Tuesday, March 17, 2026

Kim Jong Un Oversees ‘Nuclear Trigger’ Drill in Latest Show of Force

Kim Jong Un oversaw a missile exercise to enhance North Korea's nuclear readiness, emphasizing the need for rapid response to threats.

North Korea’s Military Strategy: What the 9th Party Congress Means for Regional Security

North Korea is advancing its nuclear and conventional military capabilities, raising concerns of an arms race and prompting South Korea to refine its strategy.

Samsung’s Dual-Hinge Galaxy G Fold Slated for Fall Release

Samsung is set to launch the Galaxy G Fold, a double-folding smartphone, in South Korea and China around September or October.

Watch Out ChatGPT—Alibaba’s New Qwen AI Talks, Listens, and Even Watches

EtcWatch Out ChatGPT—Alibaba’s New Qwen AI Talks, Listens, and Even Watches
Alibaba Cloud\'s end-to-end multimodal AI model Qwen2.5-Omni-7B / Photo courtesy of Alibaba
Alibaba Cloud’s end-to-end multimodal AI model Qwen2.5-Omni-7B / Photo courtesy of Alibaba

Alibaba Cloud, a subsidiary of Alibaba Group, unveiled its groundbreaking end-to-end multimodal artificial intelligence (AI) model, Qwen2.5-Omni-7B, on Monday.

This cutting-edge model can process diverse input data, including text, images, voice, and video, while delivering real-time text and voice responses.

Based on 7 billion parameters, Qwen2.5-Omni-7B is a lightweight model that implements cost-effective AI agents suitable for developing intelligent voice applications. Its versatility allows for applications across various domains, from providing real-time audio descriptions for the visually impaired to offering cooking guidance and enhancing customer service systems.

An Alibaba Cloud spokesperson stated that the model delivers high performance at a reduced cost based on innovative architecture. Key technologies include the “Thinker-Talker architecture,” which minimizes interference by separating text generation and speech synthesis, and “TMRoPE” (Time-aligned Multimodal RoPE), a location embedding technique that strengthens video and audio synchronization.

Leveraging an extensive pre-trained dataset, Qwen2.5-Omni-7B excels in various tasks, such as image-to-text, video-to-text, video-to-speech, and speech-to-text conversions. The model has notably achieved top-tier performance on the OmniBench benchmark, which assesses the integrated processing of visual, auditory, and textual information.

Alibaba Cloud has made the model open-source through popular platforms like Hugging Face and GitHub. It can also be accessed via ModelScope, Alibaba Cloud’s dedicated open-source community.

Alibaba Cloud has released over 200 Generative AI models open-source in recent years.

Following the initial launch of Qwen2.5 in September 2024, Alibaba Cloud expanded its AI portfolio with Qwen2.5-Max in January 2025. The company has also introduced specialized models like Qwen2.5-VL and Qwen2.5-1M for enhanced visual understanding and processing extended inputs.

Check Out Our Content

Check Out Other Tags:

Most Popular Articles