Watch Out ChatGPT—Alibaba’s New Qwen AI Talks, Listens, and Even Watches

Alibaba Cloud\'s end-to-end multimodal AI model Qwen2.5-Omni-7B / Photo courtesy of Alibaba — Alibaba Cloud’s end-to-end multimodal AI model Qwen2.5-Omni-7B / Photo courtesy of Alibaba

Alibaba Cloud, a subsidiary of Alibaba Group, unveiled its groundbreaking end-to-end multimodal artificial intelligence (AI) model, Qwen2.5-Omni-7B, on Monday.

This cutting-edge model can process diverse input data, including text, images, voice, and video, while delivering real-time text and voice responses.

Based on 7 billion parameters, Qwen2.5-Omni-7B is a lightweight model that implements cost-effective AI agents suitable for developing intelligent voice applications. Its versatility allows for applications across various domains, from providing real-time audio descriptions for the visually impaired to offering cooking guidance and enhancing customer service systems.

An Alibaba Cloud spokesperson stated that the model delivers high performance at a reduced cost based on innovative architecture. Key technologies include the “Thinker-Talker architecture,” which minimizes interference by separating text generation and speech synthesis, and “TMRoPE” (Time-aligned Multimodal RoPE), a location embedding technique that strengthens video and audio synchronization.

Leveraging an extensive pre-trained dataset, Qwen2.5-Omni-7B excels in various tasks, such as image-to-text, video-to-text, video-to-speech, and speech-to-text conversions. The model has notably achieved top-tier performance on the OmniBench benchmark, which assesses the integrated processing of visual, auditory, and textual information.

Alibaba Cloud has made the model open-source through popular platforms like Hugging Face and GitHub. It can also be accessed via ModelScope, Alibaba Cloud’s dedicated open-source community.

Alibaba Cloud has released over 200 Generative AI models open-source in recent years.

Following the initial launch of Qwen2.5 in September 2024, Alibaba Cloud expanded its AI portfolio with Qwen2.5-Max in January 2025. The company has also introduced specialized models like Qwen2.5-VL and Qwen2.5-1M for enhanced visual understanding and processing extended inputs.

EV Stocks Mixed as Tesla Falls, Rivian and Nikola Surge

South Korean Team Shockingly Takes Home Top Prize at Hackers World Cup—First Time Ever

South Korean PM Urges UN Action as North Korean Troops Join Russian Forces

Watch Out ChatGPT—Alibaba’s New Qwen AI Talks, Listens, and Even Watches

Check Out Our Other Content

New Tech Can Trace Where Airborne Mercury Really Comes From

ChatGPT Hits 1.25M Daily Users in Korea Thanks to Ghibli Trend

How Your Veins Could Help Spot a Deepfake—Meet FakeCatcher

New Tech Can Trace Where Airborne Mercury Really Comes From

ChatGPT Hits 1.25M Daily Users in Korea Thanks to Ghibli Trend

How Your Veins Could Help Spot a Deepfake—Meet FakeCatcher

Samsung’s Onyx LED Screen Lights Up CinemaCon with Wild 4K Clarity

North Korea Issues First-Ever Wildfire Warning Following South Korea’s Devastating Blaze

OpenAI Now Lets You Make Trump, Musk, or Xi in Any Style You Want

Most Popular Articles

New Tech Can Trace Where Airborne Mercury Really Comes From

ChatGPT Hits 1.25M Daily Users in Korea Thanks to Ghibli Trend

How Your Veins Could Help Spot a Deepfake—Meet FakeCatcher

Samsung’s Onyx LED Screen Lights Up CinemaCon with Wild 4K Clarity

North Korea Issues First-Ever Wildfire Warning Following South Korea’s Devastating Blaze

OpenAI Now Lets You Make Trump, Musk, or Xi in Any Style You Want

Lineage2M Goes Big in Southeast Asia with May 20 Launch

Over 1 Million Sold—Now Cadillac’s Escalade Gets a Major Makeover

Politics / World

Asia

Economy

Lifestyle

Entertainment