
Competition between the United States and China for supremacy in generative artificial intelligence (AI) technology is expanding beyond the AI chatbot market into the AI agent market.
AI agents are intelligent systems capable of autonomous problem-solving beyond the abilities of conventional chatbots.
U.S. companies like OpenAI (Microsoft), Google, and Anthropic also have taken the lead in this field. However, ByteDance, TikTok’s parent company, is quickly gaining ground with its new AI model, setting the stage for an intense showdown.

Industry sources reported on Tuesday that ByteDance unveiled its AI agent, UI-TARS, on January 23. This innovative system can solve problems autonomously by interpreting and inferring graphical user interfaces (GUIs).
Unlike other models, UI-TARS is said to function in both web browsers and mobile app environments.

ByteDance claimed that UI-TARS outperformed competitors like GPT-4o and Claude 3.5 Sonnet in VisualWebBench, a web-based visual AI model assessment.
The company also claims that the “Daubao 1.5 Pro” version, released on the same day, is more cost-effective than Chat GPT-4o in coding, inference, and Chinese processing. Daubao is a popular Chinese chatbot with 60 million monthly active users (MAU).

OpenAI introduced its web browser-based AI agent, “Operator,” on January 23, just a day after ByteDance’s announcement.
This rapid response is interpreted as UI-TARS influencing its timeline, just as it distributed o3-mini free of charge and launched the deep reasoning AI model DeepResearch immediately following the DeepSeek breakthrough.
The operator employs a modified version of GPT-4o’s vision recognition capability, CUA (Computer-Using Agent), to identify and sequentially execute commands in a web browser.
This enables the AI to interact by interpreting images displayed on a computer screen.

Both companies claim their AI agents can autonomously search, recommend, and book travel itineraries. These systems can handle requests for flight bookings, hotel reservations, and Uber calls through voice or text commands. If problems arise, they attempt to resolve them independently before seeking user intervention.
While traditional large language model (LLM) based AI chatbots primarily produce text outputs, these advanced AI agents are designed to perform actions based on user requests, potentially becoming essential partners in daily life and work.
An industry insider noted that tech companies invest heavily in AI technology to secure leadership and market dominance. The insider pointed out that the DeepSeek breakthrough will likely intensify the U.S.-China rivalry in the AI agent market.