Jianghai Securities: Volcano Engine releases multiple bean pack models and continues to be optimistic about investment opportunities in AI applications

Zhitongcaijing · 04/18 09:33

The Zhitong Finance App learned that on April 17, 2025, Jianghai Securities released a research report saying that on April 17, 2025, Volcano Engine released the Doubao 1.5 Deep Thinking Model, and upgraded the Douba·Wensheng Figure Model 3.0 and Douba·Visual Understanding Model; at the same time, it released OS Agent solutions and GUI Agent large model—Doubao 1.5 · UI-TARS model for agent services; and released the AI Cloud Native Serving Kit inference kit for large-scale reasoning. The bank continues to be optimistic about investment opportunities in AI applications, and is reminded to focus on Hande Information (300170.SZ), the dark horse of entrepreneurship (300688.SZ), and Cooperative Information (688615.SH).

Jianghai Securities's main views are as follows:

The average daily token call volume of the Doubao Big Model continues to rise sharply, which is beneficial to the data elements and computing power sector

By the end of March 2025, the Doubao Big Model had an average daily token call volume of more than 12.7 trillion dollars, three times that of December 2024, and 106 times that of when it was just released a year ago. According to the IDC report, the number of public cloud models used in China surged in 2024, and Volcano Engine ranked first in the Chinese market with a market share of 46.4%. The bank believes that the number of tokens called in the Doubao Big Model continues to rise, which is beneficial to the data elements and computing power sector.

Doubao 1.5 · Deep Thinking Model is newly released, using MoE architecture and dual-track reward mechanism

The Doubao 1.5 · Deep Thinking Model was newly released. It excels in reasoning tasks in specialized fields such as mathematics, coding, and science, reaching or approaching the world's first-tier level; in non-reasoning tasks such as creative writing, the model also shows excellent generalization ability and can handle a wider range of complex scenarios. In order to improve general capabilities, the model team optimized data processing strategies and integrated processing of verifiable data with creative data to meet the requirements of various tasks. Large-scale reinforcement learning is a key technology for training inference models. By using an innovative dual-track reward mechanism, the algorithm is effectively optimized by balancing the tasks of “discerning right from wrong” and “seeing others.” The model uses MoE architecture, with a total parameter of 200B and an activation parameter of only 20B, which has significant training and inference cost advantages. Based on efficient algorithms, the model provides the industry's highest concurrent carrying capacity and achieves an extremely low latency of 20 ms. When solving specific problems, big models need to be able to search for information on the Internet and conduct multiple rounds of searching and thinking. Unlike the “search first, then think” model of other reasoning models, the Doubao App uses targeted training based on the Doubao 1.5 Deep Thinking Model, which can “search while thinking”; in addition, the model also has visual understanding ability and can think based on the images it sees. The bank believes that the innovation of the Doubao 1.5 Deep Thinking Model is the use of the MoE architecture (total parameters are 200B, activation parameters are only 20B) and a dual-track reward mechanism.

Douba·Wensheng Figure Model 3.0 has been upgraded, and the effects of text layout, image and image production are better

The new Dobao Wensheng Picture Model 3.0 has been upgraded to achieve better text layout performance, real shot image generation effects, and 2K high-definition image generation methods; it can be widely used in marketing, e-commerce, and design scenarios such as film and television, posters, painting, and doll design. In the ArtificialAnalysis Arena, the latest authoritative list in the Wenshengtu field, the Doubao Wenshengtu 3.0 model has surpassed many mainstream models in the industry and ranked first in the world. The bank believes that the new upgrade to Doubao-Wenshengtu Model 3.0 is expected to be implemented in more application scenarios.

Douba·The visual understanding model has been upgraded, video positioning is more accurate, and video understanding is more intelligent

The Doubao Visual Understanding Model has been newly upgraded. It has stronger visual positioning capabilities, supports frame positioning and point positioning for multiple targets, small targets, and general targets, and supports positioning counting, describing positioning content, and 3D positioning. It can be applied to offline store inspection scenarios, GUIAgent, robot training, autonomous driving training, etc. At the same time, the new version has also greatly improved the ability to understand videos, such as memory, summary understanding, speed perception, and understanding long videos. The bean bag visual understanding model combined with vector search can directly semantically search videos, and is widely used in commercial scenarios such as security and home care. The bank believes that the new upgrade of the Doubag Visual Understanding Model is expected to continue to empower industries such as robotics, smart cars, and security in the future.

For agent services, Volcano Engine releases OS Agent solutions and GUI Agent large model—Doubao 1.5 · UI-tars model; for large-scale inference, Volcano Engine releases AI cloud-native · Serving Kit inference kit

Volcano Engine believes that in the future, AI agents will develop in parallel in the two directions of application agents and OS agents. Application agents are more specialized, such as customer service agents, data agents, code agents, etc., and can focus on completing tasks in specific fields; while OS agents have versatility and flexibility across scenarios, they can directly operate browsers, computers, mobile phones, or other agents to complete complex tasks. Based on this, Volcano Engine officially released the OS Agent solution. The solution encapsulates the large model capabilities of Doubao through the Volcano Engine VeFaaS platform, enabling enterprises and developers to easily build lightweight Codeuse and Browseruse. For complex OSAgents, Volcano Engine also officially released a large GUI Agent model - Doubao 1.5 · UI-tars model. The model integrates visual understanding of screens, logical reasoning, positioning and operation of interface elements in a single model, breaking through the limitations of traditional automation tools relying on preset rules.

In addition, Volcano Engine launched the Serving Kit inference suite to help enterprises achieve rapid model deployment, inference optimization, and observable operation and maintenance. The Serving Kit inference kit can download and warm up 671B Deep Seek R1 within 2 minutes, and load the inference engine within 13 seconds. The bank believes that future application agents and OS agents will usher in rapid development.

Risk warning: risk of industrial policy changes, risk of AI application development falling short of expectations, risk of target company performance falling short of expectations.