- The new iRAG technology mitigates hallucinations in image generation, boosting the applicability of AI-generated images.
- Miaoda is a no-code tool that enables users to build applications using simple natural language.
- ERNIE now handles 1.5 billion daily API calls, marking an exceptional 30-fold growth over the previous year.
SHANGHAI, Nov. 13, 2024 /PRNewswire/ -- Baidu, Inc. (NASDAQ: BIDU and HKEX: 9888), a leading AI company with strong Internet foundation, today hosted its annual flagship technology conference Baidu World 2024 in Shanghai. At the event, the company announced a range of new AI technologies and solutions to accelerate the boom of AI applications, featuring iRAG (Image-Based Retrieval-Augmented Generation), a new technology designed to tackle hallucinations in image generation, and Miaoda, a no-code tool that empowers businesses and individuals to create applications.
The launch comes amid surging demand for Baidu's AI offerings, as evidenced by the ERNIE foundation model's daily API calls reaching 1.5 billion by early November. This represents a substantial 30-fold increase from the 50 million announced a year ago.
ERNIE's daily API calls reach 1.5 billion by early November 2024
"The growth rate exceeded my expectations," said Robin Li, Co-founder, Chairman, and CEO of Baidu, who described the steep increase as a reflection of the explosive growth in generative AI applications in China over the past two years.
Robin Li, Co-founder, Chairman, and CEO of Baidu, speaks at Baidu World 2024
Looking ahead, Li emphasized that agents will serve as the predominant form of AI applications and are approaching a tipping point of explosive growth. Underscoring this point, Li introduced the Top 100 Agents and Top 100 Industry Applications on the ERNIE AgentBuilder platform.
Baidu World 2024 also highlighted the latest user growth for ERNIE Bot, ERNIE's expanding role in enterprise applications, and featured the debut of Xiaodu AI Glasses by Xiaodu Technology.
New iRAG Technologyto Mitigate Hallucinations in Image Generation
Hallucinations, a phenomenon in which AI generates false or misleading information, have remained one of the most intractable barriers to the widespread adoption of generative AI. In text generation, RAG technology has largely resolved the problem of hallucinations, greatly enhancing the accuracy of generated answers. However, in the field of multimodality, hallucinations remain a key obstacle, often manifesting as inaccurate depictions of people or landmarks.
Baidu's newly launched iRAG technology can mitigate hallucinations in text-to-image generation. Leveraging Baidu Search's vast collection of hundreds of millions of images and the company's strong foundation model capabilities, the new technology enables text-to-image models to deliver hyper-realistic visuals while also significantly reducing the cost of image production. The ability of the iRAG to reduce hallucinations boosts the applicability of images generated by text-to-image models across visual mediums, including comics, storyboards, posters, among others. Li described the reduction of hallucinations as laying the groundwork for the coming boom in AI applications.
Miaoda: Building Applications with Natural Language
Baidu also unveiled Miaoda, a no-code tool that makes it possible to build entire applications simply by describing them in natural language. Miaoda provides no-code programming, multi-agent collaboration, and multi-tool invocation. No-code programming allows anyone to generate code without writing a single line, lowering barriers to AI development and making it accessible to all. Its multi-agent collaboration leverages ERNIE's thinking and planning capabilities to coordinate and manage different agents effectively, while its multi-tool invocation taps into ERNIE's tool invocation abilities, extensively utilizing web search, iRAG, maps API, and other tools for a seamless workflow.
"Baidu isn't aiming to launch a 'super app'; instead, we aim to help more people and businesses create millions of 'super useful' applications," Li said.
Mirroring the real-world product development process, Miaoda draws on the abilities of different agents across multiple domains such as project management and planning, content editing, programming and quality control. Miaoda can even automatically identify bugs and use a range of tools. Li called it "the most complex application case of multi-agent collaboration to date".
Miaoda empowers everyone with the capabilities of a programmer – anyone who can speak, can create applications, greatly enhancing human productivity.
Agents as the Next Frontier in AI Application
"Today, while all leading global tech firms are paying attention to agents, few have made them as central to their strategy as Baidu has," Li said at the event. He likened the potential of agents to websites in the PC era and social media accounts in the mobile age.
"Agents are more human-like, more intelligent, and act like your sales, customer service representatives, or assistants. Agents will become a new vehicle for content, information and services," Li added.
Four types of agents – company, character, tool, and industry – were highlighted at the event to demonstrate the potential of agents in the AI era. Company agents, for example, differ from traditional websites that only show static company and product information, and can instead proactively recommend products based on customer needs and respond promptly to service requests, greatly improving the efficiency of interactive marketing. After deploying an agent powered by ERNIE, automaker BYD saw a 119% increase in lead conversion.
Li also offered the example of the tool agent Free Canvas, developed in collaboration between Baidu Wenku and Baidu Drive. It allows users to drag and drop documents, audio, video, and other materials onto a canvas-like interface to quickly generate multimodal content. Similarly, the industry agent Faxingbao was specifically developed for the legal field. It has answered over 16.6 million legal questions and can also calculate compensation, draft legal documents, and recommend suitable lawyers.
To date, Baidu's ERNIE AgentBuilder has garnered the interest of 150,000 businesses and 800,000 developers. The top 100 agents encompass character-based agents like the "Farmer Academician", as well as agents for tools, industries, workplace, emotions, and entertainment.
Expanding ERNIE's Reach with Growing Developer Community, Enterprise Applications and AI Glasses
Haifeng Wang, Chief Technology Officer of Baidu, announced that ERNIE Bot has amassed 430 million users, while the PaddlePaddle deep learning platform and ERNIE have a total of 18.08 million developers. Wang also mentioned that the ERNIE foundation model is still in training, with an even more powerful version worth looking forward to.
Dou Shen, Executive Vice President of Baidu and President of Baidu AI Cloud Group, highlighted that the Qianfan platform has helped customers fine-tune 33,000 models, and develop 770,000 enterprise applications. Shen noted that the explosion in new AI applications has begun on the enterprise front, supported by a new AI infrastructure comprising an enterprise-level foundation model engineering platform and a heterogeneous computing platform, poised to replace traditional cloud computing.
Ying Li, CEO of Xiaodu Technology, introduced Xiaodu AI Glasses, the first native AI glasses powered by a Chinese language foundation model. Powered by ERNIE and equipped with visual, audio, and location-based capabilities, these smart glasses serve as a versatile AI assistant across everyday scenarios. They function as a personal tour guide, offer information via Baidu Maps and Search, and excel in instant translation and content summarization from photos. Ideal for both academics and casual readers, they also assist with intelligent note-taking and can personalize music to enhance the user's surroundings. Li revealed that Xiaodu AI Glasses will be available for sale in the first half of 2025.
What the stars mean:
★ Poor ★ ★ Promising ★★★ Good ★★★★ Very good ★★★★★ Exceptional