A Learnable Agent Collaboration Network Framework for Personalized Multimodal AI Search Engine

Yunxiao Shi,Min Xu,Haimin Zhang,Xing Zi,Qiang Wu
2024-09-01
Abstract:Large language models (LLMs) and retrieval-augmented generation (RAG) techniques have revolutionized traditional information access, enabling AI agent to search and summarize information on behalf of users during dynamic dialogues. Despite their potential, current AI search engines exhibit considerable room for improvement in several critical areas. These areas include the support for multimodal information, the delivery of personalized responses, the capability to logically answer complex questions, and the facilitation of more flexible interactions. This paper proposes a novel AI Search Engine framework called the Agent Collaboration Network (ACN). The ACN framework consists of multiple specialized agents working collaboratively, each with distinct roles such as Account Manager, Solution Strategist, Information Manager, and Content Creator. This framework integrates mechanisms for picture content understanding, user profile tracking, and online evolution, enhancing the AI search engine's response quality, personalization, and interactivity. A highlight of the ACN is the introduction of a Reflective Forward Optimization method (RFO), which supports the online synergistic adjustment among agents. This feature endows the ACN with online learning capabilities, ensuring that the system has strong interactive flexibility and can promptly adapt to user feedback. This learning method may also serve as an optimization approach for agent-based systems, potentially influencing other domains of agent applications.
Information Retrieval,Multiagent Systems
What problem does this paper attempt to address?
### Problems the paper attempts to solve This paper aims to address the deficiencies of current artificial intelligence search engines in the following key areas: 1. **Multi - modal information support**: - Existing AI search engines mainly generate plain text content, while web content contains multiple modalities such as text, image, table and video. Supporting multi - modal content understanding is crucial for improving response quality and rich content presentation. 2. **Personalized responses**: - Current AI search engines provide uniform content to different users, ignoring the key factors of personalization and customization. Although traditional search engines have incorporated some personalized functions, AI search engines have not effectively integrated this aspect yet. For example, when asking GPT - 4 or Perplexity for muscle - building diet advice for Indians, they both recommend beef as the main source of protein, which contradicts India's cultural and diet restrictions. 3. **Answering complex logical questions**: - Current AI search engines can handle simple information retrieval and generation tasks, but perform poorly when dealing with complex, logic - intensive queries. These queries usually require multi - keyword searches and iterative retrieval processes, and the generated information requires logical coherence and strategic planning. 4. **Timely learning and adjustment**: - Current AI agents in the "expert center" rely on preset prompts and work - flows, limiting their ability to adapt independently according to user feedback. To solve these problems, the paper proposes a new framework named "Agent Collaboration Network" (ACN). The ACN framework consists of multiple agents with different roles, including Account Manager, Solution Strategist, Information Manager and Content Creator. This framework integrates image content understanding, user profile tracking and online evolution mechanisms, improving the response quality, personalization and interactivity of AI search engines. In addition, the paper also introduces an optimization algorithm named "Reflective Forward Optimization" (RFO), which supports online collaborative adjustment among agents, endows ACN with online learning ability, ensures that the system has strong interactive flexibility and can quickly adapt to user feedback. This learning method may also become a method for agent system optimization and affect other agent application areas.