探究
实验室
计算
公开
基于CAMEL Agent构建智能搜索引擎
agent
python
agentpython
dingzh@dp.tech
更新于 2025-03-14
推荐镜像 :camel-ai:0.1.0
推荐机型 :c2_m4_cpu
基于CAMEL Agent构建智能搜索引擎
定义自己的Tool
复杂任务分解

基于CAMEL Agent构建智能搜索引擎

🏃🏻 快速开始
您可以直接在 Bohrium Notebook 上执行此文档。首先,请点击位于界面顶部的 开始连接 按钮,然后选择 camel-ai:0.1.0 镜像并选择合适的的机器配置:c2_m4_cpu,稍等片刻即可开始运行。

📖 来源
本 Notebook 参考自 Camel Docs。有关更多信息,请点击这里查看。

已有能力梳理:

  • 文献、学者查询接口
  • 用户行为数据查询接口
  • 大模型:deepseek、gpt、qwen、doubao、claude、gemini等...

场景: Bohrium AI学术搜索搜索

alt

Agent优化思路:

希望借助function call和内部数据库的能力,给用户提供专业、准确的搜索结果

  • 意图识别优化
    • 识别用户身份(role assignment)【学生、老师、软件工程师、普通人、校长、院士...】
    • 识别用户意图(task specify)【了解科普、写论文综述、追热点、深度查询(分子搜索等)】
    • 增加个性化推荐信息
  • 复杂任务分解(task decompose)
    • OpenAI和DeepMind在AI for Science领域的论文数量和热点分布
      • Task 0.0: 搜索OpenAI在AI for Science领域的论文数量和热点分布
      • Task 0.1: 搜索DeepMind在AI for Science领域的论文数量和热点分布
      • Task 0.2: 对比OpenAI和DeepMind在AI for Science领域的论文数量
      • Task 0.3: 对比OpenAI和DeepMind在AI for Science领域的热点分布
  • 自定义若干function tools
    • 多模态能力,不同任务模型封装API调用
    • 查询内部paper库
    • 查询学者库
    • 查询网页接口等等
代码
文本

定义自己的Tool

写清楚函数定义,通过generate_docstring函数生成标准注释,调用FunctionTool包装成openai schema格式的函数工具,便于大模型执行function call。

代码
文本
[1]
from camel.toolkits import FunctionTool

def add(a: int, b: int) -> int:
r"""Adds two numbers.

Args:
a (int): The first number to be added.
b (int): The second number to be added.

Returns:
integer: The sum of the two numbers.
"""
return a + b

def sub(a: int, b: int) -> int:
r"""Subtracts two numbers.

Args:
a (int): The first number to be subtracted.
b (int): The second number to be subtracted.

Returns:
integer: The difference of the two numbers.
"""
return a - b

# Wrap the function with FunctionTool
add_tool = FunctionTool(add)
代码
文本

查看基本信息

代码
文本
[2]
print(add_tool.get_function_name())
add
代码
文本
[3]
print(add_tool.get_function_description())
Adds two numbers.
代码
文本
[4]
print(add_tool.get_openai_function_schema())
{'name': 'add', 'description': 'Adds two numbers.', 'strict': True, 'parameters': {'properties': {'a': {'type': 'integer', 'description': 'The first number to be added.'}, 'b': {'type': 'integer', 'description': 'The second number to be added.'}}, 'required': ['a', 'b'], 'type': 'object', 'additionalProperties': False}}
代码
文本
[5]
print(add_tool.get_openai_tool_schema())
{'type': 'function', 'function': {'name': 'add', 'description': 'Adds two numbers.', 'strict': True, 'parameters': {'properties': {'a': {'type': 'integer', 'description': 'The first number to be added.'}, 'b': {'type': 'integer', 'description': 'The second number to be added.'}}, 'required': ['a', 'b'], 'type': 'object', 'additionalProperties': False}}}
代码
文本

Camel已经集成的工具、技术栈概览

代码
文本

小试牛刀

代码
文本
[6]
from camel.toolkits import FunctionTool, SearchToolkit
from camel.configs import QwenConfig, ChatGPTConfig
from camel.agents import ChatAgent
from camel.messages import BaseMessage
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType

import os

# 设置代理、API key,便于访问google、wiki的API
os.environ["HTTP_PROXY"] = ""
os.environ["HTTPS_PROXY"] = ""
os.environ["QWEN_API_KEY"] = ""
os.environ["GOOGLE_API_KEY"] = ""
os.environ["SEARCH_ENGINE_ID"] = ""

# # Azure OpenAI
# os.environ["DEFAULT_MODEL_PLATFORM_TYPE"] = "azure"
# os.environ["AZURE_OPENAI_API_KEY"] = ""
# os.environ["AZURE_OPENAI_BASE_URL"] = ""
# os.environ["AZURE_API_VERSION"] = "2024-06-01"
# os.environ["AZURE_DEPLOYMENT_NAME"] = "gpt-4o"
# model = ModelFactory.create(
# model_platform=ModelPlatformType.AZURE,
# model_type=ModelType.GPT_4O,
# )
# DeepSeek
# os.environ["DEEPSEEK_API_KEY"] = ""
# os.environ["DEEPSEEK_API_BASE_URL"] = ""
# model = ModelFactory.create(
# model_platform=ModelPlatformType.DEEPSEEK,
# model_type="",
# )

model = ModelFactory.create(
model_platform=ModelPlatformType.QWEN,
model_type=ModelType.QWEN_PLUS,
model_config_dict=QwenConfig(temperature=0.2).as_dict(),
)


MATH_FUNCS: list[FunctionTool] = [
FunctionTool(func) for func in [add, sub]
]

assistant_sys_msg = """You are a helpful assistant to do search task."""

tools_list = [
FunctionTool(SearchToolkit().search_wiki),
FunctionTool(SearchToolkit().search_google),
*MATH_FUNCS,
]

agent = ChatAgent(
assistant_sys_msg,
model=model,
tools=tools_list
)

# Set prompt for the search task
prompt_search = ("""When was University of Oxford set up""")

# Set prompt for the calculation task
prompt_calculate = ("""Assume now is 2024 in the Gregorian calendar, University of Oxford was set up in 1096, estimate the current age of University of Oxford""")

# Convert the two prompt as message that can be accepted by the Agent
user_msg_search = BaseMessage.make_user_message(role_name="User", content=prompt_search)
user_msg_calculate = BaseMessage.make_user_message(role_name="User", content=prompt_calculate)

# Get response
assistant_response_search = agent.step(user_msg_search)
assistant_response_calculate = agent.step(user_msg_calculate)
代码
文本
[7]
print(assistant_response_search.info['tool_calls'])
[ToolCallingRecord(tool_name='search_wiki', args={'entity': 'University of Oxford'}, result="The University of Oxford is a collegiate research university in Oxford, England. There is evidence of teaching as early as 1096, making it the oldest university in the English-speaking world and the world's second-oldest university in continuous operation. It grew rapidly from 1167, when Henry II banned English students from attending the University of Paris. After disputes between students and Oxford townsfolk, some Oxford academics fled northeast to Cambridge, where they established the University of Cambridge in 1209. The two English ancient universities share many common features and are jointly referred to as Oxbridge.", tool_call_id='call_4fc80eca8795403b97911b')]
代码
文本
[8]
print(assistant_response_calculate.info['tool_calls'])
[ToolCallingRecord(tool_name='sub', args={'a': 2024, 'b': 1096}, result=928, tool_call_id='call_c91a5741d34143a59b48ce')]
代码
文本

复杂任务分解

prompt展示:

代码
文本
[ ]

TASK_DECOMPOSE_PROMPT = TextPrompt(
"""As a Task Decomposer with the role of {role_name}, your objective is to divide the given task into subtasks.
You have been provided with the following objective:

{content}

Please format the subtasks as a numbered list within <tasks> tags, as demonstrated below:
<tasks>
<task>Subtask 1</task>
<task>Subtask 2</task>
</tasks>

Each subtask should be concise, concrete, and achievable for a {role_name}.
Ensure that the task plan is created without asking any questions.
Be specific and clear.
"""
)


TASK_COMPOSE_PROMPT = TextPrompt(
"""As a Task composer with the role of {role_name}, your objective is to gather result from all sub tasks to get the final answer.
The root task is:

{content}

The additional information of the task is:

{additional_info}

The related tasks result and status:

{other_results}

so, the final answer of the root task is:
"""
)


TASK_EVOLVE_PROMPT = TextPrompt(
"""As a Task Creator for {role_name}, your objective is to draw inspiration from the provided task to develop an entirely new one.
The new task should fall within the same domain as the given task but be more complex and unique.
It must be reasonable, understandable, and actionable by {role_name}.
The created task must be enclosed within <task> </task> tags.
<task>
... created task
</task>

## given task
{content}

## created task
"""
)
代码
文本

Task和SubTask结构

代码
文本
[9]
from camel.tasks import Task
# Creating the root task
root_task = Task(content="Prepare a meal", id="0")

# Creating subtasks for the root task
sub_task_1 = Task(content="Shop for ingredients", id="1")
sub_task_2 = Task(content="Cook the meal", id="2")
sub_task_3 = Task(content="Set the table", id="3")

# Creating subtasks under "Cook the meal"
sub_task_2_1 = Task(content="Chop vegetables", id="2.1")
sub_task_2_2 = Task(content="Cook rice", id="2.2")

# Adding subtasks to their respective parent tasks
root_task.add_subtask(sub_task_1)
root_task.add_subtask(sub_task_2)
root_task.add_subtask(sub_task_3)

sub_task_2.add_subtask(sub_task_2_1)
sub_task_2.add_subtask(sub_task_2_2)

# Printing the hierarchical task structure
print(root_task.to_string())
Task 0: Prepare a meal
  Task 1: Shop for ingredients
  Task 2: Cook the meal
    Task 2.1: Chop vegetables
    Task 2.2: Cook rice
  Task 3: Set the table

代码
文本
[10]
model = ModelFactory.create(
model_platform=ModelPlatformType.QWEN,
model_type=ModelType.QWEN_PLUS,
model_config_dict=QwenConfig(temperature=0.2).as_dict(),
)

# Set message for the assistant
assistant_sys_msg = """You are a helpful assistant to do search task."""

tools_list = [
FunctionTool(SearchToolkit().search_google),
*MATH_FUNCS,
]

# Set the agent
agent = ChatAgent(
assistant_sys_msg,
model=model,
tools=tools_list
)

task = Task(
content="请帮我对比一下OpenAI和DeepMind在AI for Science领域的论文数量和热点分布",
id="0",
)

new_tasks = task.decompose(agent=agent)
for t in new_tasks:
print(t.to_string())
task.add_subtask(t)

task.compose(agent=agent)
print(task.get_result())
Task 0.0: 搜索OpenAI在AI for Science领域的论文数量和热点分布。

Task 0.1: 搜索DeepMind在AI for Science领域的论文数量和热点分布。

Task 0.2: 对比OpenAI和DeepMind在AI for Science领域的论文数量。

Task 0.3: 对比OpenAI和DeepMind在AI for Science领域的热点分布。

Task 0 result: 根据搜索结果,以下是OpenAI和DeepMind在AI for Science领域的论文数量和热点分布的对比:

### OpenAI
1. **论文数量**:关于OpenAI在AI for Science领域的具体论文数量,目前没有明确的统计数据。
2. **热点分布**:OpenAI的研究热点主要集中在自然语言处理、图像生成、强化学习等领域。虽然OpenAI在AI for Science方面的具体研究热点没有明确提及,但可以推测其研究可能涉及科学文本的理解与生成、科学数据的分析与预测等方面。

### DeepMind
1. **论文数量**:DeepMind在AI for Science领域发表了多篇重要论文。例如,DeepMind在蛋白质结构预测(AlphaFold)方面的研究取得了重大突破。
2. **热点分布**:DeepMind的研究热点包括但不限于蛋白质结构预测、分子动力学模拟、材料科学等。DeepMind的研究主要集中在利用深度学习技术解决科学问题,特别是在生物学和化学领域。

### 对比
1. **论文数量**:DeepMind在AI for Science领域的论文数量相对较多,尤其是在生物学和化学领域的应用研究方面。
2. **热点分布**:DeepMind的研究热点更集中于具体科学问题的解决,如蛋白质结构预测和分子动力学模拟。而OpenAI的研究热点则更广泛,涉及自然语言处理、图像生成等多个领域,但其在AI for Science领域的具体研究热点尚不明确。

综上所述,DeepMind在AI for Science领域的研究更为深入和具体,而OpenAI的研究则更广泛。
  Task 0.0 result: 
  Task 0.1 result: 
  Task 0.2 result: 
  Task 0.3 result: 

代码
文本

role play相关测试

代码
文本
[14]
from camel.agents import (
TaskCreationAgent,
TaskPlannerAgent,
TaskPrioritizationAgent,
TaskSpecifyAgent,
)
from camel.models import ModelFactory
from camel.types import ModelPlatformType, ModelType, TaskType

def test_task_specify_ai_society_agent(model):
original_task_prompt = "Improving stage presence and performance skills"
print(f"Original task prompt:\n{original_task_prompt}\n")
task_specify_agent = TaskSpecifyAgent(model=model)
specified_task_prompt = task_specify_agent.run(
original_task_prompt,
meta_dict=dict(assistant_role="Musician", user_role="Student"),
)
assert "{" and "}" not in specified_task_prompt
print(f"Specified task prompt:\n{specified_task_prompt}\n")

def test_task_specify_code_agent(model):
original_task_prompt = "Modeling molecular dynamics"
print(f"Original task prompt:\n{original_task_prompt}\n")
task_specify_agent = TaskSpecifyAgent(model=model, task_type=TaskType.CODE)
specified_task_prompt = task_specify_agent.run(
original_task_prompt,
meta_dict=dict(domain="Chemistry", language="Python"),
)
assert "{" and "}" not in specified_task_prompt
print(f"Specified task prompt:\n{specified_task_prompt}\n")



def test_task_planner_agent(model):
original_task_prompt = "Modeling molecular dynamics"
print(f"Original task prompt:\n{original_task_prompt}\n")
task_specify_agent = TaskSpecifyAgent(
model=model,
task_type=TaskType.CODE,
)
specified_task_prompt = task_specify_agent.run(
original_task_prompt,
meta_dict=dict(domain="Chemistry", language="Python"),
)
print(f"Specified task prompt:\n{specified_task_prompt}\n")
task_planner_agent = TaskPlannerAgent(model)
planned_task_prompt = task_planner_agent.run(specified_task_prompt)
print(f"Planned task prompt:\n{planned_task_prompt}\n")

def test_task_creation_agent(model):
original_task_prompt = "Modeling molecular dynamics"
task_creation_agent = TaskCreationAgent(
model=model,
role_name="PhD in molecular biology",
objective=original_task_prompt,
)
task_list = ["Study the computational technology for dynamics modeling"]
planned_task = task_creation_agent.run(
task_list=task_list,
)
print(f"Planned task list:\n{planned_task}\n")
assert isinstance(planned_task, list)

def test_task_prioritization_agent(model):
original_task_prompt = (
"A high school student wants to " "prove the Riemann hypothesis"
)

task_list = [
"Drop out of high school",
"Start a tech company",
"Become a billionaire",
"Buy a yacht",
"Obtain a bachelor degree in mathematics",
"Obtain a PhD degree in mathematics",
"Become a professor of mathematics",
]

task_prioritization_agent = TaskPrioritizationAgent(
objective=original_task_prompt,
model=model,
)

prioritized_task = task_prioritization_agent.run(task_list=task_list)
print(f"Prioritized task list:\n{prioritized_task}\n")
assert isinstance(prioritized_task, list)

model = ModelFactory.create(
model_platform=ModelPlatformType.QWEN,
model_type=ModelType.QWEN_PLUS,
model_config_dict=QwenConfig(temperature=0.2).as_dict(),
)

test_task_specify_ai_society_agent(model)
test_task_specify_code_agent(model)
test_task_planner_agent(model)
test_task_creation_agent(model)
test_task_prioritization_agent(model)
Original task prompt:
Improving stage presence and performance skills

Specified task prompt:
Develop a captivating stage entrance, create a signature performance gesture, engage the audience with storytelling, and master confident eye contact while incorporating dynamic movement to enhance musical expression, ensuring a memorable and immersive experience for both Student and the audience.

Original task prompt:
Modeling molecular dynamics

Specified task prompt:
Simulate the folding process of a small protein in aqueous solution using Python, applying Newton's equations of motion, and visualize its conformational changes over time to identify stable secondary structures and analyze hydrogen bond formation dynamics.

Original task prompt:
Modeling molecular dynamics

Specified task prompt:
Simulate the folding process of a specific protein (e.g., insulin) using Python to model molecular dynamics with forces, temperatures, and interactions. Visualize its 3D conformation changes over time and analyze energy states to predict stable structures. Use libraries like NumPy and MDAnalysis for calculations and Matplotlib for visualization.

Planned task prompt:
1. **Define Protein Structure**: Load the protein's initial structure (e.g., insulin) using MDAnalysis or similar libraries.  
2. **Set Simulation Parameters**: Define forces, temperatures, and interaction potentials (e.g., Lennard-Jones, Coulombic).  
3. **Model Molecular Dynamics**: Use NumPy for calculations to simulate folding over time steps.  
4. **Visualize 3D Conformation**: Use Matplotlib or Mayavi to animate conformational changes.  
5. **Analyze Energy States**: Calculate potential and kinetic energy at each step to identify stable structures.  
6. **Predict Stable Structures**: Compare energy states to determine the most stable folded conformations.

Planned task list:
[]

Prioritized task list:
['Obtain a bachelor degree in mathematics', 'Obtain a PhD degree in mathematics', 'Become a professor of mathematics', 'Start a tech company', 'Become a billionaire', 'Buy a yacht', 'Drop out of high school']

代码
文本
[16]
# 身份识别

from camel.agents import ChatAgent, RoleAssignmentAgent
from camel.messages import BaseMessage
from camel.responses import ChatAgentResponse
from camel.types import RoleType

num_roles = 5
task_prompt = "请根据用户的query来分辨角色,用户的query是:我想了解deepseek-r1的最新进展"

# Construct role assignment agent
role_description_agent = RoleAssignmentAgent(model)

# Generate the role description dictionary based on the mock step function
role_description_dict = role_description_agent.run(task_prompt, num_roles)

for key, value in role_description_dict.items():
print("Role name: ", key)
print(value)
print("==" * 10)
Role name:  Large Language Model Researcher
The researcher should have a deep understanding of large language models, specifically the DeepSeek series. They must stay updated with the latest advancements in the field, analyze technical papers, and provide insights into the latest progress of DeepSeek-R1.
====================
Role name:  Natural Language Processing Engineer
This engineer should possess expertise in natural language processing (NLP) and its applications. Their duties include evaluating the performance of DeepSeek-R1, understanding its architecture, and identifying any new features or improvements in the latest version.
====================
Role name:  Data Scientist specializing in Model Evaluation
The data scientist should focus on evaluating model performance metrics, benchmarking DeepSeek-R1 against other models, and analyzing datasets used for training and testing. They will document and communicate findings related to the model's capabilities and limitations.
====================
Role name:  Technical Writer with AI Focus
The technical writer must have experience in simplifying complex AI concepts for broader audiences. They will collaborate with researchers and engineers to create clear, concise reports or articles summarizing the latest developments in DeepSeek-R1 for users and stakeholders.
====================
Role name:  AI Product Manager
The product manager should have a strategic understanding of AI products and market trends. Their role involves gathering insights from the team, identifying potential use cases for DeepSeek-R1, and ensuring that the information provided aligns with user needs and expectations.
====================
代码
文本
agent
python
agentpython
点个赞吧