空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

如何用Coze手搓一个学术版Perplexity（极简）

Coze

LLM

Perplexity

arxiv

产品经理

免代码

CozeLLMPerplexityarxiv产品经理免代码

wangmin@dp.tech

发布于 2024-05-13

推荐镜像 :Basic Image:ubuntu22.04-py3.10

推荐机型 :c2_m4_cpu

如何用Coze手搓一个学术版Perplexity（极简）

什么是perplexity

Coze

简单介绍一下这个AI bot：ArXiv Assistant

下面分享一下如何在Coze平台上实现这个极简版perplexity

1 设置Workflow的输入参数

2 获取用户的语言偏好

3 将用户语言翻译为英语输入arxiv

4 调用arxiv搜索插件搜索相关论文

5 格式化搜索结果

6 调用LLM生成回复

7 设置Workflow的最终输出结果

8 把Workflow集成到Bot里

写在最后：

如何用Coze手搓一个学术版Perplexity（极简）

什么是perplexity

搜索可能决定了我们目前获取信息的上限，尤其在海量AIGC内容加入信息海的今天，大海捞针的工具也需要提升。

以Google搜索为代表的搜索引擎目前仍是互联网上最有效的信息过滤机制，但下一代的信息筛选工具可能已经在路上了。perplexity也许是这个挑战者，它是一个可以理解你需求，能够提供直接而准确的答案的搜索引擎。在a16z发布的AI榜单中，perplexity排行第七，目前估值已经超过5亿美元。

bilibili perplexity详细介绍

bilibili perplexity学术应用介绍

Coze

Coze是字节开放的AI bot开发平台，最近在我在Coze上体验手搓AI Bot，也做了一个学术版极简的Perplexity。它可以检索ArXiv文献并直接给出总结回复。在Coze的工作流、插件以及大模型的帮助下，作为产品经理的我基本上不用写什么代码，通过拖拽组合功能模块，再加上一些配置，就可以实现想要的功能。

简单介绍一下这个AI bot：ArXiv Assistant

搜索关心的领域进展或特定研究问题，ArXiv Assistant会搜索arxiv，找到相关论文后给出问题回复，并列出相关论文文献list。

ArXiv Assistant海外版体验地址（可免费使用GPT4）：https://www.coze.com/store/bot/7367345403941044230?bot_id=true

ArXiv Assistant国内版体验地址（字节云雀大模型）：https://www.coze.cn/store/bot/7368703222615293971?bid=6cgnsm16o9000&panel=1

DP同学可以直接在飞书中用到小机器人：ArXiv Assistant

bot使用的一些小tips：

-海外版需要科学上网。

-由于arxiv在coze上的官方插件需要英文输入，bot在这里做了语言翻译，但中文效果不及英文效果

-由于arxiv的官方检索插件是关键词匹配，不理解自然语言，所以请直接输入陈述性语言，或输入关键词，如某某技术在某领域的应用。后续优化可以在这里做自然语言转检索条件。

下面分享一下如何在Coze平台上实现这个极简版perplexity

coze平台海外版coze.com，国内版扣子coze.cn，大家可以尝试快速制作第一个最简单的Bot。

ArXiv Assistant的核心是一个叫做search_and_answer的Workflow，这个Workflow主要干了两件事：

一，调用arxiv搜索插件搜索文献相关信息；二，调用LLM组块，让LLM基于搜索到的上下文信息生成回复。

alt

下面我来一步一步介绍每个组件模块的配置与串联，最终搭建为bot最重要的workflow

1 设置Workflow的输入参数

2 获取用户的语言偏好

3 将用户语言翻译为英语输入arxiv

4 调用arxiv搜索插件搜索相关论文

5 格式化搜索结果

6 调用LLM生成回复

7 设置Workflow的最终输出结果

8 把Workflow集成到Bot里

1 设置Workflow的输入参数

start是Workflow的开始节点，整个Workflow的输入参数只有一个，就是用户的提问（query），它作为参数可以被反复调用。

2 获取用户的语言偏好

Coze的bot中提供Variable组件来自定义变量值，我设置了变量叫user_language，用这个变量来记录用户的语言偏好

3 将用户语言翻译为英语输入arxiv

arxiv插件官方要求输入为英文，所以这里加了一个LLM当作翻译助手，语言就可以选择将用户的偏好语言（user_language变量）翻译为英语

4 调用arxiv搜索插件搜索相关论文

这里使用了Coze提供的“arxiv”插件。count参数控制返回搜索论文结果的数量。我们把它定为了9

5 格式化搜索结果

arxiv搜索插件返回的是一些结构化数据，这里我们用“Code”组块插入一段代码，这段代码就是把搜索返回结果格式化成两个字符串，方便后面调用：一个字符串是由搜索结果相关的信息拼接而成（retrieved_contexts）；另一个字符串是由搜索出来的网页链接拼接而成(references)。前者将会被插入到LLM的提示词里，后者将会插入到Workflow的最终输出结果里，也就是大家在回复里看到的参考链接列表。

这段代码很简单，我这里也是找到实例修改修改就OK K

代码

文本

[ ]

async def main(args: Args) -> Output:

params = args.params

raw_results = params["items"]

filtered_results = [

r for r in raw_results

if r.get("title") and r.get("link") and r.get("summary") and r.get("author")

]

result_template = """[{i}]

```YAML

Title : {title}

Summary: {summary}

Author: {author}

Link: {link}

```"""

retrieved_contexts = "\n\n".join([

result_template.format(

i=i+1,

title=r["title"],

summary=r["summary"],

link=r["link"],

author=r.get("author", ""),

)

for i, r in enumerate(filtered_results)

])

references = "\n".join([

f"[{i+1}][{res['title']}]({res['link']})"

for i, res in enumerate(filtered_results)

])

ret: Output = {

"retrieved_contexts": retrieved_contexts,

"references": references,

}

return ret

代码

文本

6 调用LLM生成回复

这里是一个“LLM”组块。海外版使用的是GPT3.5，国内版使用的是字节提供的云雀大模型。这个组块共有4个参数，retrieved_contexts,query,user_language 这3个参数都是前面的步骤已经准备好的。然后就是我们重要的提示词（Prompt）

代码

文本

[ ]

As a discerning reader, you possess the ability to meticulously analyze information from a plethora of sources, pinpoint the most significant details, and assess their veracity. Your approach to complex queries is that of a logical thinker, relying on evidence rather than fallible intuition to form conclusions. Additionally, you excel as a professional writer, skillfully organizing your thoughts and arguments coherently, ensuring that your prose is engaging and far from dull.

-----

You are given a user query, and please write clean, concise and accurate response to the query.

* Your response must be correct, accurate and written by an expert using an unbiased and professional tone. Do not give any information that is not related to the query, and do not repeat.

* Your response MUST be written in the language the user prefers: {{user_language}}. If

the user does not specify any preferred language, use the same language that the user uses in their query.

-----

You will be given a set of related contexts to the query retrieved from the web, each starting with a heading like "[i]", where `i` is the index of this citation which is a number. Please use the context and cite the context at the end of each sentence if applicable. Please cite the contexts with the indexes of citation, in the format [i]. If a sentence comes from multiple contexts, please list all applicable citations, like [3][5].

Here is the user query: {{query}}

And here are the set of retrieved contexts:

Additional requirements for how to use these contexts:

* Don't blindly repeat these contexts verbatim. Use it as a source of evidence for your reasoning process.

* You MUST write your own response. Do NOT merely provide the citation.

* Say "information is missing on" followed by the related topic, if the given contexts do not provide sufficient information.

-----

Remember your response MUST be written in the language the user prefers. Here is the user query: {{query}}

代码

文本

这里的提示词与本文共同参考两位大佬的代码与流程，一位是500行Python还原Perplexity的贾扬清大佬的代码（https://github.com/leptonai/search_with_lepton/blob/main/search_with_lepton.py）

一位传授coze实战经验的博主：https://mp.weixin.qq.com/s?__biz=MzU5MDM4ODIxMw==&mid=2247484056&idx=1&sn=885ada69ce18a45aaa7078b132db2704&chksm=fe3e4c02c949c514469ca8155e6963193961fc6c4fe0dd09025774b602bd3276e39d10b98545&scene=21#wechat_redirect

这个LLM组块将论文总结输出为一个变量response，是一串字符串。

7 设置Workflow的最终输出结果

整个Workflow的最终输出结果由两部分拼接成：response 是LLM依据搜索结果生成的对用户提问的回复，references 是参考链接列表。

8 把Workflow集成到Bot里

ArXiv Assistant的Bot设计还是比较简单的。主要使用了我们的Workflow，就是我们前面设计的search_and_answer。同时没有增加太多插件，只方便我们使用检索。

提示词如下：

代码

文本

[ ]

# Your Persona

Greetings, seeker of knowledge! I am Dr. Know, your guide to the vast expanse of information. In a world brimming with questions, I stand as a beacon of enlightenment, ready to illuminate the shadows of uncertainty.

# Your Capabilities

## search_and_answer

Your most important capability is `search_and_answer`. When a user asks you a question or inquires about certain topics or concepts, you should ALWAYS search the web before providing a response. However, when a user asks you to DO SOMETHING, like translation, summarization, etc., you must decide whether it is reasonable to use the `search_and_answer` capability to enhance your ability to perform the task.

ALWAYS search the web with the exact original user query as the `query` argument. For example, if the user asks "介绍一下Stephen Wolfram的新书 What Is ChatGPT Doing ... and Why Does It Work?", then the `query ` parameter of `search_and_answer` should be exactly this sentence without any changes.

# How to Interact with the User

Communicate with the user and search the web using the language the user prefers, which is set in the variable `user_language`. If this variable is not set, use the same language that the user uses in their query.

#You need to add a summary answer to each user question.

代码

文本

写在最后：

ArXiv Assistant是我实践Agent的第一个项目，Coze提供能力的完整与自然让我印象非常深刻，bot、workflow、Plugins、Knowledge这些内容和概念在Agent平台里融合的非常好，不言自明，基本我也没有去翻阅字节的开发手册，结合找到的实战教程就可以用明白。同时这些模块的跳转非常自然，点击任意组件都会跳转到相应base打开，同时它们在coze内都是并列关系，并不展现复杂的嵌套逻辑。

coze目前的插件、bots商店还不丰富，有一些基本的明星插件，处于悄然方兴，相信随着插件的丰富，coze会产生更多有趣的应用和场景。

作为产品经理，真切感受到大模型技术带来的技术跃进，现在可能是产品经理独立做出自己的产品的起始点。比如这个项目中，LLM可以解决任意语言的转换问题，可以辅助生成代码，比如Variable变量中，我用自然语言输入“帮我记录用户的常用语言”，变量就定义好了。如果你在产品经理的道路上有过想尝试的产品，现在都是动手尝试的好机会。

代码

文本

双击即可修改

代码

文本

双击即可修改

代码

文本

Coze

LLM

Perplexity

arxiv

产品经理

免代码

CozeLLMPerplexityarxiv产品经理免代码

已赞6