

Chat凉宫春日 Chat-Haruhi-Suzumiya
Reviving Anime Character in Reality via Large Language Model
Chat凉宫春日是模仿凉宫春日等一系列动漫人物,使用近似语气、个性和剧情聊天的语言模型,
本项目由李鲁鲁, 冷子昂, 闫晨曦, 封小洋, scixing, 沈骏一, Aria Fei, 王皓, 米唯实, 冷月, JunityZhan, 贾曜恺, 吴平宇, 孙浩甄等开发。
本项目是一个开源项目,项目成员均在DataWhale等开源社区招募。
李鲁鲁( Cheng Li@SenseTime )发起了整个项目,并设计和实现了项目的大多数功能。
冷子昂( Ziang Leng@SenseTime )设计和实现了整体的ChatHaruhi1.0的训练,数据生成和后端架构。
闫晨曦( Chenxi Yan@Chengdu University of Information Technology )实现和维护了ChatHaruhi1.0版本的后端。
沈骏一( Junyi Shen@Zhejiang University )实现了训练代码,参与了训练数据集生成。
王皓( Hao Wang )收集了武林外传的台本数据,参与了增广数据的生成。
米唯实( Weishi MI@Tsinghua University )参与了增广数据生成,进行了chatharuhi2.0的实现(with 李鲁鲁),并将其上传到了公用Pypi账号。
Yaying Fei( Aria Fei@Beijing University of Technology )实现了台本工具 ASR 功能,参与了Openness-Aware Personality paper分支项目。
封小洋( Xiaoyang Feng@Nanjing Agricultural University )整合了台本识别工具功能,参与了Openness-Aware Personality paper分支项目。
冷月( Song Yan )收集了big bang thoery的数据。实现了台本格式转换功能。
scixing(汪好盛)( HaoSheng Wang )实现了台本工具中声纹识别功能,以及tts-vits语音合成功能。
Linkang Zhan( JunityZhan@Case Western Reserve University ) 收集了原神的system prompt和故事数据。
贾曜恺( Yaokai Jia )实现了Vue版本的前端,并且在心理项目中实践了Bert的GPU抽取。
吴平宇( Pingyu Wu@Juncai Shuyun )帮助部署了第一版本的训练代码。
孙浩甄( [Haozhen Sun@Tianjin University] )绘制了ChatHaruhi角色的拼图。
这个脚本展示了ChatHaruhi 2.0的基本使用方式
本脚本使用LangChain based的openAI的turbo3.5接口作为语言模型
This notebook demonstrates the basic usage of ChatHaruhi 2.0.
This script using the LangChain based openAI turbo 3.5 API as the language model.
首先使用pip install导入
Install and configure environment
we have removed chromadb recently
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.4/225.4 kB 2.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 18.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 794.4/794.4 kB 32.6 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 521.2/521.2 kB 30.7 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.9/75.9 kB 7.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 47.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 191.5/191.5 kB 18.8 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.3/46.3 kB 4.3 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.3/115.3 kB 10.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 11.9 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.4/49.4 kB 3.5 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.9/76.9 kB 7.1 MB/s eta 0:00:00 ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.3 MB/s eta 0:00:00 ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. llmx 0.0.15a0 requires cohere, which is not installed. tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 which is incompatible.
/content Cloning into 'Haruhi-2-Dev'... remote: Enumerating objects: 1008, done. remote: Counting objects: 100% (315/315), done. remote: Compressing objects: 100% (113/113), done. remote: Total 1008 (delta 219), reused 290 (delta 202), pack-reused 693 Receiving objects: 100% (1008/1008), 106.31 MiB | 52.95 MiB/s, done. Resolving deltas: 100% (549/549), done. /content/Haruhi-2-Dev
You may also using pip install chatharuhi to intall the library
but change
import ChatHaruhi from ChatHaruhi
into
import ChatHaruhi from chatharuhi
now role_from_hf is the most suggested way to load the Character
正在下载Luotuo-Bert
A new version of the following files was downloaded from https://huggingface.co/silk-road/luotuo-bert-medium: - models.py . Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Luotuo-Bert下载完毕
春日:「哦,你是来向我请教问题的吗?还是有什么事情需要我帮忙的?」
the role was saved at
https://huggingface.co/datasets/silk-road/ChatHaruhi-RolePlaying
this hugging face repo saved 32 characters, you may find other chacaters in
https://github.com/LC1332/Chat-Haruhi-Suzumiya/tree/main/characters/novel_collecting
if you download a local jsonl file you may use the interface like
chatbot = ChatHaruhi( role_from_jsonl = "Your Local File",\
llm = 'openai' ,\
verbose = True)
For the character has not been build in ChatHaruhi-jsonl format
you need to prepare
- all story files into a folder
- the system prompt
and use this interface
from chatharuhi import ChatHaruhi
text_folder = '/content/Haruhi-2-Dev/data/characters/haruhi/texts'
system_prompt = '/content/Haruhi-2-Dev/data/characters/haruhi/system_prompt.txt'
chatbot = ChatHaruhi( system_prompt = system_prompt,\
llm = 'debug' ,\
story_text_folder = text_folder)
chatbot.chat(role='阿虚', text = 'Haruhi, 你好啊')
see example here https://github.com/LC1332/Haruhi-2-Dev/blob/main/notebook/test_PrintLLM.ipynb
Run with Local Model
see this notebook
https://github.com/LC1332/Chat-Haruhi-Suzumiya/blob/main/notebook/ChatHaruhi_x_Qwen7B.ipynb
Embedding Support
Currently we support
"luotuo_openai" for Chinese using a distilled LuotuoBert model / English using openai api (text-embedding-ada-002)
"bge_en" using bge_small_en_v1.5
"bge_zh" using bge_small_zh_v1.5
Now, we need corresponding embeddings in the jsonl library.
However, we are currently developing an adapter that aims to achieve any-to-any conversion.



