Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
ChatHaruhi2_demo
Chat
Chat
xuxh@dp.tech
更新于 2024-08-27
推荐镜像 :Basic Image:bohrium-notebook:2023-04-07
推荐机型 :c2_m4_cpu
Chat凉宫春日 Chat-Haruhi-Suzumiya
Reviving Anime Character in Reality via Large Language Model
首先使用pip install导入
Install and configure environment
Run with Local Model
Embedding Support

Chat凉宫春日 Chat-Haruhi-Suzumiya

Reviving Anime Character in Reality via Large Language Model

Code License Data License Huggingface Gradio

Chat凉宫春日是模仿凉宫春日等一系列动漫人物,使用近似语气、个性和剧情聊天的语言模型,

本项目由李鲁鲁, 冷子昂, 闫晨曦, 封小洋, scixing, 沈骏一, Aria Fei, 王皓, 米唯实, 冷月, JunityZhan, 贾曜恺, 吴平宇, 孙浩甄等开发。

本项目是一个开源项目,项目成员均在DataWhale等开源社区招募。

李鲁鲁( Cheng Li@SenseTime )发起了整个项目,并设计和实现了项目的大多数功能。

冷子昂( Ziang Leng@SenseTime )设计和实现了整体的ChatHaruhi1.0的训练,数据生成和后端架构。

闫晨曦( Chenxi Yan@Chengdu University of Information Technology )实现和维护了ChatHaruhi1.0版本的后端。

沈骏一( Junyi Shen@Zhejiang University )实现了训练代码,参与了训练数据集生成。

王皓( Hao Wang )收集了武林外传的台本数据,参与了增广数据的生成。

米唯实( Weishi MI@Tsinghua University )参与了增广数据生成,进行了chatharuhi2.0的实现(with 李鲁鲁),并将其上传到了公用Pypi账号。

Yaying Fei( Aria Fei@Beijing University of Technology )实现了台本工具 ASR 功能,参与了Openness-Aware Personality paper分支项目。

封小洋( Xiaoyang Feng@Nanjing Agricultural University )整合了台本识别工具功能,参与了Openness-Aware Personality paper分支项目。

冷月( Song Yan )收集了big bang thoery的数据。实现了台本格式转换功能。

scixing(汪好盛)( HaoSheng Wang )实现了台本工具中声纹识别功能,以及tts-vits语音合成功能。

Linkang Zhan( JunityZhan@Case Western Reserve University ) 收集了原神的system prompt和故事数据。

贾曜恺( Yaokai Jia )实现了Vue版本的前端,并且在心理项目中实践了Bert的GPU抽取。

吴平宇( Pingyu Wu@Juncai Shuyun )帮助部署了第一版本的训练代码。

孙浩甄( [Haozhen Sun@Tianjin University] )绘制了ChatHaruhi角色的拼图。

这个脚本展示了ChatHaruhi 2.0的基本使用方式

本脚本使用LangChain based的openAI的turbo3.5接口作为语言模型

This notebook demonstrates the basic usage of ChatHaruhi 2.0.

This script using the LangChain based openAI turbo 3.5 API as the language model.

代码
文本

首先使用pip install导入

代码
文本

Install and configure environment

代码
文本

we have removed chromadb recently

代码
文本
[ ]
!pip -q install openai tiktoken langchain datasets
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 225.4/225.4 kB 2.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.0/2.0 MB 18.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 794.4/794.4 kB 32.6 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 521.2/521.2 kB 30.7 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 75.9/75.9 kB 7.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 1.5/1.5 MB 47.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 191.5/191.5 kB 18.8 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 46.3/46.3 kB 4.3 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 115.3/115.3 kB 10.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 134.8/134.8 kB 11.9 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 49.4/49.4 kB 3.5 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 76.9/76.9 kB 7.1 MB/s eta 0:00:00
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 58.3/58.3 kB 5.3 MB/s eta 0:00:00
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
llmx 0.0.15a0 requires cohere, which is not installed.
tensorflow-probability 0.22.0 requires typing-extensions<4.6.0, but you have typing-extensions 4.9.0 which is incompatible.
代码
文本
[ ]
%cd /content
!rm -rf /content/Haruhi-2-Dev
!git clone https://github.com/LC1332/Haruhi-2-Dev
%cd /content/Haruhi-2-Dev
/content
Cloning into 'Haruhi-2-Dev'...
remote: Enumerating objects: 1008, done.
remote: Counting objects: 100% (315/315), done.
remote: Compressing objects: 100% (113/113), done.
remote: Total 1008 (delta 219), reused 290 (delta 202), pack-reused 693
Receiving objects: 100% (1008/1008), 106.31 MiB | 52.95 MiB/s, done.
Resolving deltas: 100% (549/549), done.
/content/Haruhi-2-Dev
代码
文本

You may also using pip install chatharuhi to intall the library

but change

import ChatHaruhi from ChatHaruhi

into

import ChatHaruhi from chatharuhi
代码
文本
[ ]
import os
import openai
key = "sk-Wafs" # add you key here
key_bytes = key.encode()
os.environ["OPENAI_API_KEY"] = key_bytes.decode('utf-8')
代码
文本

now role_from_hf is the most suggested way to load the Character

代码
文本
[ ]
from ChatHaruhi import ChatHaruhi

chatbot = ChatHaruhi( role_from_hf = "silk-road/ChatHaruhi-RolePlaying/haruhi",\
llm = 'openai' ,\
verbose = True)
代码
文本
[ ]
response = chatbot.chat(role='阿虚', text = 'Haruhi, 你好啊')
print(response)
正在下载Luotuo-Bert
A new version of the following files was downloaded from https://huggingface.co/silk-road/luotuo-bert-medium:
- models.py
. Make sure to double-check they do not contain any added malicious code. To avoid downloading new versions of the code file, you can pin a revision.
Luotuo-Bert下载完毕
春日:「哦,你是来向我请教问题的吗?还是有什么事情需要我帮忙的?」
代码
文本

the role was saved at

https://huggingface.co/datasets/silk-road/ChatHaruhi-RolePlaying

this hugging face repo saved 32 characters, you may find other chacaters in

https://github.com/LC1332/Chat-Haruhi-Suzumiya/tree/main/characters/novel_collecting

if you download a local jsonl file you may use the interface like

chatbot = ChatHaruhi( role_from_jsonl = "Your Local File",\
                      llm = 'openai' ,\
                      verbose = True)
代码
文本

For the character has not been build in ChatHaruhi-jsonl format

you need to prepare

  • all story files into a folder
  • the system prompt

and use this interface

from chatharuhi import ChatHaruhi

text_folder = '/content/Haruhi-2-Dev/data/characters/haruhi/texts'

system_prompt = '/content/Haruhi-2-Dev/data/characters/haruhi/system_prompt.txt'

chatbot = ChatHaruhi( system_prompt = system_prompt,\
                      llm = 'debug' ,\
                      story_text_folder = text_folder)

chatbot.chat(role='阿虚', text = 'Haruhi, 你好啊')

see example here https://github.com/LC1332/Haruhi-2-Dev/blob/main/notebook/test_PrintLLM.ipynb

代码
文本

Embedding Support

Currently we support

"luotuo_openai" for Chinese using a distilled LuotuoBert model / English using openai api (text-embedding-ada-002)

"bge_en" using bge_small_en_v1.5

"bge_zh" using bge_small_zh_v1.5

Now, we need corresponding embeddings in the jsonl library.

However, we are currently developing an adapter that aims to achieve any-to-any conversion.

代码
文本
[ ]

代码
文本
Chat
Chat
点个赞吧
推荐阅读
公开
reform_main
Chat
Chat
xuxh@dp.tech
更新于 2024-08-27
公开
ChatHaruhi_x_Qwen1_8B
Chat
Chat
xuxh@dp.tech
更新于 2024-08-27