Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
Fastai Hub
Fast AI
Hugging Face
Fast AI Hugging Face
dingzh@dp.tech
发布于 2023-06-12
赞 1
AI4SCUP-CNS-BBB(v1)

fastai_hf_colab.png

代码
文本

最后一次修改: dingzh@dp.tech

描述: 本教程主要参考 hugging face notebook,可在 Bohrium Notebook 上直接运行。你可以点击界面上方蓝色按钮 开始连接,选择 bohrium-notebook:2023-04-07 镜像及任意一款GPU节点配置,稍等片刻即可运行。 如您遇到任何问题,请联系 bohrium@dp.tech

共享协议: 本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。

代码
文本

Few have done as much as the fast.ai ecosystem to make Deep Learning accessible. Our mission at Hugging Face is to democratize good Machine Learning. Let's make exclusivity in access to Machine Learning, including pre-trained models, a thing of the past and let's push this amazing field even further.

fastai is an open-source Deep Learning library that leverages PyTorch and Python to provide high-level components to train fast and accurate neural networks with state-of-the-art outputs on text, vision, and tabular data. However, fast.ai, the company, is more than just a library; it has grown into a thriving ecosystem of open source contributors and people learning about neural networks. As some examples, check out their book and courses. Join the fast.ai Discord and forums. It is a guarantee that you will learn by being part of their community!

Because of all this, and more (the writer of this post started his journey thanks to the fast.ai course), we are proud to announce that fastai practitioners can now share and upload models to Hugging Face Hub with a single line of Python.

👉 In this post, we will introduce the integration between fastai and the Hub.

We want to thank the fast.ai community, notably Jeremy Howard, Wayde Gilliam, and Zach Mueller for their feedback 🤗. This blog is heavily inspired by the Hugging Face Hub section in the fastai docs.

代码
文本

Why share to the Hub?

The Hub is a central platform where anyone can share and explore models, datasets, and ML demos. It has the most extensive collection of Open Source models, datasets, and demos.

Sharing on the Hub amplifies the impact of your fastai models by making them available for others to download and explore. You can also use transfer learning with fastai models; load someone else's model as the basis for your task.

Anyone can access all the fastai models in the Hub by filtering the hf.co/models webpage by the fastai library, as in the image below.

代码
文本

Screen Shot 2022-04-26 at 23.28.02.png

代码
文本

In addition to free model hosting and exposure to the broader community, the Hub has built-in version control based on git (git-lfs, for large files) and model cards for discoverability and reproducibility. For more information on navigating the Hub, see this introduction.

代码
文本

Joining Hugging Face and installation

To share models in the Hub, you will need to have a user. Create it on the Hugging Face website.

The huggingface_hub library is a lightweight Python client with utility functions to interact with the Hugging Face Hub. To push fastai models to the hub, you need to have some libraries pre-installed (fatai>=2.4, fastcore>=1.3.27 and toml). You can install them automatically by specifying ["fastai"] when installing huggingface_hub, and your environment is good to go:

代码
文本
[ ]
pip install huggingface_hub["fastai"]
代码
文本

📚 Creating a fastai Learner

Here we train the first model in the fastbook to identify cats 🐱. We fully recommended reading the entire fastbook.

代码
文本
[ ]
import fastai;
print(f" We are using fastai version {fastai.__version__}")
代码
文本
[ ]
# Training of 6 lines in chapter 1 of the fastbook.
from fastai.vision.all import *
path = untar_data(URLs.PETS)/'images'

def is_cat(x): return x[0].isupper()
dls = ImageDataLoaders.from_name_func(
path, get_image_files(path), valid_pct=0.2, seed=42,
label_func=is_cat, item_tfms=Resize(224))

learn = vision_learner(dls, resnet34, metrics=error_rate)
learn.fine_tune(1)
代码
文本

Sharing a Learner to the Hub

A Learner is a fastai object that bundles a model, data loaders, and a loss function. We will use the words Learner and Model interchangeably throughout this post.

First, log in to the Hugging Face Hub. You will need to create a write token in your Account Settings. Then there are three options to log in:

  1. Type huggingface-cli login in your terminal and enter your token.

  2. If in a python notebook, you can use notebook_login.

代码
文本
[ ]
from huggingface_hub import notebook_login
notebook_login()
代码
文本
  1. Use the token argument of the push_to_hub_fastai function.

You can input push_to_hub_fastai with the Learner you want to upload and the repository id for the Hub in the format of "namespace/repo_name". The namespace can be an individual account or an organization you have write access to (for example, 'fastai/stanza-de'). For more details, refer to the Hub Client documentation.

代码
文本
[ ]
from huggingface_hub import push_to_hub_fastai

# repo_id = "YOUR_USERNAME/YOUR_LEARNER_NAME"
repo_id = "dingzhaohan/identify-my-cat"

push_to_hub_fastai(learner=learn, repo_id=repo_id)
代码
文本

The Learner is now in the Hub in the repo named espejelomar/identify-my-cat. An automatic model card is created with some links and next steps. When uploading a fastai Learner (or any other model) to the Hub, it is helpful to edit its model card (image below) so that others better understand your work (refer to the Hugging Face documentation).

代码
文本

Screen Shot 2022-05-05 at 15.05.31.png

代码
文本

if you want to learn more about push_to_hub_fastai just run help(push_to_hub_fastai) in your Colab notebook or go to the Hub Client Documentation. There are some cool arguments you might be interested in 👀. Remember, your model is a Git repository with all the advantages that this entails: version control, commits, branches...

代码
文本

Loading a Learner from the Hugging Face Hub

Loading a model from the Hub is even simpler. We will load our Learner, "espejelomar/identify-my-cat", and test it with a cat image (🦮?). This code is adapted from the first chapter of the fastbook.

First, upload an image of a cat (or possibly a dog?) using ipywidgets.

代码
文本
[ ]
%%capture
!pip install ipywidgets
代码
文本
[ ]
from ipywidgets import widgets

uploader = widgets.FileUpload()
uploader
代码
文本
[ ]
## This is your image:
img = PILImage.create(uploader.data[0])
img.to_thumb(100)
代码
文本

Now let's load the Learner we just shared in the Hub and test it. This time we call it learner instead of learn.

代码
文本
[ ]
from huggingface_hub import from_pretrained_fastai

# repo_id = "YOUR_USERNAME/YOUR_LEARNER_NAME"
repo_id = "espejelomar/identify-my-cat"

learner = from_pretrained_fastai(repo_id)
代码
文本

It works 👇!

代码
文本
[ ]
_,_,probs = learner.predict(img)
print(f"Probability it is a cat: {100*probs[1].item():.2f}%")
代码
文本

The Hub Client documentation includes addtional details on from_pretrained_fastai.

代码
文本

Blurr to mix fastai and Hugging Face Transformers (and share them)!

代码
文本

[Blurr is] a library designed for fastai developers who want to train and deploy Hugging Face transformers - Blurr Docs.

We will:

  1. Train a blurr Learner with the high-level Blurr API. It will load the distilbert-base-uncased model from the Hugging Face Hub and prepare a sequence classification model.
  2. Share it to the Hub with the namespace fastai/blurr_IMDB_distilbert_classification using push_to_hub_fastai.
  3. Load it with from_pretrained_fastai and try it with learner_blurr.predict().

Collaboration and open-source are fantastic!

代码
文本

First, install blurr and train the Learner.

代码
文本
[ ]
%%capture
!git clone https://github.com/ohmeow/blurr.git
%cd blurr
!pip install -e ".[dev]"
代码
文本
[ ]
import os
os.environ['HTTP_PROXY'] = 'http://ga.dp.tech:8118'
os.environ['HTTPS_PROXY'] = 'http://ga.dp.tech:8118'
代码
文本
[ ]
import torch
import transformers
from fastai.text.all import *

from blurr.text.data.all import *
from blurr.text.modeling.all import *

path = untar_data(URLs.IMDB_SAMPLE)
model_path = Path("models")
imdb_df = pd.read_csv(path / "texts.csv")

learn_blurr = BlearnerForSequenceClassification.from_data(imdb_df, "distilbert-base-uncased", dl_kwargs={"bs": 4})
learn_blurr.fit_one_cycle(1, lr_max=1e-3)
代码
文本

Use push_to_hub_fastai to share with the Hub.

代码
文本
[ ]
from huggingface_hub import push_to_hub_fastai

# repo_id = "YOUR_USERNAME/YOUR_LEARNER_NAME"
repo_id = "fastai/blurr_IMDB_distilbert_classification"

push_to_hub_fastai(learn_blurr, repo_id)
代码
文本

Use from_pretrained_fastai to load a blurr model from the Hub.

代码
文本
[ ]
from huggingface_hub import from_pretrained_fastai

# repo_id = "YOUR_USERNAME/YOUR_LEARNER_NAME"
repo_id = "fastai/blurr_IMDB_distilbert_classification"

learner_blurr = from_pretrained_fastai(repo_id)
代码
文本

Try it with a couple sentences and review their sentiment (negative or positive) with learner_blurr.predict().

代码
文本
[ ]
sentences = ["This integration is amazing!",
"I hate this was not available before."]

probs = learner_blurr.predict(sentences)

print(f"Probability that sentence '{sentences[0]}' is negative is: {100*probs[0]['probs'][0]:.2f}%")
print(f"Probability that sentence '{sentences[1]}' is negative is: {100*probs[1]['probs'][0]:.2f}%")
代码
文本

What's next?

Take the fast.ai course (a new version is coming soon), follow Jeremy Howard and fast.ai on Twitter for updates, and start sharing your fastai models on the Hub 🤗. Or load one of the models that are already in the Hub.

📧 Feel free to contact us via the Hugging Face Discord and share if you have an idea for a project. We would love to hear your feedback 💖.

代码
文本

Would you like to integrate your library to the Hub?

This integration is made possible by the huggingface_hub library. If you want to add your library to the Hub, we have a guide for you! Or simply tag someone from the Hugging Face team.

代码
文本

A shout out to the Hugging Face team for all the work on this integration, in particular @osanseviero 🦙.

Thank you fastlearners and hugging learners 🤗.

代码
文本
Fast AI
Hugging Face
Fast AI Hugging Face
已赞1
推荐阅读
公开
线性回归进行分类和回归任务
中文机器学习及其在化学中的应用
中文机器学习及其在化学中的应用
黄文强
发布于 2023-11-12
公开
王治鲁-第2天-Python
《计算材料学》组队共读
《计算材料学》组队共读
bohr404f76
发布于 2023-12-11
1 赞