Bohrium
robot
新建

空间站广场

论文
Notebooks
比赛
课程
Apps
我的主页
我的Notebooks
我的论文库
我的足迹

我的工作空间

任务
节点
文件
数据集
镜像
项目
数据库
公开
recognize_anything_demo
RAM
图像识别
RAM图像识别
xuxh@dp.tech
更新于 2024-07-31
推荐镜像 :Basic Image:bohrium-notebook:2023-04-07
推荐机型 :c2_m4_cpu
🏷 Recognize Anything: A Strong Image Tagging Model & Tag2Text: Guiding Vision-Language Model via Image Tagging

🏷 Recognize Anything: A Strong Image Tagging Model & Tag2Text: Guiding Vision-Language Model via Image Tagging

Open In Bohrium

©️ Copyright 2023 @ Authors
作者:Xinyu Huang
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 bohrium-notebook:2023-04-07镜像及任意CPU节点配置,稍等片刻即可运行。


Official PyTorch Implementation of the Recognize Anything Model (RAM) and the Tag2Text Model.
  • RAM is an image tagging model, which can recognize any common category with high accuracy.
  • Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval and tagging.
代码
文本
[1]
#@title Import dependencies
import ipywidgets as widgets
from IPython.display import clear_output, display, Image
import os
代码
文本
[2]
#@title Clone the repository
!git clone https://github.com/xinyu1205/recognize-anything.git
%cd recognize-anything
Cloning into 'Recognize_Anything-Tag2Text'...
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (234/234), done.
remote: Compressing objects: 100% (120/120), done.
remote: Total 398 (delta 145), reused 177 (delta 113), pack-reused 164
Receiving objects: 100% (398/398), 9.24 MiB | 17.42 MiB/s, done.
Resolving deltas: 100% (203/203), done.
/content/Recognize_Anything-Tag2Text
代码
文本
[3]
#@title Install dependencies
!pip install timm transformers fairscale pycocoevalcap

clear_output()
代码
文本
[4]
# Download checkpoints
model_widget = widgets.Dropdown(
options=["RAM", "Tag2Text"],
value="RAM",
description="Select mdoel:"
)
display(model_widget)
代码
文本
[5]
model = model_widget.value
代码
文本
[6]
def download_checkpoints(model):
print('You selected', model)
if not os.path.exists('pretrained'):
os.makedirs('pretrained')

if model == "RAM":
ram_weights_path = 'pretrained/ram_swin_large_14m.pth'
if not os.path.exists(ram_weights_path):
!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/ram_swin_large_14m.pth -O pretrained/ram_swin_large_14m.pth
else:
print("RAM weights already downloaded!")
else:
tag2text_weights_path = 'pretrained/tag2text_swin_14m.pth'
if not os.path.exists(tag2text_weights_path):
!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth -O pretrained/tag2text_swin_14m.pth
else:
print("Tag2Text weights already downloaded!")

download_checkpoints(model)
print(model, 'weights are downloaded!')

You selected Tag2Text
--2023-06-18 14:45:59--  https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth
Resolving huggingface.co (huggingface.co)... 18.155.68.116, 18.155.68.38, 18.155.68.44, ...
Connecting to huggingface.co (huggingface.co)|18.155.68.116|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-06-18 14:45:59--  https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.155.68.73, 18.155.68.94, 18.155.68.128, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.155.68.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4478705095 (4.2G) [binary/octet-stream]
Saving to: ‘pretrained/tag2text_swin_14m.pth’

pretrained/tag2text 100%[===================>]   4.17G   267MB/s    in 21s     

2023-06-18 14:46:21 (199 MB/s) - ‘pretrained/tag2text_swin_14m.pth’ saved [4478705095/4478705095]

Tag2Text weights are downloaded!
代码
文本
[7]
images_dir_widget = widgets.Text(value="images/demo", description="Images dir:")
display(images_dir_widget)
代码
文本
[8]
images_dir = images_dir_widget.value
代码
文本
[9]
image_files = [f"{images_dir}/{file}" for file in sorted(os.listdir(images_dir)) if file.lower().endswith(('.jpg', '.jpeg', '.png'))]
image_path = image_files[0]

# Create dropdown widget
image_dropdown = widgets.Dropdown(
options=image_files,
description='Select Image:',
)

# Create image preview widget
image_preview = widgets.Output()

# Define function to update image preview
def update_preview(change):
global image_path
image_path = change.new
with image_preview:
image_preview.clear_output()
display(Image(filename=image_path, width=400))

# Set the initial image preview
with image_preview:
display(Image(filename=image_files[0], width=400))

# Attach the update function to the dropdown
image_dropdown.observe(update_preview, names='value')

# Display the widgets
display(image_dropdown, image_preview)
代码
文本
[10]
# Define the task and run inference
task_widget = widgets.Dropdown(
options=["one image", "multiple images"],
value="one image",
description="Task:"
)
display(task_widget)
代码
文本
[11]
task = task_widget.value
代码
文本
[12]
print('You selected', model)
print('You selected', task)

def run_inference(model, task):
if model == "Tag2Text" and task == "one image":
!python inference_tag2text.py --image {image_path} \
--pretrained pretrained/tag2text_swin_14m.pth
elif model == "Tag2Text" and task == "multiple images":
!python batch_inference.py --image-dir {images_dir} \
--pretrained pretrained/tag2text_swin_14m.pth --model-type tag2text
elif model == "RAM" and task == "one image":
!python inference_ram.py --image {image_path} \
--pretrained pretrained/ram_swin_large_14m.pth
elif model == "RAM" and task == "multiple images":
!python batch_inference.py --image-dir {images_dir} \
--pretrained pretrained/ram_swin_large_14m.pth --model-type ram
else:
print('Invalid model or task')

run_inference(model, task)
You selected Tag2Text
You selected multiple images
Downloading (…)solve/main/vocab.txt: 100% 232k/232k [00:00<00:00, 549kB/s]
Downloading (…)okenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 175kB/s]
Downloading (…)lve/main/config.json: 100% 570/570 [00:00<00:00, 2.65MB/s]
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
--------------
pretrained/tag2text_swin_14m.pth
--------------
load checkpoint from pretrained/tag2text_swin_14m.pth
vit: swin_b
2023-06-18 14:48:48.852442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
{'filepath': 'images/demo/demo2.jpg', 'model_identified_tags': 'city | christmas tree | christmas market | market | town | people | snow | old', 'user_specified_tags': None, 'image_caption': 'christmas market in the old town of a city with people'}
{'filepath': 'images/demo/demo3.jpg', 'model_identified_tags': 'trail | flower | path | mountain | road | hill | lake | yellow', 'user_specified_tags': None, 'image_caption': 'a winding road with yellow flowers on the side and a lake and mountains in the distance'}
{'filepath': 'images/demo/demo1.jpg', 'model_identified_tags': 'room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white', 'user_specified_tags': None, 'image_caption': 'a white dog sitting on a green couch in a living room with a small table and plants'}
{'filepath': 'images/demo/demo4.jpg', 'model_identified_tags': 'bicycle | bike | passenger train | train | track | person | man | ride | red', 'user_specified_tags': None, 'image_caption': 'a man riding a bike next to a red train on a track'}
Processed 4 images in 44.78 seconds.
{
  "status": 0,
  "message": "ok",
  "data": [
    {
      "filepath": "images/demo/demo2.jpg",
      "model_identified_tags": "city | christmas tree | christmas market | market | town | people | snow | old",
      "user_specified_tags": null,
      "image_caption": "christmas market in the old town of a city with people"
    },
    {
      "filepath": "images/demo/demo3.jpg",
      "model_identified_tags": "trail | flower | path | mountain | road | hill | lake | yellow",
      "user_specified_tags": null,
      "image_caption": "a winding road with yellow flowers on the side and a lake and mountains in the distance"
    },
    {
      "filepath": "images/demo/demo1.jpg",
      "model_identified_tags": "room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white",
      "user_specified_tags": null,
      "image_caption": "a white dog sitting on a green couch in a living room with a small table and plants"
    },
    {
      "filepath": "images/demo/demo4.jpg",
      "model_identified_tags": "bicycle | bike | passenger train | train | track | person | man | ride | red",
      "user_specified_tags": null,
      "image_caption": "a man riding a bike next to a red train on a track"
    }
  ]
}
代码
文本
[12]

代码
文本
RAM
图像识别
RAM图像识别
点个赞吧
推荐阅读
公开
DeePTB 快速上手指南 | 训练 Silicon 的紧束缚模型
DeePTBMachine LearningTutorialElectronic Structure
DeePTBMachine LearningTutorialElectronic Structure
周寅张皓
发布于 2023-07-18
9 赞18 转存文件
公开
test
Deep Learning
Deep Learning
bulindog
发布于 2023-09-20
3 转存文件