recognize_anything_demo | Bohrium-玻尔科研空间站

空间站广场

论文

Notebooks

比赛

课程

Apps

我的主页

我的Notebooks

我的论文库

我的足迹

我的工作空间

任务

节点

文件

数据集

镜像

项目

数据库

公开

recognize_anything_demo

RAM

图像识别

RAM图像识别

xuxh@dp.tech

更新于 2024-07-31

推荐镜像 :Basic Image:bohrium-notebook:2023-04-07

推荐机型 :c2_m4_cpu

🏷 Recognize Anything: A Strong Image Tagging Model & Tag2Text: Guiding Vision-Language Model via Image Tagging

©️ Copyright 2023 @ Authors
作者：Xinyu Huang
共享协议：本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始：点击上方的 开始连接 按钮，选择 bohrium-notebook:2023-04-07镜像及任意CPU节点配置，稍等片刻即可运行。

Official PyTorch Implementation of the Recognize Anything Model (RAM) and the Tag2Text Model.

RAM is an image tagging model, which can recognize any common category with high accuracy.

Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval and tagging.

代码

文本

[1]

#@title Import dependencies

import ipywidgets as widgets

from IPython.display import clear_output, display, Image

import os

代码

文本

[2]

#@title Clone the repository

!git clone https://github.com/xinyu1205/recognize-anything.git

%cd recognize-anything

Cloning into 'Recognize_Anything-Tag2Text'...
remote: Enumerating objects: 398, done.
remote: Counting objects: 100% (234/234), done.
remote: Compressing objects: 100% (120/120), done.
remote: Total 398 (delta 145), reused 177 (delta 113), pack-reused 164
Receiving objects: 100% (398/398), 9.24 MiB | 17.42 MiB/s, done.
Resolving deltas: 100% (203/203), done.
/content/Recognize_Anything-Tag2Text

代码

文本

[3]

#@title Install dependencies

!pip install timm transformers fairscale pycocoevalcap

clear_output()

代码

文本

[4]

# Download checkpoints

model_widget = widgets.Dropdown(

options=["RAM", "Tag2Text"],

value="RAM",

description="Select mdoel:"

)

display(model_widget)

代码

文本

[5]

model = model_widget.value

代码

文本

[6]

def download_checkpoints(model):

print('You selected', model)

if not os.path.exists('pretrained'):

os.makedirs('pretrained')

if model == "RAM":

ram_weights_path = 'pretrained/ram_swin_large_14m.pth'

if not os.path.exists(ram_weights_path):

!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/ram_swin_large_14m.pth -O pretrained/ram_swin_large_14m.pth

else:

print("RAM weights already downloaded!")

else:

tag2text_weights_path = 'pretrained/tag2text_swin_14m.pth'

if not os.path.exists(tag2text_weights_path):

!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth -O pretrained/tag2text_swin_14m.pth

else:

print("Tag2Text weights already downloaded!")

download_checkpoints(model)

print(model, 'weights are downloaded!')

You selected Tag2Text
--2023-06-18 14:45:59--  https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth
Resolving huggingface.co (huggingface.co)... 18.155.68.116, 18.155.68.38, 18.155.68.44, ...
Connecting to huggingface.co (huggingface.co)|18.155.68.116|:443... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX [following]
--2023-06-18 14:45:59--  https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX
Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.155.68.73, 18.155.68.94, 18.155.68.128, ...
Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.155.68.73|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 4478705095 (4.2G) [binary/octet-stream]
Saving to: ‘pretrained/tag2text_swin_14m.pth’

pretrained/tag2text 100%[===================>]   4.17G   267MB/s    in 21s     

2023-06-18 14:46:21 (199 MB/s) - ‘pretrained/tag2text_swin_14m.pth’ saved [4478705095/4478705095]

Tag2Text weights are downloaded!

代码

文本

[7]

images_dir_widget = widgets.Text(value="images/demo", description="Images dir:")

display(images_dir_widget)

代码

文本

[8]

images_dir = images_dir_widget.value

代码

文本

[9]

image_files = [f"{images_dir}/{file}" for file in sorted(os.listdir(images_dir)) if file.lower().endswith(('.jpg', '.jpeg', '.png'))]

image_path = image_files[0]

# Create dropdown widget

image_dropdown = widgets.Dropdown(

options=image_files,

description='Select Image:',

)

# Create image preview widget

image_preview = widgets.Output()

# Define function to update image preview

def update_preview(change):

global image_path

image_path = change.new

with image_preview:

image_preview.clear_output()

display(Image(filename=image_path, width=400))

# Set the initial image preview

with image_preview:

display(Image(filename=image_files[0], width=400))

# Attach the update function to the dropdown

image_dropdown.observe(update_preview, names='value')

# Display the widgets

display(image_dropdown, image_preview)

代码

文本

[10]

# Define the task and run inference

task_widget = widgets.Dropdown(

options=["one image", "multiple images"],

value="one image",

description="Task:"

)

display(task_widget)

代码

文本

[11]

task = task_widget.value

代码

文本

[12]

print('You selected', model)

print('You selected', task)

def run_inference(model, task):

if model == "Tag2Text" and task == "one image":

!python inference_tag2text.py --image {image_path} \

--pretrained pretrained/tag2text_swin_14m.pth

elif model == "Tag2Text" and task == "multiple images":

!python batch_inference.py --image-dir {images_dir} \

--pretrained pretrained/tag2text_swin_14m.pth --model-type tag2text

elif model == "RAM" and task == "one image":

!python inference_ram.py --image {image_path} \

--pretrained pretrained/ram_swin_large_14m.pth

elif model == "RAM" and task == "multiple images":

!python batch_inference.py --image-dir {images_dir} \

--pretrained pretrained/ram_swin_large_14m.pth --model-type ram

else:

print('Invalid model or task')

run_inference(model, task)

You selected Tag2Text
You selected multiple images
Downloading (…)solve/main/vocab.txt: 100% 232k/232k [00:00<00:00, 549kB/s]
Downloading (…)okenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 175kB/s]
Downloading (…)lve/main/config.json: 100% 570/570 [00:00<00:00, 2.65MB/s]
/encoder/layer/0/crossattention/self/query is tied
/encoder/layer/0/crossattention/self/key is tied
/encoder/layer/0/crossattention/self/value is tied
/encoder/layer/0/crossattention/output/dense is tied
/encoder/layer/0/crossattention/output/LayerNorm is tied
/encoder/layer/0/intermediate/dense is tied
/encoder/layer/0/output/dense is tied
/encoder/layer/0/output/LayerNorm is tied
/encoder/layer/1/crossattention/self/query is tied
/encoder/layer/1/crossattention/self/key is tied
/encoder/layer/1/crossattention/self/value is tied
/encoder/layer/1/crossattention/output/dense is tied
/encoder/layer/1/crossattention/output/LayerNorm is tied
/encoder/layer/1/intermediate/dense is tied
/encoder/layer/1/output/dense is tied
/encoder/layer/1/output/LayerNorm is tied
--------------
pretrained/tag2text_swin_14m.pth
--------------
load checkpoint from pretrained/tag2text_swin_14m.pth
vit: swin_b
2023-06-18 14:48:48.852442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
{'filepath': 'images/demo/demo2.jpg', 'model_identified_tags': 'city | christmas tree | christmas market | market | town | people | snow | old', 'user_specified_tags': None, 'image_caption': 'christmas market in the old town of a city with people'}
{'filepath': 'images/demo/demo3.jpg', 'model_identified_tags': 'trail | flower | path | mountain | road | hill | lake | yellow', 'user_specified_tags': None, 'image_caption': 'a winding road with yellow flowers on the side and a lake and mountains in the distance'}
{'filepath': 'images/demo/demo1.jpg', 'model_identified_tags': 'room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white', 'user_specified_tags': None, 'image_caption': 'a white dog sitting on a green couch in a living room with a small table and plants'}
{'filepath': 'images/demo/demo4.jpg', 'model_identified_tags': 'bicycle | bike | passenger train | train | track | person | man | ride | red', 'user_specified_tags': None, 'image_caption': 'a man riding a bike next to a red train on a track'}
Processed 4 images in 44.78 seconds.
{
  "status": 0,
  "message": "ok",
  "data": [
    {
      "filepath": "images/demo/demo2.jpg",
      "model_identified_tags": "city | christmas tree | christmas market | market | town | people | snow | old",
      "user_specified_tags": null,
      "image_caption": "christmas market in the old town of a city with people"
    },
    {
      "filepath": "images/demo/demo3.jpg",
      "model_identified_tags": "trail | flower | path | mountain | road | hill | lake | yellow",
      "user_specified_tags": null,
      "image_caption": "a winding road with yellow flowers on the side and a lake and mountains in the distance"
    },
    {
      "filepath": "images/demo/demo1.jpg",
      "model_identified_tags": "room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white",
      "user_specified_tags": null,
      "image_caption": "a white dog sitting on a green couch in a living room with a small table and plants"
    },
    {
      "filepath": "images/demo/demo4.jpg",
      "model_identified_tags": "bicycle | bike | passenger train | train | track | person | man | ride | red",
      "user_specified_tags": null,
      "image_caption": "a man riding a bike next to a red train on a track"
    }
  ]
}

代码

文本

[12]

代码

文本

RAM

图像识别

RAM图像识别

点个赞吧