新建
recognize_anything_demo
xuxh@dp.tech
推荐镜像 :Basic Image:bohrium-notebook:2023-04-07
推荐机型 :c2_m4_cpu
赞
目录
🏷 Recognize Anything: A Strong Image Tagging Model & Tag2Text: Guiding Vision-Language Model via Image Tagging
©️ Copyright 2023 @ Authors
作者:Xinyu Huang
共享协议:本作品采用知识共享署名-非商业性使用-相同方式共享 4.0 国际许可协议进行许可。
快速开始:点击上方的 开始连接 按钮,选择 bohrium-notebook:2023-04-07镜像及任意CPU节点配置,稍等片刻即可运行。
Official PyTorch Implementation of the Recognize Anything Model (RAM) and the Tag2Text Model.
- RAM is an image tagging model, which can recognize any common category with high accuracy.
- Tag2Text is a vision-language model guided by tagging, which can support caption, retrieval and tagging.
代码
文本
[1]
#@title Import dependencies
import ipywidgets as widgets
from IPython.display import clear_output, display, Image
import os
代码
文本
[2]
#@title Clone the repository
!git clone https://github.com/xinyu1205/recognize-anything.git
%cd recognize-anything
Cloning into 'Recognize_Anything-Tag2Text'... remote: Enumerating objects: 398, done. remote: Counting objects: 100% (234/234), done. remote: Compressing objects: 100% (120/120), done. remote: Total 398 (delta 145), reused 177 (delta 113), pack-reused 164 Receiving objects: 100% (398/398), 9.24 MiB | 17.42 MiB/s, done. Resolving deltas: 100% (203/203), done. /content/Recognize_Anything-Tag2Text
代码
文本
[3]
#@title Install dependencies
!pip install timm transformers fairscale pycocoevalcap
clear_output()
代码
文本
[4]
# Download checkpoints
model_widget = widgets.Dropdown(
options=["RAM", "Tag2Text"],
value="RAM",
description="Select mdoel:"
)
display(model_widget)
代码
文本
[5]
model = model_widget.value
代码
文本
[6]
def download_checkpoints(model):
print('You selected', model)
if not os.path.exists('pretrained'):
os.makedirs('pretrained')
if model == "RAM":
ram_weights_path = 'pretrained/ram_swin_large_14m.pth'
if not os.path.exists(ram_weights_path):
!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/ram_swin_large_14m.pth -O pretrained/ram_swin_large_14m.pth
else:
print("RAM weights already downloaded!")
else:
tag2text_weights_path = 'pretrained/tag2text_swin_14m.pth'
if not os.path.exists(tag2text_weights_path):
!wget https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth -O pretrained/tag2text_swin_14m.pth
else:
print("Tag2Text weights already downloaded!")
download_checkpoints(model)
print(model, 'weights are downloaded!')
You selected Tag2Text --2023-06-18 14:45:59-- https://huggingface.co/spaces/xinyu1205/Recognize_Anything-Tag2Text/resolve/main/tag2text_swin_14m.pth Resolving huggingface.co (huggingface.co)... 18.155.68.116, 18.155.68.38, 18.155.68.44, ... Connecting to huggingface.co (huggingface.co)|18.155.68.116|:443... connected. HTTP request sent, awaiting response... 302 Found Location: https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX [following] --2023-06-18 14:45:59-- https://cdn-lfs.huggingface.co/repos/e6/78/e678f8565485a3f321b1180e4c7e1e18a89a9295028358eedffb98981b37e11a/4ce96f0ce98f940a6680d567f66a38ccc9ca8c4e638e5f5c5c2e881a0e3502ac?response-content-disposition=attachment%3B+filename*%3DUTF-8%27%27tag2text_swin_14m.pth%3B+filename%3D%22tag2text_swin_14m.pth%22%3B&Expires=1687358760&Policy=eyJTdGF0ZW1lbnQiOlt7IlJlc291cmNlIjoiaHR0cHM6Ly9jZG4tbGZzLmh1Z2dpbmdmYWNlLmNvL3JlcG9zL2U2Lzc4L2U2NzhmODU2NTQ4NWEzZjMyMWIxMTgwZTRjN2UxZTE4YTg5YTkyOTUwMjgzNThlZWRmZmI5ODk4MWIzN2UxMWEvNGNlOTZmMGNlOThmOTQwYTY2ODBkNTY3ZjY2YTM4Y2NjOWNhOGM0ZTYzOGU1ZjVjNWMyZTg4MWEwZTM1MDJhYz9yZXNwb25zZS1jb250ZW50LWRpc3Bvc2l0aW9uPSoiLCJDb25kaXRpb24iOnsiRGF0ZUxlc3NUaGFuIjp7IkFXUzpFcG9jaFRpbWUiOjE2ODczNTg3NjB9fX1dfQ__&Signature=WTcI8lGdNwq6GKgWFTueFPmYJO%7Ejf4WKUyi-CrmshdAsQY9mhZrKjZEeSgBpV7MGftqK1501AbLi0DVKzanVwOqWLoT57q3kG8tNWIfgvoZz-U3JMSFL%7EdJbwLkBsWtm4kx0-kfRkLVwdmQGm7Ri6DL68dIGD8nDkWAaRnmZd-Te2eUlWyFCc%7EQyX%7EOE6-p8dlKij9Vu12sCoxGyTKg978jxPj4kp4kwy6ae2VlSvrov6zcd4iqd7AMu9FxQg2BFdeLO0CQ9djXGr-HCWVZJzfXbHw2aXInyDDLx37PmUnQbElJvpLDbwmQ-cpiWrlem2c6nKjpJi0kJ7TbAIkOkkw__&Key-Pair-Id=KVTP0A1DKRTAX Resolving cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)... 18.155.68.73, 18.155.68.94, 18.155.68.128, ... Connecting to cdn-lfs.huggingface.co (cdn-lfs.huggingface.co)|18.155.68.73|:443... connected. HTTP request sent, awaiting response... 200 OK Length: 4478705095 (4.2G) [binary/octet-stream] Saving to: ‘pretrained/tag2text_swin_14m.pth’ pretrained/tag2text 100%[===================>] 4.17G 267MB/s in 21s 2023-06-18 14:46:21 (199 MB/s) - ‘pretrained/tag2text_swin_14m.pth’ saved [4478705095/4478705095] Tag2Text weights are downloaded!
代码
文本
[7]
images_dir_widget = widgets.Text(value="images/demo", description="Images dir:")
display(images_dir_widget)
代码
文本
[8]
images_dir = images_dir_widget.value
代码
文本
[9]
image_files = [f"{images_dir}/{file}" for file in sorted(os.listdir(images_dir)) if file.lower().endswith(('.jpg', '.jpeg', '.png'))]
image_path = image_files[0]
# Create dropdown widget
image_dropdown = widgets.Dropdown(
options=image_files,
description='Select Image:',
)
# Create image preview widget
image_preview = widgets.Output()
# Define function to update image preview
def update_preview(change):
global image_path
image_path = change.new
with image_preview:
image_preview.clear_output()
display(Image(filename=image_path, width=400))
# Set the initial image preview
with image_preview:
display(Image(filename=image_files[0], width=400))
# Attach the update function to the dropdown
image_dropdown.observe(update_preview, names='value')
# Display the widgets
display(image_dropdown, image_preview)
代码
文本
[10]
# Define the task and run inference
task_widget = widgets.Dropdown(
options=["one image", "multiple images"],
value="one image",
description="Task:"
)
display(task_widget)
代码
文本
[11]
task = task_widget.value
代码
文本
[12]
print('You selected', model)
print('You selected', task)
def run_inference(model, task):
if model == "Tag2Text" and task == "one image":
!python inference_tag2text.py --image {image_path} \
--pretrained pretrained/tag2text_swin_14m.pth
elif model == "Tag2Text" and task == "multiple images":
!python batch_inference.py --image-dir {images_dir} \
--pretrained pretrained/tag2text_swin_14m.pth --model-type tag2text
elif model == "RAM" and task == "one image":
!python inference_ram.py --image {image_path} \
--pretrained pretrained/ram_swin_large_14m.pth
elif model == "RAM" and task == "multiple images":
!python batch_inference.py --image-dir {images_dir} \
--pretrained pretrained/ram_swin_large_14m.pth --model-type ram
else:
print('Invalid model or task')
run_inference(model, task)
You selected Tag2Text You selected multiple images Downloading (…)solve/main/vocab.txt: 100% 232k/232k [00:00<00:00, 549kB/s] Downloading (…)okenizer_config.json: 100% 28.0/28.0 [00:00<00:00, 175kB/s] Downloading (…)lve/main/config.json: 100% 570/570 [00:00<00:00, 2.65MB/s] /encoder/layer/0/crossattention/self/query is tied /encoder/layer/0/crossattention/self/key is tied /encoder/layer/0/crossattention/self/value is tied /encoder/layer/0/crossattention/output/dense is tied /encoder/layer/0/crossattention/output/LayerNorm is tied /encoder/layer/0/intermediate/dense is tied /encoder/layer/0/output/dense is tied /encoder/layer/0/output/LayerNorm is tied /encoder/layer/1/crossattention/self/query is tied /encoder/layer/1/crossattention/self/key is tied /encoder/layer/1/crossattention/self/value is tied /encoder/layer/1/crossattention/output/dense is tied /encoder/layer/1/crossattention/output/LayerNorm is tied /encoder/layer/1/intermediate/dense is tied /encoder/layer/1/output/dense is tied /encoder/layer/1/output/LayerNorm is tied -------------- pretrained/tag2text_swin_14m.pth -------------- load checkpoint from pretrained/tag2text_swin_14m.pth vit: swin_b 2023-06-18 14:48:48.852442: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT {'filepath': 'images/demo/demo2.jpg', 'model_identified_tags': 'city | christmas tree | christmas market | market | town | people | snow | old', 'user_specified_tags': None, 'image_caption': 'christmas market in the old town of a city with people'} {'filepath': 'images/demo/demo3.jpg', 'model_identified_tags': 'trail | flower | path | mountain | road | hill | lake | yellow', 'user_specified_tags': None, 'image_caption': 'a winding road with yellow flowers on the side and a lake and mountains in the distance'} {'filepath': 'images/demo/demo1.jpg', 'model_identified_tags': 'room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white', 'user_specified_tags': None, 'image_caption': 'a white dog sitting on a green couch in a living room with a small table and plants'} {'filepath': 'images/demo/demo4.jpg', 'model_identified_tags': 'bicycle | bike | passenger train | train | track | person | man | ride | red', 'user_specified_tags': None, 'image_caption': 'a man riding a bike next to a red train on a track'} Processed 4 images in 44.78 seconds. { "status": 0, "message": "ok", "data": [ { "filepath": "images/demo/demo2.jpg", "model_identified_tags": "city | christmas tree | christmas market | market | town | people | snow | old", "user_specified_tags": null, "image_caption": "christmas market in the old town of a city with people" }, { "filepath": "images/demo/demo3.jpg", "model_identified_tags": "trail | flower | path | mountain | road | hill | lake | yellow", "user_specified_tags": null, "image_caption": "a winding road with yellow flowers on the side and a lake and mountains in the distance" }, { "filepath": "images/demo/demo1.jpg", "model_identified_tags": "room | table | couch | coffee table | home | living room | sofa | blanket | dog | plant | sit on | green | small | white", "user_specified_tags": null, "image_caption": "a white dog sitting on a green couch in a living room with a small table and plants" }, { "filepath": "images/demo/demo4.jpg", "model_identified_tags": "bicycle | bike | passenger train | train | track | person | man | ride | red", "user_specified_tags": null, "image_caption": "a man riding a bike next to a red train on a track" } ] }
代码
文本
[12]
代码
文本
点个赞吧
推荐阅读
公开
DeePTB 快速上手指南 | 训练 Silicon 的紧束缚模型周寅张皓
发布于 2023-07-18
9 赞18 转存文件
公开
testbulindog
发布于 2023-09-20
3 转存文件