使用Visual Genome API + python3使用及数据集详情
程序员文章站
2022-07-01 15:15:45
...
Visual Genome数据集
- Visual Genome 主页
- Visual Genome API
- Visual Genome Python Driver
-
Visual Genome 论文
注意,API多为python2的实现,这里在使用python3.8时做了个别源码的修改,请注意注释,有问题可以留言
安装 API
pip install visual-genome
代码
注意,以**释中有2处含“代码问题”字样,需要手动修改安装的API的源码。
'''
使用visual_genome api获取数据集 版本1.1.1
参考https://github.com/ranjaykrishna/visual_genome_python_driver
参考2 https://visualgenome.org/api/v0/api_object_model.html
安装pip install visual-genome
注意,默认为pythn2版本的,而这里我们采用python3版本的,并对源码做了部分修改
'''
from visual_genome import api
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from matplotlib.patches import Rectangle
# get the list of all image ids in the Visual Genome dataset
ids = api.get_all_image_ids()
print(ids[0])
# >> 1
# There are 108249 images currently, if we want to just get the ids of images 2000 to 2010
#代码问题,此处python2和python3的差距,手动修改api.py中27 28行,即加入int
id = api.get_image_ids_in_range(start_index=2000,end_index=2010)
print(id)
# >>> [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]
# Get image data, include url, width, height, COCO and Flickr ids
image = api.get_image_data(id=61512)
print(image)
# >>> id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024,
url: https://cs.stanford.edu/people/rak248/VG_100K/61512.jpg
# Get region descriptions for an image -- this is dense captions of an image
# Each region description is a textual description of a particular region in the image
# bbox format : top left width height
regions = api.get_region_descriptions_of_image(id=61512)
print(regions[0])
# >>> id: 1, x: 511, y: 241, width: 206,height: 320, phrase: A brown, sleek horse with a bridle, image: 61512
# Get Region Graph from Region
# Region Graphs are tiny scene graphs for a particular region of an image,
# containing objects, attributes and relationships.
# We will get the scene graph of an image and print out the objects, attributes and relationships
graph = api.get_region_graph_of_region(image_id=61512,region_id=1)
# Remember that region description is 'A brown, sleek horse with a bridle'
print(graph.objects)
# >>> [horse]
print(graph.attributes)
# >>> [3015675: horse is brown]
print(graph.relationships)
# >>> []
# The region graph has one object: horse and one attribute: brown to describe the horse. no relationships
# Get Scene Graph for an image
# Each scene graph has three components: objects, attributes, and relationships.
graph = api.get_scene_graph_of_image(id=61512)
# print the object, only the name and not the bbox
print(graph.objects)
# >>> [horse, grass, horse, bridle, truck, sign, gate, truck, tire, trough, window, door, building, halter,
# mane, mane,leaves, fence]
# print the attributes
print(graph.attributes)
# >>> [3015675: horse is brown, 3015676: horse is spotted, 3015677: horse is red, 3015678: horse is dark brown,
# 3015679: truck is red, 3015680: horse is brown, 3015681: truck is red, 3015682: sign is blue,
# 3015683: gate is red, 3015684: truck is white, 3015685: tire is blue, 3015686: gate is wooden,
# 3015687: horse is standing, 3015688: truck is red, 3420018: horse is brown, 3420019: horse is white,
# 3015690: building is tan, 3015691: halter is red, 3015692: horse is brown, 3015693: gate is wooden,
# 3015694: grass is grassy, 3015695: truck is red, 3015696: gate is orange, 3015697: halter is red,
# 3015698: tire is blue, 3015699: truck is white, 3015700: trough is white, 3420016: horse is brown,
# 3420017: horse is cream, 3015702: leaves is green, 3015703: grass is lush, 3015704: horse is enclosed,
# 3420022: horse is brown, 3420023: horse is white, 3015706: horse is chestnut, 3015707: gate is red,
# 3015708: leaves is green, 3015709: building is brick, 3015710: truck is large, 3015711: gate is red,
# 3015712: horse is chestnut colored, 3015713: fence is wooden]
# print the relationships
print(graph.relationships)
# >>> [3199950: horse stands on top of grass, 3199951: horse IN grass, 3199952: horse WEARING bridle,
# 3199953: trough for horse, 3199954: window next to door, 3199955: building has door,
# 3199956: horse nudging horse, 3199957: horse has mane, 3199958: horse has mane, 3199959: trough for horse]
# Get Question Answers for an image
# Each Question Answer object contains the id of the question-answer pair, the id of image,
# the question and the answer string, as well as the list of question objects and answer
# objects identified and canonicalized in the qa pair.
# 代码问题 手动修改visual_genome/utils.py为
# qas.append(QA(info['id'], image_map[info['image']],
qas = api.get_QA_of_image(id=61512)
# First print out some core information of the QA
print(qas[1])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.
# Now let's print out the question objects of the QA
print(qas[1].q_objects)
# >>> []
# Get all Questions Answers in the dataset
# We can get all 1.7 million QAs in the Visual Genome dataset, if we don't want to get all the data,
# we can also specify how many QAs we want the function to return using the parameter qtotal
qas = api.get_all_QAs(qtotal=10)
print(qas[0])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.
# Get one type of Questions Answers from the entire dataset
# We can choose one type of <what, who, why, when, how>
qas = api.get_QA_of_type(qtotal=10, qtype='why')
print(qas[0])
# >>> id: 133089, image: 1159910, question: Why is the man cosplaying?, answer: For an event.
# Visualizing some regions, refer to https://visualgenome.org/api/v0/api_beginners_tutorial.html
image = api.get_image_data(id=61512)
regions = api.get_region_descriptions_of_image(id=61512)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
def visualize_regions(image, regions):
response = requests.get(image.url)
img = Image.open(BytesIO(response.content))
plt.imshow(img)
ax = plt.gca()
for region in regions:
ax.add_patch(Rectangle((region.x, region.y),
region.width,
region.height,
fill=False,
edgecolor='red',
linewidth=3))
ax.text(region.x, region.y, region.phrase, style='italic', bbox={'facecolor':'white', 'alpha':0.7, 'pad':10})
fig = plt.gcf()
plt.tick_params(labelbottom='off', labelleft='off')
plt.show()
#visualize_regions(image, regions[:8])
#visualize_regions(image, regions) # plot all
可视化region图
完整数据集格式
如果直接下载完整数据集到本地,并读取json文件分析标注格式,数据集格式可以汇总如下
论文精炼
读完40多页的论文,提炼出主要信息如下