欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

使用Visual Genome API + python3使用及数据集详情

程序员文章站 2022-07-01 15:15:45
...

Visual Genome数据集

安装 API

pip install visual-genome

代码

注意,以**释中有2处含“代码问题”字样,需要手动修改安装的API的源码。

'''
使用visual_genome api获取数据集 版本1.1.1
参考https://github.com/ranjaykrishna/visual_genome_python_driver
参考2 https://visualgenome.org/api/v0/api_object_model.html
安装pip install visual-genome
注意,默认为pythn2版本的,而这里我们采用python3版本的,并对源码做了部分修改
'''
from visual_genome import api
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from matplotlib.patches import Rectangle

# get the list of all image ids in the Visual Genome dataset
ids = api.get_all_image_ids()
print(ids[0])
# >> 1

# There are 108249 images currently, if we want to just get the ids of images 2000 to 2010
#代码问题,此处python2和python3的差距,手动修改api.py中27 28行,即加入int
id = api.get_image_ids_in_range(start_index=2000,end_index=2010)
print(id)
# >>> [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

# Get image data, include url, width, height, COCO and Flickr ids
image = api.get_image_data(id=61512)
print(image)
# >>> id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024,
url: https://cs.stanford.edu/people/rak248/VG_100K/61512.jpg

# Get region descriptions for an image -- this is dense captions of an image
# Each region description is a textual description of a particular region in the image
# bbox format : top left width height
regions = api.get_region_descriptions_of_image(id=61512)
print(regions[0])
# >>> id: 1, x: 511, y: 241, width: 206,height: 320, phrase: A brown, sleek horse with a bridle, image: 61512

# Get Region Graph from Region
# Region Graphs are tiny scene graphs for a particular region of an image,
# containing objects, attributes and relationships.
# We will get the scene graph of an image and print out the objects, attributes and relationships
graph = api.get_region_graph_of_region(image_id=61512,region_id=1)
# Remember that region description is 'A brown, sleek horse with a bridle'
print(graph.objects)
# >>> [horse]
print(graph.attributes)
# >>> [3015675: horse is brown]
print(graph.relationships)
# >>> []
# The region graph has one object: horse and one attribute: brown to describe the horse. no relationships

# Get Scene Graph for an image
# Each scene graph has three components: objects, attributes, and relationships.
graph = api.get_scene_graph_of_image(id=61512)
# print the object, only the name and not the bbox
print(graph.objects)
# >>>  [horse, grass, horse, bridle, truck, sign, gate, truck, tire, trough, window, door, building, halter,
#        mane, mane,leaves, fence]
# print the attributes
print(graph.attributes)
# >>> [3015675: horse is brown, 3015676: horse is spotted, 3015677: horse is red, 3015678: horse is dark brown,
#       3015679: truck is red, 3015680: horse is brown, 3015681: truck is red, 3015682: sign is blue,
#       3015683: gate is red, 3015684: truck is white, 3015685: tire is blue, 3015686: gate is wooden,
#       3015687: horse is standing, 3015688: truck is red, 3420018: horse is brown, 3420019: horse is white,
#       3015690: building is tan, 3015691: halter is red, 3015692: horse is brown, 3015693: gate is wooden,
#       3015694: grass is grassy, 3015695: truck is red, 3015696: gate is orange, 3015697: halter is red,
#       3015698: tire is blue, 3015699: truck is white, 3015700: trough is white, 3420016: horse is brown,
#       3420017: horse is cream, 3015702: leaves is green, 3015703: grass is lush, 3015704: horse is enclosed,
#       3420022: horse is brown, 3420023: horse is white, 3015706: horse is chestnut, 3015707: gate is red,
#       3015708: leaves is green, 3015709: building is brick, 3015710: truck is large, 3015711: gate is red,
#       3015712: horse is chestnut colored, 3015713: fence is wooden]
# print the relationships
print(graph.relationships)
# >>> [3199950: horse stands on top of grass, 3199951: horse IN grass, 3199952: horse WEARING bridle,
#      3199953: trough for horse, 3199954: window next to door, 3199955: building has door,
#      3199956: horse nudging horse, 3199957: horse has mane, 3199958: horse has mane, 3199959: trough for horse]

# Get Question Answers for an image
# Each Question Answer object contains the id of the question-answer pair, the id of image,
#    the question and the answer string, as well as the list of question objects and answer
#    objects identified and canonicalized in the qa pair.
# 代码问题 手动修改visual_genome/utils.py为
#   qas.append(QA(info['id'], image_map[info['image']],
qas = api.get_QA_of_image(id=61512)
# First print out some core information of the QA
print(qas[1])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.
# Now let's print out the question objects of the QA
print(qas[1].q_objects)
# >>> []

# Get all Questions Answers in the dataset
# We can get all 1.7 million QAs in the Visual Genome dataset, if we don't want to get all the data,
#    we can also specify how many QAs we want the function to return using the parameter qtotal
qas = api.get_all_QAs(qtotal=10)
print(qas[0])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.

# Get one type of Questions Answers from the entire dataset
# We can choose one type of <what, who, why, when, how>
qas = api.get_QA_of_type(qtotal=10, qtype='why')
print(qas[0])
# >>> id: 133089, image: 1159910, question: Why is the man cosplaying?, answer: For an event.

# Visualizing some regions, refer to https://visualgenome.org/api/v0/api_beginners_tutorial.html
image = api.get_image_data(id=61512)
regions = api.get_region_descriptions_of_image(id=61512)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
def visualize_regions(image, regions):
    response = requests.get(image.url)
    img = Image.open(BytesIO(response.content))
    plt.imshow(img)
    ax = plt.gca()
    for region in regions:
        ax.add_patch(Rectangle((region.x, region.y),
                               region.width,
                               region.height,
                               fill=False,
                               edgecolor='red',
                               linewidth=3))
        ax.text(region.x, region.y, region.phrase, style='italic', bbox={'facecolor':'white', 'alpha':0.7, 'pad':10})
    fig = plt.gcf()
    plt.tick_params(labelbottom='off', labelleft='off')
    plt.show()
#visualize_regions(image, regions[:8])
#visualize_regions(image, regions) # plot all

可视化region图

使用Visual Genome API + python3使用及数据集详情

完整数据集格式

如果直接下载完整数据集到本地,并读取json文件分析标注格式,数据集格式可以汇总如下
使用Visual Genome API + python3使用及数据集详情

论文精炼

读完40多页的论文,提炼出主要信息如下
使用Visual Genome API + python3使用及数据集详情