使用Visual Genome API + python3使用及数据集详情

程序员文章站 2022-07-01 15:15:45

...

Visual Genome数据集

Visual Genome 主页
Visual Genome API
Visual Genome Python Driver
Visual Genome 论文
注意，API多为python2的实现，这里在使用python3.8时做了个别源码的修改,请注意注释，有问题可以留言

安装 API

pip install visual-genome

代码

注意，以**释中有2处含“代码问题”字样，需要手动修改安装的API的源码。

'''
使用visual_genome api获取数据集 版本1.1.1
参考https://github.com/ranjaykrishna/visual_genome_python_driver
参考2 https://visualgenome.org/api/v0/api_object_model.html
安装pip install visual-genome
注意，默认为pythn2版本的，而这里我们采用python3版本的，并对源码做了部分修改
'''
from visual_genome import api
import matplotlib.pyplot as plt
import requests
from PIL import Image
from io import BytesIO
from matplotlib.patches import Rectangle

# get the list of all image ids in the Visual Genome dataset
ids = api.get_all_image_ids()
print(ids[0])
# >> 1

# There are 108249 images currently, if we want to just get the ids of images 2000 to 2010
#代码问题，此处python2和python3的差距，手动修改api.py中27 28行，即加入int
id = api.get_image_ids_in_range(start_index=2000,end_index=2010)
print(id)
# >>> [2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011]

# Get image data, include url, width, height, COCO and Flickr ids
image = api.get_image_data(id=61512)
print(image)
# >>> id: 61512, coco_id: 248774, flickr_id: 6273011878, width: 1024,
url: https://cs.stanford.edu/people/rak248/VG_100K/61512.jpg

# Get region descriptions for an image -- this is dense captions of an image
# Each region description is a textual description of a particular region in the image
# bbox format : top left width height
regions = api.get_region_descriptions_of_image(id=61512)
print(regions[0])
# >>> id: 1, x: 511, y: 241, width: 206,height: 320, phrase: A brown, sleek horse with a bridle, image: 61512

# Get Region Graph from Region
# Region Graphs are tiny scene graphs for a particular region of an image,
# containing objects, attributes and relationships.
# We will get the scene graph of an image and print out the objects, attributes and relationships
graph = api.get_region_graph_of_region(image_id=61512,region_id=1)
# Remember that region description is 'A brown, sleek horse with a bridle'
print(graph.objects)
# >>> [horse]
print(graph.attributes)
# >>> [3015675: horse is brown]
print(graph.relationships)
# >>> []
# The region graph has one object: horse and one attribute: brown to describe the horse. no relationships

# Get Scene Graph for an image
# Each scene graph has three components: objects, attributes, and relationships.
graph = api.get_scene_graph_of_image(id=61512)
# print the object, only the name and not the bbox
print(graph.objects)
# >>>  [horse, grass, horse, bridle, truck, sign, gate, truck, tire, trough, window, door, building, halter,
#        mane, mane,leaves, fence]
# print the attributes
print(graph.attributes)
# >>> [3015675: horse is brown, 3015676: horse is spotted, 3015677: horse is red, 3015678: horse is dark brown,
#       3015679: truck is red, 3015680: horse is brown, 3015681: truck is red, 3015682: sign is blue,
#       3015683: gate is red, 3015684: truck is white, 3015685: tire is blue, 3015686: gate is wooden,
#       3015687: horse is standing, 3015688: truck is red, 3420018: horse is brown, 3420019: horse is white,
#       3015690: building is tan, 3015691: halter is red, 3015692: horse is brown, 3015693: gate is wooden,
#       3015694: grass is grassy, 3015695: truck is red, 3015696: gate is orange, 3015697: halter is red,
#       3015698: tire is blue, 3015699: truck is white, 3015700: trough is white, 3420016: horse is brown,
#       3420017: horse is cream, 3015702: leaves is green, 3015703: grass is lush, 3015704: horse is enclosed,
#       3420022: horse is brown, 3420023: horse is white, 3015706: horse is chestnut, 3015707: gate is red,
#       3015708: leaves is green, 3015709: building is brick, 3015710: truck is large, 3015711: gate is red,
#       3015712: horse is chestnut colored, 3015713: fence is wooden]
# print the relationships
print(graph.relationships)
# >>> [3199950: horse stands on top of grass, 3199951: horse IN grass, 3199952: horse WEARING bridle,
#      3199953: trough for horse, 3199954: window next to door, 3199955: building has door,
#      3199956: horse nudging horse, 3199957: horse has mane, 3199958: horse has mane, 3199959: trough for horse]

# Get Question Answers for an image
# Each Question Answer object contains the id of the question-answer pair, the id of image,
#    the question and the answer string, as well as the list of question objects and answer
#    objects identified and canonicalized in the qa pair.
# 代码问题 手动修改visual_genome/utils.py为
#   qas.append(QA(info['id'], image_map[info['image']],
qas = api.get_QA_of_image(id=61512)
# First print out some core information of the QA
print(qas[1])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.
# Now let's print out the question objects of the QA
print(qas[1].q_objects)
# >>> []

# Get all Questions Answers in the dataset
# We can get all 1.7 million QAs in the Visual Genome dataset, if we don't want to get all the data,
#    we can also specify how many QAs we want the function to return using the parameter qtotal
qas = api.get_all_QAs(qtotal=10)
print(qas[0])
# >>> id: 991155, image: 61512, question: What is the window treatment?, answer: White blinds.

# Get one type of Questions Answers from the entire dataset
# We can choose one type of <what, who, why, when, how>
qas = api.get_QA_of_type(qtotal=10, qtype='why')
print(qas[0])
# >>> id: 133089, image: 1159910, question: Why is the man cosplaying?, answer: For an event.

# Visualizing some regions， refer to https://visualgenome.org/api/v0/api_beginners_tutorial.html
image = api.get_image_data(id=61512)
regions = api.get_region_descriptions_of_image(id=61512)
fig = plt.gcf()
fig.set_size_inches(18.5, 10.5)
def visualize_regions(image, regions):
    response = requests.get(image.url)
    img = Image.open(BytesIO(response.content))
    plt.imshow(img)
    ax = plt.gca()
    for region in regions:
        ax.add_patch(Rectangle((region.x, region.y),
                               region.width,
                               region.height,
                               fill=False,
                               edgecolor='red',
                               linewidth=3))
        ax.text(region.x, region.y, region.phrase, style='italic', bbox={'facecolor':'white', 'alpha':0.7, 'pad':10})
    fig = plt.gcf()
    plt.tick_params(labelbottom='off', labelleft='off')
    plt.show()
#visualize_regions(image, regions[:8])
#visualize_regions(image, regions) # plot all