欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Open Images数据集解析----下载Open Images V4指定的类别数据

程序员文章站 2022-05-30 09:41:14
...

1.下载Open Images的注释文件

注释文件如下:

 Class Names:    

           class-descriptions-boxable.csv      数据集内部使用的类名到人类可理解名称的对应

 Boxes:

          train-annotations-bbox.csv              训练图像中对象实例的边框注释
          validation-annotations-bbox.csv     验证图像中对象实例的边框注释
          test-annotations-bbox.csv                测试图像中对象实例的边框注释

下载地址:

wget https://storage.googleapis.com/openimages/2018_04/class-descriptions-boxable.csv
 
wget https://storage.googleapis.com/openimages/2018_04/train/train-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/validation/validation-annotations-bbox.csv
 
wget https://storage.googleapis.com/openimages/2018_04/test/test-annotations-bbox.csv

2.需要的包

管理AWS服务的统一工具

 sudo pip3 install awscli

sudo pip3 install tqdm

 3.运行脚本

python3 downloadOI.py    --classes 'Bicycle'    --mode train

可选的参数

parser.add_argument("--mode", help="Dataset category - train, validation or test", required=True)
parser.add_argument("--classes", help="Names of object classes to be downloaded", required=True)
parser.add_argument("--nthreads", help="Number of threads to use", required=False, type=int, default=cpu_count*2)
parser.add_argument("--occluded", help="Include occluded images", required=False, type=int, default=1)
parser.add_argument("--truncated", help="Include truncated images", required=False, type=int, default=1)
parser.add_argument("--groupOf", help="Include groupOf images", required=False, type=int, default=1)
parser.add_argument("--depiction", help="Include depiction images", required=False, type=int, default=1)
parser.add_argument("--inside", help="Include inside images", required=False, type=int, default=1)

4.downloadOI.py

#Author : Sunita Nayak, Big Vision LLC
 
#### Usage example: python3 downloadOI.py --classes 'Ice_cream,Cookie' --mode train
 
import argparse
import csv
import subprocess
import os
from tqdm import tqdm
import multiprocessing
from multiprocessing import Pool as thread_pool
 
cpu_count = multiprocessing.cpu_count()
 
parser = argparse.ArgumentParser(description='Download Class specific images from OpenImagesV4')
parser.add_argument("--mode", help="Dataset category - train, validation or test", required=True)
parser.add_argument("--classes", help="Names of object classes to be downloaded", required=True)
parser.add_argument("--nthreads", help="Number of threads to use", required=False, type=int, default=cpu_count*2)
parser.add_argument("--occluded", help="Include occluded images", required=False, type=int, default=1)
parser.add_argument("--truncated", help="Include truncated images", required=False, type=int, default=1)
parser.add_argument("--groupOf", help="Include groupOf images", required=False, type=int, default=1)
parser.add_argument("--depiction", help="Include depiction images", required=False, type=int, default=1)
parser.add_argument("--inside", help="Include inside images", required=False, type=int, default=1)
 
args = parser.parse_args()
 
run_mode = args.mode
 
threads = args.nthreads
 
classes = []
for class_name in args.classes.split(','):
    classes.append(class_name)
 
with open('./class-descriptions-boxable.csv', mode='r') as infile:
    reader = csv.reader(infile)
    dict_list = {rows[1]:rows[0] for rows in reader}
 
subprocess.run(['rm', '-rf', 'labels'])
subprocess.run([ 'mkdir', 'labels'])
 
subprocess.run(['rm', '-rf', 'JPEGImages'])
subprocess.run([ 'mkdir', 'JPEGImages'])
 
pool = thread_pool(threads)
commands = []
cnt = 0
 
for ind in range(0, len(classes)):
    
    class_name = classes[ind]
    print("Class "+str(ind) + " : " + class_name)
    
    subprocess.run([ 'mkdir', run_mode+'/'+class_name])
 
    command = "grep "+dict_list[class_name.replace('_', ' ')] + " ./" + run_mode + "-annotations-bbox.csv"
    class_annotations = subprocess.run(command.split(), stdout=subprocess.PIPE).stdout.decode('utf-8')
    class_annotations = class_annotations.splitlines()
 
    for line in class_annotations:
 
        line_parts = line.split(',')
        
        #IsOccluded,IsTruncated,IsGroupOf,IsDepiction,IsInside
        if (args.occluded==0 and int(line_parts[8])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.truncated==0 and int(line_parts[9])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.groupOf==0 and int(line_parts[10])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.depiction==0 and int(line_parts[11])>0):
            print("Skipped %s",line_parts[0])
            continue
        if (args.inside==0 and int(line_parts[12])>0):
            print("Skipped %s",line_parts[0])
            continue
 
        cnt = cnt + 1
 
        command = 'aws s3 --no-sign-request --only-show-errors cp s3://open-images-dataset/'+run_mode+'/'+line_parts[0]+'.jpg '+ 'JPEGImages'+'/'+class_name+'/'+line_parts[0]+'.jpg'
        commands.append(command)
        
 
        with open('labels/%s.txt'%(line_parts[0]),'a') as f:
            f.write(' '.join([str(ind), str((float(line_parts[5]) + float(line_parts[4]))/2), str((float(line_parts[7]) + float(line_parts[6]))/2), str(float(line_parts[5])-float(line_parts[4])), str(float(line_parts[7])-float(line_parts[6]))])+'\n')
 
print("Annotation Count : "+str(cnt))
commands = list(set(commands))
print("Number of images to be downloaded : "+str(len(commands)))
 
list(tqdm(pool.imap(os.system, commands), total = len(commands) ))
 
pool.close()
pool.join()	

下载的对应Bicycle图片

Open Images数据集解析----下载Open Images V4指定的类别数据

以及labels

Open Images数据集解析----下载Open Images V4指定的类别数据