知识图谱-数据集
原文链接:https://blog.csdn.net/qq_21097885/article/details/104562276
DBpedia
简介:
DBpedia 是一个很特殊的语义网应用范例,它从*(Wikipedia)的词条里撷取出结构化的资料,以强化*的搜寻功能,并将其他资料集连结至*。透过这样的语意化技术的介入,让*的庞杂资讯有了许多创新而有趣的应用,例如手机版本、地图整合、多面向搜寻、关系查询、文件分类与标注等等。DBpedia 同时也是世界上最大的多领域知识本体之一,也是 Linked Data 的一部分,美国科技媒体 ReadWriteWeb 也将 DBpedia 选为2009 年最佳的语义网应用服务。
DBpedia 2014 版的资料集拥有超过458万的物件,包括144万5000人、73万5000个地点、12万3000张唱片、8万7千部电影、1万9000种电脑游戏、24万1000个组织、25万1000种物种和6000个疾病。其资料不仅被BBC、路透社、纽约时报所采用,也是Google、Yahoo等搜寻引擎检索的对象。
2016年发布的版本中,包括了95亿条RDF格式的三元组数据,其中13亿条是从英文版的*中提取的50亿条来自其他语言,另外32亿条来自Depedia Commons和Wikidata。
文献:
@article{DBLP:journals/ws/BizerLKABCH09,
author = {Christian Bizer and
Jens Lehmann and
Georgi Kobilarov and
S{\"{o}}ren Auer and
Christian Becker and
Richard Cyganiak and
Sebastian Hellmann},
title = {DBpedia - {A} crystallization point for the Web of Data},
journal = {J. Web Semant.},
volume = {7},
number = {3},
pages = {154--165},
year = {2009},
url = {https://doi.org/10.1016/j.websem.2009.07.002},
doi = {10.1016/j.websem.2009.07.002},
timestamp = {Fri, 27 Dec 2019 21:12:44 +0100},
biburl = {https://dblp.org/rec/journals/ws/BizerLKABCH09.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
Yago
网址:https://www.mpi-inf.mpg.de/departments/databases-and-information-systems/research/yago-naga/yago/
中文简介:
Yago是一个开源的数据集,其中的数据是从*、WordNet和GeoNames等多个数据源中自动提取得到的。截止到2012年,就包括超过1千万个实体和1.2亿条事实。
英文简介:
YAGO (Yet Another Great Ontology) is an open source knowledge base developed at the Max Planck Institute for Computer Science in Saarbrücken. It is automatically extracted from Wikipedia and other sources.
As of 2012, YAGO3 has knowledge of more than 10 million entities and contains more than 120 million facts about these entities. The information in YAGO is extracted from Wikipedia (e.g., categories, redirects, infoboxes), WordNet (e.g., synsets, hyponymy), and GeoNames. The accuracy of YAGO was manually evaluated to be above 95% on a sample of facts.[To integrate it to the linked data cloud, YAGO has been linked to the DBpedia ontology[6] and to the SUMO ontology.
YAGO3 is provided in Turtle and tsv formats. Dumps of the whole database are available, as well as thematic and specialized dumps. It can also be queried through various online browsers and through a SPARQL endpoint hosted by OpenLink Software. The source code of YAGO3 is available on GitHub.
YAGO has been used in the Watson artificial intelligence system.
文献:
@inproceedings{DBLP:conf/www/SuchanekKW07,
author = {F* M. Suchanek and
Gjergji Kasneci and
Gerhard Weikum},
editor = {Carey L. Williamson and
Mary Ellen Zurko and
Peter F. Patel{-}Schneider and
Prashant J. Shenoy},
title = {Yago: a core of semantic knowledge},
booktitle = {Proceedings of the 16th International Conference on World Wide Web,
{WWW} 2007, Banff, Alberta, Canada, May 8-12, 2007},
pages = {697--706},
publisher = {
{ACM}},
year = {2007},
url = {https://doi.org/10.1145/1242572.1242667},
doi = {10.1145/1242572.1242667},
timestamp = {Wed, 14 Nov 2018 10:55:41 +0100},
biburl = {https://dblp.org/rec/conf/www/SuchanekKW07.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
Freebase
简介:
类似于*,Freebase的内容是由社区成员贡献的结构化知识。除了人工输入外,Freebase也主动导入如*的结构化知识。
目前,已经被谷歌公司收购。
论文中常用其子集FB13,详见:https://blog.csdn.net/qq_21097885/article/details/103519703
文献:
@inproceedings{DBLP:conf/sigmod/BollackerEPST08,
author = {Kurt D. Bollacker and
Colin Evans and
Praveen Paritosh and
Tim Sturge and
Jamie Taylor},
editor = {Jason Tsong{-}Li Wang},
title = {Freebase: a collaboratively created graph database for structuring
human knowledge},
booktitle = {Proceedings of the {ACM} {SIGMOD} International Conference on Management
of Data, {SIGMOD} 2008, Vancouver, BC, Canada, June 10-12, 2008},
pages = {1247--1250},
publisher = {
{ACM}},
year = {2008},
url = {https://doi.org/10.1145/1376616.1376746},
doi = {10.1145/1376616.1376746},
timestamp = {Tue, 27 Nov 2018 10:40:37 +0100},
biburl = {https://dblp.org/rec/conf/sigmod/BollackerEPST08.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
WordNet
网址: https://wordnet.princeton.edu/
中文简介:
WordNet是一个大型的英语词汇数据库。其中,名词、动词、形容词以及副词被按照认知上的同义词分组,称为synsets,每一个synset表征一个确定的概念。synset之间通过概念语义以及词汇关系链接。WordNet是计算机语言学和自然语言处理中常用的工具。
在汉语中,类似的有知网的HowNet。
论文中常用其子集WN11,详见:https://blog.csdn.net/qq_21097885/article/details/103519635;
以及WN18,详见:https://blog.csdn.net/qq_21097885/article/details/103519750
英文简介:
WordNet is a large lexical database of English. Nouns, verbs, adjectives and adverbs are grouped into sets of cognitive synonyms (synsets), each expressing a distinct concept. Synsets are interlinked by means of conceptual-semantic and lexical relations. The resulting network of meaningfully related words and concepts can be navigated with the browser. WordNet is also freely and publicly available for download. WordNet’s structure makes it a useful tool for computational linguistics and natural language processing.
WordNet superficially resembles a thesaurus, in that it groups words together based on their meanings. However, there are some important distinctions. First, WordNet interlinks not just word forms—strings of letters—but specific senses of words. As a result, words that are found in close proximity to one another in the network are semantically disambiguated. Second, WordNet labels the semantic relations among words, whereas the groupings of words in a thesaurus does not follow any explicit pattern other than meaning similarity.
文献:
@article{DBLP:journals/cacm/Miller95,
author = {George A. Miller},
title = {WordNet: {A} Lexical Database for English},
journal = {Commun. {ACM}},
volume = {38},
number = {11},
pages = {39--41},
year = {1995},
url = {http://doi.acm.org/10.1145/219717.219748},
doi = {10.1145/219717.219748},
timestamp = {Wed, 14 Nov 2018 10:22:30 +0100},
biburl = {https://dblp.org/rec/journals/cacm/Miller95.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
PDD
中文简介:
PDD,全称Patient-Disease-Drug,是一个医疗相关的数据集,包含了患者、疾病和药物之间的连接关系。
英文简介:
What is PDD Graph (Patient-Disease-Drug Graph):
Electronic medical records contain multi-format electronic medical data that consist of an abundance of medical knowledge. Facing with patients symptoms, experienced caregivers make right medical decisions based on their professional knowledge that accurately grasps relationships between symptoms, diagnosis, and treatments. We aim to capture these relationships by constructing a large and high-quality heterogeneous graph linking patients, diseases, and drugs (PDD) in EMRs.
Specifically, we extract important medical entities from MIMIC-III (Medical Information Mart for Intensive Care III) and automatically link them with the existing biomedical knowledge graphs, including ICD-9 ontology and DrugBank. The PDD graph presented is accessible on the Web via the SPARQL endpoint, and provides a pathway for medical discovery and applications, such as effective treatment recommendations.
文献:
@inproceedings{DBLP:conf/semweb/WangZLHWLL17,
author = {Meng Wang and
Jiaheng Zhang and
Jun Liu and
Wei Hu and
Sen Wang and
Xue Li and
Wenqiang Liu},
editor = {Claudia d'Amato and
Miriam Fern{\'{a}}ndez and
Valentina A. M. Tamma and
Freddy L{\'{e}}cu{\'{e}} and
Philippe Cudr{\'{e}}{-}Mauroux and
Juan F. Sequeda and
Christoph Lange and
Jeff Heflin},
title = {
{PDD} Graph: Bridging Electronic Medical Records and Biomedical Knowledge
Graphs via Entity Linking},
booktitle = {The Semantic Web - {ISWC} 2017 - 16th International Semantic Web Conference,
Vienna, Austria, October 21-25, 2017, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {10588},
pages = {219--227},
publisher = {Springer},
year = {2017},
url = {https://doi.org/10.1007/978-3-319-68204-4\_23},
doi = {10.1007/978-3-319-68204-4\_23},
timestamp = {Tue, 14 May 2019 10:00:53 +0200},
biburl = {https://dblp.org/rec/conf/semweb/WangZLHWLL17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
近些年,国内也推出了以中文为主的知识图谱。如清华大学的XLore、上海交通大学的zhishi.me和复旦大学的CNpedia。
清华大学的XLore
简介:
XLORE是融合中英文维基、法语维基和百度百科,对百科知识进行结构化和跨语言链接构建的多语言知识图谱,是中英文知识规模较平衡的大规模多语言知识图谱。XLORE包含16,284,901个的实例,2,466,956个概念,446,236个属性以及丰富的语义关系。
文献:
@inproceedings{DBLP:conf/semweb/WangLWLLZSLZT13,
author = {Zhigang Wang and
Juanzi Li and
Zhichun Wang and
Shuangjie Li and
Mingyang Li and
Dongsheng Zhang and
Yao Shi and
Yongbin Liu and
Peng Zhang and
Jie Tang},
editor = {Eva Blomqvist and
Tudor Groza},
title = {XLore: {A} Large-scale English-Chinese Bilingual Knowledge Graph},
booktitle = {Proceedings of the {ISWC} 2013 Posters {\&} Demonstrations Track,
Sydney, Australia, October 23, 2013},
series = {
{CEUR} Workshop Proceedings},
volume = {1035},
pages = {121--124},
publisher = {CEUR-WS.org},
year = {2013},
url = {http://ceur-ws.org/Vol-1035/iswc2013\_demo\_31.pdf},
timestamp = {Wed, 12 Feb 2020 16:44:51 +0100},
biburl = {https://dblp.org/rec/conf/semweb/WangLWLLZSLZT13.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
上海交通大学的zhishi.me
网址: 无
简介:
Zhishi.me 通过从开放的百科数据中抽取结构化数据,首次尝试构建中文通用知识图谱。目前,已融合了三大中文百科,百度百科,互动百科以及*中的数据。
文献:
@inproceedings{DBLP:conf/semweb/NiuSWRQY11,
author = {Xing Niu and
Xinruo Sun and
Haofen Wang and
Shu Rong and
Guilin Qi and
Yong Yu},
editor = {Lora Aroyo and
Chris Welty and
Harith Alani and
Jamie Taylor and
Abraham Bernstein and
Lalana Kagal and
Natasha Fridman Noy and
Eva Blomqvist},
title = {Zhishi.me - Weaving Chinese Linking Open Data},
booktitle = {The Semantic Web - {ISWC} 2011 - 10th International Semantic Web Conference,
Bonn, Germany, October 23-27, 2011, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {7032},
pages = {205--220},
publisher = {Springer},
year = {2011},
url = {https://doi.org/10.1007/978-3-642-25093-4\_14},
doi = {10.1007/978-3-642-25093-4\_14},
timestamp = {Thu, 28 Nov 2019 10:44:37 +0100},
biburl = {https://dblp.org/rec/conf/semweb/NiuSWRQY11.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
复旦大学的CN-DBpedia
网址: http://kw.fudan.edu.cn/cndbpedia/intro/
简介:
CN-DBpedia以通用百科知识沉淀为主线,以垂直纵深领域图谱积累为支线,致力于为机器语义理解提供了丰富的背景知识,为实现机器语言认知提供必要支撑。
CN-DBpedia已经从百科领域延伸至法律、工商、金融、文娱、科技、军事、教育、医疗等十多个垂直领域,为各类行业智能化应用提供支撑性知识服务,目前已有近百家单位在使用。
文献:
@inproceedings{DBLP:conf/ieaaie/XuXLXLCX17,
author = {Bo Xu and
Yong Xu and
Jiaqing Liang and
Chenhao Xie and
Bin Liang and
Wanyun Cui and
Yanghua Xiao},
editor = {Salem Benferhat and
Karim Tabia and
Moonis Ali},
title = {CN-DBpedia: {A} Never-Ending Chinese Knowledge Extraction System},
booktitle = {Advances in Artificial Intelligence: From Theory to Practice - 30th
International Conference on Industrial Engineering and Other Applications
of Applied Intelligent Systems, {IEA/AIE} 2017, Arras, France, June
27-30, 2017, Proceedings, Part {II}},
series = {Lecture Notes in Computer Science},
volume = {10351},
pages = {428--438},
publisher = {Springer},
year = {2017},
url = {https://doi.org/10.1007/978-3-319-60045-1\_44},
doi = {10.1007/978-3-319-60045-1\_44},
timestamp = {Tue, 14 May 2019 10:00:37 +0200},
biburl = {https://dblp.org/rec/conf/ieaaie/XuXLXLCX17.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27