知识图谱3
程序员文章站
2022-06-12 17:05:08
...
一、引言
Neo4j 是当前较为主流和先进的原生图数据库之一,提供原生的图数据存储、检索和处理。它由 Neo Technology支持,从 2003 年开始开发,1.0 版本发布于 2010 年,2.0版本发布于 2013 年。经过十多年的发展,Neo4j 获得越来越高的关注度,它已经从一个 Java 领域内的图数据库逐渐发展成为适应多语言多框架的图数据库。Neo4j 支持ACID、集群、备份和故障转移,具有较高的可用性和稳定性;它具备非常好的直观性,通过图形化的界面表示节点和关系;同时它具备较高的可扩展性,能够承载上亿的节点、关系和属性,通过 REST 接口或者面向对象的 JAVA API进行访问。
二、Neo4j简介
(1)基本概念
Neo4j使用图相关的概念来描述数据模型,把数据保存为图中的节点以及节点之间的关系。数据主要由三部分构成:
- 节点。节点表示对象实例,每个节点有唯一的ID区别其它节点,节点带有属性;
- 关系。就是图里面的边,连接两个节点,另外这里的关系是有向的并带有属性;
- 属性。key-value对,存在于节点和关系中,如图1所示。
(2)索引
-
动机:Neo4j使用遍历操作进行查询。为了加速查询,Neo4j会建立索引,并根据索引找到遍历用的起始节点;
-
介绍:默认情况下,相关的索引是由Apache Lucene提供的。但也能使用其他索引实现来提供。
-
操作:用户可以创建任意数量的命名索引。每个索引控制节点或者关系,而每个索引都通过key/value/object三个参数来工作。其中object要么是一个节点,要么是一个关系,取决于索引类型。另外,Neo4j中有关于节点(关系)的索引,系统通过索引实现从属性到节点(关系)的映射。
-
作用:
- 查找操作:系统通过设定访问条件比如,遍历的方向,使用深度优先或广度优先算法等条件对图进行遍历,从一个节点沿着关系到其他节点;
- 删除操作:Neo4j可以快速的插入删除节点和关系,并更新节点和关系中的属性。
(3)Neo4j的优势
- 查询的高性能
- 设计的灵活性
- 开发的敏捷行
- 与其他数据库的比较
- 综合表现
- 闪电般的读/写速度,无与伦比的高性能表现;
- 非结构化数据存储方式,在数据库设计上具有很大的灵活性;
- 能很好地适应需求变化,并适合使用敏捷开发方法;
- 很容易使用,可以用嵌入式、服务器模式、分布式模式等方式来使用数据库;
- 使用简单框图就可以设计数据模型,方便建模;
- 图数据的结构特点可以提供更多更优秀的算法设计;
- 完全支持ACID完整的事务管理特性;
- 提供分布式高可用模式,可以支持大规模的数据增长;
- 数据库安全可靠,可以实时备份数据,很方便恢复数据;
- 图的数据结构直观而形象地表现了现实世界的应用场景。
三、主体类MedicalGraph介绍
class MedicalGraph:
def __init__(self):
pass
# 读取文件,获得实体,实体关系
def read_file(self):
psss
# 创建节点
def create_node(self, label, nodes):
pass
# 创建疾病节点的属性
def create_diseases_nodes(self, disease_info):
pass
# 创建知识图谱实体
def create_graphNodes(self):
pass
# 创建实体关系边
def create_graphRels(self):
pass
# 创建实体关系边
def create_relationship(self, start_node, end_node, edges, rel_type, rel_name):
pass
- 读取文件
def read_file(self):
"""
读取文件,获得实体,实体关系
:return:
"""
# cols = ["name", "alias", "part", "age", "infection", "insurance", "department", "checklist", "symptom",
# "complication", "treatment", "drug", "period", "rate", "money"]
# 实体
diseases = [] # 疾病
aliases = [] # 别名
symptoms = [] # 症状
parts = [] # 部位
departments = [] # 科室
complications = [] # 并发症
drugs = [] # 药品
# 疾病的属性:age, infection, insurance, checklist, treatment, period, rate, money
diseases_infos = []
# 关系
disease_to_symptom = [] # 疾病与症状关系
disease_to_alias = [] # 疾病与别名关系
diseases_to_part = [] # 疾病与部位关系
disease_to_department = [] # 疾病与科室关系
disease_to_complication = [] # 疾病与并发症关系
disease_to_drug = [] # 疾病与药品关系
all_data = pd.read_csv(self.data_path, encoding='gb18030').loc[:, :].values
for data in all_data:
disease_dict = {} # 疾病信息
# 疾病
disease = str(data[0]).replace("...", " ").strip()
disease_dict["name"] = disease
# 别名
line = re.sub("[,、;,.;]", " ", str(data[1])) if str(data[1]) else "未知"
for alias in line.strip().split():
aliases.append(alias)
disease_to_alias.append([disease, alias])
# 部位
part_list = str(data[2]).strip().split() if str(data[2]) else "未知"
for part in part_list:
parts.append(part)
diseases_to_part.append([disease, part])
# 年龄
age = str(data[3]).strip()
disease_dict["age"] = age
# 传染性
infect = str(data[4]).strip()
disease_dict["infection"] = infect
# 医保
insurance = str(data[5]).strip()
disease_dict["insurance"] = insurance
# 科室
department_list = str(data[6]).strip().split()
for department in department_list:
departments.append(department)
disease_to_department.append([disease, department])
# 检查项
check = str(data[7]).strip()
disease_dict["checklist"] = check
# 症状
symptom_list = str(data[8]).replace("...", " ").strip().split()[:-1]
for symptom in symptom_list:
symptoms.append(symptom)
disease_to_symptom.append([disease, symptom])
# 并发症
complication_list = str(data[9]).strip().split()[:-1] if str(data[9]) else "未知"
for complication in complication_list:
complications.append(complication)
disease_to_complication.append([disease, complication])
# 治疗方法
treat = str(data[10]).strip()[:-4]
disease_dict["treatment"] = treat
# 药品
drug_string = str(data[11]).replace("...", " ").strip()
for drug in drug_string.split()[:-1]:
drugs.append(drug)
disease_to_drug.append([disease, drug])
# 治愈周期
period = str(data[12]).strip()
disease_dict["period"] = period
# 治愈率
rate = str(data[13]).strip()
disease_dict["rate"] = rate
# 费用
money = str(data[14]).strip() if str(data[14]) else "未知"
disease_dict["money"] = money
diseases_infos.append(disease_dict)
return set(diseases), set(symptoms), set(aliases), set(parts), set(departments), set(complications), \
set(drugs), disease_to_alias, disease_to_symptom, diseases_to_part, disease_to_department, \
disease_to_complication, disease_to_drug, diseases_infos
- 创建节点
def create_node(self, label, nodes):
"""
创建节点
:param label: 标签
:param nodes: 节点
:return:
"""
count = 0
for node_name in nodes:
node = Node(label, name=node_name)
self.graph.create(node)
count += 1
print(count, len(nodes))
return
- 创建带有属性节点
def create_diseases_nodes(self, disease_info):
"""
创建疾病节点的属性
:param disease_info: list(Dict)
:return:
"""
count = 0
for disease_dict in disease_info:
node = Node("Disease", name=disease_dict['name'], age=disease_dict['age'],
infection=disease_dict['infection'], insurance=disease_dict['insurance'],
treatment=disease_dict['treatment'], checklist=disease_dict['checklist'],
period=disease_dict['period'], rate=disease_dict['rate'],
money=disease_dict['money'])
self.graph.create(node)
count += 1
print(count)
return
- 创建知识图谱实体
def create_graphNodes(self):
"""
创建知识图谱实体
:return:
"""
disease, symptom, alias, part, department, complication, drug, rel_alias, rel_symptom, rel_part, \
rel_department, rel_complication, rel_drug, rel_infos = self.read_file()
self.create_diseases_nodes(rel_infos)
self.create_node("Symptom", symptom)
self.create_node("Alias", alias)
self.create_node("Part", part)
self.create_node("Department", department)
self.create_node("Complication", complication)
self.create_node("Drug", drug)
return
- 创建知识图谱关系
def create_graphRels(self):
disease, symptom, alias, part, department, complication, drug, rel_alias, rel_symptom, rel_part, \
rel_department, rel_complication, rel_drug, rel_infos = self.read_file()
self.create_relationship("Disease", "Alias", rel_alias, "ALIAS_IS", "别名")
self.create_relationship("Disease", "Symptom", rel_symptom, "HAS_SYMPTOM", "症状")
self.create_relationship("Disease", "Part", rel_part, "PART_IS", "发病部位")
self.create_relationship("Disease", "Department", rel_department, "DEPARTMENT_IS", "所属科室")
self.create_relationship("Disease", "Complication", rel_complication, "HAS_COMPLICATION", "并发症")
self.create_relationship("Disease", "Drug", rel_drug, "HAS_DRUG", "药品")
- 创建实体关系边
def create_relationship(self, start_node, end_node, edges, rel_type, rel_name):
"""
创建实体关系边
:param start_node:
:param end_node:
:param edges:
:param rel_type:
:param rel_name:
:return:
"""
count = 0
# 去重处理
set_edges = []
for edge in edges:
set_edges.append('###'.join(edge))
all = len(set(set_edges))
for edge in set(set_edges):
edge = edge.split('###')
p = edge[0]
q = edge[1]
query = "match(p:%s),(q:%s) where p.name='%s'and q.name='%s' create (p)-[rel:%s{name:'%s'}]->(q)" % (
start_node, end_node, p, q, rel_type, rel_name)
try:
self.graph.run(query)
count += 1
print(rel_type, count, all)
except Exception as e:
print(e)
return
- 参考资料
- QASystemOnMedicalGraph
- 韩浩明.图数据库系统研究综述[J].计算机光盘软件与应用,2014,17(23):14-
上一篇: C# 枚举获取Descrption
下一篇: C#获取当前连接设备串口号