JanusGraph集群搭建，多图配置及索引博客分类： java;janusGraph cassandra数据挖掘janusGraph

程序员文章站 2024-03-15 09:56:11

...

下载和启动实例
1、下载和启动Cassandra，生产环境会使用HBase作为Storage backend，但在开发环境部署HBase比较复杂

http://cassandra.apache.org/download/    下载3.11.2

解压后 sh bin/cassandra -f，默认绑定IP：127.0.0.1 PORT：9042

启动thrift，sh ./bin/nodetool enablethrift    默认绑定IP：127.0.0.1 PORT：9160

2、下载和启动ES

JanusGraph的发布包自带ES，sh /elasticsearch/bin/elasticsearch，默认绑定IP：127.0.0.1 PORT:9200

也可以使用独立的ES，但建议5.0以上版本。

PS：centos6下启动es5.0以上版本遇到的问题及解决方法

[1]: max file descriptors [10240] for elasticsearch process is too low, increase to at least [65536]

ulimit -n 65536
[2]: max virtual memory areas vm.max_map_count [65530] is too low, increase to at least [262144]

修改/etc/sysctl.conf配置文件，

cat /etc/sysctl.conf | grep vm.max_map_count
vm.max_map_count=262144

如果不存在则添加

echo "vm.max_map_count=262144" >>/etc/sysctl.conf

sysctl -p

[3]: system call filters failed to install; check the logs and fix your configuration or disable system call filters at your own risk

在elasticsearch.yml中配置bootstrap.system_call_filter为false，注意要在Memory下面:
bootstrap.memory_lock: false
bootstrap.system_call_filter: false

3、下载和安装JanusGraph

https://github.com/JanusGraph/janusgraph/releases/    下载janusgraph-0.2.0-hadoop2.zip

编辑janusgraph-cassandra-es.properties，将es和cassandra的ip指定为步骤1，2中的IP

sh bin/gremlin.sh

graph = JanusGraphFactory.open("conf/janusgraph-cassandra-es.properties")，如提示standardjanusgraph[cassandrathrift:[127.0.0.1]]，则配置成功

4、启动gremlin server

修改conf/gremlin-server/gremlin-server.yaml，确定properties文中的cassandra和es的ip正确

sh bin/gremlin-server.sh

使用gremlin.sh测试是否可以连上服务，:remote connect tinkerpop.server conf/remote.yaml

组建集群及多图
1、GremlinServer多图配置

服务器gremlin-server.yaml中可以设置多个graph的properties文件

graphs: {
graph: conf/gremlin-server/janusgraph-cassandra-es-server.properties,
pressureGraph: conf/gremlin-server/janusgraph-pressure.properties
}

janusgraph-pressure.properties中需要声明keyspace和index-name，否则会按照默认名字janusgraph建立存储和索引。多个properties指向同一个图。

storage.hostname=10.*.*.*
storage.cassandra.keyspace=pressure

index.pressure.hostname=10.*.*.*:9200

index.pressure.index-name=pressure

PS:可使用cqlsh describe keyspaces ， curl -XGET 10.*.*.*:9200/_cat/indices查看存储及索引情况

scripts/empty-sample.groovy,增加traversalSource，对应不同的graph

globals << [g : graph.traversal(), p : pressureGraph.traversal()]

2、集群配置配置

客户端remote-objects.yaml，指定多实例IP，并实现负载均衡

hosts: [10.*.*.*,10.*.*.*]

jgex-remote.properties指定服务器上的traversalSource

gremlin.remote.driver.sourceName=p

索引优化及检查
1、composite索引，基于storage backend实现，仅支持相等（完全匹配），速度快

    mgmt.buildIndex('copIndustryCode', Vertex.class).addKey(industryCode).buildCompositeIndex()

2、mixed索引，基于index backend实现，支持如ES的特性，速度慢于composite索引。最后一个参数为index-name。

    mgmt.buildIndex('mixCompanyCode', Vertex.class).addKey(companyCode).buildMixedIndex("pressure")
3、可以使用gramlin.sh进行检索，如提示Query requires iterating over all vertices [(companyCode = 10)]. For better performance, use indexes，则表示未使用索引。

相反则代表使用了索引

gremlin> g.V().has("companyCode", textContains("1000")).valueMap()
==>[companyCode:[1000],companyName:[机构1000]]
gremlin> g.V().has("companyCode", "10").valueMap()
19:34:32 WARN org.janusgraph.graphdb.transaction.StandardJanusGraphTx - Query requires iterating over all vertices [(companyCode = 10)]. For better performance, use indexes

4.1 全剧配置

schema.default=none // 关闭自动创建vertex label,edge label, property

# Required to enable Metrics in JanusGraph
metrics.enabled = true
force-index

4.2 storage backend

4.3 index backend

参考资料：

http://docs.janusgraph.org/latest/getting-started.html    官网

https://groups.google.com/forum/#!topic/gremlin-users/yWqS6gzmoYY         配置多graph, traversalSource

上一篇： oracle用户创建及权限设置博客分类： oracle oracle创建用户

下一篇： Apache Geode 防火墙和连接博客分类：框架Apache Geode

JanusGraph集群搭建，多图配置及索引 博客分类： java;janusGraph cassandra数据挖掘janusGraph