Elasticsearch中安装IK分词器
Elasticsearch中默认的分词器对中文的支持不好,会分隔成一个一个的汉字。而IK分词器对中文的支持比较好一些,主要有两种模式“ik_smart”和“ik_max_word”。
Elasticsearch中文拆分测试:
curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"text":"在潭州教育学习"}'
#测试结果
{
"tokens" : [
{
"token" : "在",
"start_offset" : 0,
"end_offset" : 1,
"type" : "<IDEOGRAPHIC>",
"position" : 0
},
{
"token" : "潭",
"start_offset" : 1,
"end_offset" : 2,
"type" : "<IDEOGRAPHIC>",
"position" : 1
},
{
"token" : "州",
"start_offset" : 2,
"end_offset" : 3,
"type" : "<IDEOGRAPHIC>",
"position" : 2
},
{
"token" : "教",
"start_offset" : 3,
"end_offset" : 4,
"type" : "<IDEOGRAPHIC>",
"position" : 3
},
{
"token" : "育",
"start_offset" : 4,
"end_offset" : 5,
"type" : "<IDEOGRAPHIC>",
"position" : 4
},
{
"token" : "学",
"start_offset" : 5,
"end_offset" : 6,
"type" : "<IDEOGRAPHIC>",
"position" : 5
},
{
"token" : "习",
"start_offset" : 6,
"end_offset" : 7,
"type" : "<IDEOGRAPHIC>",
"position" : 6
}
]
}
安装IK分词器
方法一:在线安装IK分词器,注意:必须保证centos系统是联网的。
IK分词器的GitHub地址,选择跟自己的Elasticsearch对应的版本,本文使用的版本是Elasticsearch6.1.1版本。
https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.1.4
找到IK分词器的6.1.1的地址然后使用elasticsearch-plugin命令安装:
bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip
方法二:离线安装IK分词器:
点击上面的IK分词器的地址现在IK分词器的安装包
640?wx_fmt=png
上传安装包到Linux服务器,然后解压到:
unzip elasticsearch-analysis-ik-6.1.1.zip -d plugins/analysis-ik
进入解压好的analysis-ik目录:
640?wx_fmt=png
将elasticsearch目录中的所有文件移动出来,删除elasticsearch目录:
[[email protected] analysis-ik]# mv elasticsearch/* ./
[[email protected] analysis-ik]# rm -fr elasticsearc
640?wx_fmt=png
启动elasticsearch:
[[email protected] elasticsearch-6.1.1]$ bin/elasticsearch
640?wx_fmt=png
测试IK分词器的ik_smart模式:
curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"analyzer":"ik_smart","text":"在潭州教育学习"}'
#测试结果
{
"tokens" : [
{
"token" : "在",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "潭州",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "教育",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "学习",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 3
}
]
}
ik_smart:会做最粗粒度的拆分,比如会将“在潭州教育学习”拆分为“在,潭州,教育,学习”。
测试IK分词器的ik_max_word模式:
curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"analyzer":"ik_max_word","text":"在潭州教育学习"}'
#测试结果
{
"tokens" : [
{
"token" : "在",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0
},
{
"token" : "潭州",
"start_offset" : 1,
"end_offset" : 3,
"type" : "CN_WORD",
"position" : 1
},
{
"token" : "教育学",
"start_offset" : 3,
"end_offset" : 6,
"type" : "CN_WORD",
"position" : 2
},
{
"token" : "教育",
"start_offset" : 3,
"end_offset" : 5,
"type" : "CN_WORD",
"position" : 3
},
{
"token" : "学习",
"start_offset" : 5,
"end_offset" : 7,
"type" : "CN_WORD",
"position" : 4
}
]
}
ik_max_word:会将文本做最细粒度的拆分,比如会将“在潭州教育学习”拆分为“在,潭州。教育学,教育,学习”,会进行各种组合。
至此,Elasticsearch中搭建IK分词器成功!
下一篇: error while loading shared libraries: libfslio.so: cannot open shared object file: No such file
推荐阅读