Elasticsearch中安装IK分词器

程序员文章站 2024-02-21 23:16:46

...

Elasticsearch中默认的分词器对中文的支持不好，会分隔成一个一个的汉字。而IK分词器对中文的支持比较好一些，主要有两种模式“ik_smart”和“ik_max_word”。

Elasticsearch中文拆分测试：

curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"text":"在潭州教育学习"}'	
#测试结果	
{	
  "tokens" : [	
    {	
      "token" : "在",	
      "start_offset" : 0,	
      "end_offset" : 1,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 0	
    },	
    {	
      "token" : "潭",	
      "start_offset" : 1,	
      "end_offset" : 2,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 1	
    },	
    {	
      "token" : "州",	
      "start_offset" : 2,	
      "end_offset" : 3,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 2	
    },	
    {	
      "token" : "教",	
      "start_offset" : 3,	
      "end_offset" : 4,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 3	
    },	
    {	
      "token" : "育",	
      "start_offset" : 4,	
      "end_offset" : 5,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 4	
    },	
    {	
      "token" : "学",	
      "start_offset" : 5,	
      "end_offset" : 6,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 5	
    },	
    {	
      "token" : "习",	
      "start_offset" : 6,	
      "end_offset" : 7,	
      "type" : "<IDEOGRAPHIC>",	
      "position" : 6	
    }	
  ]	
}

安装IK分词器

方法一：在线安装IK分词器，注意：必须保证centos系统是联网的。

IK分词器的GitHub地址，选择跟自己的Elasticsearch对应的版本，本文使用的版本是Elasticsearch6.1.1版本。

https://github.com/medcl/elasticsearch-analysis-ik/releases?after=v6.1.4

找到IK分词器的6.1.1的地址然后使用elasticsearch-plugin命令安装：

bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.1.1/elasticsearch-analysis-ik-6.1.1.zip

方法二：离线安装IK分词器：

点击上面的IK分词器的地址现在IK分词器的安装包

640?wx_fmt=png

上传安装包到Linux服务器，然后解压到：

unzip elasticsearch-analysis-ik-6.1.1.zip -d plugins/analysis-ik

进入解压好的analysis-ik目录：

640?wx_fmt=png

将elasticsearch目录中的所有文件移动出来，删除elasticsearch目录：

[[email protected] analysis-ik]# mv elasticsearch/* ./	
[[email protected] analysis-ik]# rm -fr elasticsearc

640?wx_fmt=png

启动elasticsearch：

[[email protected] elasticsearch-6.1.1]$ bin/elasticsearch

640?wx_fmt=png

测试IK分词器的ik_smart模式：

curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"analyzer":"ik_smart","text":"在潭州教育学习"}'	
 
	
#测试结果	
{	
  "tokens" : [	
    {	
      "token" : "在",	
      "start_offset" : 0,	
      "end_offset" : 1,	
      "type" : "CN_CHAR",	
      "position" : 0	
    },	
    {	
      "token" : "潭州",	
      "start_offset" : 1,	
      "end_offset" : 3,	
      "type" : "CN_WORD",	
      "position" : 1	
    },	
    {	
      "token" : "教育",	
      "start_offset" : 3,	
      "end_offset" : 5,	
      "type" : "CN_WORD",	
      "position" : 2	
    },	
    {	
      "token" : "学习",	
      "start_offset" : 5,	
      "end_offset" : 7,	
      "type" : "CN_WORD",	
      "position" : 3	
    }	
  ]	
}

ik_smart：会做最粗粒度的拆分，比如会将“在潭州教育学习”拆分为“在，潭州，教育，学习”。

测试IK分词器的ik_max_word模式：

curl -H "Content-Type:application/json" -XGET 'http://192.168.20.131:9200/_analyze?pretty' -d '{"analyzer":"ik_max_word","text":"在潭州教育学习"}'	
 
	
#测试结果	
{	
  "tokens" : [	
    {	
      "token" : "在",	
      "start_offset" : 0,	
      "end_offset" : 1,	
      "type" : "CN_CHAR",	
      "position" : 0	
    },	
    {	
      "token" : "潭州",	
      "start_offset" : 1,	
      "end_offset" : 3,	
      "type" : "CN_WORD",	
      "position" : 1	
    },	
    {	
      "token" : "教育学",	
      "start_offset" : 3,	
      "end_offset" : 6,	
      "type" : "CN_WORD",	
      "position" : 2	
    },	
    {	
      "token" : "教育",	
      "start_offset" : 3,	
      "end_offset" : 5,	
      "type" : "CN_WORD",	
      "position" : 3	
    },	
    {	
      "token" : "学习",	
      "start_offset" : 5,	
      "end_offset" : 7,	
      "type" : "CN_WORD",	
      "position" : 4	
    }	
  ]	
}

ik_max_word：会将文本做最细粒度的拆分，比如会将“在潭州教育学习”拆分为“在，潭州。教育学，教育，学习”，会进行各种组合。

至此，Elasticsearch中搭建IK分词器成功！

上一篇： ElasticSearch安装中文分词器 ik

下一篇： error while loading shared libraries: libfslio.so: cannot open shared object file: No such file

Elasticsearch中安装IK分词器

Elasticsearch中安装IK分词器

ElasticSearch ik分词器

ElasticSearch安装中文分词器 ik

ElasticSearch的IK分词器下载&安装&操作demo

ElasticSearch学习笔记之十一 Anayle API和IK分词器

Solr --- 安装IK中文分词器

安装elasticsearch-analysis-ik中文分词器的步骤讲解

ElasticSearch的安装过程中遇到的问题

Elasticsearch7 内置分词器的使用以及中文(IK)分词器的安装和使用

docker 部署 Elasticsearch kibana及ik分词器详解