Elasticsearch安装ik插件

程序员文章站 2022-05-13 17:44:03

...

想要给elasticsearch安装一个中文分词插件，网上的资料都有点过时。

现在记录一下从源码安装ik插件的过程。

（注：我用的版本是0.90.2)。

1、下载源码

首先去ik的git网站下站源码，网址：https://github.com/medcl/elasticsearch-analysis-ik

下载完源码后，发现没有对应的jar包。我用mvn package，打了一个jar包。

打包后名称最后是：elasticsearch-analysis-ik-1.2.2.jar

2、文件拷贝。

这一步很简单，将jar包拷贝到ES_HOME/plugin/analysis-ik目录下面。

将config/ik目录下面的东西拷贝纸ES_HOME/config/ik目录下面（我在本机是window，es在linux上面，我是先将文件夹打包成zip包，然后到服务器上解压)。

3、增加配置

编辑elasticsearch.xml，在文件的最后增加下面代码：

index:
  analysis:
    analyzer:
      ik:
          alias: [ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      ik_max_word:
          type: ik
          use_smart: false
      ik_smart:
          type: ik
          use_smart: true

然后重启elasitcsearch。

4、测试分词插件

这个我也不知道为啥使用下面命令不能测试。

curl 'http://localhost:9200/_analyze?analyzer=ik&pretty=true' -d'
{
	"text":"去北京怎么走"
}
'

但是从es的日志看，插件应该已经是加载了。

我安装ik插件的说明创建了一个索引，然后在索引下面使用上面的查询可以。

curl -XPUT http://localhost:9200/index

curl -XPOST http://localhost:9200/index/fulltext/_mapping -d'
{
    "fulltext": {
             "_all": {
            "indexAnalyzer": "ik",
            "searchAnalyzer": "ik",
            "term_vector": "no",
            "store": "false"
        },
        "properties": {
            "content": {
                "type": "string",
                "store": "no",
                "term_vector": "with_positions_offsets",
                "indexAnalyzer": "ik",
                "searchAnalyzer": "ik",
                "include_in_all": "true",
                "boost": 8
            }
        }
    }
}'

//测试命令
curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d'
{
	"text":"去北京怎么走"
}
'

测试分词效果如下：

{
"text":"去北京怎么走"
}
'
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 4,
    "end_offset" : 8,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "去",
    "start_offset" : 11,
    "end_offset" : 12,
    "type" : "CN_CHAR",
    "position" : 2
  }, {
    "token" : "北京",
    "start_offset" : 12,
    "end_offset" : 14,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "怎么走",
    "start_offset" : 14,
    "end_offset" : 17,
    "type" : "CN_WORD",
    "position" : 4
  } ]
}

5、补充

当测试分词“*时"，发现竟然没有分词。如下：

 curl 'http://localhost:9200/index/_analyze?analyzer=ik&pretty=true' -d'  
> {  
>     "text":"*"  
> }  
> '
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "*",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  } ]
}

但这并非我们想要的结果，难道ik这么差，不会分词了？后来经过研究，发现ik有一个smart模式，并且默认是这个模式，在这种模式下，你搜索“*"，可能就搜不到仅包含“*"的文档。只需使用ik_max_word模式即可修复以上问题，关于分词器，继续探索中....。

curl 'http://localhost:9200/index/_analyze?analyzer=ik_max_word&pretty=true' -d'  
> {  
>     "text":"*"  
> }  
> '
{
  "tokens" : [ {
    "token" : "text",
    "start_offset" : 12,
    "end_offset" : 16,
    "type" : "ENGLISH",
    "position" : 1
  }, {
    "token" : "*",
    "start_offset" : 19,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 2
  }, {
    "token" : "中华人民",
    "start_offset" : 19,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 3
  }, {
    "token" : "中华",
    "start_offset" : 19,
    "end_offset" : 21,
    "type" : "CN_WORD",
    "position" : 4
  }, {
    "token" : "华人",
    "start_offset" : 20,
    "end_offset" : 22,
    "type" : "CN_WORD",
    "position" : 5
  }, {
    "token" : "人民*",
    "start_offset" : 21,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 6
  }, {
    "token" : "人民",
    "start_offset" : 21,
    "end_offset" : 23,
    "type" : "CN_WORD",
    "position" : 7
  }, {
    "token" : "*",
    "start_offset" : 23,
    "end_offset" : 26,
    "type" : "CN_WORD",
    "position" : 8
  }, {
    "token" : "共和",
    "start_offset" : 23,
    "end_offset" : 25,
    "type" : "CN_WORD",
    "position" : 9
  }, {
    "token" : "国",
    "start_offset" : 25,
    "end_offset" : 26,
    "type" : "CN_CHAR",
    "position" : 10
  } ]
}

请支持原创：

http://donlianli.iteye.com/blog/1948841

对这类话题感兴趣？欢迎发送邮件至donlianli@126.com

关于我：邯郸人，擅长Java，Javascript，Extjs，oracle sql。

更多我之前的文章，可以访问我的空间

相关标签： elasticsearch ik插件插件安装

上一篇：数据导出

下一篇：简单的几种算法

Elasticsearch安装ik插件

http://donlianli.iteye.com/blog/1948841

AU怎么安装vst音频插件? Audition话放插件vst的安装图文教程

sublime text 2 插件安装（代理设置）

Python中pip更新和三方插件安装说明

插件下载安装系列Eclipse/IDEA/谷歌/火狐安装插件

从零开始学YII2框架（二）通过 Composer 安装扩展插件，yii2composer

在WordPress中安装使用视频播放器插件Hana Flv Player，wordpresshana_PHP教程

Kubernetes扩展插件 Cluster Monitoring安装

Kubernetes扩展插件Dashboard安装

Kubernetes扩展插件 Cluster DNS安装

Intellij IDEA 安装go语言插件

Elasticsearch安装ik插件

http://donlianli.iteye.com/blog/1948841

AU怎么安装vst音频插件? Audition话放插件vst的安装图文教程

sublime text 2 插件安装 （代理设置）

Python中pip更新和三方插件安装说明

插件下载安装系列Eclipse/IDEA/谷歌/火狐安装插件

从零开始学YII2框架（二）通过 Composer 安装扩展插件，yii2composer

在WordPress中安装使用视频播放器插件Hana Flv Player，wordpresshana_PHP教程

Kubernetes扩展插件 Cluster Monitoring安装

Kubernetes扩展插件Dashboard安装

Kubernetes扩展插件 Cluster DNS安装

Intellij IDEA 安装go语言插件

sublime text 2 插件安装（代理设置）