欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

elasticsearch analysis ansj分词器的安装及使用

程序员文章站 2022-07-04 22:13:23
...

1. 修改pom文件配置

<elasticsearch.version>1.7.1</elasticsearch.version>
<dependency>
    <groupId>org.ansj</groupId>
    <artifactId>ansj_seg</artifactId>
    <classifier>min</classifier>
    <version>2.0.8</version>
    <scope>compile</scope>
</dependency>

2.编译插件

mvn assembly:assembly

3. 插件安装

elasticsearch-1.7.1\bin>plugin -u file:///C:\Users\Administrator\Desktop\elasticsearch-analysis-ansj\target\releases\elasticsearch-analysis-ansj-1.x.1-release.zip -i ansj

4. 配置ansj分词器

index:
  analysis:
    analyzer:
      index_ansj:
          type: ansj_index
      query_ansj:
          type: ansj_query
      ik:
          alias: [news_analyzer_ik,ik_analyzer]
          type: org.elasticsearch.index.analysis.IkAnalyzerProvider
      mmseg:
          alias: [news_analyzer, mmseg_analyzer]
          type: org.elasticsearch.index.analysis.MMsegAnalyzerProvider
        
index.analysis.analyzer.default.type : "ansj_index"

详细配置可参考elasticsearch.yml.example

5. 测试及使用

  • 索引分词
http://127.0.0.1:9200/articles/_analyze?analyzer=ansj_index&text=我们是中国人

注:其中articles是索引名称,除articles外的所有请求url参数部分均为固定写法。analyzer=ansj_index指定索引分词器,text后为要索引的内容
输出:

{
  "tokens": [
    {
      "token": "我们",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "是",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "中国",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 5,
      "end_offset": 6,
      "type": "word",
      "position": 4
    }
  ]
}
  • 查询分词
http://127.0.0.1:9200/articles/_analyze?analyzer=ansj_query&text=我们是中国人

注:其中articles是索引名称,除articles外的所有请求url参数部分均为固定写法。analyzer=ansj_query指定查询分词器,text后为要查询的内容
输出:

{
  "tokens": [
    {
      "token": "我们",
      "start_offset": 0,
      "end_offset": 2,
      "type": "word",
      "position": 1
    },
    {
      "token": "是",
      "start_offset": 2,
      "end_offset": 3,
      "type": "word",
      "position": 2
    },
    {
      "token": "中国",
      "start_offset": 3,
      "end_offset": 5,
      "type": "word",
      "position": 3
    },
    {
      "token": "人",
      "start_offset": 5,
      "end_offset": 6,
      "type": "word",
      "position": 4
    }
  ]
}