欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

golang 使用 elasticsearch ik 分词器

程序员文章站 2022-07-04 22:12:53
...

golang 使用 elasticsearch ik 分词器

相关学习网址
https://github.com/olivere/elastic/wiki
https://github.com/olivere/elastic

ik分词器

https://github.com/medcl/elasticsearch-analysis-ik

安装

  • 参照词表找对应的 ik 版本
IK version ES version
master 7.x -> master
6.x 6.x
5.x 5.x
1.10.6 2.4.6
1.9.5 2.3.5
1.8.1 2.2.1
1.7.0 2.1.1
1.5.0 2.0.0
1.2.6 1.0.0
1.2.5 0.90.x
1.1.3 0.20.x
1.0.0 0.16.2 -> 0.19.0
  • 下载解压到 plugins 的ik目录下
  • 重启 es

例子

  • 默认分词-例子
GET /cms_index/_analyze
{
  "text": "我是中国人"
}
  • 2.ik分词器 (ik_max_word)
GET /cms_index/_analyze
{
  "text": "我们是软件工程师",
  "tokenizer":"ik_max_word"
}
  • 3.ik分词器 (ik_smart)
GET /cms_index/_analyze
{
  "text":"我们是软件工程师",
  "tokenizer":"ik_smart"
}
  • 4
GET cms_index/_search
{
	"query":{
		"match":{"title":"测试"}
	}
}
  • ik_max_word 和 ik_smart 什么区别?
ik_max_word: 会将文本做最细粒度的拆分,比如会将“*国歌”拆分为“*,中华人民,中华,华人,人民*,人民,,,*,共和,,国国,国歌”,会穷尽各种可能的组合;

ik_smart: 会做最粗粒度的拆分,比如会将“*国歌”拆分为“*,国歌”。

逗号分词器

  • 例子
GET cms_index/_search
{
	"query":{
		"match":{"tags":"一,二"}
	}
}

  • 多条件查询
GET cms_index/_search
{
    "query": {
        "bool": {
            "should": [
                {
                    "match": {
                        "title": {
                            "query": "测试"
                        }
                    }
                },
                {
                    "match": {
                        "tags": {
                            "query": "一"
                        }
                    }
                }
            ]
        }
    },
    "from": 0,
    "size": 10
}
{
    "bool": {
        "must": [
            {
                "bool": {
                    "should": [
                        {
                            "wildcard": {
                                "nickName": {
                                    "wildcard": "*测试*",
                                    "boost": 1
                                }
                            }
                        },
                        {
                            "match": {
                                "research": {
                                    "query": "测试",
                                    "operator": "OR",
                                    "analyzer": "ik_max_word",
                                    "prefix_length": 0,
                                    "max_expansions": 50,
                                    "fuzzy_transpositions": true,
                                    "lenient": false,
                                    "zero_terms_query": "NONE",
                                    "auto_generate_synonyms_phrase_query": true,
                                    "boost": 1
                                }
                            }
                        },
                        {
                            "match": {
                                "content": {
                                    "query": "测试",
                                    "operator": "OR",
                                    "analyzer": "ik_max_word",
                                    "prefix_length": 0,
                                    "max_expansions": 50,
                                    "fuzzy_transpositions": true,
                                    "lenient": false,
                                    "zero_terms_query": "NONE",
                                    "auto_generate_synonyms_phrase_query": true,
                                    "boost": 1
                                }
                            }
                        },
                        {
                            "match": {
                                "doctorStyle": {
                                    "query": "测试",
                                    "operator": "OR",
                                    "analyzer": "ik_max_word",
                                    "prefix_length": 0,
                                    "max_expansions": 50,
                                    "fuzzy_transpositions": true,
                                    "lenient": false,
                                    "zero_terms_query": "NONE",
                                    "auto_generate_synonyms_phrase_query": true,
                                    "boost": 1
                                }
                            }
                        }
                    ],
                    "adjust_pure_negative": true,
                    "boost": 1
                }
            },
            {
                "match": {
                    "status": {
                        "query": 1,
                        "operator": "OR",
                        "prefix_length": 0,
                        "max_expansions": 50,
                        "fuzzy_transpositions": true,
                        "lenient": false,
                        "zero_terms_query": "NONE",
                        "auto_generate_synonyms_phrase_query": true,
                        "boost": 1
                    }
                }
            }
        ],
        "adjust_pure_negative": true,
        "boost": 1
    }
}

常用的条件查询

  • term
    • term是代表完全匹配,也就是精确查询,搜索前不会再对搜索词进行分词拆解。
GET cms_index/_search
{
  "query" : {
    "term": {
      "id": "22"
    }
  }
}
  • terms

GET cms_index/_search
{
  "query" : {
    "terms": {
      "id": ["22","23"]
    }
  }
}
  • match
    • match进行搜索的时候,会先进行分词拆分,拆完后,再来匹配
  • match_phrase
    • 称为短语搜索,要求所有的分词必须同时出现在文档中,同时位置必须紧邻一致