Elasticsearch7.5配置IK中文分词器+拼音分词
程序员文章站
2022-07-09 18:50:00
...
1. 安装插件
1.1 安装插件
拼音分词器:https://github.com/medcl/elasticsearch-analysis-pinyin
中文分词器:https://github.com/medcl/elasticsearch-analysis-ik
找到自己对应的自己的Elasticsearch版本的插件进行安装
- Elasticsearch 7.5.1
- elasticsearch-analysis-ik 7.5.1
- elasticsearch-analysis-pinyin 7.5.1
直接进入Elasticsearch安装目录下,依次进行在线安装
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v7.5.1/elasticsearch-analysis-ik-7.5.1.zip
./bin/elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-pinyin/releases/download/v7.5.1/elasticsearch-analysis-pinyin-7.5.1.zip
安装完成后需要重启 elasticsearch,然后测试分词器是否OK,正常情况下会出现一堆分词结果
1.2 测试中文分词器
POST http://data:9200/_analyze
{
"analyzer":"ik_smart",
"text":"新型冠状病毒"
}
分词结果
{
"tokens": [
{
"token": "新型",
"start_offset": 0,
"end_offset": 2,
"type": "CN_WORD",
"position": 0
},
{
"token": "冠状病毒",
"start_offset": 2,
"end_offset": 6,
"type": "CN_WORD",
"position": 1
}
]
}
1.3 测试拼音分词器
POST http://data:9200/_analyze
{
"analyzer":"pinyin",
"text":"新型冠状病毒"
}
分词结果
{
"tokens": [
{
"token": "xin",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "xxgzbd",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 0
},
{
"token": "xing",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 1
},
{
"token": "guan",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 2
},
{
"token": "zhuang",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 3
},
{
"token": "bing",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 4
},
{
"token": "du",
"start_offset": 0,
"end_offset": 0,
"type": "word",
"position": 5
}
]
}
2. 修改解析器
修改分词器,以下所有操作均是对song 索引库进行的操作
2.1 关闭索引
首先关闭索引,否则会报错的
POST http://data:9200/song/_close
{
}
2.2 配置IK+拼音分词
然后自定义分词器,我这里使用的IK_SMART+拼音
PUT http://data:9200/song/_settings
{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_smart",
"filter": "pinyin_filter"
}
},
"filter": {
"pinyin_filter": {
"type": "pinyin",
"keep_first_letter": false
}
}
}
}
}
你也可以使用IK_MAX_WORD + 拼音分词
PUT http://data:9200/song/_settings
{
"index": {
"analysis": {
"analyzer": {
"ik_pinyin_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": "pinyin_filter"
}
},
"filter": {
"pinyin_filter": {
"type": "pinyin",
"keep_first_letter": false
}
}
}
}
}
2.3 开启索引
POST http://data:9200/song/_open
{
}