【ElasticSearch】高亮搜索
文章目录
ElasticSearch:高亮搜索
概述
什么是highlight
Highlight就是我们所谓的高亮,即允许对一个或者对个字段在搜索结果中高亮显示。比如字体加粗或者字体呈现和其他文本普通颜色等。
为了执行高亮显示,该字段必须有实际的内容,并且这个字段必须存储,即在mapping中store设为true,不能只存在于内存中,否则系统会自动加载_source字段并匹配相关的列。
三种高亮类型
ES提供了三种高亮类型,Lucene的plain highlighter,以及fast vector highlighter(fvh)以及posting highlighter.
Plain Highlighter
Plain Hightlighter是默认的高亮选择,由使用Lucene Hightlighter实现的。它主要是试图反应查询匹配逻辑。
如果想高亮很多字段,而且带有复杂的查询,那么这个highlight并不是很快的。为了准确地反映查询逻辑,它创建了一个很小的内存索引。并通过Lucene的查询执行计划来重新运行原始的查询条件,从而获得对当前文档的低级匹配信息,每个字段和每个需要高亮显示的文档都会重复这个过程,所以是有性能隐患的。所以需要你换一个hightlight类型
Fast Vector Highlighter
如果我们在mapping中对字段指定了term_vector参数,且参数值是with_positions_offsets,那么fast vector highlighter 将会替代plain highlighter成为默认的highlight类型。
它的主要特点:
- 对磁盘的消耗更少
- 将文本切割为句子,并且对句子进行高亮,效果更好
- 性能比plain highlight高,因为不需要重新对高亮文本进行分词
Posting Highlighter
如果我们在mapping里index_options设置成offsets,这个posting hightlighter将会代替plain highlighter。
它对大文件而言(大于1M),性能更高。
示例
查询地址信息中含有mill或者Court的记录,并将它们高亮显示。
查询语句如下:
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "Court" } }
]
}
},
"highlight": {
"fields": {
"address": {}
}
}
}
查询结果如下:
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "[email protected]",
"city" : "Movico",
"state" : "MT"
},
"highlight" : {
"address" : [
"288 <em>Mill</em> Street"
]
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "18",
"_score" : 2.1248586,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
},
"highlight" : {
"address" : [
"467 Hutchinson <em>Court</em>"
]
}
}
发现它会自动在匹配字段上加上<em> </em>
标签
自定义高亮标签
语法如下:
"pre_tags": ["<tag1>"],
"post_tags": ["</tag2>"],
查询语句如下:
GET /bank/_search
{
"query": {
"bool": {
"should": [
{ "match": { "address": "mill" } },
{ "match": { "address": "Court" } }
]
}
},
"highlight": {
"pre_tags": ["<a>"],
"post_tags": ["</a>"],
"fields": {
"address": {}
}
}
}
查询结果如下:
{
"_index" : "bank",
"_type" : "account",
"_id" : "472",
"_score" : 5.4032025,
"_source" : {
"account_number" : 472,
"balance" : 25571,
"firstname" : "Lee",
"lastname" : "Long",
"age" : 32,
"gender" : "F",
"address" : "288 Mill Street",
"employer" : "Comverges",
"email" : "[email protected]",
"city" : "Movico",
"state" : "MT"
},
"highlight" : {
"address" : [
"288 <a>Mill</a> Street"
]
}
},
{
"_index" : "bank",
"_type" : "account",
"_id" : "18",
"_score" : 2.1248586,
"_source" : {
"account_number" : 18,
"balance" : 4180,
"firstname" : "Dale",
"lastname" : "Adams",
"age" : 33,
"gender" : "M",
"address" : "467 Hutchinson Court",
"employer" : "Boink",
"email" : "[email protected]",
"city" : "Orick",
"state" : "MD"
},
"highlight" : {
"address" : [
"467 Hutchinson <a>Court</a>"
]
}
}
发现高亮标签已经被替换
上一篇: 白名单屏蔽字 unicode字符范围
下一篇: 获取字符串中的中文字符长度