欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

【ElasticSearch】高亮搜索

程序员文章站 2022-07-05 14:49:03
...

ElasticSearch:高亮搜索

概述

什么是highlight

Highlight就是我们所谓的高亮,即允许对一个或者对个字段在搜索结果中高亮显示。比如字体加粗或者字体呈现和其他文本普通颜色等。

为了执行高亮显示,该字段必须有实际的内容,并且这个字段必须存储,即在mapping中store设为true,不能只存在于内存中,否则系统会自动加载_source字段并匹配相关的列。

三种高亮类型

ES提供了三种高亮类型,Lucene的plain highlighter,以及fast vector highlighter(fvh)以及posting highlighter.

Plain Highlighter

Plain Hightlighter是默认的高亮选择,由使用Lucene Hightlighter实现的。它主要是试图反应查询匹配逻辑。

如果想高亮很多字段,而且带有复杂的查询,那么这个highlight并不是很快的。为了准确地反映查询逻辑,它创建了一个很小的内存索引。并通过Lucene的查询执行计划来重新运行原始的查询条件,从而获得对当前文档的低级匹配信息,每个字段和每个需要高亮显示的文档都会重复这个过程,所以是有性能隐患的。所以需要你换一个hightlight类型

Fast Vector Highlighter

如果我们在mapping中对字段指定了term_vector参数,且参数值是with_positions_offsets,那么fast vector highlighter 将会替代plain highlighter成为默认的highlight类型。

它的主要特点:

  1. 对磁盘的消耗更少
  2. 将文本切割为句子,并且对句子进行高亮,效果更好
  3. 性能比plain highlight高,因为不需要重新对高亮文本进行分词
Posting Highlighter

如果我们在mapping里index_options设置成offsets,这个posting hightlighter将会代替plain highlighter。

它对大文件而言(大于1M),性能更高。

示例

查询地址信息中含有mill或者Court的记录,并将它们高亮显示。

查询语句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "fields": {
      "address": {}
    }
  }
}

查询结果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <em>Mill</em> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <em>Court</em>"
        ]
    }
}

发现它会自动在匹配字段上加上<em> </em>标签

自定义高亮标签

语法如下:

"pre_tags": ["<tag1>"],
"post_tags": ["</tag2>"],

查询语句如下:

GET /bank/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "address": "mill" } },
        { "match": { "address": "Court" } }
      ]
    }
  }, 
  "highlight": {
    "pre_tags": ["<a>"],
    "post_tags": ["</a>"], 
    "fields": {
      "address": {}
    }
  }
}

查询结果如下:

{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "472",
    "_score" : 5.4032025,
    "_source" : {
        "account_number" : 472,
        "balance" : 25571,
        "firstname" : "Lee",
        "lastname" : "Long",
        "age" : 32,
        "gender" : "F",
        "address" : "288 Mill Street",
        "employer" : "Comverges",
        "email" : "[email protected]",
        "city" : "Movico",
        "state" : "MT"
    },
    "highlight" : {
        "address" : [
            "288 <a>Mill</a> Street"
        ]
    }
},
{
    "_index" : "bank",
    "_type" : "account",
    "_id" : "18",
    "_score" : 2.1248586,
    "_source" : {
        "account_number" : 18,
        "balance" : 4180,
        "firstname" : "Dale",
        "lastname" : "Adams",
        "age" : 33,
        "gender" : "M",
        "address" : "467 Hutchinson Court",
        "employer" : "Boink",
        "email" : "[email protected]",
        "city" : "Orick",
        "state" : "MD"
    },
    "highlight" : {
        "address" : [
            "467 Hutchinson <a>Court</a>"
        ]
    }
}

发现高亮标签已经被替换