ElasticSearch 6.x 学习笔记：17.词项查询

程序员文章站 2024-01-04 08:21:22

...

17.1 词项查询介绍

词项查询官网：
https://www.elastic.co/guide/en/elasticsearch/reference/6.1/term-level-queries.html

While the full text queries will analyze the query string before executing, the term-level queries operate on the exact terms that are stored in the inverted index.
全文查询将在执行之前分析查询字符串，但词项级别查询将按照存储在倒排索引中的词项进行精确操作。

These queries are usually used for structured data like numbers, dates, and enums, rather than full text fields. Alternatively, they allow you to craft low-level queries, foregoing the analysis process.
这些查询通常用于数字，日期和枚举等结构化数据，而不是全文本字段。或者，它们允许您制作低级查询，并在分析过程之前进行。

17.2 term查询

Find documents which contain the exact term specified in the field specified.

term查询用于词项搜索，已经在《7.3 文档搜索》和《15.检索入门》章节介绍，这里不再累述。

17.3 terms查询

Find documents which contain any of the exact terms specified in the field specified.
Filters documents that have fields that match any of the provided terms (not analyzed).
terms查询可以用来查询文档中包含任一个给定多词项的文档

同样，terms查询已经在《7.3 文档搜索》和《15.检索入门》章节介绍，这里不再累述。

17.4 terms_set查询

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-terms-set-query.html

terms_set查询是一个新的查询，它的语法将来可能会改变。

Find documents which match with one or more of the specified terms. The number of terms that must match depend on the specified minimum should match field or script.
查找与一个或多个指定词项匹配的文档，其中必须匹配的术语数量取决于指定的最小值，应匹配字段或脚本。

PUT my-index
{
    "mappings": {
        "doc": {
            "properties": {
                "required_matches": {
                    "type": "long"
                }
            }
        }
    }
}

PUT /my-index/doc/1?refresh
{
    "codes": ["ghi", "jkl"],
    "required_matches": 2
}

PUT /my-index/doc/2?refresh
{
    "codes": ["def", "ghi"],
    "required_matches": 2
}

最小值匹配的字段

GET /my-index/_search
{
    "query": {
        "terms_set": {
            "codes" : {
                "terms" : ["abc", "def", "ghi"],
                "minimum_should_match_field": "required_matches"
            }
        }
    }
}

查询结果

{
  "took": 66,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "codes": [
            "def",
            "ghi"
          ],
          "required_matches": 2
        }
      }
    ]
  }
}

An example that always limits the number of required terms to match to never become larger than the number of terms specified:
一个总是限制匹配条件数量永远不会超过指定词项数量的例子如下，其中params.num_terms参数在脚本中可用，以指示已指定的词项数。

GET /my-index/_search
{
    "query": {
        "terms_set": {
            "codes" : {
                "terms" : ["abc", "def", "ghi"],
                "minimum_should_match_script": {
                   "source": "Math.min(params.num_terms, doc['required_matches'].value)"
                }
            }
        }
    }
}

{
  "took": 282,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.5753642,
    "hits": [
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "2",
        "_score": 0.5753642,
        "_source": {
          "codes": [
            "def",
            "ghi"
          ],
          "required_matches": 2
        }
      }
    ]
  }
}

17.5 range查询

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-range-query.html

range查询用于匹配数值型、日期型或字符串型字段在某一范围内的文档。

【例子】搜索age字段在10到20的所有文档

DELETE my-index

PUT my-index

PUT my-index/doc/1
{"age":12}

PUT my-index/doc/2
{"age":18}

PUT my-index/doc/3
{"age":21}

GET _search
{
    "query": {
        "range" : {
            "age" : {
                "gte" : 10,
                "lte" : 20,
                "boost" : 2.0
            }
        }
    }
}

{
  "took": 24,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 2,
    "hits": [
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "2",
        "_score": 2,
        "_source": {
          "age": 18
        }
      },
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "1",
        "_score": 2,
        "_source": {
          "age": 12
        }
      }
    ]
  }
}

【例子】日期范围查询

GET website/_search
{
    "query": {
        "range" : {
            "postdate" : {
                "gte" : "2017-01-01",
                "lte" :  "2017-12-31",
                "format": "yyyy-MM-dd"
            }
        }
    }
}

{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "8",
        "_score": 1,
        "_source": {
          "title": "es高亮",
          "author": "程裕强",
          "postdate": "2017-01-03",
          "abstract": "Elasticsearch查询关键字高亮",
          "url": "http://url/53991802"
        }
      }
    ]
  }
}

17.5 exists查询

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-exists-query.html

Returns documents that have at least one non-null value in the original field
返回原始字段中至少包含一个非空值的文档

PUT my-index/doc/1
{ "user": "jane" }

PUT my-index/doc/2
{ "user": "" } 

PUT my-index/doc/3
{ "user": [] }

PUT my-index/doc/4
{ "user": ["jane", null ] }

PUT my-index/doc/5
{ "age": 28 }

GET /_search
{
    "query": {
        "exists" : { "field" : "user" }
    }
}

匹配结果

{
  "took": 18,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "2",
        "_score": 1,
        "_source": {
          "user": ""
        }
      },
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "4",
        "_score": 1,
        "_source": {
          "user": [
            "jane",
            null
          ]
        }
      },
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "jane"
        }
      }
    ]
  }
}

说明：
可以匹配的查询

“user”: “” ，有user字段，值非空（空字符串）
“user”: “jane”，有user字段，值非空
“user”: [“jane”,null]，有user字段，至少有一个值非空

不能匹配的文档

“user”: []，有user字段，值空
“age”: 28，没有user字段

17.6 prefix查询

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-prefix-query.html
【例子】查询以ki开头的用户

GET /_search
{ "query": {
    "prefix" : { "user" : "ki" }
  }
}

{
  "took": 37,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

【例子】查询以ja开头的用户

GET /_search
{ "query": {
    "prefix" : { "user" : "ja" }
  }
}

{
  "took": 50,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "4",
        "_score": 1,
        "_source": {
          "user": [
            "jane",
            null
          ]
        }
      },
      {
        "_index": "my-index",
        "_type": "doc",
        "_id": "1",
        "_score": 1,
        "_source": {
          "user": "jane"
        }
      }
    ]
  }
}

17.7 wildcard查询（通配符查询）

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-wildcard-query.html

GET website/_search
{
    "query": {
        "wildcard" : { "title" : "*yum*" }
    }
}

{
  "took": 39,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "6",
        "_score": 1,
        "_source": {
          "title": "CentOS更换国内yum源",
          "author": "程裕强",
          "postdate": "2016-12-30",
          "abstract": "CentOS更换国内yum源",
          "url": "http://url.cn/53946911"
        }
      }
    ]
  }
}

17.8 regexp查询（正则表达式查询）

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-regexp-query.html

The performance of a regexp query heavily depends on the regular expression chosen. Matching everything like .* is very slow as well as using lookaround regular expressions. If possible, you should try to use a long prefix before your regular expression starts. Wildcard matchers like .*?+ will mostly lower performance.
正则表达式查询的性能很大程度上取决于所选的正则表达式。类似.*的匹配任何内容的正则表达式非常缓慢，并且使用了lookaround正则表达式。如果可以的话，请尝试在正则表达式开始之前使用长前缀。像.*?+这样的通配符匹配器大多会降低性能。

Most regular expression engines allow you to match any part of a string. If you want the regexp pattern to start at the beginning of the string or finish at the end of the string, then you have to anchor it specifically, using ^ to indicate the beginning or $ to indicate the end.
大多数正则表达式引擎允许您匹配字符串的任何部分。如果你想让正则表达式模式从字符串的开头开始，或者在字符串的末尾完成，那么你必须明确地定位它，使用^表示开始或$表示结束。

元字符	语义	说明	例子
`.`	Match any character	The period “.” can be used to represent any character 匹配任何一个字符	`ab.`匹配abc、ab1
`+`	One-or-more	The plus sign “+” can be used to repeat the preceding shortest pattern once or more times. 加号“+”可以用来重复上一个最短的模式一次或多次。	“aaabbb”匹配a+b+
`*`	Zero-or-more	The asterisk “*” can be used to match the preceding shortest pattern zero-or-more times.	“aaabbb”匹配ab
`?`	Zero-or-one	The question mark “?” makes the preceding shortest pattern optional. It matches zero or one times.	“aaabbb”匹配aaa?bbbb?
`{m}`,`{m,n}`	Min-to-max	Curly brackets “{}” can be used to specify a minimum and (optionally) a maximum number of times the preceding shortest pattern can repeat.	“aaabbb”匹配a{3}b{3}和a{2,4}b{2,4}
`()`	Grouping	Parentheses “()” can be used to form sub-patterns.	“ababab”匹配`(ab)+`
`\|`	Alternation	The pipe symbol “\|” acts as an OR operator.	“aabb”匹配`aabb\|bbaa`
`[]`	Character classes	Ranges of potential characters may be represented as character classes by enclosing them in square brackets “[]”. A leading ^ negates the character class.	[abc]匹配 ‘a’ or ‘b’ or ‘c’
`~`	Complement	The shortest pattern that follows a tilde “~” is negated（否定）.“ab~cd”的意思是：以a开头，后跟b，后面跟一个任意长度的字符串，但不是c，以d结尾	“abcdef”匹配ab~df或a~(cb)def，不匹配ab~cdef和a~(bc)def
`<>`	Interval间隔	The interval option enables the use of numeric ranges, enclosed by angle brackets “<>”.	“foo80”匹配`foo<1-100>`
`&`	Intersection	The ampersand “&” joins two patterns in a way that both of them have to match.	“aaabbb”匹配aaa.+&.+bbb
`@`	Any string	The at sign “@” matches any string in its entirety.	`@&~(foo.+)`匹配除了以“foo”开头的字符串 “foo”

GET website/_search
{
    "query": {
        "regexp":{
            "title": "gc.*"
        }
    }
}

{
  "took": 20,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "CentOS升级gcc",
          "author": "程裕强",
          "postdate": "2016-12-25",
          "abstract": "CentOS升级gcc",
          "url": "http://url.cn/53868915"
        }
      }
    ]
  }
}

17.9 fuzzy查询（模糊查询）

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-fuzzy-query.html

GET website/_search
{
    "query": {
        "fuzzy":{
            "title": "vmwere"
        }
    }
}

{
  "took": 22,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.81735766,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "4",
        "_score": 0.81735766,
        "_source": {
          "title": "vmware复制虚拟机",
          "author": "程裕强",
          "postdate": "2016-12-29",
          "abstract": "vmware复制虚拟机",
          "url": "http://url.cn/53946664"
        }
      }
    ]
  }
}

17.9 type查询

GET /_search
{
    "query": {
        "type" : {
            "value" : "my_type"
        }
    }
}

{
  "took": 33,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 3,
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "2",
        "_score": 1,
        "_source": {
          "city": "York"
        }
      },
      {
        "_index": "index_1",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "text": "Document in index 1"
        }
      },
      {
        "_index": "my_index",
        "_type": "my_type",
        "_id": "1",
        "_score": 1,
        "_source": {
          "city": "New York"
        }
      }
    ]
  }
}

17.9 ids查询

https://www.elastic.co/guide/en/elasticsearch/reference/6.1/query-dsl-ids-query.html

GET /_search
{
    "query": {
        "ids" : {
            "type" : "blog",
            "values" : ["2", "3"]
        }
    }
}

{
  "took": 15,
  "timed_out": false,
  "_shards": {
    "total": 55,
    "successful": 55,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 2,
    "max_score": 1,
    "hits": [
      {
        "_index": "website",
        "_type": "blog",
        "_id": "2",
        "_score": 1,
        "_source": {
          "title": "watchman源码编译",
          "author": "程裕强",
          "postdate": "2016-12-23",
          "abstract": "CentOS7.x的watchman源码编译",
          "url": "http://url.cn/53844169"
        }
      },
      {
        "_index": "website",
        "_type": "blog",
        "_id": "3",
        "_score": 1,
        "_source": {
          "title": "CentOS升级gcc",
          "author": "程裕强",
          "postdate": "2016-12-25",
          "abstract": "CentOS升级gcc",
          "url": "http://url.cn/53868915"
        }
      }
    ]
  }
}