elasticsearch入门使用（二） Mapping + field type字段类型

程序员文章站 2022-07-05 08:04:24

...

Elasticsearch Reference [6.2] » Mapping
参考官方英文文档 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping.html

部分内容参考：https://www.cnblogs.com/ljhdo/p/4981928.html

Mapping 是定义文档及其包含的字段如何存储和编制索引的过程，每个索引都有一个映射类型，用于确定文档将如何编制索引。

一、Meta-fields
包括文档的_index，_type，_id和_source字段

二、es字段数据类型：
https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-types.html

字符串类型
text 、 keyword
数值类型
long, integer, short, byte, double, float, half_float, scaled_float
日期类型
date
布尔值类型
boolean
二进制类型
binary
范围类型
integer_range, float_range, long_range, double_range, date_range
Array数据类型(Array不需要定义特殊类型)

[ "one", "two" ]
[ 1, 2 ]
[{ "name": "Mary", "age": 12 },{ "name": "John", "age": 10}]

Object数据类型（json嵌套）

{ 
  "region": "US",
  "manager": { 
    "age":     30,
    "name": { 
      "first": "John",
      "last":  "Smith"
    }
  }
}

地理数据类型
Geo-point，Geo-Shape(比较复杂，参考官网文档，一般用Geo-point就可以了)
特殊数据类型
ip(IPv4 and IPv6 addresses)
completion(自动完成/搜索)
token_count (数值类型，分析字符串，索引的数量)
murmur3 (索引时计算字段值的散列并将它们存储在索引中的功能。在高基数和大字符串字段上运行基数聚合时有很大帮助)
join (同一索引的文档中创建父/子关系)

以下是常用的参数类型定义&赋值demo

类型	参数定义	赋值
text	"name":{"type":"text"}	"name": "zhangsan"
keyword	"tags":{"type":"keyword"}	"tags": "food"
date	"date":{"type": "date"}	"date":"2015-01-01T12:10:30"
long	"age":{"type":"long"}	"age" :28
double	"score":{"type":"double"}	"score":98.8
boolean	"isgirl": { "type": "boolean" }	"isgirl" :true
ip	"ip_addr":{"type":"ip"}	"ip_addr": "192.168.1.1"
geo_point	"location": {"type":"geo_point"}	"location":{"lat":40.12,"lon":-71.34}

三、Mapping parameters

https://www.elastic.co/guide/en/elasticsearch/reference/6.2/mapping-params.html 带*是常用的字段属性

Parameters	默认值	备注
*analyzer	"standard"	standard/simple/stop/keyword/whitespace/(lang:english)字符串分析器,keyword意思是不分词内容整体作为一个token
normalizer	-	统一设置标准化分词，mapping里的字段可以使用同样的分词器
boost	1.0	字段在文档中的权重
coerce	true	字符串强制转换为数字
copy_to	-	例如将firstname和lastname复制到fullname
doc_values	true	创建索引的时候存储在磁盘的数据结构，不需要排序和聚合改为false节省磁盘空间
*dynamic	true	ture/false/strict允许动态添加字段，不建议设为true
enabled	true	只存储不索引或聚合，例如session会话存储
fielddata		字符串专用，查询时将term-document关系存储在内存中
eager_global_ordinals		自增唯一编号
*format		"format": "yyyy-MM-dd hh:mm:ss"
ignore_above	0	int,超过这个长度的字符串不会被索引和存储,0代表不限制
ignore_malformed		设置为true允许错误的数据类型索引到字段中引起的异常
index_options	positions/docs	docs(只索引文档编号)/freqs(索引文档编号和词频)/positions(索引文档编号/词频/词位置)/offsets(索引文档编号/词频/词偏移量/词位置) ,被索引的字段默认用positions，其他的docs
*index	"analyzed"	analyzed/not_analyzed/no 字段值是否被索引,设置no的字段不可查询，参考中文文档
fields		相同的字段设置不同的方式
norms	true	score评分相关，会占用一定的磁盘空间，不需要可以关闭
null_value	null	空值不能被索引和搜索，用字符串"NULL"代替空值 "null_value": "NULL"
position_increment_gap	100	当索引多个值的文本时支持临近或短语查询，或值之间的间隙
properties	-	在创建索引时定义字段的属性
*search_analyzer	索引的分词器	一般索引和搜索用同样的分词器，如需不一样可更改
similarity	"BM25"	BM25/classic/boolean，主要用于文本字段的相似度算法
*store	false	默认情况字段被索引可以搜索，但没有存储原始值且不能用原始值查询，_resource包含了所有的值，当大段文本需要搜索时可以修改为true
term_vector	"no"	no/yes/with_positions/with_offsets/with_positions_offsets 分词向量，分析过程产生的术语

上一篇： Elasticsearch入门 - 简单使用

下一篇： ElasticSearch简单入门（二）