Elasticsearch学习笔记：索引结构中store, _all , index，copy_to 属性介绍，批量索引优势分析

程序员文章站 2022-07-09 19:10:34

...

禁用字段类型猜测
创建索引blog, 插入文档后新增document, 增加字段end

PUT /blog
{
  "mappings":{
    "article":{
      "dynamic":"false",
      "properties": {
         "id":{"type": "text"},
         "content":{"type": "text"},
         "author":{"type": "text"}
      }
    }
  }
}

PUT /blog/article/1
{
  "id":"1",
  "content":"2",
  "author":"wlf",
  "end":1
}
###查看mapping, 没有动态增加end 字段的映射
GET /blog/_mapping/article

索引结构映射
公共属性：
- index: true , false 是否为该field 创建索引，体现出来就是该字段是否可被查询

######title index 设置为false
PUT my_index1
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "store": false,
          "index": false
        },
        "date": {
          "type": "date",
          "store": false 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index1/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

GET my_index1/_doc/_search        ####title 字段 index 为 false, 查询时会报错
{
      "query" : {
     "bool" : {
      "must":    
      { "match": { "title": "Some short title" }}
    }
  }
}

stroe: 默认情况下，原始字段值被编入索引，但是原始字段不存储，只存储索引，以使它们可被搜索。这意味着可以查询该字段，但不能检索原始字段值。但是该字段值已经存储在_source 字段里，也是能检索出原始字段的，所以stroe 默认为false；在某些情况下，它可能对store某个领域有意义。例如，如果您的文档包含a title，a date和非常大的content 字段，则您可能只想检索title和date不必从大_source字段中提取这些字段：

DELETE my_index
PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "title": {
          "type": "text",
          "store": true 
        },
        "date": {
          "type": "date",
          "store": true 
        },
        "content": {
          "type": "text"
        }
      }
    }
  }
}

PUT my_index/_doc/1
{
  "title":   "Some short title",
  "date":    "2015-01-01",
  "content": "A very long content field..."
}

####不需要查询出content字段
GET my_index/_search
{
  "stored_fields": [ "title", "date" ]    
}

_all: 在6.0+ 中，该字段默认被禁用，同时在创建index的时候不能 enable；_all 字段能捕获所有字段，它将所有其他字段的值连接成一个大字符串，使用空格作为分隔符，然后进行分析和索引，但不存储。这意味着它可以被搜索，但不能被检索。建议使用 copy_to 实现用户自定义的_all 功能

PUT myindex
{
  "mappings": {
    "mytype": {
      "_all": {"enabled": true},
      "properties": {
        "title": { 
          "type": "text",
          "boost": 2
        },
        "content": { 
          "type": "text"
        }
      }
    }
  }
}
#####结果报错
{
  "error": {
    "root_cause": [
      {
        "type": "mapper_parsing_exception",
        "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
      }
    ],
    "type": "mapper_parsing_exception",
    "reason": "Failed to parse mapping [mytype]: Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field.",
    "caused_by": {
      "type": "illegal_argument_exception",
      "reason": "Enabling [_all] is disabled in 6.0. As a replacement, you can use [copy_to] on mapping fields to create your own catch all field."
    }
  },
  "status": 400
}

copy_to: 指定一个字段，使用的字段值复制到改字段

PUT myindex
{
  "mappings": {
    "mytype": {
      "properties": {
        "first_name": {
          "type":    "text",
          "copy_to": "full_name" 
        },
        "last_name": {
          "type":    "text",
          "copy_to": "full_name" 
        },
        "full_name": {
          "type":    "text"
        }
      }
    }
  }
}

PUT myindex/mytype/1
{
  "first_name": "John",
  "last_name": "Smith"
}

GET myindex/_search
{
  "query": {
    "match": {
      "full_name": "John Smith"
    }
  }
}

批量索引
1. 支持操作有index，create，delete和update；
  语法：每一行以换行符结束
  { action: { metadata }}\n
  { request body }\n
  { action: { metadata }}\n
  { request body }\n

POST /test_index/_bulk
{ "delete": { "_type": "test_type", "_id": "3" }} 
{ "create": { "_type": "test_type", "_id": "12" }}
{ "test_field":    "test12" }
{ "index":  { "_type": "test_type" }}
{ "test_field":    "auto-generate id test" }
{ "index":  { "_type": "test_type", "_id": "2" }}
{ "test_field":    "replaced test2" }
{ "update": { "_type": "test_type", "_id": "1", "_retry_on_conflict" : 3} }
{ "doc" : {"test_field2" : "bulk test1"} }

bulk操作中，任意一个操作失败，是不会影响其他的操作的，但是会返回结果
bulk size： bulk request会加载到内存里，如果太大的话，性能反而会下降，因此需要反复尝试一个最佳的bulk size。一般从1000~5000条数据开始，尝试逐渐增加。另外，如果看大小的话，最好是在5~15MB之间。

为什么不用易读的 jsonArray结构呢：
[{
“action”: {
},
“data”: { }}]
将json数组解析为JSONArray对象，这个时候，整个数据，就会在内存中出现一份一模一样的拷贝，一份数据是json文本，一份数据是JSONArray对象;
假设说现在100个bulk请求发送到了一个节点上去，然后每个请求是10MB，100个请求，就是1000MB = 1GB，然后每个请求的json都copy一份为jsonarray对象，此时内存中的占用就会翻倍，就会占用2GB的内存。

占用更多的内存可能就会积压其他请求的内存使用量，比如说最重要的搜索请求，分析请求，等等，此时就可能会导致其他请求的性能急速下降；另外，占用内存更多，就会导致java虚拟机的垃圾回收次数更多，跟频繁，每次要回收的垃圾对象更多，耗费的时间更多，导致es的java虚拟机停止工作线程的时间更多

使用json 占每一行的结构，不用将其转换为json对象，不会出现内存中的相同数据的拷贝，直接按照换行符切割json，能提高性能

Elasticsearch学习笔记：索引结构中store, _all , index，copy_to 属性介绍， 批量索引优势分析

Elasticsearch学习笔记：索引结构中store, _all , index，copy_to 属性介绍， 批量索引优势分析

Elasticsearch学习笔记：索引结构中store, _all , index，copy_to 属性介绍，批量索引优势分析

Elasticsearch学习笔记：索引结构中store, _all , index，copy_to 属性介绍，批量索引优势分析