Elasticsearch _reindex 操作说明

程序员文章站 2022-07-09 18:49:54

...


# _reindex 用于重建索引，并可在重建时提取字段，也可跨集群复制
# 由于索引的某些配置是不可变的，如: 主分片数量、Mapping映射等，因此可通过重建索引的方式进行修改
# 重建索引的操作不会复制索引的配置信息，因此需提前设置，或为其创建 Template
# 最好在重建索引前将目标索引的副本数设为 0，并关闭刷新来加快写入进度

# ------------------------------------------------------------------------ 复制索引

POST _reindex
{
    "source": {
        "index": "xxxxx-*"
        "size": 10000           # batch size ... 默认 1000
    },
    "dest":{
        "index": "xxxxx-new"
    }
}

# ------------------------------------------------------------------------ 携带查询条件的复制

POST _reindex
{
  "source": {
    "index": "source",
    "type": "user",
    "query": {
      "match": {
        "company": "cat"
      }
    }
  },
  "dest": {
    "index": "dest",
    "type": "_doc",
    "routing": "=cat"
  }
}

# ------------------------------------------------------------------------ 修改索引类型及文档ID

POST _reindex
{
  "source": {
    "index": "twitter"
  },
  "dest": {
    "index": "new_twitter"
  },
  "script": {
    "source": """
      ctx._source.type = ctx._type;
      ctx._id = ctx._type + '-' + ctx._id;
      ctx._type = '_doc';
    """
  }
}

# ------------------------------------------------------------------------ 修改索引内容

POST _reindex
{
  "source": {
    "index": "metricbeat-*"
  },
  "dest": {
    "index": "metricbeat"
  },
  "script": {
    "lang": "painless",
    "source": "ctx._index = 'metricbeat-' + (ctx._index.substring('metricbeat-'.length(), ctx._index.length())) + '-1'"
  }
}

# ------------------------------------------------------------------------ 远程复制

POST _reindex
{
  "source": {
    "remote": {
      "host": "http://otherhost:9200",
      "username": "user",
      "password": "pass"
      "socket_timeout": "1m",
      "connect_timeout": "10s"
    },
    "index": "my-index-000001",
    "size": 3000,                   # 每次获取3000条数据
    "query": {
      "match": {
        "test": "data"
      }
    }
  },
  "dest": {
    "index": "my-new-index-000001"
  }
}

# Tips:
# 被复制的远程主机必须在 elasticsearch.yaml 中使用 reindex.remote.whitelist 属性显式列入白名单 (在执行复制的采集端编辑)
# reindex.remote.whitelist: "otherhost:9200, another:9200, 127.0.10.*:9200, localhost:*"

使用 Logstash 执行数据迁移 ( 先创建Mapping )


# 使用logstash:
# 执行后会将源的所有index全部copy到目标集群，并将mapping信息携带过去，随后开始逐步做index内的数据迁移
# 建议：正式执行前先测试: stdout { codec => rubydebug { metadata => true } }

# Metadata：
# logstash 1.5 版后使用 metadata 的概念来描述1次event并允许被用户修改，但不会写到event的结果中对event的结果产生影响
# 除此之外 metadata 作为事件的元数据描述信息，可在 input、filter、output 三种插件的执行周期内存活 ...

# docinfo
# elasticsearch input插件中的一个参数，默认是false
# 原文是 "If set, include Elasticsearch document information such as index, type, and the id in the event."
# 意味着设置此字段后会将 index、type、id 等信息全部记录到 event 中去，即 metadata
# 这也就意味着可以在整个 event 执行周期内随意的使用 index、type、id 这些参数了 ...

# elasticsearch input插件中的index参数支持通配符，可使用"*"这样的模糊匹配通配符来表示所有对象 
# 由于metadata的特性，我们可以在output中直接"继承"input中的index、type信息
# 并在目标集群中直接创建和源集群一致的index和type，甚至文档id（还需要处理映射!）

input {
    elasticsearch {
        hosts => ["XX.XX.XX.XX:9212","XX.XX.XX.XX:9212","XX.XX.XX.XX:9212"]
        index => "<INDEX>"
        size => 1000
        scroll => "5m"
        docinfo => true
        user => 'username...'
        password => "pass...."
    }
}

# filter {
#     mutate {
#         remove_field => ["@version"]
#     }
# }

output {
    elasticsearch {
        hosts => ["XX.XX.XX.XX:9212","XX.XX.XX.XX:9212","XX.XX.XX.XX:9212"] 
        index => "%{[@metadata][_index]}"
        action => "create"      # 为文档建立索引，如果索引中已经存在具有该ID的文档，则该索引将失败。
        user => "elasticsearch"
        password => "elastic"
        codec => "json"
    }
}

Elasticsearch _reindex 操作说明

使用 Logstash 执行数据迁移 ( 先创建Mapping )

Linux中配置双机SSH信任操作说明

使用Python操作Elasticsearch数据索引的教程

久闻网作家后台操作说明

奥维互动地图PC版GPS设备连接及定位操作说明介绍

php 数组操作(增加，删除，查询，排序)等函数说明第1/2页

Elasticsearch 操作

PHP的范围解析操作符(::)的含义分析说明

Java中使用elasticsearch搜索引擎实现简单、修改等操作

二、winForm-DataGridView操作——DataGridView 操作、属性说明

Linux操作系统的启动步骤详细说明

Elasticsearch _reindex 操作说明

使用 Logstash 执行数据迁移 ( 先创建Mapping )

Linux中配置双机SSH信任 操作说明

使用Python操作Elasticsearch数据索引的教程

久闻网作家后台操作说明

奥维互动地图PC版GPS设备连接及定位操作说明介绍

php 数组操作(增加，删除，查询，排序)等函数说明第1/2页

Elasticsearch 操作

PHP的范围解析操作符(::)的含义分析说明

Java中使用elasticsearch搜索引擎实现简单、修改等操作

二、winForm-DataGridView操作——DataGridView 操作、属性说明

Linux操作系统的启动步骤详细说明

Linux中配置双机SSH信任操作说明