Java代码解决ElasticSearch的Result window is too large问题
程序员文章站
2023-11-06 14:47:04
调用ElasticSearch做分页查询时报错: 提示用from+size方式有1万条数据查询的限制,需要更改index.max_result_window参数的值。 翻了下elasticsearch官网的文档: 说是用传统方式(from + size)查询占用内存空间且比较消耗时间,所以做了限制。 ......
调用elasticsearch做分页查询时报错:
queryphaseexecutionexception[result window is too large, from + size must be less than or equal to: [10000] but was [666000]. see the scroll api for a more efficient way to request large data sets. this limit can be set by changing the [index.max_result_window] index level setting.]; }
提示用from+size方式有1万条数据查询的限制,需要更改index.max_result_window参数的值。
翻了下elasticsearch官网的文档:
index.max_result_window the maximum value of from + size for searches to this index.defaults to 10000.
search requests take heap memory and time proportional to from + size and this limits that memory.
see scroll or search after for a more efficient alternative to raising this.
说是用传统方式(from + size)查询占用内存空间且比较消耗时间,所以做了限制。
问题是用scroll方式做后台分页根本行不通。
不说用scroll方式只能一页页的翻这种不人性化的操作。页码一多,scrollid也很难管理啊。
所以继续鼓捣传统方式的分页。
上网查了下设置max_result_window的方法,全都是用crul或者http方式改的。
后来无意间看到了一篇文档: https://blog.csdn.net/tzconn/article/details/83309516
结合之前逛elastic中文社区的时候知道这个参数是索引级别的。于是小试了一下,结果竟然可以了。
java代码如下:
public searchresponse search(string logindex, string logtype, querybuilder query,
list<aggregationbuilder> agg, int page, int size) { page = page > 0 ? page - 1 : page; transportclient client = getclient(); searchrequestbuilder searchrequestbuilder = client.preparesearch(logindex.split(",")) .settypes(logtype.split(",")) .setsearchtype(searchtype.dfs_query_then_fetch) .addsort("createtime", sortorder.desc); if (agg != null && !agg.isempty()) { for (int i = 0; i < agg.size(); i++) { searchrequestbuilder.addaggregation(agg.get(i)); } } updateindexs(client, logindex, page, size); searchresponse searchresponse = searchrequestbuilder .setquery(query) .setfrom(page * size) .setsize(size) .get(); return searchresponse; } //更新索引的max_result_window参数 private boolean updateindexs(transportclient client, string indices, int from, int size) { int records = from * size + size; if (records <= 10000) return true; updatesettingsresponse indexresponse = client.admin().indices() .prepareupdatesettings(indices) .setsettings(settings.builder() .put("index.max_result_window", records) .build() ).get(); return indexresponse.isacknowledged(); }
搞定。
当然这段代码不好的地方在于:
每次查询超过10000万条记录的时候,都会去更新一次index。
这对原本就偏慢的from+size查询来说,更是雪上加霜了。