druid查询
druid为各种场景提供了丰富的查询类型。 查询由各种JSON属性组合而成,不同类型的查询,JSON属性不同,下面主要介绍常用的查询类型。
1. Components
-
Datasources
一个数据源等价于druid表。此外,一个查询也可以作为数据源,提供类似于子查询的功能。查询数据源目前只支持GroupBy查询
-
Table Data Source
最常用的类型
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Union Data Source
联合数据源中的数据源必须拥有相同的schema。联合查询只能被发送给broker/router节点,不支持直接发送到历史节点
{ "type": "union", "dataSources": ["<string_value1>", "<string_value2>", "<string_value3>", ... ] }
-
-
Filters
-
Selector filter
等价于sql的
where countryIsoCode = 'US'
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "selector", "dimension": "countryIsoCode", "value": "US" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Column Comparison filter
等价于sql的
where countryName = cityName
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "columnComparison", "dimensions": [ "countryName", "cityName" ] }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Regular expression filter
正则表达式,支持标准的java正则表达式,下面的查询表示countryIsoCode以U开头
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "regex", "dimension": "countryIsoCode", "pattern": "^U" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Logical expression filters
支持and or not,下面的等价于
where countryIsoCode = 'US' and cityName = 'New York'
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "and", "fields": [ { "type": "selector", "dimension": "countryIsoCode", "value": "US" }, { "type": "selector", "dimension": "cityName", "value": "New York" } ] }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
JavaScript filter
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "javascript", "dimension" : "countryIsoCode", "function" : "function(value) { return (value == 'US' || value == 'CN') }" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Search filter
用于部分字符串匹配,如下面的表示包含foo,并且对大小写不敏感
{ "filter": { "type": "search", "dimension": "product", "query": { "type": "insensitive_contains", "value": "foo" } } }
-
In filter
等价于
where countryIsoCode in ('US', 'CN')
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "in", "dimension" : "countryIsoCode", "values": ["US", "CN"] }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Like filter
等价于
where countryIsoCode like '%U'
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "like", "dimension" : "countryIsoCode", "pattern": "%U" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
Bound filter
等价于
"CN" < countryIsoCode < "US"
{ "queryType": "scan", "dataSource": { "type": "table", "name": "wikipedia3" }, "filter": { "type": "bound", "dimension" : "countryIsoCode", "lower": "CN", "lowerStrict": true, "upper": "US" , "ordering": "numeric", "upperStrict": true, "ordering": "lexicographic" }, "resultFormat": "list", "columns": [ "page", "countryName", "cityName", "countryIsoCode" ], "intervals": [ "2016-06-27/2016-06-28" ], "limit": 5 }
-
-
Aggregations
-
Count aggregator
select page, count(*) as num from wikipedia3 where "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-28 00:00:00' group by page order by num desc limit 5
{ "queryType": "topN", "dataSource": "wikipedia3", "dimension": "page", "threshold": 5, "metric": "num", "granularity": "all", "aggregations": [ { "type": "count", "name": "num" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
-
Sum aggregators
longSum、doubleSum、floatSum
select page, sum(delta) as num from wikipedia3 where "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-28 00:00:00' group by page order by page asc limit 5
{ "queryType": "topN", "dataSource": "wikipedia3", "dimension": "page", "threshold": 5, "metric": "num", "granularity": "all", "aggregations": [ { "type": "longSum", "name": "num", "fieldName" : "delta" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
-
Min / Max aggregators
doubleMin、doubleMax、floatMin、floatMax、longMin、longMax
select page, max(delta) as num from wikipedia3 where "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-28 00:00:00' group by page order by page asc limit 5
{ "queryType": "topN", "dataSource": "wikipedia3", "dimension": "page", "threshold": 5, "metric": "num", "granularity": "all", "aggregations": [ { "type": "longMax", "name": "num", "fieldName" : "delta" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
-
First / Last aggregator
不能在数据摄入的时候使用,只能用于查询
Last:最大时间戳对应的数据,0 if no row exist;First最小时间戳对应的数据,0 if no row exist
-
JavaScript aggregator
-
-
Post Aggregations
对Aggregations的结果进行二次加工并输出,最终的结果既包含Aggregations的结果也包含Post Aggregations的结果
2. Timeseries
统计一段时间内的汇总数据
SELECT count(*) as num, sum(added) FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59'
{ "queryType": "timeseries", "dataSource": "wikipedia3", "granularity": "all", "aggregations": [ { "type": "count", "name": "count" }, { "type": "longSum", "name": "added", "fieldName": "added" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
3. TopN
返回前N条数据,并可以按照metric排序,可以支持维度,但只有一个
SELECT page, sum(added) as num FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59' group by page order by num desc limit 5
{ "queryType": "topN", "dataSource": "wikipedia3", "dimension": "page", "threshold": 5, "metric": "added", "granularity": "all", "aggregations": [ { "type": "doubleSum", "name": "added", "fieldName": "added" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
4. GroupBy
能对指定的多个维度分组,也支持对指定的维度排序,也支持limit,但是性能比TopN和Timeseries要差很多
SELECT page, countryName, sum(added) as num, sum(delta) as num2 FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59' group by page,countryName order by num desc limit 5
{ "queryType": "groupBy", "dataSource": "wikipedia3", "granularity": "all", "dimensions": [ "page", "countryName" ], "limitSpec": { "type": "default", "limit": 5, "columns": [ { "dimension": "added", "direction": "descending", "dimensionOrder": { "type": "numeric" } } ] }, "aggregations": [ { "type": "longSum", "name": "added", "fieldName": "added" }, { "type": "longSum", "name": "delta", "fieldName": "delta" } ], "intervals": [ "2016-06-27/2016-06-28" ] }
5. Search
类似于like操作,可以查询多个维度列,不支持聚合
SELECT page, countryName FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59' and page like '%C' or countryName like '%C' limit 5
{ "queryType": "search", "dataSource": "wikipedia3", "granularity": "all", "dimensions": [ "page", "countryName" ], "query": { "type": "insensitive_contains", "value": "C" }, "sort" : { "type": "lexicographic" }, "limit": 5, "intervals": [ "2016-06-27/2016-06-28" ] }
6. Select
查数据,不支持聚合,但支持分页,排序
SELECT * FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59' limit 0,5
{ "queryType": "select", "dataSource": "wikipedia3", "granularity": "all", "dimensions":[], "metrics":[], "pagingSpec":{"pagingIdentifiers": {}, "threshold":5}, "intervals": [ "2016-06-27/2016-06-28" ] }
7. Scan
类似于Select,但不支持分页,但是如果没有分页需求,推荐使用这个,性能比Select好
SELECT page,countryName FROM wikipedia WHERE "__time" BETWEEN TIMESTAMP '2016-06-27 00:00:00' AND TIMESTAMP '2016-06-27 23:59:59' limit 5
{ "queryType": "scan", "dataSource": "wikipedia3", "resultFormat": "list", "columns":["page","countryName"], "intervals": [ "2016-06-27/2016-06-28" ], "batchSize":20480, "limit":5 }
上一篇: hadoop 提交程序并监控运行
下一篇: 自作孽不可活啊!