Filters in HBase (or intra row scanning part II)
程序员文章站
2022-04-08 15:34:52
...
Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...). Intras row scanning can be done using ColumnRa
Filters in HBase are a somewhat obscure and under-documented feature. (Even us committers are often not aware of their usefulness - see HBASE-5229, and HBASE-4256... Or maybe it's just me...).Intras row scanning can be done using ColumnRangeFilter. Other filters such as ColumnPrefixFilter or MultipleColumnPrefixFilter might also be handy for this. All three filters have in common that they can provide scanners (see scanning in hbase) with what I will call "seek hints". These hints allow a scanner to seek to the next column, the next row, or an arbitrary next cell determined by the filter. This is far more efficient than having a dumb filter that is passed each cell and determines whether the cell is included in the result or not.
Many other filters also provide these "seek hints". The exception here are filters that filter on column values, as there is no inherent ordering between column values; these filters need to look at the value for each column.
For example check out this code in MultipleColumnPrefixFilter (ASF 2.0 license):
TreeSet
(TreeSet
if (lesserOrEqualPrefixes.size() != 0) {
byte [] largestPrefixSmallerThanQualifier = lesserOrEqualPrefixes.last();
if (Bytes.startsWith(qualifier, largestPrefixSmallerThanQualifier)) {
return ReturnCode.INCLUDE;
}
if (lesserOrEqualPrefixes.size() == sortedPrefixes.size()) {
return ReturnCode.NEXT_ROW;
} else {
hint = sortedPrefixes.higher(largestPrefixSmallerThanQualifier);
return ReturnCode.SEEK_NEXT_USING_HINT;
}
} else {
hint = sortedPrefixes.first();
return ReturnCode.SEEK_NEXT_USING_HINT;
}
(the
See how this code snippet allows the filter to
- seek to the next row if all prefixes are know to be less or equal the current qualifier (and the largest didn't match the passed column qualifier). Note that a single seek to the next row can potentially skip millions of columns with a single seek operation.
- seek to the next larger prefix if there are more prefixes, but the current does not match the qualifier.
- seek to the first prefix (the smallest) if none the prefixes are less or equal to the current qualifier.
I'm in the process of adding more information for these Filter to the HBase
原文地址:Filters in HBase (or intra row scanning part II), 感谢原作者分享。