hbase 过滤器 rowfilter
hbase为筛选数据提供了一组过滤器,通过这个过滤器可以在hbase中的数据的多个维度(行,列,数据版本)上进行对数据的筛选操作,也就是说过滤器最终能够筛选的数据能够细化到具体的一个存储单元格上(由行键,列明,时间戳定位)。通常来说,通过行键,值来筛选数据的应用场景较多。
1.创建测试表studnet1
vi student1.java
import java.io.ioexception; import org.apache.hadoop.hbase.hbaseconfiguration; import org.apache.hadoop.hbase.client.hbaseadmin; import org.apache.hadoop.hbase.client.htable; import org.apache.hadoop.hbase.hcolumndescriptor; import org.apache.hadoop.hbase.htabledescriptor; import org.apache.hadoop.hbase.util.bytes; import org.apache.hadoop.hbase.client.put; public class student1{ public static void main(string[] args){ hbaseconfiguration config = new hbaseconfiguration(); config.set("hbase.zookeeper.quorum", "h201,h202,h203"); string tablename = new string("student1"); try{ hbaseadmin admin = new hbaseadmin(config); if(admin.tableexists(tablename)){ admin.disabletable(tablename); admin.deletetable(tablename); } htabledescriptor tabledesc = new htabledescriptor(tablename); tabledesc.addfamily(new hcolumndescriptor("cf1")); admin.createtable(tabledesc); admin.close(); htable table = new htable(config, bytes.tobytes("student1")); put put1 = new put(bytes.tobytes("a101")); put1.add(bytes.tobytes("cf1"),bytes.tobytes("name"),bytes.tobytes("zs1")); put put2 = new put(bytes.tobytes("a102")); put2.add(bytes.tobytes("cf1"),bytes.tobytes("name"),bytes.tobytes("ls1")); put put3 = new put(bytes.tobytes("a103")); put3.add(bytes.tobytes("cf1"),bytes.tobytes("name"),bytes.tobytes("ww1")); table.put(put1); table.put(put2); table.put(put3); table.close(); } catch(ioexception e) { e.printstacktrace(); } } }
- 使用过滤器
1.1
rowfilter:筛选出匹配的所有的行,对于这个过滤器的应用场景,是非常直观的:使用binarycomparator可以筛选出具有某个行键的行,或者通过改变比较运算符(comparefilter.compareop.equal)来筛选出符合某一条件的多条数据
rowfilter用于过滤row key
operator |
description |
less |
小于 |
less_or_equal |
小于等于 |
[equal |
等于 |
not_equal |
不等于 |
greater_or_equal |
大于等于 |
greater |
大于 |
no_op |
排除所有 |
comparator |
description |
binarycomparator |
使用bytes.compareto()比较 |
binaryprefixcomparator |
和binarycomparator差不多,从前面开始比较 |
regexstringcomparator |
正则表达式 |
substringcomparator |
把数据当成字符串,用contains()来判断 |
import java.io.ioexception; import org.apache.hadoop.hbase.hbaseconfiguration; import org.apache.hadoop.hbase.client.htable; import org.apache.hadoop.hbase.client.scan; import org.apache.hadoop.hbase.client.resultscanner; import org.apache.hadoop.hbase.client.result; import org.apache.hadoop.hbase.util.bytes; import org.apache.hadoop.hbase.keyvalue; import org.apache.hadoop.hbase.filter.rowfilter; import org.apache.hadoop.hbase.filter.filter; import org.apache.hadoop.hbase.filter.binarycomparator; import org.apache.hadoop.hbase.filter.comparefilter; public class hss1{ public static void main(string[] args){ hbaseconfiguration config = new hbaseconfiguration(); config.set("hbase.zookeeper.quorum", "h201,h202,h203"); try{ htable table = new htable(config, bytes.tobytes("student1")); scan scan = new scan(); filter filter1 = new rowfilter(comparefilter.compareop.equal,new binarycomparator("a101".getbytes())); scan.setfilter(filter1); resultscanner rst = table.getscanner(scan); for (result r:rst){ for (keyvalue kv : r.raw()) { stringbuffer s1 = new stringbuffer() .append(bytes.tostring(kv.getrow())).append(":") .append(bytes.tostring(kv.getfamily())).append(",") .append(bytes.tostring(kv.getqualifier())).append(",") .append(bytes.tostring(kv.getvalue())); system.out.println(s1.tostring()); } } rst.close(); table.close(); } catch(ioexception e) { e.printstacktrace(); } } }
1.2
prefixfilter:筛选出具有特定前缀的行键的数据。这个过滤器所实现的功能其实也可以由rowfilter结合regexstringcomparator来实现,不过这里提供了一种简便的使用方法
import org.apache.hadoop.hbase.filter.prefixfilter;
filter filter2 = new prefixfilter(bytes.tobytes("a"));
scan.setfilter(filter2);
1.3
regexcomparator 正则过滤
import org.apache.hadoop.hbase.filter.regexstringcomparator;
filter filter3 = new rowfilter(comparefilter.compareop.equal,new regexstringcomparator("^a.*"));
scan.setfilter(filter3);