[hadoop2.7.1]I/O之SequenceFile最新API编程实例(写入、读取)
程序员文章站
2024-01-15 17:05:16
...
写操作
根据上一篇的介绍,在hadoop2.x之后,hadoop中的SequenceFile.Writer将会逐渐摒弃大量的createWriter()重载方法,而整合为更为简洁的createWriter()
方法,除了配置参数外,其他的参数统统使用SequenceFile.Writer.Option来替代,具体有:
新的API里提供的option参数:
FileOption
FileSystemOption
StreamOption
BufferSizeOption
BlockSizeOption
ReplicationOption
KeyClassOption
ValueClassOption
MetadataOption
ProgressableOption
CompressionOption
这些参数能够满足各种不同的需要,参数之间不存在顺序关系,这样减少了代码编写工作量,更为直观,便于理解,下面先来看看这个方法,后边将给出一个具体实例。
-
createWriter
public staticorg.apache.hadoop.io.SequenceFile.WritercreateWriter(Configurationconf, org.apache.hadoop.io.SequenceFile.Writer.Option...opts) throws IOException
Create a new Writer with the given options.- Parameters:
conf
- the configuration to useopts
- the options to create the file with- Returns:
- a new Writer
- Throws:
IOException
权威指南第四版中提供了一个SequenceFileWriteDemo实例:
// cc SequenceFileWriteDemo Writing a SequenceFile import java.io.IOException; import java.net.URI; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.Text; // vv SequenceFileWriteDemo public class SequenceFileWriteDemo { private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" }; public static void main(String[] args) throws IOException { String uri = args[0]; Configuration conf = new Configuration(); FileSystem fs = FileSystem.get(URI.create(uri), conf); Path path = new Path(uri); IntWritable key = new IntWritable(); Text value = new Text(); SequenceFile.Writer writer = null; try { writer = SequenceFile.createWriter(fs, conf, path, key.getClass(), value.getClass()); for (int i = 0; i < 100; i++) { key.set(100 - i); value.set(DATA[i % DATA.length]); System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value); writer.append(key, value); } } finally { IOUtils.closeStream(writer); } } } // ^^ SequenceFileWriteDemo
对于上面实例中的createWriter()
方法用整合之后的最新的方法来改写一下,代码如下:
package org.apache.hadoop.io; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.SequenceFile.Writer; import org.apache.hadoop.io.SequenceFile.Writer.FileOption; import org.apache.hadoop.io.SequenceFile.Writer.KeyClassOption; import org.apache.hadoop.io.SequenceFile.Writer.ValueClassOption; import org.apache.hadoop.io.Text; public class THT_testSequenceFile2 { private static final String[] DATA = { "One, two, buckle my shoe", "Three, four, shut the door", "Five, six, pick up sticks", "Seven, eight, lay them straight", "Nine, ten, a big fat hen" }; public static void main(String[] args) throws IOException { // String uri = args[0]; String uri = "file:///D://B.txt"; Configuration conf = new Configuration(); Path path = new Path(uri); IntWritable key = new IntWritable(); Text value = new Text(); SequenceFile.Writer writer = null; SequenceFile.Writer.FileOption option1 = (FileOption) Writer.file(path); SequenceFile.Writer.KeyClassOption option2 = (KeyClassOption) Writer.keyClass(key.getClass()); SequenceFile.Writer.ValueClassOption option3 = (ValueClassOption) Writer.valueClass(value.getClass()); try { writer = SequenceFile.createWriter( conf, option1,option2,option3,Writer.compression(CompressionType.RECORD)); for (int i = 0; i < 10; i++) { key.set(1 + i); value.set(DATA[i % DATA.length]); System.out.printf("[%s]\t%s\t%s\n", writer.getLength(), key, value); writer.append(key, value); } } finally { IOUtils.closeStream(writer); } } }
运行结果如下:
2015-11-06 22:15:05,027 INFO compress.CodecPool (CodecPool.java:getCompressor(153)) - Got brand-new compressor [.deflate] [128] 1 One, two, buckle my shoe [173] 2 Three, four, shut the door [220] 3 Five, six, pick up sticks [264] 4 Seven, eight, lay them straight [314] 5 Nine, ten, a big fat hen [359] 6 One, two, buckle my shoe [404] 7 Three, four, shut the door [451] 8 Five, six, pick up sticks [495] 9 Seven, eight, lay them straight [545] 10 Nine, ten, a big fat hen
生成的文件:
读操作
新的API里提供的option参数:
FileOption -表示读哪个文件
InputStreamOption
StartOption
LengthOption -按照设置的长度变量来决定读取的字节
BufferSizeOption
OnlyHeaderOption
根据最新的API直接上源码:
package org.apache.hadoop.io; import java.io.IOException; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IOUtils; import org.apache.hadoop.io.SequenceFile; import org.apache.hadoop.io.SequenceFile.Reader; import org.apache.hadoop.io.Writable; import org.apache.hadoop.util.ReflectionUtils; public class THT_testSequenceFile3 { public static void main(String[] args) throws IOException { //String uri = args[0]; String uri = "file:///D://B.txt"; Configuration conf = new Configuration(); Path path = new Path(uri); SequenceFile.Reader.Option option1 = Reader.file(path); SequenceFile.Reader.Option option2 = Reader.length(174);//这个参数表示读取的长度 SequenceFile.Reader reader = null; try { reader = new SequenceFile.Reader(conf,option1,option2); Writable key = (Writable) ReflectionUtils.newInstance( reader.getKeyClass(), conf); Writable value = (Writable) ReflectionUtils.newInstance( reader.getValueClass(), conf); long position = reader.getPosition(); while (reader.next(key, value)) { String syncSeen = reader.syncSeen() ? "*" : ""; System.out.printf("[%s%s]\t%s\t%s\n", position, syncSeen, key, value); position = reader.getPosition(); // beginning of next record } } finally { IOUtils.closeStream(reader); } } }
我这儿设置了一个读取长度的参数,只读到第174个字节那,所以运行结果如下:
2015-11-06 22:53:00,602 INFO compress.CodecPool (CodecPool.java:getDecompressor(181)) - Got brand-new decompressor [.deflate] [128] 1 One, two, buckle my shoe [173] 2 Three, four, shut the door
上一篇: 请问:PHP js 实现复杂的按钮功能
下一篇: 请教 php 正则写法