Hadoop之——本地通过Eclipse链接Hadoop操作MySQL数据库问题小结

程序员文章站 2024-03-23 08:27:46

...

前一段时间，在上一篇博文中描述了自己抽时间在构建的完全分布式Hadoop环境过程中遇到的一些问题以及构建成功后，通过Eclipse操作HDFS的时候遇到的一些问题，最近又想进一步学习学习Hadoop操作Mysql数据库的一些知识，在这里网上存在很多分歧，很多人可能会笑话，用那么“笨重”的Hadoop来操作数据库，脑子有问题吧，Hadoop的HDFS优势在于处理分布式文件系统，这种说法没有任何错误，数据库的操作讲究“安全、轻便、快捷”，用Hadoop操作完全是不符合常理啊，那为啥还要学习这个东西呢？其实退一步讲，在之前access数据库的应用占一定份额的时候，很多人选择使用文件作为数据的仓储，增删查改全部是操作文件，一个文件可能就是一个数据库或者一个数据表，那么对于一些实时性要求不是很高且数据量比较小的操作，选择用hadoop操作数据库，其实说来也不是不可以考录，不说了，每个人有自己的观点，当然这个也与每个人所在的公司的要求有关系，下面就说说自己遇到的比较恼人的一个问题：还是classNotFound的问题：

首先要说明的是：你的运行环境，先的明白你的代码到底是在服务器端还是在本地,其次再参考不同的代码进行模拟。

下面说说本地运行的时候3种classNotFount的问题

（1）MySql的驱动找不到，这个很容易解决，在自己的项目中引入MySql的官方驱动jar包就可以解决了，如上图红色框

（2)对JDBC的Jar包处理

因为程序虽然用Eclipse编译运行但最终要提交到Hadoop集群上，所以JDBC的jar必须放到Hadoop集群中。有两种方式：

<1>在每个节点下的${HADOOP_HOME}/lib下添加该包，重启集群，一般是比较原始的方法。

我们的Hadoop安装包在"/usr/hadoop"，所以把Jar放到"/usr/hadoop/lib"下面，然后重启，记得是Hadoop集群中所有的节点都要放，因为执行分布式是程序是在每个节点本地机器上进行。

<2>在Hadoop集群的分布式文件系统中创建"/lib"文件夹，并把我们的的JDBC的jar包上传上去，然后在主程序添加如下语句，就能保证 Hadoop集群中所有的节点都能使用这个jar包。因为这个jar包放在了HDFS上，而不是本地系统，这个要理解清楚。

(3)关联数据库表的实体类找不到（本篇文章解决的重点），StudentRecord.class not found。。。。

出现此问题的源代码如下：

package cn.hadoop.db;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.net.URI;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.IdentityReducer;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;

import cn.hadoop.db.DBAccessReader.Student.DBInputMapper;

public class DBAccessReader {
    
    public static class Student implements Writable, DBWritable{
        public int id;
        public  String name;
        public  String sex;
        public  int age;
        
        public Student() {
            
        }
        @Override
        public void write(PreparedStatement statement) throws SQLException {
            statement.setInt(1, this.id);
            statement.setString(2, this.name);
            statement.setString(3, this.sex);
            statement.setInt(4, this.age);
        }

        @Override
        public void readFields(ResultSet resultSet) throws SQLException {
            this.id = resultSet.getInt(1);
            this.name = resultSet.getString(2);
            this.sex = resultSet.getString(3);
            this.age = resultSet.getInt(4);
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(this.id);
            Text.writeString(out, this.name);
            Text.writeString(out, this.sex);
            out.writeInt(this.age);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            this.id = in.readInt();
            this.name = Text.readString(in);
            this.sex = Text.readString(in);
            this.age = in.readInt();
        }

        @Override
        public String toString() {
            return new String("Student [id=" + id + ", name=" + name + ", sex=" + sex
                    + ", age=" + age + "]");
        }
        
        public static class DBInputMapper extends MapReduceBase implements Mapper<LongWritable, cn.hadoop.db.DBAccessReader.Student, LongWritable, Text>{

            @Override
            public void map(LongWritable key, cn.hadoop.db.DBAccessReader.Student value,
                    OutputCollector<LongWritable, Text> collector,
                    Reporter reporter) throws IOException {
                collector.collect(new LongWritable(value.id), new Text(value.toString()));
                
            }
            
        }
        
        
        
    }
    public static void main(String[] args) throws IOException{
        
        JobConf conf = new JobConf(DBAccessReader.class);
        conf.set("mapred.job.tracker", "192.168.56.10:9001"); 
        
            FileSystem fileSystem = FileSystem.get(
                    URI.create("hdfs://192.168.56.10:9000/"), conf);
            
            DistributedCache
            .addFileToClassPath(
                    new Path(
                            "hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar"),
                            conf, fileSystem);
        conf.setOutputKeyClass(LongWritable.class);
        conf.setOutputValueClass(Text.class);

        conf.setInputFormat(DBInputFormat.class);



        FileOutputFormat.setOutputPath(conf, new Path(
                "hdfs://192.168.56.10:9000/user/studentInfo"));

        DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",
                "jdbc:mysql://192.168.56.109:3306/school", "root", "1qaz2wsx");

        String[] fields = { "id", "name", "sex", "age" };

        DBInputFormat.setInput(conf, cn.hadoop.db.DBAccessReader.Student.class, "student", null,
                "id", fields);

        conf.setMapperClass(DBInputMapper.class);
        conf.setReducerClass(IdentityReducer.class);
        
            JobClient.runJob(conf);
    }
}

运行的时候，报的错误如下
Hadoop之——本地通过Eclipse链接Hadoop操作MySQL数据库问题小结

错误很明显，就是找不到实体类Student,可是看代码好多遍，这个类明明在啊，为啥会报错找不到呢？？？我也迷糊了很长时间，各种尝试都是不行，最后还是将目标锁定在日志信息里面，很明显，这是在服务器端去找DBAccessReader这个Job的jar，明显我们没有上传，肯定是找不到到，所以报错，错误很明显，就在main方法下面的这里：

JobConf conf = new JobConf(DBAccessReader.class);
conf.set("mapred.job.tracker", "192.168.56.10:9001");

所以，修改代码如下以后，问题得到解决：

package cn.hadoop.db;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.net.URI;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.lib.IdentityReducer;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBInputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;

import cn.hadoop.db.DBAccessReader.Student.DBInputMapper;

public class DBAccessReader {

    public static class Student implements Writable, DBWritable {
        public int id;
        public String name;
        public String sex;
        public int age;

        public Student() {

        }

        @Override
        public void write(PreparedStatement statement) throws SQLException {
            statement.setInt(1, this.id);
            statement.setString(2, this.name);
            statement.setString(3, this.sex);
            statement.setInt(4, this.age);
        }

        @Override
        public void readFields(ResultSet resultSet) throws SQLException {
            this.id = resultSet.getInt(1);
            this.name = resultSet.getString(2);
            this.sex = resultSet.getString(3);
            this.age = resultSet.getInt(4);
        }

        @Override
        public void write(DataOutput out) throws IOException {
            out.writeInt(this.id);
            Text.writeString(out, this.name);
            Text.writeString(out, this.sex);
            out.writeInt(this.age);
        }

        @Override
        public void readFields(DataInput in) throws IOException {
            this.id = in.readInt();
            this.name = Text.readString(in);
            this.sex = Text.readString(in);
            this.age = in.readInt();
        }

        @Override
        public String toString() {
            return new String("Student [id=" + id + ", name=" + name + ", sex="
                    + sex + ", age=" + age + "]");
        }

        public static class DBInputMapper extends MapReduceBase
                implements
                Mapper<LongWritable, cn.hadoop.db.DBAccessReader.Student, LongWritable, Text> {

            @Override
            public void map(LongWritable key,
                    cn.hadoop.db.DBAccessReader.Student value,
                    OutputCollector<LongWritable, Text> collector,
                    Reporter reporter) throws IOException {
                collector.collect(new LongWritable(value.id),
                        new Text(value.toString()));

            }

        }

    }

    public static void main(String[] args) throws IOException {

        JobConf conf = new JobConf();
        FileSystem fileSystem = FileSystem.get(
                URI.create("hdfs://192.168.56.10:9000/"), conf);

        DistributedCache
                .addFileToClassPath(
                        new Path(
                                "hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar"),
                        conf, fileSystem);
        conf.setOutputKeyClass(LongWritable.class);
        conf.setOutputValueClass(Text.class);

        conf.setInputFormat(DBInputFormat.class);

        FileOutputFormat.setOutputPath(conf, new Path(
                "hdfs://192.168.56.10:9000/user/studentInfo"));

        DBConfiguration.configureDB(conf, "com.mysql.jdbc.Driver",
                "jdbc:mysql://192.168.56.109:3306/school", "root", "1qaz2wsx");

        String[] fields = { "id", "name", "sex", "age" };

        DBInputFormat.setInput(conf, cn.hadoop.db.DBAccessReader.Student.class,
                "student", null, "id", fields);

        conf.setMapperClass(DBInputMapper.class);
        conf.setReducerClass(IdentityReducer.class);

        JobClient.runJob(conf);
    }
}

以下是运行时打印出的日志信息：

三月 13, 2016 5:39:57 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
三月 13, 2016 5:39:57 下午 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles
警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
三月 13, 2016 5:39:57 下午 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles
警告: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
三月 13, 2016 5:39:57 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager downloadCacheObject
信息: Creating mysql-connector-java-5.1.18-bin.jar in /tmp/hadoop-hadoop/mapred/local/archive/2605709384407216388_-2048973133_91096108/192.168.56.10/lib-work-2076365714246383853 with rwxr-xr-x
三月 13, 2016 5:39:58 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager downloadCacheObject
信息: Cached hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar as /tmp/hadoop-hadoop/mapred/local/archive/2605709384407216388_-2048973133_91096108/192.168.56.10/lib/mysql-connector-java-5.1.18-bin.jar
三月 13, 2016 5:39:58 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager localizePublicCacheObject
信息: Cached hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar as /tmp/hadoop-hadoop/mapred/local/archive/2605709384407216388_-2048973133_91096108/192.168.56.10/lib/mysql-connector-java-5.1.18-bin.jar
三月 13, 2016 5:39:58 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息: Running job: job_local_0001
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.MapTask runOldMapper
信息: numReduceTasks: 1
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: io.sort.mb = 100
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: data buffer = 79691776/99614720
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: record buffer = 262144/327680
三月 13, 2016 5:39:59 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 0% reduce 0%
三月 13, 2016 5:40:04 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
信息: Starting flush of map output
三月 13, 2016 5:40:04 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
信息: Finished spill 0
三月 13, 2016 5:40:04 下午 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
三月 13, 2016 5:40:04 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息:
三月 13, 2016 5:40:04 下午 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_m_000000_0' done.
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 0%
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息:
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Merging 1 sorted segments
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Down to the last merge-pass, with 1 segments left of total size: 542 bytes
三月 13, 2016 5:40:05 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息:
三月 13, 2016 5:40:06 下午 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
三月 13, 2016 5:40:06 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息:
三月 13, 2016 5:40:06 下午 org.apache.hadoop.mapred.Task commit
信息: Task attempt_local_0001_r_000000_0 is allowed to commit now
三月 13, 2016 5:40:06 下午 org.apache.hadoop.mapred.FileOutputCommitter commitTask
信息: Saved output of task 'attempt_local_0001_r_000000_0' to hdfs://192.168.56.10:9000/user/studentInfo
三月 13, 2016 5:40:08 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: reduce > reduce
三月 13, 2016 5:40:08 下午 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_r_000000_0' done.
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 100%
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息: Job complete: job_local_0001
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息: Counters: 20
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:   File Input Format Counters
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Bytes Read=0
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:   File Output Format Counters
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Bytes Written=513
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:   FileSystemCounters
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_READ=1592914
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     HDFS_BYTES_READ=1579770
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_WRITTEN=3270914
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     HDFS_BYTES_WRITTEN=513
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:   Map-Reduce Framework
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce input groups=9
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output materialized bytes=546
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Combine output records=0
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Map input records=9
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce shuffle bytes=0
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce output records=9
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Spilled Records=18
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output bytes=522
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Total committed heap usage (bytes)=231874560
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Map input bytes=9
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Combine input records=0
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output records=9
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     SPLIT_RAW_BYTES=75
三月 13, 2016 5:40:09 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce input records=9

这是运行的结果：

到此，Hadoop连接数据库读取数据表输出的操作完成了，当然这就是一个简单的演示，实际项目中不会用到，只是可以帮我们熟悉熟悉Hadoop操作数据库的流程，下面给出

Hadoop处理文件以后，将结果写入数据库的示例代码，和上面的差不多：

package cn.hadoop.db;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.net.URI;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.util.Iterator;
import java.util.StringTokenizer;

import org.apache.hadoop.filecache.DistributedCache;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.lib.db.DBConfiguration;
import org.apache.hadoop.mapred.lib.db.DBOutputFormat;
import org.apache.hadoop.mapred.lib.db.DBWritable;

public class WriteDB {

    public static void main(String[] args) throws IOException {
        JobConf conf = new JobConf();

        FileSystem fileSystem = FileSystem.get(
                URI.create("hdfs://192.168.56.10:9000/"), conf);
        DistributedCache
                .addFileToClassPath(
                        new Path(
                                "hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar"),
                        conf, fileSystem);
        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(DBOutputFormat.class);

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Combine.class);
        conf.setReducerClass(Reduce.class);

        FileInputFormat.setInputPaths(conf, new Path(
                "hdfs://192.168.56.10:9000/user/db_in"));

        DBConfiguration
                .configureDB(
                        conf,
                        "com.mysql.jdbc.Driver",
                        "jdbc:mysql://192.168.56.109:3306/school?characterEncoding=UTF-8",
                        "root", "1qaz2wsx");

        String[] fields = { "word", "number" };

        DBOutputFormat.setOutput(conf, "wordcount", fields);
        JobClient.runJob(conf);

    }
}

class Map extends MapReduceBase implements
        Mapper<Object, Text, Text, IntWritable> {

    private final static IntWritable one = new IntWritable(1);

    private Text word = new Text();

    @Override
    public void map(Object key, Text value,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        String line = value.toString();
        StringTokenizer tokenizer = new StringTokenizer(line);
        while (tokenizer.hasMoreTokens()) {
            word.set(tokenizer.nextToken());
            output.collect(word, one);
        }
    }

}

class Combine extends MapReduceBase implements
        Reducer<Text, IntWritable, Text, IntWritable> {

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<Text, IntWritable> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        output.collect(key, new IntWritable(sum));
    }

}

class Reduce extends MapReduceBase implements
        Reducer<Text, IntWritable, WordRecord, Text> {

    @Override
    public void reduce(Text key, Iterator<IntWritable> values,
            OutputCollector<WordRecord, Text> output, Reporter reporter)
            throws IOException {
        int sum = 0;
        while (values.hasNext()) {
            sum += values.next().get();
        }
        WordRecord wordcount = new WordRecord();
        wordcount.word = key.toString();
        wordcount.number = sum;
        output.collect(wordcount, new Text());
    }

}

class WordRecord implements Writable, DBWritable {

    public String word;
    public int number;

    @Override
    public void write(PreparedStatement statement) throws SQLException {
        statement.setString(1, this.word);
        statement.setInt(2, this.number);
    }

    @Override
    public void readFields(ResultSet resultSet) throws SQLException {
        this.word = resultSet.getString(1);
        this.number = resultSet.getInt(2);
    }

    @Override
    public void write(DataOutput out) throws IOException {
        Text.writeString(out, this.word);
        out.writeInt(this.number);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        this.word = Text.readString(in);
        this.number = in.readInt();
    }

}

运行打印的日志信息如下：

三月 13, 2016 6:09:31 下午 org.apache.hadoop.util.NativeCodeLoader <clinit>
警告: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
三月 13, 2016 6:09:31 下午 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles
警告: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
三月 13, 2016 6:09:31 下午 org.apache.hadoop.mapred.JobClient copyAndConfigureFiles
警告: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
三月 13, 2016 6:09:31 下午 org.apache.hadoop.mapred.FileInputFormat listStatus
信息: Total input paths to process : 2
三月 13, 2016 6:09:32 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager downloadCacheObject
信息: Creating mysql-connector-java-5.1.18-bin.jar in /tmp/hadoop-hadoop/mapred/local/archive/-8205516116475251282_-2048973133_91096108/192.168.56.10/lib-work-1371358416408211818 with rwxr-xr-x
三月 13, 2016 6:09:33 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager downloadCacheObject
信息: Cached hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar as /tmp/hadoop-hadoop/mapred/local/archive/-8205516116475251282_-2048973133_91096108/192.168.56.10/lib/mysql-connector-java-5.1.18-bin.jar
三月 13, 2016 6:09:33 下午 org.apache.hadoop.filecache.TrackerDistributedCacheManager localizePublicCacheObject
信息: Cached hdfs://192.168.56.10:9000/lib/mysql-connector-java-5.1.18-bin.jar as /tmp/hadoop-hadoop/mapred/local/archive/-8205516116475251282_-2048973133_91096108/192.168.56.10/lib/mysql-connector-java-5.1.18-bin.jar
三月 13, 2016 6:09:33 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息: Running job: job_local_0001
三月 13, 2016 6:09:33 下午 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
三月 13, 2016 6:09:33 下午 org.apache.hadoop.mapred.MapTask runOldMapper
信息: numReduceTasks: 1
三月 13, 2016 6:09:33 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: io.sort.mb = 100
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: data buffer = 79691776/99614720
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: record buffer = 262144/327680
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
信息: Starting flush of map output
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
信息: Finished spill 0
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_m_000000_0 is done. And is in the process of commiting
三月 13, 2016 6:09:34 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 0% reduce 0%
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: hdfs://192.168.56.10:9000/user/db_in/file2.txt:0+41
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_m_000000_0' done.
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask runOldMapper
信息: numReduceTasks: 1
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: io.sort.mb = 100
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: data buffer = 79691776/99614720
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer <init>
信息: record buffer = 262144/327680
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer flush
信息: Starting flush of map output
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.MapTask$MapOutputBuffer sortAndSpill
信息: Finished spill 0
三月 13, 2016 6:09:36 下午 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_m_000001_0 is done. And is in the process of commiting
三月 13, 2016 6:09:37 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 0%
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: hdfs://192.168.56.10:9000/user/db_in/file1.txt:0+24
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_m_000001_0' done.
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.Task initialize
信息:  Using ResourceCalculatorPlugin : null
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: 
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Merging 2 sorted segments
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.Merger$MergeQueue merge
信息: Down to the last merge-pass, with 2 segments left of total size: 116 bytes
三月 13, 2016 6:09:39 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: 
三月 13, 2016 6:09:41 下午 org.apache.hadoop.mapred.Task done
信息: Task:attempt_local_0001_r_000000_0 is done. And is in the process of commiting
三月 13, 2016 6:09:42 下午 org.apache.hadoop.mapred.LocalJobRunner$Job statusUpdate
信息: reduce > reduce
三月 13, 2016 6:09:42 下午 org.apache.hadoop.mapred.Task sendDone
信息: Task 'attempt_local_0001_r_000000_0' done.
三月 13, 2016 6:09:42 下午 org.apache.hadoop.mapred.FileOutputCommitter cleanupJob
警告: Output path is null in cleanup
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息:  map 100% reduce 100%
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.JobClient monitorAndPrintJob
信息: Job complete: job_local_0001
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息: Counters: 19
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:   File Input Format Counters 
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Bytes Read=65
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:   File Output Format Counters 
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Bytes Written=0
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:   FileSystemCounters
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_READ=2389740
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     HDFS_BYTES_READ=2369826
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     FILE_BYTES_WRITTEN=4905883
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:   Map-Reduce Framework
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce input groups=7
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output materialized bytes=124
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Combine output records=9
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Map input records=5
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce shuffle bytes=0
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce output records=7
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Spilled Records=18
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output bytes=104
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Total committed heap usage (bytes)=482291712
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Map input bytes=65
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Combine input records=10
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Map output records=10
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     SPLIT_RAW_BYTES=198
三月 13, 2016 6:09:43 下午 org.apache.hadoop.mapred.Counters log
信息:     Reduce input records=9

数据库中的结果如下：
Hadoop之——本地通过Eclipse链接Hadoop操作MySQL数据库问题小结

以下代码都是本人亲自测试和运行过的，hadoop的版本和服务器环境信息请参看上一篇博文。