通过Java来测试JSON和Protocol Buffer的传输文件大小

程序员文章站 2024-03-07 22:29:39

json相信大家都知道是什么东西，如果不知道，那可就真的out了，google一下去。这里就不介绍啥的了。 protobuffer大家估计就很少听说了，但如果说到是goo...

json相信大家都知道是什么东西，如果不知道，那可就真的out了，google一下去。这里就不介绍啥的了。
protobuffer大家估计就很少听说了，但如果说到是google搞的，相信大家都会有兴趣去试一下，毕竟google出口，多属精品。
protobuffer是一个类似json的一个传输协议，其实也不能说是协议，只是一个数据传输的东西罢了。
那它跟json有什么区别呢？
跨语言，这是它的一个优点。它自带了一个编译器，protoc，只需要用它进行编译，可以编译成java、python、c++代码，暂时只有这三个，其他就暂时不要想了，然后就可以直接使用，不需要再写任何其他代码。连解析的那些都已经自带有的。json当然也是跨语言的，但这个跨语言是建立在编写代码的基础上。
如果想再深入了解的，可以去看看：
https://developers.google.com/protocol-buffers/docs/overview
好了，废话不多说，我们直接来看看，为什么我们需要对比protobuffer（下面简称gpb）和json。
1、json因为有一定的格式，并且是以字符存在的，在数据量上还有可以压缩的空间。而gpb上大数据量时，空间比json小很多，等一下的例子我们可以看到。
2、json各个库之间的效率相差比较大，jackson库和gson就大概有5-10的差距（这个只做过一次测试，如有误，请大家轻拍）。而gpb只需要一个，没有所谓的多个库的区别。当然这个点只是弄出来凑数的，可以忽略不计哈。

talk is cheap,just show me the code。
在程序界，代码永远是王道，下面就直接来代码吧。
上代码前，大家要先下载protobuffer，在这里：
https://github.com/google/protobuf

1、首先，gpb是需要有一个类似类定义的文件，叫proto文件。
我们以学生和老师的例子来进行一个例子：
我们有以下两个文件：student.proto

option java_package = "com.shun"; 
option java_outer_classname = "studentproto"; 
 
message student { 
  required int32 id = 1; 
  optional string name = 2; 
  optional int32 age = 3; 
}</span>

teacher.proto

import "student.proto"; 
option java_package = "com.shun"; 
option java_outer_classname = "teacherproto"; 
 
message teacher { 
  required int32 id = 1; 
  optional string name = 2; 
 
  repeated student student_list = 3; 
}</span>

这里我们遇到了一些比较奇怪的东西：
import,int32,repated,required,optional,option等
一个个来吧：
1）import表示引入其他的proto文件
2）required,optional表示字段是否可选，这个决定了该字段有无值的情况下protobuffer会进行什么处理。如果标志了required，但当处理时，该字段没有进行传值，则会报错;如果标志了optional，不传值则不会有什么问题。
3）repeated相信应该都看得懂了，就是是否重复，跟java里面的list类似
4）message就是相当于class了
5）option表示选项，其中的java_package表示包名，即生成java代码时使用的包名，java_outer_classname即为类名，注意这个类名不能跟下面的message中的类名相同。
至于还有其他的选项和相关类型的，请参观官方文档。

2、有了这几个文件，我们能怎么样呢？
记得上面下载的编译器了吧，解压出来，我们得到一个protoc.exe，这当然是windows下的，我没弄其他系统的，有兴趣的同学去折腾下罗。
加到path（加不加可以随便，只是方不方便而已），然后就可以通过上面的文件生成我们需要的类文件了。
protoc --java_out=存放源代码的路径 --proto_path=proto文件的路径 proto具体文件
--proto_path指定的是proto文件的文件夹路径，并不是单个文件，主要是为了import文件查找使用的，可以省略

如我需要把源代码放在d:\protobuffervsjson\src，而我的proto文件存放在d:\protofiles
那么我的编译命令就是：

protoc --java_out=d:\protobuffervsjson\src 
d:\protofiles\teacher.proto d:\protofiles\student.proto

注意，这里最后的文件，我们需要指定需要编译的所有文件

编译后可以看到生成的文件。
代码就不贴出来了，太多了。大家可以私下看看，代码里面有一大堆builder，相信一看就知道是建造者模式了。
这时可以把代码贴到你的项目中了，当然，错误一堆了。

记得我们前面下载的源代码吗？解压它吧，不要手软。然后找到src/main/java/复制其中的一堆到你的项目，当然，你也可以ant或者maven编译，但这两个东西我都不熟，就不献丑了，我还是习惯直接复制到项目中。

通过Java来测试JSON和Protocol Buffer的传输文件大小

代码出错，哈哈，正常。不知道为何，google非要留下这么个坑给我们。
翻回到protobuffer目录下的\java看到有个readme.txt了吧，找到一句：

通过Java来测试JSON和Protocol Buffer的传输文件大小

看来看去，感觉这个代码会有点奇怪的，好像错错的感觉，反正我是没按那个执行，我的命令是:

<span style="font-size: 16px;">protoc --java_out=还是上面的放代码的地方 proto文件的路径（这里是descriptor.proto文件的路径）</span>

执行后，我们可以看到代码中的错误木有了。

3、接下来当然就是测试了。
我们先进行gpb写入测试：

package com.shun.test; 
 
import java.io.fileoutputstream; 
import java.io.ioexception; 
import java.util.arraylist; 
import java.util.list; 
 
import com.shun.studentproto.student; 
import com.shun.teacherproto.teacher; 
 
public class protowritetest { 
 
  public static void main(string[] args) throws ioexception { 
     
    student.builder stubuilder = student.newbuilder(); 
    stubuilder.setage(25); 
    stubuilder.setid(11); 
    stubuilder.setname("shun"); 
     
    //构造list 
    list<student> stubuilderlist = new arraylist<student>(); 
    stubuilderlist.add(stubuilder.build()); 
     
    teacher.builder teabuilder = teacher.newbuilder(); 
    teabuilder.setid(1); 
    teabuilder.setname("testtea"); 
    teabuilder.addallstudentlist(stubuilderlist); 
     
    //把gpb写入到文件 
    fileoutputstream fos = new fileoutputstream("c:\\users\\shun\\desktop\\test\\test.protoout"); 
    teabuilder.build().writeto(fos); 
    fos.close(); 
  } 
 
}</span>

我们去看看文件，如无意外，应该是生成了的。
生成了之后，我们肯定要读回它的。

package com.shun.test; 
 
import java.io.fileinputstream; 
import java.io.filenotfoundexception; 
import java.io.ioexception; 
 
import com.shun.studentproto.student; 
import com.shun.teacherproto.teacher; 
 
public class protoreadtest { 
 
  public static void main(string[] args) throws filenotfoundexception, ioexception { 
     
    teacher teacher = teacher.parsefrom(new fileinputstream("c:\\users\\shun\\desktop\\test\\test.protoout")); 
    system.out.println("teacher id:" + teacher.getid() + ",name:" + teacher.getname()); 
    for (student stu:teacher.getstudentlistlist()) { 
      system.out.println("student id:" + stu.getid() + ",name:" + stu.getname() + ",age:" + stu.getage()); 
    } 
  } 
 
}</span>

代码很简单，因为gpb生成的代码都帮我们完成了。
上面知道基本的用法了，我们重点来关注gpb跟json生成文件大小的区别，json的详细代码我这里就不贴了，之后会贴出示例，大家有兴趣可以下载。
这里我们用gson来解析json，下面只给出对象转换成json后写出文件的代码：
两个类student和teacher的基本定义就不弄了，大家随意就行，代码如下：

package com.shun.test; 
 
import java.io.filewriter; 
import java.io.ioexception; 
import java.util.arraylist; 
import java.util.list; 
 
import com.google.gson.gson; 
import com.shun.student; 
import com.shun.teacher; 
 
public class gsonwritetest { 
 
  public static void main(string[] args) throws ioexception { 
    student stu = new student(); 
    stu.setage(25); 
    stu.setid(22); 
    stu.setname("shun"); 
     
    list<student> stulist = new arraylist<student>(); 
    stulist.add(stu); 
     
    teacher teacher = new teacher(); 
    teacher.setid(22); 
    teacher.setname("shun"); 
    teacher.setstulist(stulist); 
     
    string result = new gson().tojson(teacher); 
    filewriter fw = new filewriter("c:\\users\\shun\\desktop\\test\\json"); 
    fw.write(result); 
    fw.close(); 
  } 
 
}</span>

接下来正式进入我们的真正测试代码了，前面我们只是在列表中放入一个对象，接下来，我们依次测试100,1000,10000,100000,1000000,5000000这几个数量的gpb和json生成的文件大小。
改进一下之前的gpb代码，让它生成不同数量的列表，再生成文件：

package com.shun.test; 
 
import java.io.fileoutputstream; 
import java.io.ioexception; 
import java.util.arraylist; 
import java.util.list; 
 
import com.shun.studentproto.student; 
import com.shun.teacherproto.teacher; 
 
public class protowritetest { 
 
  public static final int size = 100; 
   
  public static void main(string[] args) throws ioexception { 
     
    //构造list 
    list<student> stubuilderlist = new arraylist<student>(); 
    for (int i = 0; i < size; i ++) { 
      student.builder stubuilder = student.newbuilder(); 
      stubuilder.setage(25); 
      stubuilder.setid(11); 
      stubuilder.setname("shun"); 
       
      stubuilderlist.add(stubuilder.build()); 
    } 
     
    teacher.builder teabuilder = teacher.newbuilder(); 
    teabuilder.setid(1); 
    teabuilder.setname("testtea"); 
    teabuilder.addallstudentlist(stubuilderlist); 
     
    //把gpb写入到文件 
    fileoutputstream fos = new fileoutputstream("c:\\users\\shun\\desktop\\test\\proto-" + size); 
    teabuilder.build().writeto(fos); 
    fos.close(); 
  } 
 
}</span>

这里的size依次改成我们上面据说的测试数，可以得到如下：

通过Java来测试JSON和Protocol Buffer的传输文件大小

然后我们再看看json的测试代码：

package com.shun.test; 
 
import java.io.filewriter; 
import java.io.ioexception; 
import java.util.arraylist; 
import java.util.list; 
 
import com.google.gson.gson; 
import com.shun.student; 
import com.shun.teacher; 
 
public class gsonwritetest { 
 
  public static final int size = 100; 
   
  public static void main(string[] args) throws ioexception { 
     
    list<student> stulist = new arraylist<student>(); 
    for (int i = 0; i < size; i ++) { 
      student stu = new student(); 
      stu.setage(25); 
      stu.setid(22); 
      stu.setname("shun"); 
       
      stulist.add(stu); 
    } 
     
     
    teacher teacher = new teacher(); 
    teacher.setid(22); 
    teacher.setname("shun"); 
    teacher.setstulist(stulist); 
     
    string result = new gson().tojson(teacher); 
    filewriter fw = new filewriter("c:\\users\\shun\\desktop\\test\\json" + size); 
    fw.write(result); 
    fw.close(); 
  } 
 
}</span>

同样的方法修改size，并作相应的测试。

可以明显得看到json的文件大小跟gpb的文件大小在数据量慢慢大上去的时候就会有比较大的差别了，json明显要大上许多。

通过Java来测试JSON和Protocol Buffer的传输文件大小

上面的表应该可以看得比较清楚了，在大数据的gpb是非常占优势的，但一般情况下客户端和服务端并不会直接进行这么大数据的交互，大数据主要发生在服务器端的传输上，如果你面对需求是每天需要把几百m的日志文件传到另外一台服务器，那么这里gpb可能就能帮你的大忙了。
通过Java来测试JSON和Protocol Buffer的传输文件大小

说是深度对比，其实主要对比的是大小方面，时间方面可比性不会太大，也没相差太大。
文章中选择的gson解析器，有兴趣的朋友可以选择jackson或者fastjson，又或者其他的，但生成的文件大小是一样的，只是解析时间有区别。

上一篇： asp.net开发中怎样去突破文件依赖缓存

下一篇： PHP + plupload.js实现多图上传并显示进度条加删除实例代码