实例解析Ruby程序中调用REXML来解析XML格式数据的用法

程序员文章站 2022-06-24 11:23:26

rexml 是由 sean russell 编写的库。它不是 ruby 的唯一 xml 库，但它是很受欢迎的一个，并且是用纯 ruby 编写（ nqxml 也是用 ruby...

rexml 是由 sean russell 编写的库。它不是 ruby 的唯一 xml 库，但它是很受欢迎的一个，并且是用纯 ruby 编写（ nqxml 也是用 ruby 编写的，但 xmlparser 封装了用 c 编写的 jade 库）。在他的 rexml 概述中，russell 评论道：
我有这样的问题：我不喜欢令人困惑的 api。有几种用于 java 实现的 xml 解析器 api。其中大多数都遵循 dom 或 sax，并且在基本原理上与不断出现的众多 java api 非常相似。也就是说，它们看上去象是由从未使用过他们自己的 api 的理论家设计出来的。通常，现有的 xml api 都很令人讨厌。他们采用一种被明确设计成非常简单、一流且功能强大的标记语言，然后用讨厌的、过多的和大型 api 对它进行封装。甚至是为了进行最基本的 xml 树操作，我总是不得不参考 api 文档；没有任何东西是凭直觉的，而且几乎每个操作都很复杂。
虽然我并不认为它有多么令人心烦，但我同意 russell 的观点：xml api 对于大多数使用它们的人来说无疑带来了过多的工作量。

示例
看下面的book.xml:

引用

<library shelf="recent acquisitions"> 
 <section name="ruby"> 
  <book isbn="0672328844"> 
  <title>the ruby way</title> 
  <author>hal fulton</author> 
  <description> 
   second edition. the book you are now reading. 
   ain't recursion grand? 
  </description> 
  </book> 
 </section> 
 <section name="space"> 
  <book isbn="0684835509"> 
   <title>the case for mars</title> 
   <author>robert zubrin</author> 
   <description>pushing toward a second home for the human 
    race. 
   </description> 
  </book> 
  <book isbn="074325631x"> 
   <title>first man: the life of neil a. armstrong</title> 
   <author>james r. hansen</author> 
   <description>definitive biography of the first man on 
    the moon. 
   </description> 
  </book> 
 </section> 
</library>

1 tree parsing(也就是dom-like)

我们需要require rexml/document 库，并且include rexml :

require 'rexml/document' 
include rexml 
 
input = file.new("books.xml") 
doc = document.new(input) 
 
root = doc.root 
puts root.attributes["shelf"]  # recent acquisitions 
 
doc.elements.each("library/section") { |e| puts e.attributes["name"] } 
# output: 
# ruby 
# space 
 
doc.elements.each("*/section/book") { |e| puts e.attributes["isbn"] } 
# output: 
# 0672328844 
# 0321445619 
# 0684835509 
# 074325631x 
 
sec2 = root.elements[2] 
author = sec2.elements[1].elements["author"].text  # robert zubrin

这里要注意的是xml中的属性和值被表示为一个hash，因此我们能够通过attributes[]来提取我们需要的值，元素的值还能通过类似于path的字符串或者整数来取得.其中用整数取的话，是1-based而不是0-based.

2 stream parsing（也就是sax-like parsing)

这边使用了一个小技巧，那就是定义了一个listener 类，它将会在parse的时候被回调:

require 'rexml/document' 
require 'rexml/streamlistener' 
include rexml 
 
class mylistener 
 include rexml::streamlistener 
 def tag_start(*args) 
 puts "tag_start: #{args.map {|x| x.inspect}.join(', ')}" 
 end 
 
 def text(data) 
 return if data =~ /^\w*$/  # whitespace only 
 abbrev = data[0..40] + (data.length > 40 ? "..." : "") 
 puts " text : #{abbrev.inspect}" 
 end 
end 
 
list = mylistener.new 
source = file.new "books.xml" 
document.parse_stream(source, list)

这里介绍一下streamlistener 模块，这个模块它提供了几个空的回调方法，因此你可以为了实现你自己的功能而覆盖它.当parser 进入一个tag时，就会调用tag_start方法.而text方法也是类似的，他只不过是当读取到数据时会被回调,它的输出是这样的：

tag_start: "library", {"shelf"=>"recent acquisitions"} 
tag_start: "section", {"name"=>"ruby"} 
tag_start: "book", {"isbn"=>"0672328844"} 
tag_start: "title", {} 
text : "the ruby way"

3 xpath

rexml通过xpath 类来提供xpath的支持. 它也同时支持dom-like和sax-like .还是前面的那个xml文件，我们使用xpath可以这样做：

book1 = xpath.first(doc, "//book") # info for first book found 
p book1 
 
# print out all titles 
xpath.each(doc, "//title") { |e| puts e.text } 
 
# get an array of all of the "author" elements in the document. 
names = xpath.match(doc, "//author").map {|x| x.text } 
p names

输出是类似于下面的：

<book isbn='0672328844'> ... </> 
the ruby way 
the case for mars 
first man: the life of neil a. armstrong 
["hal fulton", "robert zubrin", "james r. hansen"]

实例解析Ruby程序中调用REXML来解析XML格式数据的用法

实例解析Ruby程序中调用REXML来解析XML格式数据的用法

Ruby使用REXML库来解析xml格式数据的方法

Ruby使用REXML库来解析xml格式数据的方法

实例解析Ruby程序中调用REXML来解析XML格式数据的用法