Android 使用Pull方法解析XML文件的方法
程序员文章站
2023-12-06 10:38:22
pull解析方法给应用程序完全的控制文档该怎么样被解析。android中对pull方法提供了支持的api,主要是复制代码 代码如下:org.xmlpull.v1.xmlpu...
pull解析方法给应用程序完全的控制文档该怎么样被解析。android中对pull方法提供了支持的api,主要是
org.xmlpull.v1.xmlpullparser;
org.xmlpull.v1.xmlpullparserfactory;
二个类,其中主要使用的是xmlpullparser,xmlpullparserfactory是一个工厂,用于构建xmlpullparser对象。
应用程序通过调用xmlpullparser.next()等方法来产生event,然后再处理event。可以看到它与push方法的不同,push方法是由parser自己主动产生event,回调给应用程序。而pull方法是主动的调用parser的方法才能产生事件。
假如xml中的语句是这样的:"<author country="united states">james elliott</author>",author是tag,country是attribute,"james elliott"是text。
要想解析文档先要构建一个xmlpullparser对象
final xmlpullparserfactory factory = xmlpullparserfactory.newinstance();
factory.setnamespaceaware(true);
final xmlpullparser parser = factory.newpullparser();
pull解析是一个遍历文档的过程,每次调用next(),nexttag(), nexttoken()和nexttext()都会向前推进文档,并使parser停留在某些事件上面,但是不能倒退。
然后把文档设置给parser
parser.setinput(new stringreader("<author country=\"united states\">james elliott</author>");
这时,文档刚被初始化,所以它应该位于文档的开始,事件应该是start_document,可以通过xmlpullparser.geteventtype()来获取。然后调用next()会产生
start_tag,这个事件告诉应用程序一个标签已经开始了,调用getname()会返回"author";再next()会产生
text事件,调用gettext()会返回"james elliott",再next(),会产生
end_tag,这个告诉你一个标签已经处理完了,再next(),会产生
end_document,它告诉你整个文档已经处理完成了。
除了next()外,nexttoken()也可以使用,只不过它会返回更加详细的事件,比如 comment, cdsect, docdecl, entity等等非常详细的信息。如果程序得到比较底层的信息,可以用nexttoken()来驱动并处理详细的事件。需要注意一点的是text事件是有可能返回空白的white spaces比如换行符或空格等。
另外有二个非常实用的方法nexttag()和nexttext()
nexttag()--首先它会忽略white spaces,如果可以确定下一个是start_tag或end_tag,就可以调用nexttag()直接跳过去。通常它有二个用处:当start_tag时,如果能确定这个tag含有子tag,那么就可以调用nexttag()产生子标签的start_tag事件;当end_tag时,如果确定不是文档结尾,就可以调用nexttag()产生下一个标签的start_tag。在这二种情况下如果用next()会有text事件,但返回的是换行符或空白符。
nexttext()--它只能在start_tag时调用。当下一个元素是text时,text的内容会返回;当下一个元素是end_tag时,也就是说这个标签的内容为空,那么空字串返回;这个方法返回后,parser会停在end_tag上。比如:
<author>james elliott</author>
<author></author>
<author/>
当start_tag时,调用nexttext(),依次返回:
"james elliott"
""(empty)
""(empty)
这个方法在处理没有子标签的标签时很有用。比如:
<title>what is hibernate</title>
<author>james elliott</author>
<category>web</category>
就可以用以下代码来处理:
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
eventtype = parser.nexttag();
break;
default:
break;
}
}
这就要比用next()来处理方便多了,可读性也大大的加强了。
最后附上一个解析xml的实例android程序
import java.io.ioexception;
import java.io.inputstream;
import org.xmlpull.v1.xmlpullparser;
import org.xmlpull.v1.xmlpullparserexception;
import org.xmlpull.v1.xmlpullparserfactory;
import android.util.log;
public class rsspullparser extends rssparser {
private final string tag = feedsettings.global_tag;
private inputstream minputstream;
public rsspullparser(inputstream is) {
minputstream = is;
}
public void parse() throws readerbaseexception, xmlpullparserexception, ioexception {
if (minputstream == null) {
throw new readerbaseexception("no input source, did you initialize this class correctly?");
}
final xmlpullparserfactory factory = xmlpullparserfactory.newinstance();
factory.setnamespaceaware(true);
final xmlpullparser parser = factory.newpullparser();
parser.setinput(minputstream);
int eventtype = parser.geteventtype();
if (eventtype != xmlpullparser.start_document) {
throw new readerbaseexception("not starting with 'start_document'");
}
eventtype = parserss(parser);
if (eventtype != xmlpullparser.end_document) {
throw new readerbaseexception("not ending with 'end_document', do you finish parsing?");
}
if (minputstream != null) {
minputstream.close();
} else {
log.e(tag, "inputstream is null, xmlpullparser closed it??");
}
}
/**
* parsing the xml document. current type must be start_document.
* after calling this, parser is positioned at end_document.
* @param parser
* @return event end_document
* @throws xmlpullparserexception
* @throws readerbaseexception
* @throws ioexception
*/
private int parserss(xmlpullparser parser) throws xmlpullparserexception, readerbaseexception, ioexception {
int eventtype = parser.geteventtype();
if (eventtype != xmlpullparser.start_document) {
throw new readerbaseexception("not starting with 'start_document', is this a new document?");
}
log.e(tag, "starting document, are you aware of that!");
eventtype = parser.next();
while (eventtype != xmlpullparser.end_document) {
switch (eventtype) {
case xmlpullparser.start_tag: {
log.e(tag, "start tag: '" + parser.getname() + "'");
final string tagname = parser.getname();
if (tagname.equals(rssfeed.tag_rss)) {
log.e(tag, "starting an rss feed <<");
final int attrsize = parser.getattributecount();
for (int i = 0; i < attrsize; i++) {
log.e(tag, "attr '" + parser.getattributename(i) + "=" + parser.getattributevalue(i) + "'");
}
} else if (tagname.equals(rssfeed.tag_channel)) {
log.e(tag, "\tstarting an channel <<");
parsechannel(parser);
}
break;
}
case xmlpullparser.end_tag: {
log.e(tag, "end tag: '" + parser.getname() + "'");
final string tagname = parser.getname();
if (tagname.equals(rssfeed.tag_rss)) {
log.e(tag, ">> edning an rss feed");
} else if (tagname.equals(rssfeed.tag_channel)) {
log.e(tag, "\t>> ending an channel");
}
break;
}
default:
break;
}
eventtype = parser.next();
}
log.e(tag, "end of document, it is over");
return parser.geteventtype();
}
/**
* parse a channel. must be start tag of an channel, otherwise exception thrown.
* param xmlpullparser
* after calling this function, parser is positioned at end_tag of channel.
* return end tag of a channel
* @throws xmlpullparserexception
* @throws readerbaseexception
* @throws ioexception
*/
private int parsechannel(xmlpullparser parser) throws xmlpullparserexception, readerbaseexception, ioexception {
int eventtype = parser.geteventtype();
string tagname = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_channel.equals(tagname)) {
throw new readerbaseexception("not start with 'start tag', is this a start of a channel?");
}
log.e(tag, "\tstarting " + tagname);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag: {
final string tag = parser.getname();
if (tag.equals(rssfeed.tag_image)) {
parseimage(parser);
} else if (tag.equals(rssfeed.tag_item)) {
parseitem(parser);
} else {
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
}
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
}
default:
break;
}
}
log.e(tag, "\tending " + parser.getname());
return parser.geteventtype();
}
/**
* parse image in a channel.
* precondition: position must be at start_tag and tag must be 'image'
* postcondition: position is end_tag of '/image'
* @throws ioexception
* @throws xmlpullparserexception
* @throws readerbaseexception
*/
private int parseimage(xmlpullparser parser) throws xmlpullparserexception, ioexception, readerbaseexception {
int eventtype = parser.geteventtype();
string tag = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_image.equals(tag)) {
throw new readerbaseexception("not start with 'start tag', is this a start of an image?");
}
log.e(tag, "\t\tstarting image " + tag);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
log.e(tag, tag + ": [" + parser.nexttext() + "]");
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
default:
break;
}
}
log.e(tag, "\t\tending image " + parser.getname());
return parser.geteventtype();
}
/**
* parse an item in a channel.
* precondition: position must be at start_tag and tag must be 'item'
* postcondition: position is end_tag of '/item'
* @throws ioexception
* @throws xmlpullparserexception
* @throws readerbaseexception
*/
private int parseitem(xmlpullparser parser) throws xmlpullparserexception, ioexception, readerbaseexception {
int eventtype = parser.geteventtype();
string tag = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_item.equals(tag)) {
throw new readerbaseexception("not start with 'start tag', is this a start of an item?");
}
log.e(tag, "\t\tstarting " + tag);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
default:
break;
}
}
log.e(tag, "\t\tending " + parser.getname());
return parser.geteventtype();
}
}
复制代码 代码如下:
org.xmlpull.v1.xmlpullparser;
org.xmlpull.v1.xmlpullparserfactory;
二个类,其中主要使用的是xmlpullparser,xmlpullparserfactory是一个工厂,用于构建xmlpullparser对象。
应用程序通过调用xmlpullparser.next()等方法来产生event,然后再处理event。可以看到它与push方法的不同,push方法是由parser自己主动产生event,回调给应用程序。而pull方法是主动的调用parser的方法才能产生事件。
假如xml中的语句是这样的:"<author country="united states">james elliott</author>",author是tag,country是attribute,"james elliott"是text。
要想解析文档先要构建一个xmlpullparser对象
复制代码 代码如下:
final xmlpullparserfactory factory = xmlpullparserfactory.newinstance();
factory.setnamespaceaware(true);
final xmlpullparser parser = factory.newpullparser();
pull解析是一个遍历文档的过程,每次调用next(),nexttag(), nexttoken()和nexttext()都会向前推进文档,并使parser停留在某些事件上面,但是不能倒退。
然后把文档设置给parser
复制代码 代码如下:
parser.setinput(new stringreader("<author country=\"united states\">james elliott</author>");
这时,文档刚被初始化,所以它应该位于文档的开始,事件应该是start_document,可以通过xmlpullparser.geteventtype()来获取。然后调用next()会产生
start_tag,这个事件告诉应用程序一个标签已经开始了,调用getname()会返回"author";再next()会产生
text事件,调用gettext()会返回"james elliott",再next(),会产生
end_tag,这个告诉你一个标签已经处理完了,再next(),会产生
end_document,它告诉你整个文档已经处理完成了。
除了next()外,nexttoken()也可以使用,只不过它会返回更加详细的事件,比如 comment, cdsect, docdecl, entity等等非常详细的信息。如果程序得到比较底层的信息,可以用nexttoken()来驱动并处理详细的事件。需要注意一点的是text事件是有可能返回空白的white spaces比如换行符或空格等。
另外有二个非常实用的方法nexttag()和nexttext()
nexttag()--首先它会忽略white spaces,如果可以确定下一个是start_tag或end_tag,就可以调用nexttag()直接跳过去。通常它有二个用处:当start_tag时,如果能确定这个tag含有子tag,那么就可以调用nexttag()产生子标签的start_tag事件;当end_tag时,如果确定不是文档结尾,就可以调用nexttag()产生下一个标签的start_tag。在这二种情况下如果用next()会有text事件,但返回的是换行符或空白符。
nexttext()--它只能在start_tag时调用。当下一个元素是text时,text的内容会返回;当下一个元素是end_tag时,也就是说这个标签的内容为空,那么空字串返回;这个方法返回后,parser会停在end_tag上。比如:
复制代码 代码如下:
<author>james elliott</author>
<author></author>
<author/>
当start_tag时,调用nexttext(),依次返回:
"james elliott"
""(empty)
""(empty)
这个方法在处理没有子标签的标签时很有用。比如:
复制代码 代码如下:
<title>what is hibernate</title>
<author>james elliott</author>
<category>web</category>
就可以用以下代码来处理:
复制代码 代码如下:
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
eventtype = parser.nexttag();
break;
default:
break;
}
}
这就要比用next()来处理方便多了,可读性也大大的加强了。
最后附上一个解析xml的实例android程序
复制代码 代码如下:
import java.io.ioexception;
import java.io.inputstream;
import org.xmlpull.v1.xmlpullparser;
import org.xmlpull.v1.xmlpullparserexception;
import org.xmlpull.v1.xmlpullparserfactory;
import android.util.log;
public class rsspullparser extends rssparser {
private final string tag = feedsettings.global_tag;
private inputstream minputstream;
public rsspullparser(inputstream is) {
minputstream = is;
}
public void parse() throws readerbaseexception, xmlpullparserexception, ioexception {
if (minputstream == null) {
throw new readerbaseexception("no input source, did you initialize this class correctly?");
}
final xmlpullparserfactory factory = xmlpullparserfactory.newinstance();
factory.setnamespaceaware(true);
final xmlpullparser parser = factory.newpullparser();
parser.setinput(minputstream);
int eventtype = parser.geteventtype();
if (eventtype != xmlpullparser.start_document) {
throw new readerbaseexception("not starting with 'start_document'");
}
eventtype = parserss(parser);
if (eventtype != xmlpullparser.end_document) {
throw new readerbaseexception("not ending with 'end_document', do you finish parsing?");
}
if (minputstream != null) {
minputstream.close();
} else {
log.e(tag, "inputstream is null, xmlpullparser closed it??");
}
}
/**
* parsing the xml document. current type must be start_document.
* after calling this, parser is positioned at end_document.
* @param parser
* @return event end_document
* @throws xmlpullparserexception
* @throws readerbaseexception
* @throws ioexception
*/
private int parserss(xmlpullparser parser) throws xmlpullparserexception, readerbaseexception, ioexception {
int eventtype = parser.geteventtype();
if (eventtype != xmlpullparser.start_document) {
throw new readerbaseexception("not starting with 'start_document', is this a new document?");
}
log.e(tag, "starting document, are you aware of that!");
eventtype = parser.next();
while (eventtype != xmlpullparser.end_document) {
switch (eventtype) {
case xmlpullparser.start_tag: {
log.e(tag, "start tag: '" + parser.getname() + "'");
final string tagname = parser.getname();
if (tagname.equals(rssfeed.tag_rss)) {
log.e(tag, "starting an rss feed <<");
final int attrsize = parser.getattributecount();
for (int i = 0; i < attrsize; i++) {
log.e(tag, "attr '" + parser.getattributename(i) + "=" + parser.getattributevalue(i) + "'");
}
} else if (tagname.equals(rssfeed.tag_channel)) {
log.e(tag, "\tstarting an channel <<");
parsechannel(parser);
}
break;
}
case xmlpullparser.end_tag: {
log.e(tag, "end tag: '" + parser.getname() + "'");
final string tagname = parser.getname();
if (tagname.equals(rssfeed.tag_rss)) {
log.e(tag, ">> edning an rss feed");
} else if (tagname.equals(rssfeed.tag_channel)) {
log.e(tag, "\t>> ending an channel");
}
break;
}
default:
break;
}
eventtype = parser.next();
}
log.e(tag, "end of document, it is over");
return parser.geteventtype();
}
/**
* parse a channel. must be start tag of an channel, otherwise exception thrown.
* param xmlpullparser
* after calling this function, parser is positioned at end_tag of channel.
* return end tag of a channel
* @throws xmlpullparserexception
* @throws readerbaseexception
* @throws ioexception
*/
private int parsechannel(xmlpullparser parser) throws xmlpullparserexception, readerbaseexception, ioexception {
int eventtype = parser.geteventtype();
string tagname = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_channel.equals(tagname)) {
throw new readerbaseexception("not start with 'start tag', is this a start of a channel?");
}
log.e(tag, "\tstarting " + tagname);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag: {
final string tag = parser.getname();
if (tag.equals(rssfeed.tag_image)) {
parseimage(parser);
} else if (tag.equals(rssfeed.tag_item)) {
parseitem(parser);
} else {
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
}
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
}
default:
break;
}
}
log.e(tag, "\tending " + parser.getname());
return parser.geteventtype();
}
/**
* parse image in a channel.
* precondition: position must be at start_tag and tag must be 'image'
* postcondition: position is end_tag of '/image'
* @throws ioexception
* @throws xmlpullparserexception
* @throws readerbaseexception
*/
private int parseimage(xmlpullparser parser) throws xmlpullparserexception, ioexception, readerbaseexception {
int eventtype = parser.geteventtype();
string tag = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_image.equals(tag)) {
throw new readerbaseexception("not start with 'start tag', is this a start of an image?");
}
log.e(tag, "\t\tstarting image " + tag);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
log.e(tag, tag + ": [" + parser.nexttext() + "]");
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
default:
break;
}
}
log.e(tag, "\t\tending image " + parser.getname());
return parser.geteventtype();
}
/**
* parse an item in a channel.
* precondition: position must be at start_tag and tag must be 'item'
* postcondition: position is end_tag of '/item'
* @throws ioexception
* @throws xmlpullparserexception
* @throws readerbaseexception
*/
private int parseitem(xmlpullparser parser) throws xmlpullparserexception, ioexception, readerbaseexception {
int eventtype = parser.geteventtype();
string tag = parser.getname();
if (eventtype != xmlpullparser.start_tag || !rssfeed.tag_item.equals(tag)) {
throw new readerbaseexception("not start with 'start tag', is this a start of an item?");
}
log.e(tag, "\t\tstarting " + tag);
eventtype = parser.nexttag();
while (eventtype != xmlpullparser.end_tag) {
switch (eventtype) {
case xmlpullparser.start_tag:
tag = parser.getname();
final string content = parser.nexttext();
log.e(tag, tag + ": [" + content + "]");
// now it should be at end_tag, ensure it
if (parser.geteventtype() != xmlpullparser.end_tag) {
throw new readerbaseexception("not ending with 'end tag', did you finish parsing sub item?");
}
eventtype = parser.nexttag();
break;
default:
break;
}
}
log.e(tag, "\t\tending " + parser.getname());
return parser.geteventtype();
}
}