wIndows phone 7 解析Html数据

程序员文章站 2022-04-30 20:21:01

在我的上一篇文章中我介绍了windows phone 7的gb2312解码, /kf/201111/112551.html 解决了下载的html乱码问题,这一篇,我将介绍关于w...

在我的上一篇文章中我介绍了windows phone 7的gb2312解码,

/kf/201111/112551.html

解决了下载的html乱码问题,这一篇,我将介绍关于windows phone 7解析html数据，以便我们获得想要的数据.

这里,我先介绍一个类库htmlagilitypack,（上一篇文章也是通过这个工具来解码的）. 类库的dll文件我会随demo一起提供

这里,我以新浪新闻为例来解析数据

先看看网页版的新浪新闻

http://news.sina.com.cn/w/sd/2011-11-27/070023531646.shtml

然后我们看一下他的源文件，

发现新闻内容的结构是

view sourceprint?

<h1 id="artibodytitle" pid="1" tid="1" did="23531646" fid="1666">title</h1>

<a href="http://www.sina.com.cn">http://www.sina.com.cn</a> pub_date <a href="">media_name</a> <a href=""></a>

大部分还有id属性,这更适合我们去解析了。

接下来我们开始去解析

第一：引用htmlagilitypack.dll文件

第二：用webclient或者webrequest类来下载html页面然后处理成字符串。

view sourceprint?public delegate void callbackevent(object sender, downloadeventargs e);

public event callbackevent downloadcallbackevent;

public void httpwebrequestdownloadget(string url)

{

thread _thread = new thread(delegate()

{

uri _uri = new uri(url, urikind.relativeorabsolute);

httpwebrequest _httpwebrequest = (httpwebrequest)webrequest.create(_uri);

_httpwebrequest.method="get";

_httpwebrequest.begingetresponse(new asynccallback(delegate(iasyncresult result)

{

httpwebrequest _httpwebrequestcallback = (httpwebrequest)result.asyncstate;

httpwebresponse _httpwebresponsecallback = (httpwebresponse)_httpwebrequestcallback.endgetresponse(result);

stream _streamcallback = _httpwebresponsecallback.getresponsestream();

streamreader _streamreader = new streamreader(_streamcallback,new htmlagilitypack.gb2312encoding());

string _stringcallback = _streamreader.readtoend();

deployment.current.dispatcher.begininvoke(new action(() =>

{

if (downloadcallbackevent != null)

{

downloadeventargs _downloadeventargs = new downloadeventargs();

_downloadeventargs._downloadstream = _streamcallback;

_downloadeventargs._downloadstring = _stringcallback;

downloadcallbackevent(this, _downloadeventargs);

}

}));

}), _httpwebrequest);

}) ;

_thread.start();

}

// }

o(∩_∩)o! 我这个比较复杂, 总之我们下载了html的数据就行了。

贴一个简单的下载方式吧

view sourceprint?webclient webclenet=new webclient();

webclenet.encoding = new htmlagilitypack.gb2312encoding(); //加入这句设定编码

webclenet.downloadstringasync(new uri("http://news.sina.com.cn/s/2011-11-25/120923524756.shtml", urikind.relativeorabsolute));

webclenet.downloadstringcompleted += new downloadstringcompletedeventhandler(webclenet_downloadstringcompleted);

现在处理回调函数的e.result

view sourceprint?string _result = e._downloadstring;

htmldocument _doc = new htmldocument(); //实例化htmlagilitypack.htmldocument对象

_doc.loadhtml(_result); //载入html

htmlnode _htmlnode01 = _doc.getelementbyid("artibodytitle"); //新闻标题的div

string _title = _htmlnode01.innertext;

htmlnode _htmlnode02 = _doc.getelementbyid("artibody"); //获取内容的p

string _content = _htmlnode02.innertext;

// int _count= _htmlnode02.childnodes.where(new func<htmlnode,bool>("p"));

int _pindex = _content.indexof(" .blkcomment");

_content= _content.substring(0,_pindex);

#region　新浪标签

htmlnode _htmlnodo03 = _doc.getelementbyid("art_source");

string _www = _htmlnodo03.firstchild.innertext;

string _wwwint = _htmlnodo03.firstchild.attributes[0].value;

#endregion

// string _source = _htmlnodo03;

//_htmlnodo03.childnodes

#region 发布时间

htmlnode _htmlnodo04 = _doc.getelementbyid("pub_date");

string _pub_date = _htmlnodo04.innertext;

#endregion

#region 来源网站信息

htmlnode _htmlnodo05 = _doc.getelementbyid("media_name");

string _media_name = _htmlnodo05.firstchild.innertext;

string _modia_source = _htmlnodo05.firstchild.attributes[0].value;

#endregion

media_namehyperlinkbutton.content = _pub_date + " " + _media_name;

media_namehyperlinkbutton.navigateuri = new uri(_modia_source, urikind.relativeorabsolute);

titletextblock.text = _title;

contenttextblock.text = _content;

结果如下图所示：

wIndows phone 7 解析Html数据

网页的大部分标签是没有id属性的,不过幸运的是htmlagilitypack支持xpath

那就需要通过xpath语言来查找匹配所需节点

xpath教程：http://www.w3school.com.cn/xpath/index.

案例下载：

http://115.com/file/dn87dl2d#

myframework_test.zip

作者青瓷

上一篇： ai怎么设计漂亮的铃铛素材?

下一篇： Windows Phone 7(WP7)开发读取本地资源文件

wIndows phone 7 解析Html数据

Windows7下Python3.4使用MySQL数据库

windows phone 配置PhoneGap开发环境(wp7 phonegap 开发环境)图文教程

python抓取某汽车网数据解析html存入excel示例

Windows Phone 7 开发探索笔记5——页面间导航

7月1日起微软停止为Windows Phone 8.x设备提供应用更新

深入解析HTML5的IndexedDB索引数据库

Windows7下Python3.4使用MySQL数据库

C#抓取网页数据解析标题描述图片等信息去除HTML标签

Windows Phone 7有损，缩略图的生成

windows phone 7基本导航

wIndows phone 7 解析Html数据

Windows7下Python3.4使用MySQL数据库

windows phone 配置PhoneGap开发环境(wp7 phonegap 开发环境)图文教程

python抓取某汽车网数据解析html存入excel示例

Windows Phone 7 开发探索笔记5——页面间导航

7月1日起微软停止为Windows Phone 8.x设备提供应用更新

深入解析HTML5的IndexedDB索引数据库

Windows7下Python3.4使用MySQL数据库

C#抓取网页数据 解析标题描述图片等信息 去除HTML标签

Windows Phone 7有损，缩略图的生成

windows phone 7基本导航

C#抓取网页数据解析标题描述图片等信息去除HTML标签