欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  移动技术

wIndows phone 7 解析Html数据

程序员文章站 2022-04-30 20:21:01
  在我的上一篇文章中我介绍了windows phone 7的gb2312解码, /kf/201111/112551.html 解决了下载的html乱码问题,这一篇,我将介绍关于w...

 

在我的上一篇文章中我介绍了windows phone 7的gb2312解码,

/kf/201111/112551.html

解决了下载的html乱码问题,这一篇,我将介绍关于windows phone 7解析html数据,以便我们获得想要的数据.

 

这里,我先介绍一个类库htmlagilitypack,(上一篇文章也是通过这个工具来解码的). 类库的dll文件我会随demo一起提供

 

这里,我以新浪新闻为例来解析数据

 

 

 

先看看网页版的新浪新闻

 

http://news.sina.com.cn/w/sd/2011-11-27/070023531646.shtml

 

然后我们看一下他的源文件,

 

发现新闻内容的结构是

 

view sourceprint?<p class="blkcontainersblk"> 

 

                <h1 id="artibodytitle" pid="1" tid="1" did="23531646" fid="1666">title</h1> 

 

                <p class="artinfo"><span id="art_source"><a href="http://www.sina.com.cn">http://www.sina.com.cn</a></span>  <span id="pub_date">pub_date</span>  <span id="media_name"><a href="">media_name</a> <a href=""></a> </span></p> 

 

  

 

                <!-- 正文内容begin --> 

 

                <!-- google_ad_section_start --> 

 

  

 

                <p class="blkcontainersblkcon" id="artibody"></p> 

 

</p>

 

大部分还有id属性,这更适合我们去解析了。

 

接下来我们开始去解析

 

第一: 引用htmlagilitypack.dll文件

 

第二:用webclient或者webrequest类来下载html页面然后处理成字符串。

 

view sourceprint?public  delegate void callbackevent(object sender, downloadeventargs e); 

 

       public  event callbackevent downloadcallbackevent; 

 

       public void httpwebrequestdownloadget(string url) 

 

       { 

 

             

 

           thread _thread = new thread(delegate() 

 

           { 

 

               uri _uri = new uri(url, urikind.relativeorabsolute); 

 

               httpwebrequest _httpwebrequest = (httpwebrequest)webrequest.create(_uri); 

 

                _httpwebrequest.method="get"; 

 

               

 

               _httpwebrequest.begingetresponse(new asynccallback(delegate(iasyncresult result) 

 

               { 

 

                   httpwebrequest _httpwebrequestcallback = (httpwebrequest)result.asyncstate; 

 

                   httpwebresponse _httpwebresponsecallback = (httpwebresponse)_httpwebrequestcallback.endgetresponse(result); 

 

                   stream _streamcallback = _httpwebresponsecallback.getresponsestream(); 

 

 

 

                   streamreader _streamreader = new streamreader(_streamcallback,new htmlagilitypack.gb2312encoding()); 

 

                   string _stringcallback = _streamreader.readtoend(); 

 

                  

 

                   deployment.current.dispatcher.begininvoke(new action(() => 

 

                   { 

 

                       if (downloadcallbackevent != null) 

 

                       { 

 

                           downloadeventargs _downloadeventargs = new downloadeventargs(); 

 

                           _downloadeventargs._downloadstream = _streamcallback; 

 

                           _downloadeventargs._downloadstring = _stringcallback; 

 

                           downloadcallbackevent(this, _downloadeventargs); 

 

 

 

                       } 

 

                   })); 

 

 

 

               }), _httpwebrequest); 

 

           }) ; 

 

           _thread.start(); 

 

       } 

 

      // }

 

o(∩_∩)o! 我这个比较复杂, 总之我们下载了html的数据就行了。 

 

贴一个简单的下载方式吧

 

view sourceprint?webclient webclenet=new webclient();   

 

  

 

         webclenet.encoding = new htmlagilitypack.gb2312encoding(); //加入这句设定编码   

 

  

 

         webclenet.downloadstringasync(new uri("http://news.sina.com.cn/s/2011-11-25/120923524756.shtml", urikind.relativeorabsolute));        

 

  

 

         webclenet.downloadstringcompleted += new downloadstringcompletedeventhandler(webclenet_downloadstringcompleted);

 

 现在处理回调函数的e.result

 

view sourceprint?string _result = e._downloadstring; 

 

 

 

           htmldocument _doc = new htmldocument(); //实例化htmlagilitypack.htmldocument对象 

 

           _doc.loadhtml(_result);         //载入html 

 

 

 

           htmlnode _htmlnode01 = _doc.getelementbyid("artibodytitle");  //新闻标题的div 

 

           string _title = _htmlnode01.innertext; 

 

 

 

           htmlnode _htmlnode02 = _doc.getelementbyid("artibody");     //获取内容的p   

 

           string _content = _htmlnode02.innertext; 

 

          // int _count= _htmlnode02.childnodes.where(new func<htmlnode,bool>("p")); 

 

           int _pindex = _content.indexof(" .blkcomment"); 

 

 

 

           _content= _content.substring(0,_pindex); 

 

 

 

           #region 新浪标签 

 

           htmlnode _htmlnodo03 = _doc.getelementbyid("art_source"); 

 

           string _www = _htmlnodo03.firstchild.innertext; 

 

           string _wwwint = _htmlnodo03.firstchild.attributes[0].value; 

 

           #endregion 

 

           // string _source = _htmlnodo03; 

 

           //_htmlnodo03.childnodes 

 

 

 

           #region 发布时间 

 

           htmlnode _htmlnodo04 = _doc.getelementbyid("pub_date"); 

 

           string _pub_date = _htmlnodo04.innertext; 

 

           #endregion 

 

 

 

 

 

           #region 来源网站信息 

 

           htmlnode _htmlnodo05 = _doc.getelementbyid("media_name"); 

 

           string _media_name = _htmlnodo05.firstchild.innertext; 

 

           string _modia_source = _htmlnodo05.firstchild.attributes[0].value; 

 

           #endregion 

 

 

 

           media_namehyperlinkbutton.content = _pub_date + " " + _media_name; 

 

           media_namehyperlinkbutton.navigateuri = new uri(_modia_source, urikind.relativeorabsolute); 

 

           titletextblock.text = _title; 

 

           contenttextblock.text = _content;

 

 

 

结果如下图所示:

 

wIndows phone 7 解析Html数据

 

 

网页的大部分标签是没有id属性的,不过幸运的是htmlagilitypack支持xpath

 

那就需要通过xpath语言来查找匹配所需节点

 

xpath教程:http://www.w3school.com.cn/xpath/index.

 

 

 

案例下载:

 

http://115.com/file/dn87dl2d#

myframework_test.zip

 

作者 青瓷