正则表达式提取网址、标题、图片等一例(.Net Asp Javascript/Js)的实现
程序员文章站
2023-08-26 20:39:56
在一些抓取、过滤等情况下, 正则表达式 regular expression 的优势是很明显的。 例如,有如下的字符串: 复制代码 代码如下:
在一些抓取、过滤等情况下, 正则表达式 regular expression 的优势是很明显的。
例如,有如下的字符串:
<li><a href="http://www.abcxyz.com/something/article/143.htm" title="fckeditor高亮代码插件测试"><span class="article-date">[09/11]</span>fckeditor高亮代码插件测试</a></li>
现在,需要提取 href 后面的网址,[]内的日期,和 链接的文字。
下面给出c#, asp 和 javascript 的实现方式
c#的实现
string strhtml = "<li><a \"href=http://www.abcxyz.com/something/article/143.htm\" title=\"fckeditor高亮代码插件测试\"><span class=\"article-date\">[09/11]</span>fckeditor高亮代码插件测试</a></li>";
string pattern = "http://([^\\s]+)\".+?span.+?\\[(.+?)\\].+?>(.+?)<";
regex reg = new regex( pattern, regexoptions.ignorecase );
matchcollection mc = reg.matches( strhtml );
if (mc.count > 0)
{
foreach (match m in mc)
{
console.writeline( m.groups[1].value );
console.writeline( m.groups[2].value );
console.writeline( m.groups[3].value );
}
}
asp的实现
<%
dim str, reg, objmatches
str = "<li><a href=""http://localhost/z-blog18/article/143.htm"" title=""fckeditor高亮代码插件测试""><span class=""article-date"">[09/11]</span>fckeditor高亮代码插件测试</a></li>"
set reg = new regexp
reg.ignorecase = true
reg.global = true
reg.pattern = "http://([^\s]+)"".+?span.+?\[(.+?)\].+?>(.+?)<"
set objmatches = reg.execute(str)
if objmatches.count > 0 then
response.write("网址:")
response.write(objmatches(0).submatches(0))
response.write("<br>")
response.write("日期:")
response.write(objmatches(0).submatches(1))
response.write("<br>")
response.write("标题:")
response.write(objmatches(0).submatches(2))
end if
%>
javascript的实现
<script type="text/javascript">
var str = '<li><a href="http://localhost/z-blog18/article/143.htm" title="fckeditor高亮代码插件测试"><span class="article-date">[09/11]</span>fckeditor高亮代码插件测试</a></li>';
var pattern = /http:\/\/([^\s]+)".+?span.+?\[(.+?)\].+?>(.+?)</gi;
var mts = pattern.exec(str);
if (mts != null)
{
alert(mts[1]);
alert(mts[2]);
alert(mts[3]);
alert(mts[4]);
}
</script>
例如,有如下的字符串:
复制代码 代码如下:
<li><a href="http://www.abcxyz.com/something/article/143.htm" title="fckeditor高亮代码插件测试"><span class="article-date">[09/11]</span>fckeditor高亮代码插件测试</a></li>
现在,需要提取 href 后面的网址,[]内的日期,和 链接的文字。
下面给出c#, asp 和 javascript 的实现方式
c#的实现
复制代码 代码如下:
string strhtml = "<li><a \"href=http://www.abcxyz.com/something/article/143.htm\" title=\"fckeditor高亮代码插件测试\"><span class=\"article-date\">[09/11]</span>fckeditor高亮代码插件测试</a></li>";
string pattern = "http://([^\\s]+)\".+?span.+?\\[(.+?)\\].+?>(.+?)<";
regex reg = new regex( pattern, regexoptions.ignorecase );
matchcollection mc = reg.matches( strhtml );
if (mc.count > 0)
{
foreach (match m in mc)
{
console.writeline( m.groups[1].value );
console.writeline( m.groups[2].value );
console.writeline( m.groups[3].value );
}
}
asp的实现
复制代码 代码如下:
<%
dim str, reg, objmatches
str = "<li><a href=""http://localhost/z-blog18/article/143.htm"" title=""fckeditor高亮代码插件测试""><span class=""article-date"">[09/11]</span>fckeditor高亮代码插件测试</a></li>"
set reg = new regexp
reg.ignorecase = true
reg.global = true
reg.pattern = "http://([^\s]+)"".+?span.+?\[(.+?)\].+?>(.+?)<"
set objmatches = reg.execute(str)
if objmatches.count > 0 then
response.write("网址:")
response.write(objmatches(0).submatches(0))
response.write("<br>")
response.write("日期:")
response.write(objmatches(0).submatches(1))
response.write("<br>")
response.write("标题:")
response.write(objmatches(0).submatches(2))
end if
%>
javascript的实现
复制代码 代码如下:
<script type="text/javascript">
var str = '<li><a href="http://localhost/z-blog18/article/143.htm" title="fckeditor高亮代码插件测试"><span class="article-date">[09/11]</span>fckeditor高亮代码插件测试</a></li>';
var pattern = /http:\/\/([^\s]+)".+?span.+?\[(.+?)\].+?>(.+?)</gi;
var mts = pattern.exec(str);
if (mts != null)
{
alert(mts[1]);
alert(mts[2]);
alert(mts[3]);
alert(mts[4]);
}
</script>