ASP获取网页全部图片地址并保存为数组的正则
程序员文章站
2023-01-25 09:10:34
目前还是有bug的,最新的测试页面在: http://www.reallydo.com/getimg.asp 正则分析页面在: http://jorki...
目前还是有bug的,最新的测试页面在: http://www.reallydo.com/getimg.asp
正则分析页面在: http://jorkin.reallydo.com/article.asp?id=380
发现bug请在后面留言,谢谢.
1.31修正
src=后面有空格不能正确匹配.已修正.
src=''为空时出错.已修正.
发现bug: 图片路径有多个空格时只能保留一个.未修正.
2.18修正
图片路径有多个空格时只能保留一个的bug.已修正.
<%
'功能:获取全部图片地址,保存到一个数组.
'来源:http://jorkin.reallydo.com/article.asp?id=448
'需要replaceall函数:http://jorkin.reallydo.com/article.asp?id=406
function getimg(sstring)
dim sreallydo, regex, ireallydo
dim omatches, cmatch
'//定义一个空数组
ireallydo = -1
redim areallydo(ireallydo)
if isnull(sstring) then
getimg = ""
exit function
end if
'//格式化html代码
'//将每个 <img 换行 方便正则替换
sreallydo = sstring
on error resume next
sreallydo = replace(sreallydo, vbcr, " ")
sreallydo = replace(sreallydo, vblf, " ")
sreallydo = replace(sreallydo, vbtab, " ")
sreallydo = replace(sreallydo, "<img ", vbcrlf & "<img ", 1, -1, 1)
sreallydo = replace(sreallydo, "/>", " />", 1, -1, 1)
sreallydo = replaceall(sreallydo, "= ", "=", true)
sreallydo = replaceall(sreallydo, "> ", ">", true)
sreallydo = replace(sreallydo, "><", ">" & vbcrlf & "<")
sreallydo = trim(sreallydo)
on error goto 0
set regex = new regexp
regex.ignorecase = true
regex.global = true
'//去除onclick,onload等脚本
regex.pattern = "\s[on].+?=([\""|\'])(.*?)\1"
sreallydo = regex.replace(sreallydo, "")
'//将src不带引号的图片地址加上引号
regex.pattern = "<img.*?\ssrc=([^\""\'\s][^\""\'\s>]*).*?>"
sreallydo = regex.replace(sreallydo, "<img src=""$1"" />")
'//正则匹配图片src地址
regex.pattern = "<img.*?\ssrc=([\""\'])([^\""\']+?)\1.*?>"
set omatches = regex.execute(sreallydo)
'//将图片地址存入数组
for each cmatch in omatches
ireallydo = ireallydo + 1
redim preserve areallydo(ireallydo)
areallydo(ireallydo) = regex.replace(cmatch.value, "$2")
next
getimg = areallydo
end function
%>
正则分析页面在: http://jorkin.reallydo.com/article.asp?id=380
发现bug请在后面留言,谢谢.
1.31修正
src=后面有空格不能正确匹配.已修正.
src=''为空时出错.已修正.
发现bug: 图片路径有多个空格时只能保留一个.未修正.
2.18修正
图片路径有多个空格时只能保留一个的bug.已修正.
复制代码 代码如下:
<%
'功能:获取全部图片地址,保存到一个数组.
'来源:http://jorkin.reallydo.com/article.asp?id=448
'需要replaceall函数:http://jorkin.reallydo.com/article.asp?id=406
function getimg(sstring)
dim sreallydo, regex, ireallydo
dim omatches, cmatch
'//定义一个空数组
ireallydo = -1
redim areallydo(ireallydo)
if isnull(sstring) then
getimg = ""
exit function
end if
'//格式化html代码
'//将每个 <img 换行 方便正则替换
sreallydo = sstring
on error resume next
sreallydo = replace(sreallydo, vbcr, " ")
sreallydo = replace(sreallydo, vblf, " ")
sreallydo = replace(sreallydo, vbtab, " ")
sreallydo = replace(sreallydo, "<img ", vbcrlf & "<img ", 1, -1, 1)
sreallydo = replace(sreallydo, "/>", " />", 1, -1, 1)
sreallydo = replaceall(sreallydo, "= ", "=", true)
sreallydo = replaceall(sreallydo, "> ", ">", true)
sreallydo = replace(sreallydo, "><", ">" & vbcrlf & "<")
sreallydo = trim(sreallydo)
on error goto 0
set regex = new regexp
regex.ignorecase = true
regex.global = true
'//去除onclick,onload等脚本
regex.pattern = "\s[on].+?=([\""|\'])(.*?)\1"
sreallydo = regex.replace(sreallydo, "")
'//将src不带引号的图片地址加上引号
regex.pattern = "<img.*?\ssrc=([^\""\'\s][^\""\'\s>]*).*?>"
sreallydo = regex.replace(sreallydo, "<img src=""$1"" />")
'//正则匹配图片src地址
regex.pattern = "<img.*?\ssrc=([\""\'])([^\""\']+?)\1.*?>"
set omatches = regex.execute(sreallydo)
'//将图片地址存入数组
for each cmatch in omatches
ireallydo = ireallydo + 1
redim preserve areallydo(ireallydo)
areallydo(ireallydo) = regex.replace(cmatch.value, "$2")
next
getimg = areallydo
end function
%>
上一篇: NAV导致IIS调用FSO失败的解决方法
下一篇: 列出指定目录下的所有文件和目录