欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

Java读取网页内容并下载图片的实例

程序员文章站 2024-02-28 12:55:10
java读取网页内容并下载图片的实例      很多人在第一次了解数据采集的时候,可能无从下手,尤其是作为一个新手,更是...

java读取网页内容并下载图片的实例

     很多人在第一次了解数据采集的时候,可能无从下手,尤其是作为一个新手,更是感觉很是茫然,所以,在这里分享一下自己的心得,希望和大家一起分享技术,如果有什么不足,还请大家指正。写出这篇目的,就是希望大家一起成长,我也相信技术之间没有高低,只有互补,只有分享,才能使彼此更加成长。  

示例代码:

import java.io.bufferedinputstream;
import java.io.bufferedreader;
import java.io.file;
import java.io.filenotfoundexception;
import java.io.fileoutputstream;
import java.io.ioexception;
import java.io.inputstreamreader;
import java.net.malformedurlexception;
import java.net.url;
import java.util.regex.matcher;
import java.util.regex.pattern;

public class getcontentpicture {
public void gethtmlpicture(string httpurl) {
url url;
bufferedinputstream in;
fileoutputstream file;
try {
  system.out.println("取网络图片");
  string filename = httpurl.substring(httpurl.lastindexof("/"));
  string filepath = "./pic/";
  url = new url(httpurl);

  in = new bufferedinputstream(url.openstream());

  file = new fileoutputstream(new file(filepath+filename));
  int t;
  while ((t = in.read()) != -1) {
  file.write(t);
  }
  file.close();
  in.close();
  system.out.println("图片获取成功");
} catch (malformedurlexception e) {
  e.printstacktrace();
} catch (filenotfoundexception e) {
  e.printstacktrace();
} catch (ioexception e) {
  e.printstacktrace();
}
}

public string gethtmlcode(string httpurl) throws ioexception {
string content ="";
url uu = new url(httpurl); // 创建url类对象
bufferedreader ii = new bufferedreader(new inputstreamreader(uu
  .openstream())); // //使用openstream得到一输入流并由此构造一个bufferedreader对象
string input;
while ((input = ii.readline()) != null) { // 建立读取循环,并判断是否有读取值
  content += input;
}
ii.close();
return content;
}

public void get(string url) throws ioexception {

string searchimgreg = "(?x)(src|src|background|background)=('|\")/?(([\\w-]+/)*([\\w-]+\\.(jpg|jpg|png|png|gif|gif)))('|\")";
string searchimgreg2 = "(?x)(src|src|background|background)=('|\")(http://([\\w-]+\\.)+[\\w-]+(:[0-9]+)*(/[\\w-]+)*(/[\\w-]+\\.(jpg|jpg|png|png|gif|gif)))('|\")";

string content = this.gethtmlcode(url);
system.out.println(content);

pattern pattern = pattern.compile(searchimgreg);
matcher matcher = pattern.matcher(content);
while (matcher.find()) {
  system.out.println(matcher.group(3));
  this.gethtmlpicture(url+matcher.group(3));

}

pattern = pattern.compile(searchimgreg2);
matcher = pattern.matcher(content);
while (matcher.find()) {
  system.out.println(matcher.group(3));
  this.gethtmlpicture(matcher.group(3));

}
// searchimgreg =
// "(?x)(src|src|background|background)=('|\")/?(([\\w-]+/)*([\\w-]+\\.(jpg|jpg|png|png|gif|gif)))('|\")";
}
public static void main(string[] args) throws ioexception {
string url = "http://www.baidu.com/";
getcontentpicture gcp = new getcontentpicture();
gcp.get(url);
}
}

如有疑问请留言或者到本站社区交流讨论,感谢阅读,希望能帮助到大家,谢谢大家对本站的支持!