Java抓取网页动态发送到邮箱案例(springboot)
作为互联网公司,公司销售需要接项目,需要经常注意金采网动态,但是又常常错过。然后想着用python或者java抓取当天的招标项目,然后以邮件的形式发送到邮箱。
刚开始听到这个需求是一脸懵逼的,python,妈的刚装上软件,鬼知道怎么写,然后就在网上搜现成的前辈经验。可惜对爬虫
技术一窍不通短时间内内是搞不出来的。最后还是拿起老本行Java敲代码吧。
项目需求:抓取http://www.cfcpn.com/plist/caigou和http://www.cfcpn.com/plist/zhengji每天发布的公告,然后以邮件
的形式发送到指定的多个用户邮箱
邮箱效果图:蓝色字体含有超链接可直接访问网站中相关文章。
想要爬网页动态发送到邮箱有两个过程:
1、从网页获取想要的信息
2、把想要的信息发送到邮箱
下面这两篇文章给了完成功能的曙光。
如何java写/实现网络爬虫抓取网页
https://jingyan.baidu.com/album/2c8c281db5f6970009252a60.html?picindex=7
java发送邮件(qq邮箱)
https://jingyan.baidu.com/album/c910274bb41859cd361d2d03.html?picindex=4
下面开始上代码了,整个项目以springboot为框架进行编写的。
详细代码:
1.pom.xml 文件
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.feeling.listener</groupId>
<artifactId>catchdemo</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>
<properties>
<spring.version>4.3.9.RELEASE</spring.version>
<slf4j.version>1.7.12</slf4j.version>
<log4j.version>1.2.14</log4j.version>
<commons.logging.version>1.1.1</commons.logging.version>
<commons.pool.version>1.6</commons.pool.version>
<logback.logstash.version>4.9</logback.logstash.version>
</properties>
<!-- springboot必须的jar包 -->
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.4.5.RELEASE</version>
<relativePath /> <!-- lookup parent from repository -->
</parent>
<dependencies>
<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
<dependency>
<groupId>commons-httpclient</groupId>
<artifactId>commons-httpclient</artifactId>
<version>3.1</version>
</dependency>
<!--Javamail qq发送邮件的以阿里 -->
<dependency>
<groupId>javax.activation</groupId>
<artifactId>activation</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>javax.mail</groupId>
<artifactId>mail</artifactId>
<version>1.4</version>
</dependency>
<!-- JSOUP从⽂文件中加载网页 -->
<dependency>
<groupId>org.jsoup</groupId>
<artifactId>jsoup</artifactId>
<version>1.7.3</version>
</dependency>
<!-- springboot必须的jar包 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<!-- springboot web 加载的jar包 -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<!--在基础IOC功能上提供扩展服务,此外还提供许多企业级服务的支持,有邮件服务、任务调度、JNDI定位,EJB集成、远程访问、缓存以及多种视图层框架的支持。 -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
</dependency>
<!--Spring的核心工具包 -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
</dependency>
<!--Spring IOC的基础实现,包含访问配置文件、创建和管理bean等。 -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-beans</artifactId>
</dependency>
<!-- Spring context的扩展支持,用于MVC方面 -->
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context-support</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>commons-fileupload</groupId>
<artifactId>commons-fileupload</artifactId>
<version>1.3.1</version>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
</dependency>
</dependencies>
<build>
<plugins>
<!-- 自带jdk配置 -->
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<version>2.5.1</version>
<source>1.7</source>
<target>1.7</target>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-jar-plugin</artifactId>
<configuration>
<archive>
<addMavenDescriptor>false</addMavenDescriptor>
<manifest>
<addClasspath>true</addClasspath>
<classpathPrefix>lib/</classpathPrefix>
<mainClass>com.feeling.mail.CatchDemoApplication</mainClass>
</manifest>
</archive>
</configuration>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-dependency-plugin</artifactId>
<executions>
<execution>
<id>copy</id>
<phase>package</phase>
<goals>
<goal>copy-dependencies</goal>
</goals>
<configuration>
<outputDirectory>${project.build.directory}/lib</outputDirectory>
</configuration>
</execution>
</executions>
</plugin>
</plugins>
<!-- 如果不使用resource插件的话,默认情况下,打包jar包不会把webapp下的东西打包进来 ,参考http://blog.csdn.net/u012849872/article/details/51035938 -->
<resources>
<!-- 打包时将jsp文件拷贝到META-INF目录下 -->
<resource>
<!-- 指定resources插件处理哪个目录下的资源文件 -->
<directory>src/main/webapp</directory>
<!--将项目中的src/main/webapp目录下的内容打包到了META-INF/resources路径下 -->
<targetPath>META-INF/resources</targetPath>
<includes>
<include>**/**</include>
</includes>
</resource>
<resource>
<directory>src/main/resources</directory>
<includes>
<include>**/**</include>
</includes>
<filtering>false</filtering>
</resource>
</resources>
</build>
</project>
2.params.properties本文件需要参数需要自己填写
email.host=smtp.qq.com//固定
email.sendMail= //发送邮箱
email.sendPassword= //第二篇文章开启qq邮箱服务的生成的字符串,本项目中不不要qq密码和邮箱真正的密码
aaa@qq.com,aaa@qq.com//多个收件人以逗号间隔3.
3.CatchDemoApplication.java springboot启动类
本项目为配置数据库,所以必须注解,不然启动时会报错
@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})
package com.feeling.mail;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
import org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration;
import org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.scheduling.annotation.EnableScheduling;
@EnableScheduling//定时器
@ComponentScan(basePackages={"com.feeling.mail.*"})
@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})
public class CatchDemoApplication {
public static void main(String[] args) {
SpringApplication.run(CatchDemoApplication.class, args);
}
}
4.CatchSchdule.java定时器
package com.feeling.mail.batch;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import com.feeling.mail.controller.CatchNews;
@Component
public class CatchSchdule {
@Autowired
private CatchNews catchnews;
// @Scheduled(cron = "0 23 15 * * ?")
@Scheduled(cron = "0 0/3 * * * ?")
public void CatchNewToSendMail() throws Exception {
//1、采购公告
catchnews.catchNews("http://www.cfcpn.com/plist/caigou","采购公告","D:\\caigou.html");
//2、寻源/征集公告
catchnews.catchNews("http://www.cfcpn.com/plist/zhengji","寻源/征集公告","D:\\zhengji.html");
}
}
5.CatchNews.java 抓取网页当天发布的项目信息,这步很重要,如果你自己要爬你的网页动态这部分需要改代码了。
代码中有时间的判断,当天发布的的才拼装。
package com.feeling.mail.controller;
import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Scanner;
import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;
/**
*
* @author liangwenbo catchNews 方法被定时器掉用抓捕新闻然后进行发送到email
*/
@Component
public class CatchNews {
@Autowired
private SendMail sendmail;
public void catchNews(String url, String title, String fileName) throws Exception {
// 1、采取金采网采购公告主页面内容读出为html文件
// 两个参数,第一个参数是要获取的网页地址,第二个参数是文件的名称及路径
getHtml(url, fileName);
// 2、获取每条动态<a></a>标签的href和主题
File input = new File(fileName);
// 成功解析,后面的网址可以不填写
Document doc = Jsoup.parse(input, "UTF-8", url);
Date now = new Date();
String nowDate = dateToString(now);
String context = "";
// 邮件内容,标题,时间,网址,内容简要说明,
Elements contents = doc.select(".cfcpn_list_content");
for (Element content : contents) {
// 遍历所有的动态
// 获取时间
String dateDate = content.select(".cfcpn_list_date").first().text();
String date = dateDate.substring(5, 15);
//当为当天的数据时
if (nowDate.equals(date)) {
context = context + content.toString()+"<br /><br />";
}
}
System.out.println(context);
// 4、把标题内容当做邮件的标题内容发送到指定邮箱
sendmail.sendMail(title+nowDate, context.replace("/front/newcontext", "http://www.cfcpn.com/front/newcontext"));
// 找到地址去数据库查询,如果查不到相同的url,就进行发送邮件并存储在数据库,如果查到相同的,则不在数据库做任何操作
// 查找更精确的
}
private static void getHtml(String HttpUrl, String filename) throws ClientProtocolException, IOException {
CloseableHttpClient httpClient = HttpClients.createDefault();
HttpGet httpGet = new HttpGet(HttpUrl);
CloseableHttpResponse response = httpClient.execute(httpGet);
HttpEntity entity = response.getEntity();
// 2、通过Entity获取到InputStream对象,然后对返回内容进⾏行处理
InputStream is = entity.getContent();
Scanner sc = new Scanner(is);
PrintWriter os = new PrintWriter(filename);
while (sc.hasNext()) {
os.write(sc.nextLine());
}
os.close();
sc.close();
}
public String dateToString(Date date){
SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
return sdf.format(date);
}
}
6.SendMail.java 发送邮件
package com.feeling.mail.controller;
import java.util.Date;
import java.util.Properties;
import javax.mail.Message;
import javax.mail.Message.RecipientType;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;
/**
* java发送邮件发送
*/
@Component
@PropertySource(value = "classpath:params.properties")
public class SendMail {
@Value("${email.host}")
private String host;
@Value("${email.sendMail}")
private String sendMail;
@Value("${email.sendPassword}")
private String sendPassword;
@Value("${email.receiveMail}")
private String receiveMail;
public void sendMail(String title, String context) throws Exception {
// 邮件的参数设置
Properties props = new Properties();
props.setProperty("mail.transport.protocol", "smtp");
props.setProperty("mail.smtp.host", host);
props.setProperty("mail.smtp.auth", "true");
props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
props.setProperty("mail.smtp.port", "465");
props.setProperty("mail.smtp.socketFactory.port", "465");
// 根据配置创建会话对象,用于邮件和服务器交互
Session session = Session.getDefaultInstance(props);
session.setDebug(true);// 设置为debug模式,可以查看详细的发送日志
// 创建一封邮件
Message message = createMineMessage(session,title,context);
// 根据Session获取邮件的传输对象
Transport transport = session.getTransport();
// 使用邮箱账号 和密码连接服务器,这里认证的邮箱必须和Message中发件人邮箱一致,否则会报错
transport.connect(sendMail, sendPassword);
// 发送邮件
transport.sendMessage(message, message.getAllRecipients());
// 关闭连接
transport.close();
}
private Message createMineMessage(Session session, String title, String context) throws Exception {
Message message = new MimeMessage(session);
// 设置昵称
String nick=javax.mail.internet.MimeUtility.encodeText("金采网");
// 发送地址
message.setFrom(new InternetAddress(nick+"<"+sendMail+">"));
// 接收地址//可以写多个发送人
// 多个收件人
message.setRecipients(RecipientType.TO, InternetAddress.parse(receiveMail));
// 设置邮件标题
message.setSubject(title);
// 设置邮件内容,以html的方式发送
message.setContent(context,"text/html;charset=utf-8");
message.setSentDate(new Date());
// 保存设置
message.saveChanges();
return message;
}
}
下面是怎么打包发布部署项目到linux服务器:
pom.xml需要指定启动类
然后参考:https://www.cnblogs.com/larryzeal/p/6253356.html
http://www.linuxidc.com/Linux/2013-06/86588.htm
打包项目
进入到该项目的工作空间的项目所在路径
我的是E:\eclipse_workspace\apollo\catchdemo\target
建立个demo的文件夹把
蓝色的放进去。
放入linux数据库,cd 进入该目录。输入
java -jar catchdemo-0.0.1-SNAPSHOT.jar
就自己启动了。
上一篇: 数据库主从配置入门级(Mysql5.6)
下一篇: Mysql数据库异地备份