欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Java抓取网页动态发送到邮箱案例(springboot)

程序员文章站 2024-03-20 21:45:04
...

 作为互联网公司,公司销售需要接项目,需要经常注意金采网动态,但是又常常错过。然后想着用python或者java抓取当天的招标项目,然后以邮件的形式发送到邮箱。

刚开始听到这个需求是一脸懵逼的,python,妈的刚装上软件,鬼知道怎么写,然后就在网上搜现成的前辈经验。可惜对爬虫

术一窍不通短时间内内是搞不出来的。最后还是拿起老本行Java敲代码吧。

项目需求:抓取http://www.cfcpn.com/plist/caigou和http://www.cfcpn.com/plist/zhengji每天发布的公告,然后以邮件

的形式发送到指定的多个用户邮箱

Java抓取网页动态发送到邮箱案例(springboot)

邮箱效果图:蓝色字体含有超链接可直接访问网站中相关文章。

Java抓取网页动态发送到邮箱案例(springboot)

想要爬网页动态发送到邮箱有两个过程:

1、从网页获取想要的信息

2、把想要的信息发送到邮箱

下面这两篇文章给了完成功能的曙光。

如何java写/实现网络爬虫抓取网页
https://jingyan.baidu.com/album/2c8c281db5f6970009252a60.html?picindex=7

java发送邮件(qq邮箱)
https://jingyan.baidu.com/album/c910274bb41859cd361d2d03.html?picindex=4

下面开始上代码了,整个项目以springboot为框架进行编写的。

Java抓取网页动态发送到邮箱案例(springboot)

详细代码:

1.pom.xml 文件

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<groupId>com.feeling.listener</groupId>
	<artifactId>catchdemo</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<packaging>jar</packaging>
	<properties>
		<spring.version>4.3.9.RELEASE</spring.version>
		<slf4j.version>1.7.12</slf4j.version>
		<log4j.version>1.2.14</log4j.version>
		<commons.logging.version>1.1.1</commons.logging.version>
		<commons.pool.version>1.6</commons.pool.version>
		<logback.logstash.version>4.9</logback.logstash.version>
	</properties>

	<!-- springboot必须的jar包 -->
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>1.4.5.RELEASE</version>
		<relativePath /> <!-- lookup parent from repository -->
	</parent>


	<dependencies>
		
		<!-- https://mvnrepository.com/artifact/commons-httpclient/commons-httpclient -->
		<dependency>
			<groupId>commons-httpclient</groupId>
			<artifactId>commons-httpclient</artifactId>
			<version>3.1</version>
		</dependency>
		<!--Javamail qq发送邮件的以阿里 -->
		<dependency>
			<groupId>javax.activation</groupId>
			<artifactId>activation</artifactId>
			<version>1.1</version>
		</dependency>
		<dependency>
			<groupId>javax.mail</groupId>
			<artifactId>mail</artifactId>
			<version>1.4</version>
		</dependency>
		<!-- JSOUP从⽂文件中加载网页 -->
		<dependency>
			<groupId>org.jsoup</groupId>
			<artifactId>jsoup</artifactId>
			<version>1.7.3</version>
		</dependency>

		<!-- springboot必须的jar包 -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
		</dependency>
		
		<!-- springboot web 加载的jar包 -->
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>
		
		
		<!--在基础IOC功能上提供扩展服务,此外还提供许多企业级服务的支持,有邮件服务、任务调度、JNDI定位,EJB集成、远程访问、缓存以及多种视图层框架的支持。 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context</artifactId>
		</dependency>
		<!--Spring的核心工具包 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-core</artifactId>
		</dependency>
		<!--Spring IOC的基础实现,包含访问配置文件、创建和管理bean等。 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-beans</artifactId>
		</dependency>
		
		<!-- Spring context的扩展支持,用于MVC方面 -->
		<dependency>
			<groupId>org.springframework</groupId>
			<artifactId>spring-context-support</artifactId>
		</dependency>
		
		
		
		
		<dependency>
			<groupId>org.slf4j</groupId>
			<artifactId>slf4j-api</artifactId>
		</dependency>
		
		
		<dependency>
			<groupId>commons-fileupload</groupId>
			<artifactId>commons-fileupload</artifactId>
			<version>1.3.1</version>
		</dependency>
		<dependency>
			<groupId>org.apache.httpcomponents</groupId>
			<artifactId>httpclient</artifactId>
		</dependency>
		
		
		
		
		
		
	</dependencies>
	<build>
		<plugins>
			<!-- 自带jdk配置 -->
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-compiler-plugin</artifactId>
				<configuration>
					<version>2.5.1</version>
					<source>1.7</source>
					<target>1.7</target>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-jar-plugin</artifactId>
				<configuration>
					<archive>
						<addMavenDescriptor>false</addMavenDescriptor>
						<manifest>
							<addClasspath>true</addClasspath>
							<classpathPrefix>lib/</classpathPrefix>
							<mainClass>com.feeling.mail.CatchDemoApplication</mainClass>
						</manifest>
					</archive>
				</configuration>
			</plugin>
			<plugin>
				<groupId>org.apache.maven.plugins</groupId>
				<artifactId>maven-dependency-plugin</artifactId>
				<executions>
					<execution>
						<id>copy</id>
						<phase>package</phase>
						<goals>
							<goal>copy-dependencies</goal>
						</goals>
						<configuration>
							<outputDirectory>${project.build.directory}/lib</outputDirectory>
						</configuration>
					</execution>
				</executions>
			</plugin>
		</plugins>
		<!-- 如果不使用resource插件的话,默认情况下,打包jar包不会把webapp下的东西打包进来 ,参考http://blog.csdn.net/u012849872/article/details/51035938 -->
		<resources>
			<!-- 打包时将jsp文件拷贝到META-INF目录下 -->
			<resource>
				<!-- 指定resources插件处理哪个目录下的资源文件 -->
				<directory>src/main/webapp</directory>
				<!--将项目中的src/main/webapp目录下的内容打包到了META-INF/resources路径下 -->
				<targetPath>META-INF/resources</targetPath>
				<includes>
					<include>**/**</include>
				</includes>
			</resource>
			<resource>
				<directory>src/main/resources</directory>
				<includes>
					<include>**/**</include>
				</includes>
				<filtering>false</filtering>
			</resource>
		</resources>


	</build>

</project>
2.params.properties本文件需要参数需要自己填写

email.host=smtp.qq.com//固定
email.sendMail= //发送邮箱
email.sendPassword=    //第二篇文章开启qq邮箱服务的生成的字符串,本项目中不不要qq密码和邮箱真正的密码
aaa@qq.com,aaa@qq.com//多个收件人以逗号间隔3.
3.CatchDemoApplication.java springboot启动类

本项目为配置数据库,所以必须注解,不然启动时会报错

@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})

package com.feeling.mail;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration;
import org.springframework.boot.autoconfigure.jdbc.DataSourceTransactionManagerAutoConfiguration;
import org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration;
import org.springframework.context.annotation.ComponentScan;
import org.springframework.scheduling.annotation.EnableScheduling;


@EnableScheduling//定时器
@ComponentScan(basePackages={"com.feeling.mail.*"})
@SpringBootApplication(exclude = {DataSourceAutoConfiguration.class,DataSourceTransactionManagerAutoConfiguration.class,HibernateJpaAutoConfiguration.class})
public class CatchDemoApplication {
	
	public static void main(String[] args) {
		SpringApplication.run(CatchDemoApplication.class, args);
	}
	
	

}

4.CatchSchdule.java定时器

package com.feeling.mail.batch;

import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.scheduling.annotation.Scheduled;
import org.springframework.stereotype.Component;
import com.feeling.mail.controller.CatchNews;

@Component
public class CatchSchdule {
	
	
	@Autowired
	private CatchNews catchnews;
	
	
//	@Scheduled(cron = "0 23 15 * * ?") 
	@Scheduled(cron = "0 0/3 * * * ?") 
	public void CatchNewToSendMail() throws Exception {
		//1、采购公告
		catchnews.catchNews("http://www.cfcpn.com/plist/caigou","采购公告","D:\\caigou.html");
		
			
		//2、寻源/征集公告
		catchnews.catchNews("http://www.cfcpn.com/plist/zhengji","寻源/征集公告","D:\\zhengji.html");
		
	}
	
	

}


5.CatchNews.java 抓取网页当天发布的项目信息,这步很重要,如果你自己要爬你的网页动态这部分需要改代码了。

代码中有时间的判断,当天发布的的才拼装。

package com.feeling.mail.controller;

import java.io.File;
import java.io.IOException;
import java.io.InputStream;
import java.io.PrintWriter;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.Scanner;

import org.apache.http.HttpEntity;
import org.apache.http.client.ClientProtocolException;
import org.apache.http.client.methods.CloseableHttpResponse;
import org.apache.http.client.methods.HttpGet;
import org.apache.http.impl.client.CloseableHttpClient;
import org.apache.http.impl.client.HttpClients;
import org.jsoup.Jsoup;
import org.jsoup.nodes.Document;
import org.jsoup.nodes.Element;
import org.jsoup.select.Elements;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;


/**
 * 
 * @author liangwenbo catchNews 方法被定时器掉用抓捕新闻然后进行发送到email
 */
@Component
public class CatchNews {

	
	@Autowired
	private SendMail sendmail;

	
	
	public void catchNews(String url, String title, String fileName) throws Exception {
		// 1、采取金采网采购公告主页面内容读出为html文件
		// 两个参数,第一个参数是要获取的网页地址,第二个参数是文件的名称及路径
		getHtml(url, fileName);
		// 2、获取每条动态<a></a>标签的href和主题
		File input = new File(fileName);
		// 成功解析,后面的网址可以不填写
		Document doc = Jsoup.parse(input, "UTF-8", url);
		Date now = new Date();
		String nowDate = dateToString(now);
		String context = "";
		
		// 邮件内容,标题,时间,网址,内容简要说明,
		Elements contents = doc.select(".cfcpn_list_content");

		for (Element content : contents) {
			
			
			// 遍历所有的动态
			// 获取时间
			String dateDate = content.select(".cfcpn_list_date").first().text();
			String date = dateDate.substring(5, 15);
			//当为当天的数据时
			if (nowDate.equals(date)) {
				context = context +  content.toString()+"<br /><br />";
			
			}

		}
		System.out.println(context);

		
		// 4、把标题内容当做邮件的标题内容发送到指定邮箱
		
		sendmail.sendMail(title+nowDate, context.replace("/front/newcontext", "http://www.cfcpn.com/front/newcontext"));
	
		// 找到地址去数据库查询,如果查不到相同的url,就进行发送邮件并存储在数据库,如果查到相同的,则不在数据库做任何操作
		// 查找更精确的

	}

	private static void getHtml(String HttpUrl, String filename) throws ClientProtocolException, IOException {

		CloseableHttpClient httpClient = HttpClients.createDefault();
		HttpGet httpGet = new HttpGet(HttpUrl);
		CloseableHttpResponse response = httpClient.execute(httpGet);
		HttpEntity entity = response.getEntity();
		// 2、通过Entity获取到InputStream对象,然后对返回内容进⾏行处理
		InputStream is = entity.getContent();
		Scanner sc = new Scanner(is);
		PrintWriter os = new PrintWriter(filename);
		while (sc.hasNext()) {
			os.write(sc.nextLine());
		}
		os.close();
		sc.close();

	}
	
	public  String dateToString(Date date){
		SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd");
		return sdf.format(date);
	}

}
6.SendMail.java 发送邮件

package com.feeling.mail.controller;

import java.util.Date;
import java.util.Properties;

import javax.mail.Message;
import javax.mail.Message.RecipientType;
import javax.mail.Session;
import javax.mail.Transport;
import javax.mail.internet.InternetAddress;
import javax.mail.internet.MimeMessage;

import org.springframework.beans.factory.annotation.Value;
import org.springframework.context.annotation.PropertySource;
import org.springframework.stereotype.Component;


/**
 * java发送邮件发送
 */
@Component
@PropertySource(value = "classpath:params.properties")
public class SendMail {

	@Value("${email.host}")
	private String host;
	@Value("${email.sendMail}")
	private String sendMail;
	@Value("${email.sendPassword}")
	private String sendPassword;
	@Value("${email.receiveMail}")
	private String receiveMail;
	
	
	

	public void sendMail(String title, String context) throws Exception {
		// 邮件的参数设置
		Properties props = new Properties();
		props.setProperty("mail.transport.protocol", "smtp");
		props.setProperty("mail.smtp.host", host);
		props.setProperty("mail.smtp.auth", "true");
		props.setProperty("mail.smtp.socketFactory.class", "javax.net.ssl.SSLSocketFactory");
		props.setProperty("mail.smtp.port", "465");
		props.setProperty("mail.smtp.socketFactory.port", "465");

		// 根据配置创建会话对象,用于邮件和服务器交互
		Session session = Session.getDefaultInstance(props);
		session.setDebug(true);// 设置为debug模式,可以查看详细的发送日志

		
		
		// 创建一封邮件
		Message message = createMineMessage(session,title,context);
		// 根据Session获取邮件的传输对象
		Transport transport = session.getTransport();
		// 使用邮箱账号 和密码连接服务器,这里认证的邮箱必须和Message中发件人邮箱一致,否则会报错
		transport.connect(sendMail, sendPassword);
		// 发送邮件
		transport.sendMessage(message, message.getAllRecipients());
		// 关闭连接
		transport.close();

		

	}

	private Message createMineMessage(Session session,  String title, String context) throws Exception {

		Message message = new MimeMessage(session);
		
		// 设置昵称
		String nick=javax.mail.internet.MimeUtility.encodeText("金采网");
				
		
		// 发送地址
		message.setFrom(new InternetAddress(nick+"<"+sendMail+">"));
		// 接收地址//可以写多个发送人
		 // 多个收件人  
		message.setRecipients(RecipientType.TO, InternetAddress.parse(receiveMail));
		// 设置邮件标题
		message.setSubject(title);
		// 设置邮件内容,以html的方式发送
		message.setContent(context,"text/html;charset=utf-8");


		message.setSentDate(new Date());
		// 保存设置
		message.saveChanges();
		return message;
	}

}

下面是怎么打包发布部署项目到linux服务器:

pom.xml需要指定启动类

Java抓取网页动态发送到邮箱案例(springboot)


然后参考:https://www.cnblogs.com/larryzeal/p/6253356.html

http://www.linuxidc.com/Linux/2013-06/86588.htm

打包项目

进入到该项目的工作空间的项目所在路径

我的是E:\eclipse_workspace\apollo\catchdemo\target

Java抓取网页动态发送到邮箱案例(springboot)

建立个demo的文件夹把

Java抓取网页动态发送到邮箱案例(springboot)

蓝色的放进去。

放入linux数据库,cd 进入该目录。输入

java -jar catchdemo-0.0.1-SNAPSHOT.jar   

就自己启动了。