Spring boot开发从Excel表格导入大数据量(十万级/百万级)数据到数据库(MySQL)的工具
程序员文章站
2024-03-22 12:17:16
...
-
首先创建一个项目,我这里项目名称为:utils-self 。可使用Spring Assistant 创建,如下图,选择 web、jdbc、mysql三个模块的依赖,点击next。
-
设置配置文件 application.properties,如下:
server.port=8076
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/library_basic_v0431?serverTimezone=UTC&useUnicode=true&characterEncoding=utf8
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.username=root
spring.datasource.password=root
- 引入相应的依赖,我这里要用到的依赖如下:pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.2.1.RELEASE</version>
<relativePath/> <!-- lookup parent from repository -->
</parent>
<groupId>com.bai</groupId>
<artifactId>utils-self</artifactId>
<version>0.0.1-SNAPSHOT</version>
<name>utils-self</name>
<description>utils for work</description>
<properties>
<java.version>1.8</java.version>
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jdbc</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
<exclusions>
<exclusion>
<groupId>org.junit.vintage</groupId>
<artifactId>junit-vintage-engine</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi</artifactId>
<version>RELEASE</version>
</dependency>
<dependency>
<groupId>org.apache.poi</groupId>
<artifactId>poi-ooxml</artifactId>
<version>RELEASE</version>
</dependency>
<!-- https://mvnrepository.com/artifact/xerces/xercesImpl -->
<dependency>
<groupId>xerces</groupId>
<artifactId>xercesImpl</artifactId>
<version>2.12.0</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
</project>
- 搭建好项目结构,建立对应工具类、实体类、接口及实现类,如图:
- 如上图顺序,依次贴上相应代码
AreaCodeIdRepository.java
package com.bai.excel2db.dao;
import com.bai.excel2db.model.AreaCodeId;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;
@Repository
public interface AreaCodeIdRepository extends JpaRepository<AreaCodeId,Integer> {
}
ExcelUtil.java
package com.bai.excel2db.excel;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;
import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;
public class ExcelUtil extends DefaultHandler {
private SharedStringsTable sst;
private String lastContents;
private boolean nextIsString;
private int sheetIndex = -1;
private List<String> rowlist = new ArrayList<String>();
private List<List<String>> excelList=new ArrayList<List<String>>(); //存放整个sheet中的行 rowList
private int curRow = 0;
private int curCol = 0;
public List<List<String>> getExcelList() {
return excelList;
}
public void setExcelList(List<List<String>> excelList) {
this.excelList = excelList;
}
public static void main(String[] args)throws Exception{
String filepath = "D:/home/smart.xls";
System.out.println("开始");
ExcelUtil excelUtil = new ExcelUtil();
excelUtil.readExcel(filepath,3);
}
/**
* 读取Excel表格(支持Excel2003(.xls)和Excel2007(.xlsx)两种格式)
* @param filepath
* @return 表格List集合
*/
public List<List<String>> readExcel(String filepath){
return readExcel(filepath,1);
}
public List<List<String>> readExcel(String filepath,int sheetId){
List<List<String>> resultList = null;
try {
// readAllSheets(filepath);
readOneSheet(filepath,sheetId);
resultList = getExcelList();
} catch (Exception e) {
e.printStackTrace();
}
// System.out.println(resultList);
return resultList;
}
/**
* 读取单个工作簿的入口方法,带上要读取sheet的索引,不带则默认读第一个sheet:1
* @param path
*/
public void readOneSheet(String path) throws Exception {
readOneSheet(path,1);
}
public void readOneSheet(String path,int sheetId) throws Exception {
String sheetIndexOne = "rId"+sheetId;
OPCPackage pkg = OPCPackage.open(path);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
InputStream sheet = r.getSheet(sheetIndexOne);
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
/**
* 读取所有工作簿的入口方法,依次遍历
* @param path
* @throws Exception
*/
public void readAllSheets(String path) throws Exception {
OPCPackage pkg = OPCPackage.open(path);
XSSFReader r = new XSSFReader(pkg);
SharedStringsTable sst = r.getSharedStringsTable();
XMLReader parser = fetchSheetParser(sst);
Iterator<InputStream> sheets = r.getSheetsData();
while (sheets.hasNext()) {
curRow = 0;
sheetIndex++;
InputStream sheet = sheets.next();
InputSource sheetSource = new InputSource(sheet);
parser.parse(sheetSource);
sheet.close();
}
}
/**
* 该方法自动被调用,每读一行调用一次,在方法中写自己的业务逻辑即可
* @param sheetIndex 工作簿序号
* @param curRow 处理到第几行
* @param rowList 当前数据行的数据集合
*/
public void optRow(int sheetIndex, int curRow, List<String> rowList) {
if (curRow<=0)return; //第0行时,标题行不存入
List<String> onerowList=new ArrayList<String>();
for(String str : rowList) {
onerowList.add(str);
}
excelList.add(onerowList);
// System.out.println(onerowList);
}
public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
XMLReader parser = XMLReaderFactory
.createXMLReader("org.apache.xerces.parsers.SAXParser");
this.sst = sst;
parser.setContentHandler(this);
return parser;
}
public void startElement(String uri, String localName, String name,
Attributes attributes) throws SAXException {
// c => 单元格
if (name.equals("c")) {
// 如果下一个元素是 SST 的索引,则将nextIsString标记为true
String cellType = attributes.getValue("t");
if (cellType != null && cellType.equals("s")) {
nextIsString = true;
} else {
nextIsString = false;
}
}
// 置空
lastContents = "";
}
public void endElement(String uri, String localName, String name)
throws SAXException {
// 根据SST的索引值的到单元格的真正要存储的字符串
// 这时characters()方法可能会被调用多次
if (nextIsString) {
try {
int idx = Integer.parseInt(lastContents);
lastContents = new XSSFRichTextString(sst.getEntryAt(idx))
.toString();
} catch (Exception e) {
}
}
// v => 单元格的值,如果单元格是字符串则v标签的值为该字符串在SST中的索引
// 将单元格内容加入rowlist中,在这之前先去掉字符串前后的空白符
if (name.equals("v")) {
String value = lastContents.trim();
value = value.equals("") ? " " : value;
rowlist.add(curCol, value);
curCol++;
} else {
// 如果标签名称为 row ,这说明已到行尾,调用 optRows() 方法
if (name.equals("row")) {
// System.out.println("row----"+curRow);
optRow(sheetIndex, curRow, rowlist);
rowlist.clear();
curRow++;
curCol = 0;
}
}
}
public void characters(char[] ch, int start, int length)
throws SAXException {
// 得到单元格内容的值
lastContents += new String(ch, start, length);
}
}
分页工具类:PagingList.java
package com.bai.excel2db.excel;
import java.util.List;
import java.util.stream.Collectors;
public class PagingList<T> {
/**
* 总页数
*/
private int totalPage = 0;
/**
* 当前是第几页
*/
private int curPageNo = 0;
/**
* 每页的大小
*/
private int pageSize = 0;
/**
* 每页默认大小
*/
private static final int DEFAULT_PAGE_SIZE = 500;
private List<T> pageData = null;
public PagingList(List<T> pageResult, int pageSize) {
this.pageSize = pageSize;
this.pageData = pageResult;
init(pageResult, pageSize);
}
public PagingList(List<T> pageResult) {
this(pageResult, DEFAULT_PAGE_SIZE);
}
private void init(List<T> pageResult, int pageSize) {
if (pageSize <= 0) {
throw new IllegalArgumentException("Paging size must be greater than zero.");
}
if (null == pageResult) {
throw new NullPointerException("Paging resource list must be not null.");
}
if (pageResult.size() % pageSize > 0) {
this.totalPage = (pageResult.size() / pageSize) + 1;
} else {
this.totalPage = pageResult.size() / pageSize;
}
}
/**
* 返回当前剩余页数
*
* @return
*/
private int getSurplusPage() {
if (pageData.size() % pageSize > 0) {
return (pageData.size() / pageSize) + 1;
} else {
return pageData.size() / pageSize;
}
}
/**
* 返回是否还有下一页数据
*
* @return
*/
public boolean hasNext() {
return pageData.size() > 0;
}
/**
* 获取分页后,总的页数
*
* @return
*/
public int getTotalPage() {
return totalPage;
}
public List<T> next() {
List<T> pagingData = pageData.stream().limit(pageSize).collect(Collectors.toList());
pageData = pageData.stream().skip(pageSize).collect(Collectors.toList());
return pagingData;
}
/**
* 返回当前页数
*
* @return
*/
public int getCurPageNo() {
return totalPage - getSurplusPage();
}
}
实体类:AreaCodeId .java
package com.bai.excel2db.model;
import javax.persistence.*;
@Entity
public class AreaCodeId implements java.io.Serializable {
private static final long seriaVersionUID = 1L;
@Id
@GeneratedValue(strategy = GenerationType.IDENTITY)
private Integer id;
private String userId;
// @Column(name = "box_id")
private String boxId;
public AreaCodeId() {
}
public AreaCodeId(Integer id, String userId, String boxId) {
this.id = id;
this.userId = userId;
this.boxId = boxId;
}
public Integer getId() {
return id;
}
public void setId(Integer id) {
this.id = id;
}
public String getUserId() {
return userId;
}
public void setUserId(String userId) {
this.userId = userId;
}
public String getBoxId() {
return boxId;
}
public void setBoxId(String boxId) {
this.boxId = boxId;
}
}
业务层接口:AreaCodeIdService .java
package com.bai.excel2db.service;
import com.bai.excel2db.model.AreaCodeId;
import java.util.List;
public interface AreaCodeIdService {
void save(AreaCodeId areaCodeId);
void saveAll(List<AreaCodeId> areaCodeIdList);
}
业务层实现类:AreaCodeIdServiceImpl
package com.bai.excel2db.service.impl;
import com.bai.excel2db.dao.AreaCodeIdRepository;
import com.bai.excel2db.model.AreaCodeId;
import com.bai.excel2db.service.AreaCodeIdService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;
import java.util.List;
@Service
@Transactional
public class AreaCodeIdServiceImpl implements AreaCodeIdService {
@Autowired
private AreaCodeIdRepository areaCodeIdRepository;
@Override
public void save(AreaCodeId areaCodeId) {
if(areaCodeId==null)return;
boolean exit=areaCodeIdRepository.existsById(areaCodeId.getId());
if(!exit){
areaCodeIdRepository.save(areaCodeId);
}
}
@Override
public void saveAll(List<AreaCodeId> areaCodeIdList) {
if(areaCodeIdList==null||areaCodeIdList.size()==0)return;
areaCodeIdRepository.saveAll(areaCodeIdList);
}
}
入口类:UtilsSelfApplication.java
package com.bai;
import com.bai.excel2db.excel.ExcelUtil;
import com.bai.excel2db.excel.PagingList;
import com.bai.excel2db.model.AreaCodeId;
import com.bai.excel2db.service.AreaCodeIdService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;
import java.util.ArrayList;
import java.util.List;
@SpringBootApplication
public class UtilsSelfApplication {
public static void main(String[] args) {
SpringApplication.run(UtilsSelfApplication.class, args);
}
}
@Component
@Order(value = 1)
class Import implements CommandLineRunner{
@Autowired
AreaCodeIdService areaCodeIdService;
@Override
public void run(String... args) throws Exception {
String filepath = "D:/home/smart.xlsx";
ExcelUtil excelUtil = new ExcelUtil();
List<List<String>> result = excelUtil.readExcel(filepath,1);
// System.out.println("========="+result);
final List<AreaCodeId> finalList=new ArrayList<>();
result.forEach(list->{
AreaCodeId areaCodeId=new AreaCodeId();
areaCodeId.setUserId(list.get(0));
areaCodeId.setBoxId(list.get(1));
finalList.add(areaCodeId);
});
// System.out.println(finalList);
PagingList<AreaCodeId> paging = new PagingList(finalList, 1000);
while (paging.hasNext()) {
areaCodeIdService.saveAll(paging.next());
}
System.exit(0);
}
}
-
简单描述下:主要功能在 ExcelUtil.java 中,包括读取Excel表格(支持xls 和 xlsx 两种文件),存储到List集合中,调用写入数据库方法。
写入数据库方面,不了解的可以参考我的上篇文章:spring boot+spring-data-jpa操作数据库(MySQL)
我这里在入口类 UtilsSelfApplication.java 中添加了一个非 public 类:Import ,实现了 CommandLineRunner 接口(加上两个注解 @Component @Order(value = 1) ),重写run方法,可以在项目启动时就运行。具体逻辑大家根据需要修改。 -
结果: 我最后运行项目时,一个有13万条数据的Excel表格,有两个字段,插入数据库只用了2分钟时间。