欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Spring boot开发从Excel表格导入大数据量(十万级/百万级)数据到数据库(MySQL)的工具

程序员文章站 2024-03-22 12:17:16
...
  1. 首先创建一个项目,我这里项目名称为:utils-self 。可使用Spring Assistant 创建,如下图,选择 web、jdbc、mysql三个模块的依赖,点击next。
    Spring boot开发从Excel表格导入大数据量(十万级/百万级)数据到数据库(MySQL)的工具

  2. 设置配置文件 application.properties,如下:

server.port=8076
spring.datasource.url=jdbc:mysql://127.0.0.1:3306/library_basic_v0431?serverTimezone=UTC&useUnicode=true&characterEncoding=utf8
spring.datasource.driver-class-name=com.mysql.cj.jdbc.Driver
spring.datasource.username=root
spring.datasource.password=root
  1. 引入相应的依赖,我这里要用到的依赖如下:pom.xml
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
	xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 https://maven.apache.org/xsd/maven-4.0.0.xsd">
	<modelVersion>4.0.0</modelVersion>
	<parent>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-parent</artifactId>
		<version>2.2.1.RELEASE</version>
		<relativePath/> <!-- lookup parent from repository -->
	</parent>
	<groupId>com.bai</groupId>
	<artifactId>utils-self</artifactId>
	<version>0.0.1-SNAPSHOT</version>
	<name>utils-self</name>
	<description>utils for work</description>

	<properties>
		<java.version>1.8</java.version>
	</properties>

	<dependencies>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-jdbc</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-web</artifactId>
		</dependency>

		<dependency>
			<groupId>mysql</groupId>
			<artifactId>mysql-connector-java</artifactId>
			<scope>runtime</scope>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-data-jpa</artifactId>
		</dependency>
		<dependency>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-starter-test</artifactId>
			<scope>test</scope>
			<exclusions>
				<exclusion>
					<groupId>org.junit.vintage</groupId>
					<artifactId>junit-vintage-engine</artifactId>
				</exclusion>
			</exclusions>
		</dependency>

		<dependency>
			<groupId>org.apache.poi</groupId>
			<artifactId>poi</artifactId>
			<version>RELEASE</version>
		</dependency>
		<dependency>
			<groupId>org.apache.poi</groupId>
			<artifactId>poi-ooxml</artifactId>
			<version>RELEASE</version>
		</dependency>
		<!-- https://mvnrepository.com/artifact/xerces/xercesImpl -->
		<dependency>
			<groupId>xerces</groupId>
			<artifactId>xercesImpl</artifactId>
			<version>2.12.0</version>
		</dependency>
	</dependencies>

	<build>
		<plugins>
			<plugin>
				<groupId>org.springframework.boot</groupId>
				<artifactId>spring-boot-maven-plugin</artifactId>
			</plugin>
		</plugins>
	</build>

</project>


  1. 搭建好项目结构,建立对应工具类、实体类、接口及实现类,如图:
    Spring boot开发从Excel表格导入大数据量(十万级/百万级)数据到数据库(MySQL)的工具
  2. 如上图顺序,依次贴上相应代码
    AreaCodeIdRepository.java
package com.bai.excel2db.dao;

import com.bai.excel2db.model.AreaCodeId;
import org.springframework.data.jpa.repository.JpaRepository;
import org.springframework.stereotype.Repository;

@Repository
public interface AreaCodeIdRepository extends JpaRepository<AreaCodeId,Integer> {

}

ExcelUtil.java

package com.bai.excel2db.excel;

import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xssf.eventusermodel.XSSFReader;
import org.apache.poi.xssf.model.SharedStringsTable;
import org.apache.poi.xssf.usermodel.XSSFRichTextString;
import org.xml.sax.Attributes;
import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.XMLReader;
import org.xml.sax.helpers.DefaultHandler;
import org.xml.sax.helpers.XMLReaderFactory;

import java.io.InputStream;
import java.util.ArrayList;
import java.util.Iterator;
import java.util.List;

public class ExcelUtil extends DefaultHandler {

    private SharedStringsTable sst;
    private String lastContents;
    private boolean nextIsString;

    private int sheetIndex = -1;
    private List<String> rowlist = new ArrayList<String>();
    private List<List<String>> excelList=new ArrayList<List<String>>();  //存放整个sheet中的行 rowList

    private int curRow = 0;
    private int curCol = 0;

    public List<List<String>> getExcelList() {
        return excelList;
    }
    public void setExcelList(List<List<String>> excelList) {
        this.excelList = excelList;
    }



    public static void main(String[] args)throws Exception{
        String filepath = "D:/home/smart.xls";
        System.out.println("开始");

        ExcelUtil excelUtil = new ExcelUtil();
        excelUtil.readExcel(filepath,3);

    }

    /**
     * 读取Excel表格(支持Excel2003(.xls)和Excel2007(.xlsx)两种格式)
     * @param filepath
     * @return 表格List集合
     */
    public List<List<String>> readExcel(String filepath){
        return readExcel(filepath,1);
    }
    public List<List<String>> readExcel(String filepath,int sheetId){
        List<List<String>> resultList = null;
        try {
         //  readAllSheets(filepath);
            readOneSheet(filepath,sheetId);

            resultList = getExcelList();
        } catch (Exception e) {
            e.printStackTrace();
        }
     //   System.out.println(resultList);
        return resultList;
    }

    /**
     * 读取单个工作簿的入口方法,带上要读取sheet的索引,不带则默认读第一个sheet:1
     * @param path
     */
    public void readOneSheet(String path) throws Exception {
        readOneSheet(path,1);
    }
    public void readOneSheet(String path,int sheetId) throws Exception {
        String sheetIndexOne = "rId"+sheetId;
        OPCPackage pkg = OPCPackage.open(path);
        XSSFReader r = new XSSFReader(pkg);
        SharedStringsTable sst = r.getSharedStringsTable();
        XMLReader parser = fetchSheetParser(sst);
        InputStream sheet = r.getSheet(sheetIndexOne);
        InputSource sheetSource = new InputSource(sheet);
        parser.parse(sheetSource);
        sheet.close();
    }

    /**
     * 读取所有工作簿的入口方法,依次遍历
     * @param path
     * @throws Exception
     */
    public void readAllSheets(String path) throws Exception {
        OPCPackage pkg = OPCPackage.open(path);
        XSSFReader r = new XSSFReader(pkg);
        SharedStringsTable sst = r.getSharedStringsTable();
        XMLReader parser = fetchSheetParser(sst);
        Iterator<InputStream> sheets = r.getSheetsData();
        while (sheets.hasNext()) {
            curRow = 0;
            sheetIndex++;
            InputStream sheet = sheets.next();
            InputSource sheetSource = new InputSource(sheet);
            parser.parse(sheetSource);
            sheet.close();
        }

    }

    /**
     * 该方法自动被调用,每读一行调用一次,在方法中写自己的业务逻辑即可
     * @param sheetIndex 工作簿序号
     * @param curRow 处理到第几行
     * @param rowList 当前数据行的数据集合
     */
    public void optRow(int sheetIndex, int curRow, List<String> rowList) {

        if (curRow<=0)return;   //第0行时,标题行不存入
        List<String> onerowList=new ArrayList<String>();
        for(String str : rowList) {
            onerowList.add(str);
        }
        excelList.add(onerowList);
      //  System.out.println(onerowList);

    }


    public XMLReader fetchSheetParser(SharedStringsTable sst) throws SAXException {
        XMLReader parser = XMLReaderFactory
                .createXMLReader("org.apache.xerces.parsers.SAXParser");
        this.sst = sst;
        parser.setContentHandler(this);
        return parser;
    }

    public void startElement(String uri, String localName, String name,
                             Attributes attributes) throws SAXException {
        // c => 单元格
        if (name.equals("c")) {
            // 如果下一个元素是 SST 的索引,则将nextIsString标记为true
            String cellType = attributes.getValue("t");
            if (cellType != null && cellType.equals("s")) {
                nextIsString = true;
            } else {
                nextIsString = false;
            }

        }
        // 置空
        lastContents = "";
    }


    public void endElement(String uri, String localName, String name)
            throws SAXException {
        // 根据SST的索引值的到单元格的真正要存储的字符串
        // 这时characters()方法可能会被调用多次
        if (nextIsString) {
            try {
                int idx = Integer.parseInt(lastContents);
                lastContents = new XSSFRichTextString(sst.getEntryAt(idx))
                        .toString();
            } catch (Exception e) {

            }
        }
        // v => 单元格的值,如果单元格是字符串则v标签的值为该字符串在SST中的索引
        // 将单元格内容加入rowlist中,在这之前先去掉字符串前后的空白符
        if (name.equals("v")) {
            String value = lastContents.trim();
            value = value.equals("") ? " " : value;
            rowlist.add(curCol, value);
            curCol++;
        } else {
            // 如果标签名称为 row ,这说明已到行尾,调用 optRows() 方法
            if (name.equals("row")) {
             //   System.out.println("row----"+curRow);
                optRow(sheetIndex, curRow, rowlist);
                rowlist.clear();
                curRow++;
                curCol = 0;
            }

        }
    }

    public void characters(char[] ch, int start, int length)
            throws SAXException {
        // 得到单元格内容的值
        lastContents += new String(ch, start, length);
    }

}

分页工具类:PagingList.java

package com.bai.excel2db.excel;

import java.util.List;
import java.util.stream.Collectors;

public class PagingList<T> {

    /**
     * 总页数
     */
    private int totalPage = 0;

    /**
     * 当前是第几页
     */
    private int curPageNo = 0;

    /**
     * 每页的大小
     */
    private int pageSize = 0;

    /**
     * 每页默认大小
     */
    private static final int DEFAULT_PAGE_SIZE = 500;


    private List<T> pageData = null;

    public PagingList(List<T> pageResult, int pageSize) {
        this.pageSize = pageSize;
        this.pageData = pageResult;
        init(pageResult, pageSize);
    }


    public PagingList(List<T> pageResult) {
        this(pageResult, DEFAULT_PAGE_SIZE);
    }


    private void init(List<T> pageResult, int pageSize) {
        if (pageSize <= 0) {
            throw new IllegalArgumentException("Paging size must be greater than zero.");
        }
        if (null == pageResult) {
            throw new NullPointerException("Paging resource list must be not null.");
        }
        if (pageResult.size() % pageSize > 0) {
            this.totalPage = (pageResult.size() / pageSize) + 1;
        } else {
            this.totalPage = pageResult.size() / pageSize;
        }
    }

    /**
     * 返回当前剩余页数
     *
     * @return
     */
    private int getSurplusPage() {
        if (pageData.size() % pageSize > 0) {
            return (pageData.size() / pageSize) + 1;
        } else {
            return pageData.size() / pageSize;
        }

    }

    /**
     * 返回是否还有下一页数据
     *
     * @return
     */
    public boolean hasNext() {
        return pageData.size() > 0;
    }

    /**
     * 获取分页后,总的页数
     *
     * @return
     */
    public int getTotalPage() {
        return totalPage;
    }

    public List<T> next() {
        List<T> pagingData = pageData.stream().limit(pageSize).collect(Collectors.toList());
        pageData = pageData.stream().skip(pageSize).collect(Collectors.toList());
        return pagingData;
    }

    /**
     * 返回当前页数
     *
     * @return
     */
    public int getCurPageNo() {
        return totalPage - getSurplusPage();
    }


}

实体类:AreaCodeId .java

package com.bai.excel2db.model;

import javax.persistence.*;

@Entity
public class AreaCodeId implements java.io.Serializable {
    private static final long seriaVersionUID = 1L;
    @Id
    @GeneratedValue(strategy = GenerationType.IDENTITY)
    private Integer id;

    private  String userId;

 //   @Column(name = "box_id")
    private  String boxId;

    public AreaCodeId() {
    }

    public AreaCodeId(Integer id, String userId, String boxId) {
        this.id = id;
        this.userId = userId;
        this.boxId = boxId;
    }

    public Integer getId() {
        return id;
    }

    public void setId(Integer id) {
        this.id = id;
    }

    public String getUserId() {
        return userId;
    }

    public void setUserId(String userId) {
        this.userId = userId;
    }

    public String getBoxId() {
        return boxId;
    }

    public void setBoxId(String boxId) {
        this.boxId = boxId;
    }
}

业务层接口:AreaCodeIdService .java

package com.bai.excel2db.service;

import com.bai.excel2db.model.AreaCodeId;
import java.util.List;

public interface AreaCodeIdService {

    void save(AreaCodeId areaCodeId);

    void saveAll(List<AreaCodeId> areaCodeIdList);

}

业务层实现类:AreaCodeIdServiceImpl

package com.bai.excel2db.service.impl;

import com.bai.excel2db.dao.AreaCodeIdRepository;
import com.bai.excel2db.model.AreaCodeId;
import com.bai.excel2db.service.AreaCodeIdService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import org.springframework.transaction.annotation.Transactional;

import java.util.List;

@Service
@Transactional
public class AreaCodeIdServiceImpl implements AreaCodeIdService {
    @Autowired
    private AreaCodeIdRepository areaCodeIdRepository;

    @Override
    public void save(AreaCodeId areaCodeId) {
        if(areaCodeId==null)return;

        boolean exit=areaCodeIdRepository.existsById(areaCodeId.getId());

        if(!exit){
            areaCodeIdRepository.save(areaCodeId);
        }

    }

    @Override
    public void saveAll(List<AreaCodeId> areaCodeIdList) {

        if(areaCodeIdList==null||areaCodeIdList.size()==0)return;

        areaCodeIdRepository.saveAll(areaCodeIdList);

    }

}

入口类:UtilsSelfApplication.java

package com.bai;

import com.bai.excel2db.excel.ExcelUtil;
import com.bai.excel2db.excel.PagingList;
import com.bai.excel2db.model.AreaCodeId;
import com.bai.excel2db.service.AreaCodeIdService;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.CommandLineRunner;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.core.annotation.Order;
import org.springframework.stereotype.Component;

import java.util.ArrayList;
import java.util.List;

@SpringBootApplication
public class UtilsSelfApplication {

	public static void main(String[] args) {
		SpringApplication.run(UtilsSelfApplication.class, args);
	}

}

@Component
@Order(value = 1)
class Import implements CommandLineRunner{

	@Autowired
	AreaCodeIdService areaCodeIdService;

	@Override
	public void run(String... args) throws Exception {

		String filepath = "D:/home/smart.xlsx";

		ExcelUtil excelUtil = new ExcelUtil();
		List<List<String>> result = excelUtil.readExcel(filepath,1);
		//   System.out.println("========="+result);
		final List<AreaCodeId> finalList=new ArrayList<>();
		result.forEach(list->{
			AreaCodeId areaCodeId=new AreaCodeId();
			areaCodeId.setUserId(list.get(0));
			areaCodeId.setBoxId(list.get(1));
			finalList.add(areaCodeId);
		});
	//	System.out.println(finalList);
		PagingList<AreaCodeId> paging = new PagingList(finalList, 1000);

		while (paging.hasNext()) {
			areaCodeIdService.saveAll(paging.next());
		}

		System.exit(0);

	}
}
  1. 简单描述下:主要功能在 ExcelUtil.java 中,包括读取Excel表格(支持xls 和 xlsx 两种文件),存储到List集合中,调用写入数据库方法。
    写入数据库方面,不了解的可以参考我的上篇文章:spring boot+spring-data-jpa操作数据库(MySQL)
    我这里在入口类 UtilsSelfApplication.java 中添加了一个非 public 类:Import ,实现了 CommandLineRunner 接口(加上两个注解 @Component @Order(value = 1) ),重写run方法,可以在项目启动时就运行。具体逻辑大家根据需要修改。

  2. 结果: 我最后运行项目时,一个有13万条数据的Excel表格,有两个字段,插入数据库只用了2分钟时间。