Word通过poi-ooxml生成PDF（一）

程序员文章站 2024-03-21 14:41:04

...

生成PDF的前世今生

之前楼主刚刚接手一个老项目，大致内容是负责合同的生成。毕竟说楼主发量还是挺足的，于是在接手之后楼主噼里啪啦一顿操作，修改文件，上传，生成……

呵呵，运气真好，第一驳操作就报错了。

项目中PDF的生成步骤大致是：

手动将Word文件转为HTML
维护生成后HTML，如写入变量等等
上传HTML文件到系统中，之后通过freemarker进行渲染生成PDF合同文件

总的来说，以上步骤并没有任何的毛病，最多就是废手和眼，因为给楼主的报错信息是有一个标签没有闭合！

淦！

几千行HTML文件中一个标签没有闭合，鬼知道哪个标签没有闭合，况且就算标签没有闭合浏览器依然可以渲染出来。

不过这并难不倒我，毕竟我发量贼足。

楼主仗着5.2的视力将HTML文件逐行看了个遍，然而，依然没有看出来究竟是哪个标签没有闭合！！！

再后来，脑袋一拍干脆百度搜一个HTML标签校验器，把文件复制一份上去检测一下，大意了，早知道有这玩意，我还找个P啊。

所以这个故事告诉我们，手动将Word转HTML，生成PDF究竟有多蛋疼！

为了避免因为标签没有闭合而导致freemarker渲染失败，楼主果断换一种生成方式，也就是使用poi-ooxml来生成。

准备工作

撸代码之前先添加一波依赖

<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-ooxml</artifactId>
    <version>3.14</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi-scratchpad</artifactId>
    <version>3.14</version>
</dependency>
<dependency>
    <groupId>org.apache.poi</groupId>
    <artifactId>poi</artifactId>
    <version>3.14</version>
</dependency>
<dependency>
    <groupId>fr.opensagres.xdocreport</groupId>
    <artifactId>org.apache.poi.xwpf.converter.pdf-gae</artifactId>
    <version>1.0.6</version>
</dependency>

代码实例

/**
 * 生成PDF工具类
 *
 * @author b3
 */
public final class PdfUtil {

    private PdfUtil() {
        //
    }

    /**
     * 生成PDF
     *
     * @param wordPath Word文件路径
     * @param outputPath 文件输出路径
     */
    public static void build(String wordPath, String outputPath) {
        try (FileInputStream in = new FileInputStream(wordPath);
             FileOutputStream out = new FileOutputStream(outputPath)) {
            build(in, out);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 生成PDF
     *
     * @param in Word文件流
     * @param function 文件输出回调
     */
    public static void build(InputStream in, Consumer<ByteArrayOutputStream> function) {
        try {
            ByteArrayOutputStream out = new ByteArrayOutputStream();
            build(in, out);
            function.accept(out);
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

    /**
     * 生成PDF
     *
     * @param in 文件输入流
     * @param out 文件输出流
     * @throws IOException IOException
     */
    private static void build(InputStream in, OutputStream out)
            throws IOException {
        XWPFDocument xwpfDocument = new XWPFDocument(in);
        PdfOptions pdfOptions = PdfOptions.create().fontProvider(FontRegistry.get("SIMSUN"));
        PdfConverter.getInstance().convert(xwpfDocument, out, pdfOptions);
    }

}

/**
 * 字体注册
 *
 * @author b3
 * 2021/1/13 10:11
 * @since 1.0.0
 */
public class FontRegistry {

    private static Map<String, BaseFont> fontMap = new HashMap<>();
    static {
        fontMap.put("SIMSUN", register("C:\\Windows\\Fonts\\simsun.ttc,0"));
        fontMap.put("SIMHEI", register("C:\\Windows\\Fonts\\simhei.ttf"));
        fontMap.put("TIMES_NEW_ROMAN", register("C:\\Windows\\Fonts\\times.ttf"));
    }

    private static BaseFont register(String fontPath) {
        try {
            return BaseFont.createFont(fontPath, BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
        } catch (DocumentException | IOException e) {
            e.printStackTrace();
        }
        return null;
    }

    public static IFontProvider get(String familyName) {
        return (familyName0, encoding, size, style, color) -> {
            BaseFont baseFont = fontMap.get(familyName);
            if (baseFont == null) {
                return new Font();
            }
            return new Font(baseFont, 10.5f, Font.NORMAL, Color.BLACK);
        };
    }

}

/**
 * @author b3
 * 2021/1/12 14:48
 * @since 1.0.0
 */
public class MagicMirrorPDFDemo {

    public static void main(String[] args) {
        String basePath = "e:/pdf/";
        build1(basePath);
    }

    static void build0(String basePath) {
        PdfUtil.build(basePath + "b3.docx", basePath + "gen.pdf");
    }

    static void build1(String basePath) {
        try(FileOutputStream out = new FileOutputStream(basePath + "gen-test0.docx");
            BufferedOutputStream bufferedOutputStream = new BufferedOutputStream(out)) {
            FileInputStream in = new FileInputStream(basePath + "b3.docx");
            PdfUtil.build(in, outputStream -> {
                try {
                    bufferedOutputStream.write(outputStream.toByteArray());
                } catch (IOException e) {
                    e.printStackTrace();
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        }
    }

}

结语

省了手动将Word转HTML这个步骤，发量总算是保住了，同时也不用每次都担心会不会有标签闭合问题。

真香！！！

这里并没有数据渲染步骤，如果需要手动渲染数据，可以考虑在以下位置织入自己的逻辑：

PdfConverter.getInstance().convert(xwpfDocument, out, pdfOptions);

Word通过poi-ooxml生成PDF（一）

生成PDF的前世今生

准备工作

代码实例

结语