欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Poi-2:个人记录使用过程的问题

程序员文章站 2024-02-10 14:00:52
...

Poi-2:个人记录使用过程的问题

doc以及docx转html

public static String docToHtml(String wPath) throws Exception 
        String fpath=new GetFromFilePath().getNameFromPath(wPath);
        File path = new File(fpath);
        String imagePathStr = path.getAbsolutePath() + "\\static\\image\\";
        String sourceFileName = wPath;

        String targetFileName = path.getAbsolutePath() + "\\static\\"+fpath+".html";
        File file = new File(imagePathStr);
        if(!file.exists()) {
            file.mkdirs();
        }
            
        HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(sourceFileName));
        org.w3c.dom.Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
        WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document);
        //保存图片
        wordToHtmlConverter.setPicturesManager((content, pictureType, name, width, height) -> {
            try (FileOutputStream out = new FileOutputStream(imagePathStr + name)) {
                out.write(content);
            } catch (Exception e) {
                e.printStackTrace();
            }
            //图片的相对路径
            return "image/" + name;
        });        wordToHtmlConverter.processDocument(wordDocument);
        org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
        DOMSource domSource = new DOMSource(htmlDocument);
        StreamResult streamResult = new StreamResult(new File(targetFileName));
        TransformerFactory tf = TransformerFactory.newInstance();
        Transformer serializer = tf.newTransformer();
        serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");
        serializer.setOutputProperty(OutputKeys.INDENT, "yes");
        serializer.setOutputProperty(OutputKeys.METHOD, "html");
        serializer.transform(domSource, streamResult);
        return targetFileName;
    }

但是docx转成html时就遇到了最大的问题,开始搜寻相关资料,使用docx转换成html方法。。。

public static void docx2Html(String fileName, String outPutFile) throws IOException {
        String fileOutName = outPutFile;
        long startTime = System.currentTimeMillis();
        XWPFDocument document = new XWPFDocument(new FileInputStream(fileName));
        XHTMLOptions options = XHTMLOptions.getDefault();
        // 导出图片
        File imageFolder = new File("D:\\testfile\\static\\image");
        options.setExtractor(new FileImageExtractor(imageFolder));
        // URI resolver  word的html中图片的目录路径
        options.URIResolver(new BasicURIResolver("image"));
        File outFile = new File(fileOutName);
        outFile.getParentFile().mkdirs();
        OutputStream out = new FileOutputStream(outFile);
        
        XHTMLConverter.getInstance().convert(document, out, options);
        System.out.println("Generate " + fileOutName + " with " + (System.currentTimeMillis() - startTime) + " ms.");
    }

显示缺少一些类,我又重新下载jar包,并且使用最新的poi4.2版本,代码不报错了,但是运行时显示java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader`

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
    at org.openxmlformats.schemas.wordprocessingml.x2006.main.FontsDocument$Factory.parse(Unknown Source)
    at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument$FontsDocumentVisitor.visitDocumentPart(XWPFStylesDocument.java:1600)
    at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument$DocumentVisitor.visitDocument(XWPFStylesDocument.java:1496)
    at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument.getFontsDocument(XWPFStylesDocument.java:1618)
    at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument.<init>(XWPFStylesDocument.java:196)
    at fr.opensagres.poi.xwpf.converter.xhtml.internal.styles.CSSStylesDocument.<init>(CSSStylesDocument.java:103)
    at fr.opensagres.poi.xwpf.converter.xhtml.internal.XHTMLMapper.createStylesDocument(XHTMLMapper.java:121)
    at fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor.<init>(XWPFDocumentVisitor.java:175)
    at fr.opensagres.poi.xwpf.converter.xhtml.internal.XHTMLMapper.<init>(XHTMLMapper.java:111)
    at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.convert(XHTMLConverter.java:73)
    at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:64)
    at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:39)
    at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:46)
    at com.agree.DocxToHtml.docx2Html(DocxToHtml.java:84)
    at com.agree.DocxToHtml.main(DocxToHtml.java:33)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.POIXMLTypeLoader
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 15 more

尝试几个版本的poi都是不行,现在实在没有办法了。。。。

我又想到能不能将docx文件重命名为doc,结果还是失败,因为poi识别时还是将转换后的doc文件识别为docx版本
java.lang.IllegalArgumentException: The document is really a OOXML file

又看到许多jacob的使用,尝试后确实可以将docx转换,只能在Windows下使用,需要jacob-1.19-x86.dll放到jre/bin下,貌似本地一定有安装相关的软件,例如office或WPS这点我还不确定。。。

所以现在使用poi转换docx为html没有成功,朋友有没有好的方法推荐,感谢。。