Poi-2:个人记录使用过程的问题
程序员文章站
2024-02-10 14:00:52
...
Poi-2:个人记录使用过程的问题
doc以及docx转html
public static String docToHtml(String wPath) throws Exception
String fpath=new GetFromFilePath().getNameFromPath(wPath);
File path = new File(fpath);
String imagePathStr = path.getAbsolutePath() + "\\static\\image\\";
String sourceFileName = wPath;
String targetFileName = path.getAbsolutePath() + "\\static\\"+fpath+".html";
File file = new File(imagePathStr);
if(!file.exists()) {
file.mkdirs();
}
HWPFDocument wordDocument = new HWPFDocument(new FileInputStream(sourceFileName));
org.w3c.dom.Document document = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument();
WordToHtmlConverter wordToHtmlConverter = new WordToHtmlConverter(document);
//保存图片
wordToHtmlConverter.setPicturesManager((content, pictureType, name, width, height) -> {
try (FileOutputStream out = new FileOutputStream(imagePathStr + name)) {
out.write(content);
} catch (Exception e) {
e.printStackTrace();
}
//图片的相对路径
return "image/" + name;
}); wordToHtmlConverter.processDocument(wordDocument);
org.w3c.dom.Document htmlDocument = wordToHtmlConverter.getDocument();
DOMSource domSource = new DOMSource(htmlDocument);
StreamResult streamResult = new StreamResult(new File(targetFileName));
TransformerFactory tf = TransformerFactory.newInstance();
Transformer serializer = tf.newTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "GB2312");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
serializer.setOutputProperty(OutputKeys.METHOD, "html");
serializer.transform(domSource, streamResult);
return targetFileName;
}
但是docx转成html时就遇到了最大的问题,开始搜寻相关资料,使用docx转换成html方法。。。
public static void docx2Html(String fileName, String outPutFile) throws IOException {
String fileOutName = outPutFile;
long startTime = System.currentTimeMillis();
XWPFDocument document = new XWPFDocument(new FileInputStream(fileName));
XHTMLOptions options = XHTMLOptions.getDefault();
// 导出图片
File imageFolder = new File("D:\\testfile\\static\\image");
options.setExtractor(new FileImageExtractor(imageFolder));
// URI resolver word的html中图片的目录路径
options.URIResolver(new BasicURIResolver("image"));
File outFile = new File(fileOutName);
outFile.getParentFile().mkdirs();
OutputStream out = new FileOutputStream(outFile);
XHTMLConverter.getInstance().convert(document, out, options);
System.out.println("Generate " + fileOutName + " with " + (System.currentTimeMillis() - startTime) + " ms.");
}
显示缺少一些类,我又重新下载jar包,并且使用最新的poi4.2版本,代码不报错了,但是运行时显示java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader`
Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/poi/POIXMLTypeLoader
at org.openxmlformats.schemas.wordprocessingml.x2006.main.FontsDocument$Factory.parse(Unknown Source)
at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument$FontsDocumentVisitor.visitDocumentPart(XWPFStylesDocument.java:1600)
at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument$DocumentVisitor.visitDocument(XWPFStylesDocument.java:1496)
at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument.getFontsDocument(XWPFStylesDocument.java:1618)
at fr.opensagres.poi.xwpf.converter.core.styles.XWPFStylesDocument.<init>(XWPFStylesDocument.java:196)
at fr.opensagres.poi.xwpf.converter.xhtml.internal.styles.CSSStylesDocument.<init>(CSSStylesDocument.java:103)
at fr.opensagres.poi.xwpf.converter.xhtml.internal.XHTMLMapper.createStylesDocument(XHTMLMapper.java:121)
at fr.opensagres.poi.xwpf.converter.core.XWPFDocumentVisitor.<init>(XWPFDocumentVisitor.java:175)
at fr.opensagres.poi.xwpf.converter.xhtml.internal.XHTMLMapper.<init>(XHTMLMapper.java:111)
at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.convert(XHTMLConverter.java:73)
at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:64)
at fr.opensagres.poi.xwpf.converter.xhtml.XHTMLConverter.doConvert(XHTMLConverter.java:39)
at fr.opensagres.poi.xwpf.converter.core.AbstractXWPFConverter.convert(AbstractXWPFConverter.java:46)
at com.agree.DocxToHtml.docx2Html(DocxToHtml.java:84)
at com.agree.DocxToHtml.main(DocxToHtml.java:33)
Caused by: java.lang.ClassNotFoundException: org.apache.poi.POIXMLTypeLoader
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
... 15 more
尝试几个版本的poi都是不行,现在实在没有办法了。。。。
我又想到能不能将docx文件重命名为doc,结果还是失败,因为poi识别时还是将转换后的doc文件识别为docx版本
java.lang.IllegalArgumentException: The document is really a OOXML file
又看到许多jacob的使用,尝试后确实可以将docx转换,只能在Windows下使用,需要jacob-1.19-x86.dll放到jre/bin下,貌似本地一定有安装相关的软件,例如office或WPS这点我还不确定。。。
所以现在使用poi转换docx为html没有成功,朋友有没有好的方法推荐,感谢。。