欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  IT编程

JAVA读取PDF、WORD文档实例代码

程序员文章站 2024-02-29 18:36:04
读取pdf文件jar引用 org.apache.pdfbox

读取pdf文件jar引用

<dependency>
  <groupid>org.apache.pdfbox</groupid>
  pdfbox</artifactid>
  <version>1.8.13</version>
</dependency>

读取word文件jar引用

<dependency>
  <groupid>org.apache.poi</groupid>
  poi-scratchpad</artifactid>
  <version>3.16-beta1</version>
</dependency>
<dependency>
  <groupid>org.apache.poi</groupid>
  poi</artifactid>
  <version>3.16-beta1</version>
</dependency>

读取word文件方法

/**
   * 
   * @title: gettextfromword
   * @description: 读取word
   * @param filepath
   *      文件路径
   * @return: string 读出的word的内容
   */
  public static string gettextfromword(string filepath) {
    string result = null;
    file file = new file(filepath);
    fileinputstream fis = null;
    try {
      fis = new fileinputstream(file);
      @suppresswarnings("resource")
      wordextractor wordextractor = new wordextractor(fis);
      result = wordextractor.gettext();
    } catch (filenotfoundexception e) {
      e.printstacktrace();
    } catch (ioexception e) {
      e.printstacktrace();
    } finally {
      if (fis != null) {
        try {
          fis.close();
        } catch (ioexception e) {
          e.printstacktrace();
        }
      }
    }
    return result;
  }

读取pdf文件方法

/**
 * 
 * @title: gettextfrompdf
 * @description: 读取pdf文件内容
 * @param filepath
 * @return: 读出的pdf的内容
 */
public static string gettextfrompdf(string filepath) {
  string result = null;
  fileinputstream is = null;
  pddocument document = null;
  try {
    is = new fileinputstream(filepath);
    pdfparser parser = new pdfparser(is);
    parser.parse();
    document = parser.getpddocument();
    pdftextstripper stripper = new pdftextstripper();
    result = stripper.gettext(document);
  } catch (filenotfoundexception e) {
    e.printstacktrace();
  } catch (ioexception e) {
    e.printstacktrace();
  } finally {
    if (is != null) {
      try {
        is.close();
      } catch (ioexception e) {
        e.printstacktrace();
      }
    }
    if (document != null) {
      try {
        document.close();
      } catch (ioexception e) {
        e.printstacktrace();
      }
    }
  }
  return result;
}

希望本篇实例代码可以帮到您