欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Java获取TXT文本和Word文件的内容并显示在页面

程序员文章站 2022-05-31 10:42:01
...

Java获取TXT文本和Word文件的内容并显示在页面

 

注意2003版本的word和2007以上的word需要用不同的jar包来获取!

1.TXT文本:

import java.io.BufferedReader;
import java.io.FileReader;

StringBuffer texts =new StringBuffer();    
BufferedReader br = new BufferedReader(new FileReader(file)); //
String line = null;  
while ((line = br.readLine()) != null) {   
      texts.append(line);  
}  
br.close(); 

注意:按照上面的方式,会出现中文乱码问题!

 

解决方式:加上编码转换

StringBuffer texts =new StringBuffer();    
InputStreamReader isr = new InputStreamReader(new FileInputStream(file), "UTF-8");//加上编码转换
BufferedReader read = new BufferedReader(isr);
String line = null;  
while ((line = br.readLine()) != null) {   
      texts.append(line);  
}  
br.close(); 

 

 

2.Word2003——doc格式:

import java.io.FileInputStream;
import org.apache.poi.hwpf.extractor.WordExtractor;

try {

    FileInputStream inputStream = new FileInputStream(file);
    WordExtractor extractor = new WordExtractor(inputStream);
    text = extractor.getText();

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (Exception e) {
    e.printStackTrace();
}

 

或者

import java.io.FileInputStream;
import org.textmining.text.extraction.WordExtractor;//引入包不同

try {

    FileInputStream inputStream = new FileInputStream(file);
    WordExtractor extractor = new WordExtractor();//此处无参数
    text = extractor.getText(inputStream);//此处有参数

} catch (FileNotFoundException e) {
    e.printStackTrace();
} catch (Exception e) {
    e.printStackTrace();
}

注意注释的不同之处!

 

3.Word2007及以上版本——docx格式:

使用到的 jar 包
* poi-3.9-20121203.jar
* poi-ooxml-3.9-20121203.jar
* poi-ooxml-schemas-3.9-20121203.jar
* poi-scratchpad-3.9-20121203.jar
* xmlbeans-2.3.0.jar
* dom4j-1.6.1.jar

import org.apache.poi.POIXMLDocument;
import org.apache.poi.POIXMLTextExtractor;
import org.apache.poi.openxml4j.opc.OPCPackage;
import org.apache.poi.xwpf.extractor.XWPFWordExtractor;

try {

    OPCPackage opcPackage = POIXMLDocument.openPackage(filePath);
    POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
    text = extractor.getText();

} catch (IOException e) {
    e.printStackTrace();
} catch (XmlException e) {
    e.printStackTrace();
} catch (OpenXML4JException e) {
    e.printStackTrace();
}

 

 

4.实例分析:

long id = Long.valueOf(request.getParameter("id"));
PolicyDao policyDao = new PolicyDao();
Policy policy = policyDao.getPolicy(id);
//读取文件中的内容
StringBuffer fileContent = new StringBuffer();
String fileName = policy.getFilePath();
String uploadPath = Configuration.getConfig().getString("policyFilesPath");
File file = new File(uploadPath+fileName);
if(file.exists()){
	String suffix = file.getName().substring(file.getName().lastIndexOf(".")+1);
	//Word2003
	if (suffix.equals("doc")) {
		FileInputStream fis = new FileInputStream(file);
		WordExtractor wordExtractor = new WordExtractor(fis);
		String text = wordExtractor.getText();
		fileContent.append(text);
	}
	//Word2007
	else if (suffix.equals("docx")) {
		OPCPackage opcPackage = POIXMLDocument.openPackage(uploadPath+fileName);
        POIXMLTextExtractor extractor = new XWPFWordExtractor(opcPackage);
        String text = extractor.getText();
        fileContent.append(text);
	}
	//TXT
	else if (suffix.equals("txt")) {
		BufferedReader bufferReader = new BufferedReader(new InputStreamReader(new FileInputStream(file),"utf-8"));
        //每从BufferedReader对象中读取一行字符。
        String line = null;
        while((line=bufferReader.readLine()) !=null){
        	fileContent.append(line);
        }
        bufferReader.close();
	}
}else{
    System.out.println("文件不存在!");
}
//输出
request.setAttribute("content", fileContent);
request.setAttribute("name", policy.getTitle());
request.setAttribute("id", policy.getId());
request.getRequestDispatcher("/frontShow/document-info.jsp").forward(request, response);
return;

 

 

出现类似于: IOException:Unable to read entire block; 362 bytes read; expected 512 bytes的异常

 

解决办法:既然expected 512 bytes ,那我就写够512bytes

ByteArrayOutputStream byteOS = new ByteArrayOutputStream(); 

FileInputStream fis = new FileInputStream(fileToBeRead);  

byte[] by = new byte[512];  

int t = fis.read(by,0,by.length); 

while(t>0){   byteOS.write(by, 0, 512);  //这里别写成t,写够512,呵呵,SB的方法对付SB的java API

     t = fis.read(by,0,by.length);  

} 

byteOS.close();  

InputStream byteIS = new ByteArrayInputStream(byteOS.toByteArray());  

HSSFWorkbook workbook = new HSSFWorkbook(byteIS);

 

实例如下:

if(file.exists()){
	String suffix = file.getName().substring(file.getName().lastIndexOf(".")+1);
	if (suffix.equalsIgnoreCase("doc")) {
		FileInputStream fis = new FileInputStream(file);
		/*byte buf[] = IOUtils.toByteArray(fis);
		ByteArrayInputStream bs = new ByteArrayInputStream(buf);*/
		ByteArrayOutputStream byteOS = new ByteArrayOutputStream(); 
		byte[] by = new byte[512];  
		int t = fis.read(by,0,by.length); 
		while(t>0){byteOS.write(by, 0, 512);  //这里别写成t,写够512,呵呵,SB的方法对付SB的java API
			t = fis.read(by,0,by.length);  
		} 
		InputStream byteIS = new ByteArrayInputStream(byteOS.toByteArray());  
		WordExtractor wordExtractor = new WordExtractor(byteIS);
		String text = wordExtractor.getText();
		fileContent.append(text);
		fis.close();
		byteOS.close();
		byteIS.close();
	}
......

 

相关标签: Java Word Txt