lucene-索引RTF文档
1、对RTF进行文本提取操作时可以使用部分JAVA标准类处理(javax.swing.text和javax.swing.text.rtf)
2、
public class JavaBuiltInRTFHanlder implementsDocumentHandler{
public DocumentgetDocument(InputSream is)throws DocumentHandlerException{
StringbodyText=null;
DefaultStyleDocument styledDoc=new DefaultSytldDocument();
try{
new RTFEditorKit().read(is,styledDoc,0);
//通过JAVA内置的RTFEditorKit类从RTF文档中提取文本的内容
botyText=sytledDoc.getText(0,styledDoc.getLength());
}
catch (IOException e){
throw new DocumentHandlerException("cannot extract text from a RTFdocument",e);
}
catch (BadLocationException e){
throw new DocumentHandlerException("cannot extract text from a RTFdocument",e);
}
if (bodyText!=null){
Document doc=new Document();
doc.add(Field.UnStored("body",bodyText));
return doc;
}
return null;
}
public static voidmain(String[] args) throws Exception{
JavaBuiltInRTFHandler handler=new JavaBuiltInRTFHandler();
Document doc=handler.getDocument(new FileInputStream(newFile(args[0]));
System.out.println(doc);
}
}
上一篇: 消息压缩和解压缩
下一篇: java.util.zip