Lucene高亮显示

程序员文章站 2022-07-09 09:52:50

...

前面一节我们基本了解了Lucene的一些基本使用，索引的创建，搜索。

接下去让我们的搜索结果高亮！

需要的架包（版本比较低~）

  <dependencies>
    <!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-core -->
	<dependency>
	    <groupId>org.apache.lucene</groupId>
	    <artifactId>lucene-core</artifactId>
	    <version>2.3.0</version>
	</dependency>
	
	<!-- https://mvnrepository.com/artifact/org.apache.lucene/lucene-highlighter -->
	<dependency>
	    <groupId>org.apache.lucene</groupId>
	    <artifactId>lucene-highlighter</artifactId>
	    <version>2.3.0</version>
	</dependency>
  </dependencies>

我们现在项目根目录下的analyzer/files/目录下创建两个txt文件，内容分别为：

我是中国人， I am Chinese!

你好， Hello World

windows用户用记事本的话注意保存成utf-8的格式。

准备工作完成了，接下去创建索引，代码：

/**
 * 创建索引
 * @author TangXW
 *
 */
public class IndexHandler{
	public void doIndex() throws Exception{
		File fileDir = new File("analyzer/files/");  // 需要索引的位置
		File indexDir = new File("analyzer/");  // 索引存放位置
		
		Analyzer strandardAnalyzer = new StandardAnalyzer();  // 分析器
		// true表示如果原来已经有索引文件，则覆盖
		IndexWriter indexWriter = new IndexWriter(indexDir, strandardAnalyzer, true);
		
		File[] textFiles = fileDir.listFiles();
		if(textFiles != null && textFiles.length > 0){
			for(int i = 0; i < textFiles.length; i++){
				// 如果是txt文件
				if(textFiles[i].isFile() && textFiles[i].getName().endsWith(".txt")){
					String temp = FileReaderAll(textFiles[i].getCanonicalPath(), "UTF-8");
					Document document = new Document();
					// 建立一个body索引，即txt的内容，并且存储索引
					Field fieldBody = new Field("body", temp, Field.Store.YES, Field.Index.TOKENIZED);
					document.add(fieldBody);
					// 入库
					indexWriter.addDocument(document);
				}
			}
		}
		indexWriter.optimize(); // 整合优化
		indexWriter.close();
	}
	
	
	/**
	 * 获取文件中的全部内容
	 * @param fileName
	 * @param charset
	 * @return
	 * @throws IOException
	 */
	public static String FileReaderAll(String fileName, String charset) throws IOException{
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), charset));
        String line = new String();
        StringBuffer temp = new StringBuffer();

        while((line = reader.readLine()) != null){
            temp.append(line);
        }
        reader.close();
        return temp.toString();
    }
}

和前面的文章基本一样，注意的是body这个所以一定要存储，即Field.Store.YES，因为你要高亮显示，所以这些信息肯定是要存储的，不然的话等下检索的时候会报错。

编写检索的接口：

public class SearchHandler {
	public void doSearch(String sql) throws Exception{
		// TODO Auto-generated method stub
		IndexSearcher searcher = new IndexSearcher("analyzer/");
		
		Analyzer analyzer = new StandardAnalyzer();
		QueryParser qp = new QueryParser("body", analyzer);
		Query query = qp.parse(sql);
		if(searcher != null){
			
			// 设置高亮显示
			SimpleHTMLFormatter simpleHTMLFormatter = new SimpleHTMLFormatter("<span style='color:red'>", "</span>");
			Highlighter highlighter = new Highlighter(simpleHTMLFormatter, new QueryScorer(query));
			
			Hits hits = searcher.search(query);
			if(searcher != null){
				System.out.println("hits.length = " + hits.length());
				for(int i = 0; i < hits.length(); i++){
					Document doc = hits.doc(i);
					
					/**
					 * 高亮显示
					 * 这里要注意analyzer.tokenStream()中的body也就是先前我们加的索引
					 * 一定是要存储的，即【Field.Store.YES】，因为你要显示肯定是要存储的
					 */
					TokenStream tokenStream = analyzer.tokenStream("body", new StringReader(doc.get("body")));
					String content = highlighter.getBestFragment(tokenStream, doc.get("body"));
					System.out.println(content);
					
				}
			}
		}
	}
}

以上接口全部完成，接下去我们test一下

public class Test {
	public static void main(String[] args) throws Exception{
		// 1. 创建索引
		IndexHandler indexHandler = new IndexHandler();
		indexHandler.doIndex();
		
		// 2. 搜索
		SearchHandler searchHandler = new SearchHandler();
		searchHandler.doSearch("chinese");
	}
}

发现工程目录下多了三个索引文件，这是lucene生成的

Lucene高亮显示

运行结果：

hits.length = 1
我是中国人， I am <span style='color:red'>Chinese</span>!

ok，高亮显示了。因为我们这里用的是StandardAnalyzer的标准解析器，我们也可以用一些中文解析器，比如SmartChineseAnalyzer，这里就不细说了。

上一篇： Oclean X值不值得买 Oclean X上手体验及评测

下一篇：双通道内存怎么安装双通道内存安装与好处介绍

Lucene高亮显示

Python显示进度条的方法

Android7.0上某些PopuWindow出现显示位置不正确问题的解决方法

Vue.js 使用v-cloak后仍显示变量的解决方法

elementUI Vue 单个按钮显示和隐藏的变换功能(两种方法)

Django框架实现分页显示内容的方法详解

数据库查询记录php 多行多列显示

干货 |《从Lucene到Elasticsearch全文检索实战》拆解实践

js实现将json数组显示前台table中

解决windows下Sublime Text 2 运行 PyQt 不显示的方法分享

360浏览器怎么开启分屏显示？360浏览器分屏模式使用设置教程