lucene 分词器

程序员文章站 2022-07-01 15:31:36

...

lucene的英文分词器主要用到StandardAnalyzer，中文的主要是极易分词MMAnalyzer（需要单独引jar包je-analysis-1.5.3.jar）。

英文分词的过程：[color=red][size=large]1,关键词切分->2,去除停用词（is of）->3,形态还原（ing,ed,复数等）->4,转化为小写[/size][/color]

中文分词：：[color=red][size=large]1,关键词切分->2,去除停用词（的着）[/size][/color]



import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.SimpleAnalyzer;
import org.apache.lucene.analysis.Token;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.standard.StandardAnalyzer;

public class AnalyzerTest {


	static String  enText = "The PGP signatures can be verified using PGP or GPG. ";
	static String  chText = "世界发达国家居民消费1000度的电能的费用占全国月平均工资的6.79%";
	static Analyzer en1 = new StandardAnalyzer();
	static Analyzer en2 = new SimpleAnalyzer();
	static Analyzer ch1 = new MMAnalyzer();



	/**
	 * @param args
	 */
	public static void main(String[] args) throws Exception{
		// TODO Auto-generated method stub
		new AnalyzerTest().analyze(chText, ch1);

	}


	public void analyze(String text,Analyzer analyzer) throws Exception{
		TokenStream tokenStream = analyzer.tokenStream(null, new StringReader(text));
		for (Token token = new Token();(token = tokenStream.next(token))!= null;){
			System.out.println(token);
		}
	}

}

lucene 分词器

C#编写了一个基于Lucene.Net的搜索引擎查询通用工具类：SearchEngineUtil

Lucene 索引数据库

iOS中自带超强中文分词器的实现方法

浅谈MySQL和Lucene索引的对比分析

iOS中自带超强中文分词器的实现方法

浅谈MySQL和Lucene索引的对比分析

Lucene实现索引和查询的实例讲解

java Lucene 中自定义排序的实现

干货 |《从Lucene到Elasticsearch全文检索实战》拆解实践

Lucene.Net实现搜索结果分类统计功能(中小型网站)