solr4.7.2支持最细粒度分词和智能分词的IKAnalyzer
下载IK
IKAnalyzer2012FF_u1.jar
配置schema.xml
<fieldType name="text_ik" class="solr.TextField">
<!-- 最细粒度分词 -->
<analyzer type="index" useSmart="false" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
<!-- 智能分词 -->
<analyzer type="query" useSmart="true" class="org.wltea.analyzer.lucene.IKAnalyzer"/>
</fieldType>
希望建立索引时使用最细粒度分词,查询时使用智能分词,但是配置useSmart参数不起作用,一直都是最细粒度分词
分析原因
通过查看源代码,solr应该在创建IKAnalyzer对象时调用了默认构造函数,所以useSmart的一直是false最细粒度分词
public final class IKAnalyzer extends Analyzer
{
private boolean useSmart;
.....省略代码......
public IKAnalyzer()
{
this(false);
}
public IKAnalyzer(boolean useSmart)
{
this.useSmart = useSmart;
}
.....省略代码......
}
解决方法
模仿IKAnalyzer,自己编写UseSmartIKAnalyzer和NotUseSmartIKAnalyzer类,在默认构造函数中分别给useSmart赋值初始值即可
具体实现
1) IKAnalyzer2012FF_u1 依赖lucene4.x,lucene和solr的版本一致都是4.7.2
2) 编写UseSmartIKAnalyzer.java
package org.wltea.analyzer.lucene;
import java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.Tokenizer;
public final class UseSmartIKAnalyzer
extends Analyzer
{
private boolean useSmart;
public boolean useSmart()
{
return this.useSmart;
}
public void setUseSmart(boolean useSmart)
{
this.useSmart = useSmart;
}
public UseSmartIKAnalyzer()
{
//默认值true
this.useSmart = true;
}
protected Analyzer.TokenStreamComponents createComponents(String fieldName, Reader in)
{
Tokenizer _IKTokenizer = new IKTokenizer(in, useSmart());
return new Analyzer.TokenStreamComponents(_IKTokenizer);
}
}
3)编写NotUseSmartIKAnalyzer.java
package org.wltea.analyzer.lucene;
import java.io.Reader;
import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.Analyzer.TokenStreamComponents;
import org.apache.lucene.analysis.Tokenizer;
public final class NotUseSmartIKAnalyzer
extends Analyzer
{
private boolean useSmart;
public boolean useSmart()
{
return this.useSmart;
}
public void setUseSmart(boolean useSmart)
{
this.useSmart = useSmart;
}
public NotUseSmartIKAnalyzer()
{
//默认值false
this.useSmart = false;
}
protected Analyzer.TokenStreamComponents createComponents(String fieldName, Reader in)
{
Tokenizer _IKTokenizer = new IKTokenizer(in, useSmart());
return new Analyzer.TokenStreamComponents(_IKTokenizer);
}
}
4)将IKAnalyzer2012FF_u1.jar,lucene4.7.2的依赖jar包,UseSmartIKAnalyzer.java,NotUseSmartIKAnalyzer.java放到同1个目录中,因为编译UseSmartIKAnalyzer.java,NotUseSmartIKAnalyzer.java要使用到
如果lucene4.7.2的jar包找不全,可以下载solr4.7.2从solr.war中取
5) 使用 javac 命令编译
E:\>javac -encoding UTF-8 -classpath E:\ik\* E:\ik\NotUseSmartIKAnalyzer.java
E:\>javac -encoding UTF-8 -classpath E:\ik\* E:\ik\UseSmartIKAnalyzer.java
6) 编译生成class文件,class文件在相同目录中生成
7) 将NotUseSmartIKAnalyzer.class和UseSmartIKAnalyzer.class放到IKAnalyzer2012FF_u1.jar的org.wltea.analyzer.lucene路径中,通过rar软件添加即可,下图使用jd-gui查看
8)配置solr索引的schema.xml
<fieldType name="text_ik" class="solr.TextField">
<!-- 使用智能分词 -->
<analyzer type="index" class="org.wltea.analyzer.lucene.NotUseSmartIKAnalyzer"/>
<!-- 使用智能分词 -->
<analyzer type="query" class="org.wltea.analyzer.lucene.UseSmartIKAnalyzer"/>
</fieldType>
9) 查看分词效果
可以看到建立索引时使用最细粒度分词,查询时使用智能分词