详解Java的Hibernate框架中的搜索工具的运用
hibernate提供了全文索引功能,非常棒,这里简要介绍下它的用法,
1. 在pom.xml引入包依赖
<dependency> <groupid>org.hibernate</groupid> <artifactid>hibernate-search-orm</artifactid> <version>${hibernate-search.version}</version> </dependency> <dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-analyzers-smartcn</artifactid> <version>${lucene.version}</version> </dependency> <dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-queryparser</artifactid> <version>${lucene.version}</version> </dependency> <dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-analyzers-phonetic</artifactid> <version>${lucene.version}</version> </dependency>
hibernate配置 search index保存路径
<bean id="sessionfactory" class="org.springframework.orm.hibernate4.localsessionfactorybean" destroy-method="destroy"> <property name="datasource" ref="poolingdatasource" /> <property name="configlocation"> <value> classpath:hibernate.cfg.xml </value> </property> <property name="hibernateproperties"> <props> <prop key="hibernate.dialect">${hibernate.dialect}</prop> <!-- booleans can be easily used in expressions by declaring hql query substitutions in hibernate configuration --> <prop key="hibernate.query.substitutions">true 'y', false 'n'</prop> <!-- http://ehcache.org/documentation/integrations/hibernate --> <!-- http://www.tutorialspoint.com/hibernate/hibernate_caching.htm --> <prop key="hibernate.cache.use_second_level_cache">true</prop> <!-- org.hibernate.cache.ehcache.ehcacheregionfactory --> <prop key="hibernate.cache.region.factory_class">org.hibernate.cache.ehcache.ehcacheregionfactory</prop> <!-- hibernate只会缓存使用load()方法获得的单个持久化对象,如果想缓存使用findall()、 list()、iterator()、createcriteria()、createquery() 等方法获得的数据结果集的话,就需要设置hibernate.cache.use_query_cache true --> <prop key="hibernate.cache.use_query_cache">true</prop> <prop key="net.sf.ehcache.configurationresourcename">ehcache-hibernate.xml</prop> <!-- hibernate search index directory --> ***<prop key="hibernate.search.default.indexbase">indexes/</prop>*** </props> </property> </bean>
对需要搜索的类加上indexed annotation,然后对类中可以被搜索的字段加上@field annotation,通常enum字段不需要analyzer进行词法分析,其他字段则需要,对于不需要projection(返回部分字段)的情况下,不需要在index中存储实际数据。可以通过analyzerdef来定义不同的词法分析器以及对于的特殊词过滤器
@indexed @analyzerdef( name="entopicanalyzer", charfilters={ @charfilterdef(factory=htmlstripcharfilterfactory.class) }, tokenizer=@tokenizerdef(factory=standardtokenizerfactory.class), filters={ @tokenfilterdef(factory=standardfilterfactory.class), @tokenfilterdef(factory=stopfilterfactory.class), @tokenfilterdef(factory=phoneticfilterfactory.class, params = { @parameter(name="encoder", value="doublemetaphone") }), @tokenfilterdef(factory=snowballporterfilterfactory.class, params = { @parameter(name="language", value="english") }) } ) public class topic { ...... @field(index=index.yes, analyze=analyze.yes, store=store.no) @analyzer(definition = "entopicanalyzer") private string title; ...... @field(index=index.yes, analyze=analyze.yes, store=store.no) @analyzer(definition = "entopicanalyzer") private string content; ...... @enumerated(enumtype.string) @field(index=index.yes, analyze=analyze.no, store=store.no, bridge=@fieldbridge(impl=enumbridge.class)) private topicstatus status; ... }
通过代码对已有数据创建index
applicationcontext context = new classpathxmlapplicationcontext("spring-resources.xml"); sessionfactory sessionfactory = (sessionfactory) context.getbean("sessionfactory"); session sess = sessionfactory.opensession(); fulltextsession fulltextsession = search.getfulltextsession(sess); try { fulltextsession.createindexer().startandwait(); } catch (interruptedexception e) { log.error(e.getmessage(), e); } finally { fulltextsession.close(); } ((abstractapplicationcontext)context).close();
创建查询fulltextsession,按照query条件获取结果
fulltextsession fulltextsession = search .getfulltextsession(getsession()); querybuilder querybuilder = fulltextsession.getsearchfactory() .buildquerybuilder().forentity(show.class).get(); org.apache.lucene.search.query lucenequery = null; lucenequery = querybuilder.keyword()// .wildcard() .onfields("title", "content").matching(query.getkeyword()) // .matching("*" + query.getkeyword() + "*") .createquery(); fulltextquery hibernatequery = fulltextsession.createfulltextquery( lucenequery, show.class); return hibernatequery.list();
note:
1. 在一次测试过程中,修改了value object,添加了新的index,忘记了rebuildindex,结果unit test没问题,生成环境就出错了。
2. 搜索还不是很强大,比如搜索测,含有测试的结果可能就搜索不出来
中文词法分析
hibernate search底层使用lucene,所以lucene可以使用的中文分词,hibernate search都可以用来支持中文词法分析,比较常用的词法分析器包括paoding,ikanalyzer,mmseg4j 等等。具体可以参考分词分析 最近分析。hibernate search默认的分词器是org.apache.lucene.analysis.standard.standardanalyzer,中文按字分词,显然不符合我们的需求。
这里介绍一下如何在hibernate中配置中文分词,选择的是lucene自带的中文分词–。使用可以通过3种方式,一种是在hibernate的配置文件设置词法分析方法,另外一种是在每个需要被搜索的类中定义分词方法,最后一种是对单个字段配置。这里介绍下前2种的配置方式。
hibernate配置方式:
<property name="hibernate.search.analyzer"> org.apache.lucene.analysis.cn.smart.smartchineseanalyzer</property>
被搜索类配置中文分词:
@indexed @analyzer(impl=smartchineseanalyzer.class)
同时需要在maven中引入相关包依赖
<dependency> <groupid>org.apache.lucene</groupid> <artifactid>lucene-analyzers-smartcn</artifactid> <version>${lucene.version}</version> </dependency>
多条件查询
hibernate search可以通过多组合条件来实现多条件查询,这里简单介绍一下多条件查询的一个实践。
如果只是单个条件查询,那么这个查询就可以很简单
lucenequery = querybuilder.keyword().onfields("title", "content").matching(query.getkeyword()).createquery()
如果是多条件并查询,那么就需要使用到must join,如果是多条件或查询,就需要使用should join,这里举个must join的例子
//must true mustjunction term = querybuilder.bool().must(querybuilder.keyword() .onfields("title", "content") .matching(query.getkeyword()).createquery()); //must false term.must(querybuilder.keyword() .onfield("status") .matching(query.getexcludestatus()).createquery()).not();
完整例子:
private fulltextquery findbykeywordquery(topicquery query) { fulltextsession fulltextsession = search .getfulltextsession(getsession()); querybuilder querybuilder = fulltextsession.getsearchfactory() .buildquerybuilder().forentity(topic.class).get(); org.apache.lucene.search.query lucenequery = null; if (null == query.getstatus() && null == query.getusername() && null == query.getexcludestatus()) { lucenequery = querybuilder.keyword()// .wildcard() .onfields("title", "content").matching(query.getkeyword()) // .matching("*" + query.getkeyword() + "*") .createquery(); if(log.isdebugenabled()){ log.debug("create clean keyword search query: " + lucenequery.tostring()); } } else { mustjunction term = querybuilder.bool().must(querybuilder.keyword() .onfields("title", "content") .matching(query.getkeyword()).createquery()); if(null != query.getstatus()){ term.must(querybuilder.keyword() // .wildcard() .onfield("status") .matching(query.getstatus()).createquery()); } if(null != query.getexcludestatus()){ term.must(querybuilder.keyword() .onfield("status") .matching(query.getexcludestatus()).createquery()).not(); } if(null != query.getusername()){ term.must(querybuilder.keyword() // .wildcard() .onfield("owner.username") .ignorefieldbridge() .matching(query.getusername()).createquery()); } lucenequery =term.createquery(); if(log.isdebugenabled()){ log.debug("create complicated keyword search query: " + lucenequery.tostring()); } } // booleanquery fulltextquery hibernatequery = fulltextsession.createfulltextquery( lucenequery, topic.class); return hibernatequery; }
下一篇: java实现检测是否字符串中包含中文