lucene-相关概念与定义

程序员文章站 2024-01-27 21:48:46

...

原文： http://wiki.apache.org/lucene-java/ConceptsAndDefinitions
导航：Lucene-java Wiki-》1 Overview-》1.1 Informational-》 1.1.2 ConceptsAndDefinitions
注意：“ 红色 ”，表示不知道、不确定怎么翻译。 “ 蓝色”自己的描述。

这里主要描述了一些Lucene的相关概念和定义

定义

Analyzer - 用于在分析文本，英语和拉丁语系通常用StandardAnalyzer 。编制索引的文本Lucene的类。大多数应用程序可以使用英语和拉丁语的语言StandardAnalyzer。

Payloads(有效载荷) - payload 是一个字节数组（array of bytes），用于存储term的位置。

Snowball Stemmers(雪球词干分析器 ) --Snowball Stemmers是lucene引入的词干分析器之一。更多信息请参看 nowball website 。

Stemmer （词干分析器） - 以下解释来自于维基：“这种算法用来降低干扰词、同义词的影响……，以用于降低索引大小……” 。这一段原文如下：

"A stemming algorithm, or stemmer, is a computer program or algorithm for reducing inflected (or sometimes derived) words to their stem, base or root form — generally a written word form." Stemmers are often used to reduce the search space and index size. Often times a user searching for "widgets" is interested in documents that contain the term "widget".

核心类

Document

A Lucene Document is a record in the index. A Document has a list of fields; each field has a name and a textual value.

Term

A Term is Lucene's unit of indexing. In western languages, a Term is often a word.

TermEnum

TermEnum 通常用于统计某个field中的term个数，但不考虑这些term出现在哪个document中。

一些查询子类就是通过对比terms 来实现查询的，例如： WildcardQuery,PrefixQuery, RangeQuery.

原文

TermEnum is used to enumerate all terms in the index for a given field, regardless of which documents the terms occur in (or where they occur).

Some query subclasses are implemented by enumerating terms that match a pattern, and building a large OR query from the enumeration. E.g. WildcardQuery,PrefixQuery, RangeQuery.

See LuceneFAQ, How do I retrieve all the values of a particular field that exists within an index, across all documents? which also includes sample code.

TermDocs

不像TermEnum (see above), TermDocs 通常用于确定哪些文档包括给定的Term。另外，TermDocs 也提供了term 在文档中出现的频率。

TermFreqVector

A TermFreqVector (aka Term Frequency Vector or just Term Vector) is a data structure containing a given Document's term and frequency information and can be retrieved from the IndexReader only when Term Vectors are stored during indexing.

TermFreqVector 是一个包含 given Document's term 和**的数据结构。

原文

IndexReader

IndexSearcher

相关标签： lucene java 概念定义

上一篇：随笔：通俗理解IaaS

下一篇： CSS布局之脱离文档流详解——浮动、绝对定位脱离文档流的区别

lucene-相关概念与定义

定义

核心类

Document

Term

TermEnum

TermDocs

TermFreqVector

Directory

IndexReader

IndexSearcher

lucene-相关概念与定义

使用Axure RP原型设计实践02,自定义部件以及熟悉与部件相关面板_html/css_WEB-ITnose

python中面向对象_类_对象的概念与定义

ES6新特性之类(Class)和继承(Extends)相关概念与用法分析

python中面向对象_类_对象的概念与定义

ES6新特性之类(Class)和继承(Extends)相关概念与用法分析

C++ main函数中参数argc和argv相关定义与研究

java加密与解密-相关概念（1）

Android适配器(Adapter)的概念与自定义

Python新手学习基础之函数-概念与定义