Lucene 4.6(一) 基本使用

程序员文章站 2024-01-18 19:19:28

...

Lucene 简介

最近几年Lucene的更新速度很快.目前的最新版本是4.6.Lucene它不是一个完整的全文检索引擎，而是一个全文检索引擎的架构.目前有很多应用程序是基于Lucene的,比如我们常用的Eclipse的帮助信息就是其中之一.Lucene能够为文本类型的数据建立索引.所以我们也可以将HTML,PDF,Word格式数据转换成文本后进行索引.然后将其保存到磁盘或者内存中.用户可以根据条件在索引文件中进行查询.

Lucene常用的几个对象:

Document:用来描述文档.一个 Document 对象由多个Field 对象组成的。可以将其看成是一个Document 就是一条记录,Field 相当于一条记录中的一个属性

Field:描述一个文档的属性.比如一个文件可以由文件名和内容两个Field描述.

Analyzer:需要索引就可能需要分词.Analyzer就是来负责这个工作的.它是一个抽象类.

IndexWriter:把一个个的Document添加到索引中.

IndexReader:主要是对文档的检索.

Directory:Lucene 的索引的存储的位置.它是一个抽象类.FSDirectory,表示检索文件磁在盘中的位置.RAMDirectory表示内存中的索引位置.

接下来看一下如何建立索引:

public static void index(boolean hasIndex) {
		int[] ids = {0,1,2,3,4,5};
		String[] emails = {"[email protected]","[email protected]","[email protected]","[email protected]","[email protected]","[email protected]"};
		String[] contents = {
				"incididunt ut labore et dolore magna aliqua. Ut enim ad lorem. ",
				"Lorem ipsum dolor sit amet lorem consectetur adipisicing",
				"dolor in reprehenderit in voluptate velit esse cillum nostrud exercitation ullamco laboris. ",
				"dolor in reprehenderit in voluptate velit esse cillum nostrud exercitation ullamco laboris. ",
				"Lorem ipsum dolor sit amet, consectetur adipisicing elit",
				"Consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna "
		};
		String[] names = {"zzp","lfd","lfx","tom","huanglili","tzp"};
		IndexWriter writer = null ;
		Directory directory = null ;
		try {
			directory = FSDirectory.open(new File("D:/Lucene")) ;
			//directory = new RAMDirectory() ; //索引文件在内存
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_46, 
					new StandardAnalyzer(Version.LUCENE_46))) ;
			//是否重新构建索引
			if(hasIndex) {
				writer.deleteAll() ;
			} 
			
			int count = names.length ;
			for(int i=0; i<count; i++) {
				Document doc = new Document() ;
				
				/* 首先是一个不变的属性值，这类字段还有一个主要用途，
				 * 就是可以用于对搜索的返回结果集排序或是按范
				 * 围查询FloatField  
				DoubleField    
				IntField  
				LongField  
				BinaryDocValuesField             
				NumericDocValuesField  
				SortedDocValuesField  
				SortedSetDocValuesField  
				
				StoredField    整个域要存储的  
				StringField    是一个不需要分词，而直接用于索引的字符串  
				TextField      是一大块需要经过分词的文本 
				FieldType fieldType = new FieldType();
				fieldType.setIndexed(true);//set 是否索引
				fieldType.setStored(true);//set 是否存储
				fieldType.setTokenized(false);//set 是否分词*/				
				doc.add(new IntField("id", ids[i], Store.YES)) ;
				doc.add(new StringField("email", emails[i], Store.YES));
				doc.add(new TextField("content", contents[i], Store.YES)) ;
				FieldType type = new FieldType() ;
				type.setIndexed(true) ;
				type.setStored(true) ;
				doc.add(new Field("name", names[i], type)) ;
				writer.addDocument(doc) ;
			}
			writer.commit() ;
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer != null) {
					writer.close() ;
					writer = null ;
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

简单的查询方式:

public static void searcher01() {
		try {
			Directory directory = FSDirectory.open(new File("D:/Lucene")) ;
			DirectoryReader reader = DirectoryReader.open(directory) ;
			IndexSearcher searcher = new IndexSearcher(reader) ;
			//获取Query,查询Field名为content,内容中包含consectetur.
			Query query = new TermQuery(new Term("content", "consectetur")) ; 
			TopDocs topDocs = searcher.search(query, 10) ;
			ScoreDoc[] scores = topDocs.scoreDocs ;
			int length = scores.length ;
			for(int i=0; i<length; i++) {
				//scores[i].doc:根据Document的id获取Document
				//doc.get("xxx"):获取储存索引时的Field名获取相应Document的内容.
				Document doc = searcher.doc(scores[i].doc) ;
				System.out.println("id:" + doc.get("id") + "  email:" + doc.get("email") + "  content:" + doc.get("content") + " name:" + doc.get("name"));
			}
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(reader != null) {
					reader.close() ;
					reader = null ;
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

删除索引:

public static void delIndex() {
		IndexWriter writer = null ;
		try {
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_46, new StandardAnalyzer(Version.LUCENE_46))) ;
			writer.deleteDocuments(new Term("content", "welcome")) ;
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer != null) {
					writer.close() ;
					writer = null ; 
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

更新索引:

public static void updIndex() {
		IndexWriter writer = null ;
		try {
			writer = new IndexWriter(directory, new IndexWriterConfig(Version.LUCENE_46, new StandardAnalyzer(Version.LUCENE_46))) ;
			Document doc = new Document();
			doc.add(new StringField("id","11", Field.Store.YES));
			doc.add(new StringField("content", "incididunt ut labore et dolore magna aliqua. Ut enim ad lorem. ", Field.Store.YES));
			writer.updateDocument(new Term("id","0"), doc) ;
		} catch (IOException e) {
			e.printStackTrace();
		} finally {
			try {
				if(writer != null) {
					writer.close() ;
					writer = null ; 
				}
			} catch (IOException e) {
				e.printStackTrace();
			}
		}
	}

Lucene 4.6(一) 基本使用

Lucene 4.6(一) 基本使用

VueJS简明教程(一)之基本使用方法

Linux下MySQL的一些基本使用方法_MySQL

使用vue-cli3新建一个项目并写好基本配置(推荐)

使用Python操作MySQL的一些基本方法

2015.8月最新浏览器排名 Chrome44第一 Edge基本没人使用

使用vue-cli3新建一个项目并写好基本配置(推荐)

使用Python操作MySQL的一些基本方法

2015.8月最新浏览器排名 Chrome44第一 Edge基本没人使用

Lucene的基本使用