Lucene的介绍和举例

程序员文章站 2022-06-07 10:40:09

...

2019独角兽企业重金招聘Python工程师标准>>> Lucene的介绍和举例

1.为什么要介绍Lucene

Elasticsearch是基于lucene的，lucene作为一个全文信息检索库，提供了基本的检索操作，需要开发者自己整合这个功能到自己的项目中，Elasticsearch作为一个完全的解决方案，使用restful接口屏蔽了索引的底层细节，使得搜索功能的使用更加方便。一些基础的概念也是基于lucene。

2.Lucene的一些基础概念

document 数据存储的基本对象（类似数据库中的记录）

field document的属性（类似数据库中的字段）

index field在document里面出现的位置和频率，使用了倒排索引（倒排索引指的是用属性来确定记录，而不是用记录来确定属性）

3.Lucene的运行流程

3.1lucene只作为一个库存在，其作用是管理数据和相应的index，解析来自用户的查询。他的工作可以分为两个方面，一个是，利用上层应用提供的数据生成索引，把数据存放在相应的地方；第二个是，接受来自上层的查询要求，查询到相应的数据然后返回。

3.2代码层面的流程

建立索引

把数据对象转化为Document，可以设置Document的各个属性
利用IndexWriter把Document写入系统

查询

把查询操作包装为Query对象
使用IndexSearcher的search方法进行搜索
解析返回的ScoreDoc数组，可以使用IndexSearcher.doc(ScoreDoc.doc)来获取具体的Document对像

4.例子

import org.apache.lucene.analysis.Analyzer;
import org.apache.lucene.analysis.cjk.CJKAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.index.Term;
import org.apache.lucene.search.*;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;

/**
* Created by kisstheraik on 16/7/12.
* Description 全文搜索引擎的使用实践,搜索文件夹下面的文件,Lucene版本为6.1
*
*/
public class search {

    //索引的存放地址,文件的存放文件夹
    public static String indexDir="/example/index";
    public static String dataDir="/example/data";

    public static void main(String[] args) throws Exception{

        search search=new search();

        //索引文件
        search.writeIndex();

        //搜索文件名字里面有"红"的文件,并打印出文件名
        search.search("红","filename");

    }

    //索引化文档内容
    public void writeIndex() throws Exception{

        File indexFile=new File(indexDir);
        File dataFile=new File(dataDir);

        if(!indexFile.exists()){
            indexFile.mkdir();
        }
        if(!dataFile.exists()){
            dataFile.mkdir();
        }

        //使用自带的二元分词
        Analyzer analyzer=new CJKAnalyzer();

        //创建放在磁盘上的的索引文档
        Directory directory=FSDirectory.open(indexFile.toPath());

        IndexWriterConfig indexWriterConfig=new IndexWriterConfig(analyzer);

        IndexWriter indexWriter = new IndexWriter(directory, indexWriterConfig);

        File[] fileList=dataFile.listFiles();

        if(fileList==null)return;

        int fileNum=fileList.length;

        //把每个文件都转化成document
        for (int i=0;i<fileNum;i++){

            if(fileList[i].getName().startsWith("."))continue;
            Document document=new Document();
            document.add(new Field("filename",fileList[i].getName(), TextField.TYPE_STORED));
            document.add(new Field("filecontent",getFileContent(fileList[i]), TextField.TYPE_STORED));

            indexWriter.addDocument(document);

        }

        indexWriter.close();

    }

    /*
    * Description 搜索文章里面的内容
    * Para key [ 关键字 ] target [ 可以指定要查询的字段 ]
    */
    public void search(String key,String target)throws Exception{

        File indexFile=new File(indexDir);
        //获取索引存放的文件夹
        DirectoryReader directoryReader=DirectoryReader.open(FSDirectory.open(indexFile.toPath()));

        IndexSearcher indexSearcher=new IndexSearcher(directoryReader);

        //封装查询的内容
        Term term=new Term(target,key);
        Query termQuery=new TermQuery(term);

        TopDocs topDocs=indexSearcher.search(termQuery, 1000);

        for(ScoreDoc scoreDoc:topDocs.scoreDocs){

            //输出文件的名字
            Document document=indexSearcher.doc(scoreDoc.doc);
            System.out.println(document.get(target));

        }

    }
    /*
    * description 获取文档里面的内容
    * para file [ File ]
    */
    public String getFileContent(File file) throws Exception{

        BufferedReader reader = null;
        String result ="";
        reader = new BufferedReader(new FileReader(file));
        String tempString = null;

        while ((tempString = reader.readLine()) != null) {

                result+=(tempString+"\n");

        }

        reader.close();

        return result;

    }

}

转载于:https://my.oschina.net/lovezfy/blog/717378

上一篇：宋朝一个厉害的女人！为什么后世评价她“有吕武之才，无吕武之恶”？

下一篇：为一加7T让路！一加7系列迎最高优惠300元：以旧换新再返300元

Lucene的介绍和举例

Python run()函数和start()函数的比较和差别介绍

关于SqlDependency类的使用和应用场景介绍

PHP的curl实现get,post和cookie(实例介绍)_PHP教程

vue和react的介绍

Webview组件和HTML的介绍_html/css_WEB-ITnose

Python中的模块和包概念介绍

书签的作用有哪些，书签的作用和意义介绍

jQuery UI的简介和特性介绍

WebStorm ES6 语法设置和babel的使用介绍

总结ONLY_FULL_GROUP_BY的介绍和引起的错误