Lucenen搜索引擎入门案例

程序员文章站 2024-02-27 18:30:45

...

Lucene是开发全文检索功能的工具包，使用时从官方网站下载，并解压。
官方网站：http://lucene.apache.org/
下载地址：http://archive.apache.org/dist/lucene/java/

下载版本：4.10.3
JDK要求：1.7以上（从版本4.8开始，不支持1.7以下）。

一、项目创建（Maven或java项目）

二、加入jar包

maven的pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>com.qf</groupId>
  <artifactId>lucene</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <dependencies>
        <dependency>
        	<!-- lucene核心包 -->
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-core</artifactId>
            <version>4.10.3</version>
        </dependency>

        <dependency>
       		 <!-- lucene分词器 -->
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-analyzers-common</artifactId>
            <version>4.10.3</version>
        </dependency>

        <dependency>
        	<!-- 查询 -->
            <groupId>org.apache.lucene</groupId>
            <artifactId>lucene-queryparser</artifactId>
            <version>4.10.3</version>
        </dependency>

        <dependency>
        	<!-- 测试 -->
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.9</version>
        </dependency>

        <!-- 从MySql采集数据 -->
		
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.32</version>
        </dependency>

    </dependencies>


     <build>
        <plugins>
            <!-- java编译插件 -->
            <plugin>
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.6.0</version>
                <configuration>
                   <source>1.8</source>
                   <target>1.8</target>
                    <encoding>${project.build.sourceEncoding}</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build>
</project>

如果是非maven项目，则加入以下jar包
Lucenen搜索引擎入门案例

三、执行sql脚本

四、pojo

package com.XX.lucene.pojo;

public class Book {
	     // 图书ID
	     private Integer id;
	     // 图书名称
	     private String name;
	     // 图书价格
	     private Float price;
	     // 图书图片
	     private String pic;
	     // 图书描述
	     private String description;
		public Book() {
			super();
			// TODO Auto-generated constructor stub
		}

五、dao

package com.xx.lucene.dao;
public interface BookDao {
	List<Book> queryBooks();
}

package com.xx.lucene.dao.impl;
public class BookDaoImpl implements BookDao{

	     public List<Book> queryBooks() {
	          // 数据库链接
	          Connection connection = null;
	          // 预编译statement
	          PreparedStatement preparedStatement = null;
	          // 结果集
	          ResultSet resultSet = null;
	          // 图书列表
	          List<Book> list = new ArrayList<Book>();
	          try {
	              // 加载数据库驱动
	              Class.forName("com.mysql.jdbc.Driver");
	              // 连接数据库
	              connection = DriverManager.getConnection(
	                        "jdbc:mysql://localhost:3306/lucene", "root", "123456");
	              // SQL语句
	              String sql = "SELECT * FROM book";
	              // 创建preparedStatement
	              preparedStatement = connection.prepareStatement(sql);
	              // 获取结果集
	              resultSet = preparedStatement.executeQuery();
	              // 结果集解析
	              while (resultSet.next()) {
	                   Book book = new Book();
	                   book.setId(resultSet.getInt("id"));
	                   book.setName(resultSet.getString("name"));
	                   book.setPrice(resultSet.getFloat("price"));
	                   book.setPic(resultSet.getString("pic"));
	                   book.setDescription(resultSet.getString("description"));
	                   list.add(book);
	              }              
	          } catch (Exception e) {
	              e.printStackTrace();
	          }
	          return list;
	     }
	
}

六、建立索引

创建测试类IndexManager

package com.xx.lucene.test;
public class IndexManager {

	@Test
	public void createIndex() throws Exception{
		//采集数据
		BookDaoImpl dao = new BookDaoImpl();
		List<Book> books = dao.queryBooks();
		
		//创建文档域，将采集到的数据存到Document对象中
		ArrayList<Document> docs = new ArrayList<Document>();
		
		//将Field放入Document对象中,一个Book对应一个Document
		//Store yes：存储到文档域中
		for (Book book : books) {
			Document document =new Document();
			// 图书ID
            //不分词、索引、存储
            Field id = new StringField("id", book.getId().toString(), Store.YES);
            // 图书名称
            //分词、索引、存储
            Field name = new TextField("name", book.getName(), Store.YES);
            // 图书价格
            //不分词、索引、存储
            Field price = new FloatField("price", book.getPrice(), Store.YES);
            // 图书图片地址
            //不分词、不索引、存储
            Field pic = new StoredField("pic", book.getPic());
            // 图书描述
            //分词、索引、不存储
            Field description = new TextField("description", book.getDescription(), Store.NO);
            // 将field域设置到Document对象中
            document.add(id);
            document.add(name);
            document.add(price);
            document.add(pic);
            document.add(description);
            docs.add(document);
            
		}
		
		//创建分词器
		//分词：将field域中的内容一个个的分词
		//过滤：将分好的词进行过滤，比如去掉标点符号、大写转小写、词的型还原（复数转单数、过去式转成现在式）、停用词过滤
		//停用词：单独应用没有特殊意义的词。比如的、啊、等，英文中的this is a the等等。
		Analyzer analyzer = new StandardAnalyzer();
		//创建IndexWriterConfig
        IndexWriterConfig cfg = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        //指定索引库的地址
        File indexFile = new File("G:\\bookindex\\");
        Directory directory = FSDirectory.open(indexFile);
        IndexWriter writer = new IndexWriter(directory, cfg);
        //创建索引
        //将Document对象中分析出来的索引写入索引库
        for(Document doc : docs){
            writer.addDocument(doc);
        }
        //关闭输出流
        writer.close();
		
	} 
	
	
	@Test
	public void searchIndex() throws Exception{
		//创建查询对象
        //指定查询的域：第一个参数
        //指定分词器：第二个参数
		QueryParser queryParser = new QueryParser("description",new StandardAnalyzer());
		Query query = queryParser.parse("description:spring AND mybatis");
		
		//创建IndexSearcher
        //指定索引库的地址
		File indexFile = new File("G:\\bookindex\\");
        Directory directory = FSDirectory.open(indexFile);
        IndexReader reader = DirectoryReader.open(directory);
        IndexSearcher searcher = new IndexSearcher(reader);
        //搜索索引库,返回10条记录的结果集
        TopDocs topDocs = searcher.search(query, 10);
        //实际查询结果的记录数
        int count = topDocs.totalHits;
        System.out.println("匹配的记录总数：" + count);
        
        ScoreDoc[] scoreDocs = topDocs.scoreDocs;
        for (ScoreDoc scoreDoc : scoreDocs) {
        	int docId= scoreDoc.doc;
        	Document document = searcher.doc(docId);
        	System.out.println("docId：" + document.get("id"));
        	System.out.println("商品名称：" + document.get("name"));
        	System.out.println("商品价格" + document.get("price"));
        	System.out.println("标题图片：" + document.get("pic"));
        	System.out.println("商品描述：" + document.get("description"));
		}
	}
}

Lucenen搜索引擎入门案例创建的索引库：

七、中文分词

•StandardAnalyzer：
单字分词：就是按照中文一个字一个字地进行分词。如：“我爱中国”，
效果：“我”、“爱”、“中”、“国”。
•CJKAnalyzer
二分法分词：按两个字进行切分。如：“我是中国人”，效果：“我是”、“是中”、“中国”、“国人”。

使用 IKAnalyzer2012FF_u1.jar Maven*仓库无此包，用以下方法添加到本地仓库。
maven项目的jar包：
参考资料：
Maven添加本地Jar包：http://www.tbdazhe.com/archives/586?doxsjq=91uk63

1、添加jar包

在目录中创建lib文件夹，并加入jar包：
Lucenen搜索引擎入门案例

2、配置pom

<!-- 中文分词器 -->
<dependency>
      <groupId>org.wltea.analyzer</groupId>
      <artifactId>IKAnalyzer</artifactId>
      <version>2012FF_u1</version>
      <scope>system</scope>
      <systemPath>${project.basedir}/lib/IKAnalyzer2012FF_u1.jar</systemPath>
</dependency>

3、修改分词器代码

将标准分词器改成中文分词器

        //创建分词器
        //标准分词器
        //Analyzer analyzer = new StandardAnalyzer();
        //中文分词器
        Analyzer analyzer = new IKAnalyzer();

八、Luke工具的使用

输入查询语句

同数据库的sql一样，lucene全文检索也有固定的语法：
最基本的有比如：AND, OR, NOT 等
例如：用户想找一个description中包括spring关键字和mybatis关键字的文档。
它对应的查询语句：description:spring AND mybatis
如下是使用luke搜索的例子：
Lucenen搜索引擎入门案例

九、删除索引

1、根据Term项删除索引

满足条件的document将被删除。
注意：如果根据name删除，则创建索引时name域一定要设置为可索引
在solr中添加索引时必须添加唯一键，可以方便的根据唯一键删除数据

 @Test
    public void deleteIndex() throws IOException{
        //创建分词器
        Analyzer analyzer = new StandardAnalyzer();
        //创建IndexWriterConfig
        IndexWriterConfig cfg = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        //指定索引库的地址
        File indexFile = new File("D:\\bookindex\\");
        Directory directory = FSDirectory.open(indexFile);
        IndexWriter writer = new IndexWriter(directory, cfg);
        //删除：查找name中包含solr的数据
        Term term = new Term("name", "solr");
        writer.deleteDocuments(term);
        writer.close();
    }

2、全部删除

 @Test
    public void deleteAllIndex() throws IOException{
        //创建分词器
        Analyzer analyzer = new StandardAnalyzer();
        //创建IndexWriterConfig
        IndexWriterConfig cfg = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        //指定索引库的地址
        File indexFile = new File("D:\\bookindex\\");
        Directory directory = FSDirectory.open(indexFile);
        IndexWriter writer = new IndexWriter(directory, cfg);
        writer.deleteAll();
        writer.close();
    }

十、更新索引

@Test
    public void updateIndex() throws IOException{
        //创建分词器
        Analyzer analyzer = new StandardAnalyzer();
        //创建IndexWriterConfig
        IndexWriterConfig cfg = new IndexWriterConfig(Version.LUCENE_4_10_3, analyzer);
        //指定索引库的地址
        File indexFile = new File("D:\\bookindex\\");
        Directory directory = FSDirectory.open(indexFile);
        IndexWriter writer = new IndexWriter(directory, cfg);
        //updateDocument：找到名字为张三的document，将名字改成李四
        // 第一个参数：指定查询条件
        // 第二个参数：修改之后的对象
        // 修改时：1、如果根据查询条件查询出结果，则将以前的删掉，然后覆盖新的Document对象，
        //        2、如果没有查询出结果，则新增一个Document
        // 修改流程即：先查询，再删除，再添加
        Document doc = new Document();
        doc.add(new TextField("name", "李四", Store.YES));
        writer.updateDocument(new Term("name", "张三"), doc);
        writer.close();
    }

Lucenen搜索引擎入门案例

一、项目创建（Maven或java项目）

二、加入jar包

三、执行sql脚本

四、pojo

五、dao

六、建立索引

七、中文分词

1、添加jar包

2、配置pom

3、修改分词器代码

八、Luke工具的使用

九、删除索引

1、根据Term项删除索引

2、全部删除

十、更新索引

Lucenen搜索引擎入门案例

Spring 入门案例

Spring Boot整合Elasticsearch实现全文搜索引擎案例解析

Oracle PL/SQL入门案例实践

PHP设计模式之策略模式（Strategy）入门与应用案例详解

Python全栈入门必学知识：正确的input和print操作使用方法，案例详解！

Python入门经典案例一

一个简单的Nginx入门案例

Python数据分析入门案例

VUE入门+5个小案例