倒排索引的java实现

程序员文章站 2022-04-28 18:10:29

...

假设有3篇文章，file1, file2, file3，文件内容如下：

文件内容代码  

file1 (单词1，单词2，单词3，单词4....)  

file2 (单词a，单词b，单词c，单词d....)  

file3 (单词1，单词a，单词3，单词d....)

那么建立的倒排索引就是这个样子：

文件内容代码  

单词1 (file1,file3)  

单词2 (file1)  

单词3 (file1,file3)  

单词a (file2, file3)  

....

而词频就是每个单词在文件中出现的相应次数，本文计算的是每个单词在所有文件中出现的总次数，如果有更简洁有效的写法，欢迎交流。

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.io.IOException;
import java.util.ArrayList;
import java.util.HashMap;
import java.util.Map;


public class IntertedIndex {
	
	private Map<String, ArrayList<String>> map=new HashMap<>();
	private ArrayList<String> list;
	private Map<String, Integer> nums=new HashMap<>();
	
	public void CreateIndex(String filepath){

		String[] words = null;
		try {
		
			File file=new File(filepath);
			BufferedReader reader=new BufferedReader(new FileReader(file));
			String s=null;
			while((s=reader.readLine())!=null){
				//获取单词
				words=s.split(" ");
				
			}
			
			for (String string : words) {
			
				if (!map.containsKey(string)) {
					list=new ArrayList<String>();
					list.add(filepath);
					map.put(string, list);
					nums.put(string, 1);
				}else {
					list=map.get(string);
					//如果没有包含过此文件名，则把文件名放入
					if (!list.contains(filepath)) {
						list.add(filepath);
					}
					//文件总词频数目
					int count=nums.get(string)+1;
					nums.put(string, count);
				}
			}
			reader.close();
			
		} catch (IOException e) {
			
			e.printStackTrace();
		}
	
		
	}
	public static void main(String[] args) {
		IntertedIndex index=new IntertedIndex();
		
		for(int i=1;i<=3;i++){
			String path="E:\\data\\"+i+".txt";
			index.CreateIndex(path);
		}
		for (Map.Entry<String, ArrayList<String>> map : index.map.entrySet()) {
			System.out.println(map.getKey()+":"+map.getValue());
		}

		for (Map.Entry<String, Integer> num : index.nums.entrySet()) {
			System.out.println(num.getKey()+":"+num.getValue());
		}
	}
}

文件内容：

1.txt：i live in hangzhou where are you

2.txt：i love you i love you

3.txt：i love you today is a good day

运行结果

倒排索引的java实现

倒排索引的java实现

Java Redis分布式锁的正确实现方式详解

Java实现获取银行卡所属银行，验证银行卡号是否正确的方法详解

Java StringUtils字符串分割转数组的实现

java+MongoDB实现存图片、下载图片的方法示例

java实现一次性压缩多个文件到zip中的方法示例

以Python的Pyspider为例剖析搜索引擎的网络爬虫实现方法

Java实现的质因数分解操作示例【基于递归算法】

JAVA实现双向链表的增删功能的方法

Java实现两人五子棋游戏(四) 落子动作的实现

java得到某年某周的第一天实现思路及代码