JAVA算法：按照给定的段落统计单词出现次数（JAVA代码）

程序员文章站 2024-03-16 12:50:40

...

JAVA算法：按照给定的段落统计单词出现次数（JAVA代码）

写一个 JAVA程序以统计一个文本文件 words.txt 中每个单词出现的频率。

为了简单起见，你可以假设：

words.txt只包括小写字母和 ' ' 。
每个单词只由小写字母组成。
单词间由一个或多个空格字符分隔。
示例:

假设 words.txt 内容如下：

the day is sunny the the
the sunny is is
你的脚本应当输出（以词频降序排列）：

the 4
is 3
sunny 2
day 1
说明:

不要担心词频相同的单词的排序问题，每个单词出现的频率都是唯一的。

算法设计

package com.bean.algorithm.basic;

import java.io.BufferedReader;
import java.io.FileInputStream;
import java.io.IOException;
import java.io.InputStreamReader;
import java.util.ArrayList;
import java.util.Collections;
import java.util.Comparator;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
import java.util.StringTokenizer;

public class CountWords {

	public static void main(String[] args) {

		long startTime = System.currentTimeMillis(); // 获取开始时间

		String string = "";
		Map<String, Integer> map = new HashMap<String, Integer>();
		try {
			//读取文件
			FileInputStream fis = new FileInputStream("G://CountWords.txt");
			BufferedReader br = new BufferedReader(new InputStreamReader(fis));
			String temp = "";
			try {
				while ((temp = br.readLine()) != null) {
					string = string + temp;
				}
			} catch (IOException e) {
				// TODO: handle exception
				e.printStackTrace();
			}

		} catch (Exception e) {
			// TODO: handle exception
			e.printStackTrace();
		}

		// 分割字符串
		StringTokenizer st = new StringTokenizer(string); // 用于切分字符串
		//初始化计数器
		int count;
		//初始化word变量
		String word;
		while (st.hasMoreTokens()) {
			//逗号，问号，句号，感叹号，冒号，双引号，单引号，换行符号
			word = st.nextToken(",?.!:\"\"' '\n");
			if (map.containsKey(word)) {
				// HashMap 保存数据
				count = map.get(word);
				//计数器累加
				map.put(word, count + 1);
			} else {
				map.put(word, 1);
			}
		}

		// 排序
		Comparator<Map.Entry<String, Integer>> valueComparator = new Comparator<Map.Entry<String, Integer>>() {
			public int compare(Map.Entry<String, Integer> o1, Map.Entry<String, Integer> o2) {
				return o2.getValue() - o1.getValue();
			}
		};
		// 输出结果
		List<Map.Entry<String, Integer>> list = new ArrayList<Map.Entry<String, Integer>>(map.entrySet());
		Collections.sort(list, valueComparator);

		System.out.println("---------------------Words分析结果 ——— 输出结果----------");
		for (Map.Entry<String, Integer> entry : list) {
			System.out.println(entry.getKey() + ":" + entry.getValue());
		}

		long endTime = System.currentTimeMillis(); // 获取结束时间
		System.out.println("程序运行时间： " + (endTime - startTime) + "ms");

	}

}

样例文本如下：

if you just want to try running findbugs against your own code, you can run findbugs using javawebstart. this will use our new gui under Java 1.5+ and our old gui under java 1.4. the new gui provides a number of new features, but requires java 1.5+. both use exactly the same analysis engine.

程序运行结果

---------------------Words分析结果 ——— 输出结果----------
new:3
1:3
gui:3
use:2
our:2
java:2
5+:2
you:2
the:2
findbugs:2
under:2
but:1
code:1
against:1
own:1
run:1
your:1
running:1
can:1
number:1
features:1
same:1
engine:1
and:1
provides:1
of:1
if:1
just:1
Java:1
a:1
using:1
will:1
old:1
want:1
this:1
exactly:1
analysis:1
both:1
4:1
javawebstart:1
try:1
to:1
requires:1
程序运行时间： 6ms

相关标签：算法分析与设计 JAVA算法设计

上一篇：算法系列------队列

下一篇： Manacher算法

JAVA算法：按照给定的段落统计单词出现次数（JAVA代码）