692. Top K Frequent Words

程序员文章站 2022-04-25 19:06:48

...

692. Top K Frequent Words

方法1: priority_queue

易错点：

方法2: bucket-sort

易错点：

Given a non-empty list of words, return the k most frequent elements.

Your answer should be sorted by frequency from highest to lowest. If two words have the same frequency, then the word with the lower alphabetical order comes first.

Example 1:

Input: ["i", "love", "leetcode", "i", "love", "coding"], k = 2
Output: ["i", "love"]

Explanation: "i" and "love" are the two most frequent words.
    Note that "i" comes before "love" due to a lower alphabetical order.

Example 2:

Input: ["the", "day", "is", "sunny", "the", "the", "the", "sunny", "is", "is"], k = 4
Output: ["the", "is", "sunny", "day"]

Explanation: "the", "is", "sunny" and "day" are the four most frequent words,
    with the number of occurrence being 4, 3, 2 and 1 respectively.

Note:
You may assume k is always valid, 1 ≤ k ≤ number of unique elements.
Input words contain only lowercase letters.

Follow up:
Try to solve it in O(n log k) time and O(n) extra space.

方法1: priority_queue

思路：

如果按照常规的HashMap + PriorityQueue 需要O(nlogn)，不符合题目要求。优化成k个元素的PriorityQueue, 在放入O(n)个元素的过程中不断更替掉低频词而达到O(nlogk)

692. Top K Frequent Words

易错点：

要会写c++中的comparator。注意上文中的最后一段。因此priority_queue默认的comparator是std::less，从小到大排序但是先pop最大值。模仿这个写法，就应该是1. a.freq < b.freq, 2. a.word > b.word. 这样会优先词频，并在词频相等时先输出字母序靠前的单词。
但是这道题由于我们想在推堆的过程住不断替换掉最小值，所以应该用的是完全相反来实现让词频小且字母序靠后的在最前pop。也就是comparator定义为 1. a.freq > b.freq, 2. a.word < b.word。
cmp的function定义之后需要有；
priority的定义语法
最后还需要reverse一次

class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        vector<string> result;
        unordered_map<string, int> freqMap;
        for (string word: words) {
            ++freqMap[word];
        }
                
        // 这里first是word, second是词频
        // true的结果是a排在前面
        auto cmp = [](pair<string, int> & a, pair<string, int> & b){
            if (a.second == b.second){
                return a.first < b.first;
            }
            return a.second > b.second;
            
        };
        
        priority_queue<pair<string, int>, vector<pair<string, int>>, decltype(cmp)> freqHeap(cmp);
        for (auto item: freqMap) {
            freqHeap.push(item);
            if (freqHeap.size() > k){
                freqHeap.pop();
            }
        }
        
        for (int i = 0; i < k; i++){
            result.push_back(freqHeap.top().first);
            freqHeap.pop();
        }
        //先pop出来的小词频现在在前面，需要reverse
        std::reverse(result.begin(), result.end());
        return result;
    }
};

方法2: bucket-sort

用map<int, set<string>>的结构来建立bucket，{freq: {word1, word2…}}。如果使用ordered_map, which is default type of map (in contrast to unordered_map), 就可以直接从map中取最大的词直至取满k个。map中的相同词频的用set<string>来储存，就可以自动排字母序，每次从后面开始取。

易错点：

从bucket中取，和从bucket中的second，也就是vector<\string> 当中取都应该是reverse。用到rbegin(), rend()
while (!tmp2.empty() && k > 0 ) 这里内部也需要限制k。可以有一种整体判断的方式避免遗忘这个条件：每次先计算当前取出bucket的大小，如果累计不超过k全体推入，如果超过有限制的筛选（grandyang）

692. Top K Frequent Words


class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        vector<string> result;
        unordered_map<string, int> freqMap;
        map<int, set<string>> bucket;
        for (string word: words) {
            ++freqMap[word];
        }
        
        for (auto item: freqMap){
            bucket[item.second].insert(item.first);
        }
        
        for (auto it = bucket.rbegin(); it != bucket.rend(); it++){
            if (k <= 0 ) break;
            set<string> tmp = it->second;
            vector<string> tmp2(tmp.rbegin(), tmp.rend());
            while (!tmp2.empty() && k > 0 ){
                string a = tmp2.back() ;
                tmp2.pop_back();
                result.push_back(a);
                k--;
            }
        }
        return result;
    }
};

grandyang

class Solution {
public:
    vector<string> topKFrequent(vector<string>& words, int k) {
        vector<string> res;
        unordered_map<string, int> freq;
        vector<set<string>> v(words.size() + 1, set<string>());
        for (string word : words) ++freq[word];
        for (auto a : freq) {
            v[a.second].insert(a.first);
        }
        for (int i = v.size() - 1; i >= 0; --i) {
            if (k <= 0) break;
            vector<string> t(v[i].begin(), v[i].end());
            if (k >= t.size()) {
                res.insert(res.end(), t.begin(), t.end());
            } else {
                res.insert(res.end(), t.begin(), t.begin() + k);
            }
            k -= t.size();
        }
        return res;
    }
};

上一篇： jsp页面调用applet实现人民币的大小写转换（二）_MySQL

下一篇： Linux下MySQL忽略表名大小写_MySQL

692. Top K Frequent Words