欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Crazy Search(hash算法)

程序员文章站 2022-05-06 08:01:01
...

Many people like to solve hard puzzles some of which may lead them to madness. One such puzzle could be finding a hidden prime number in a given text. Such number could be the number of different substrings of a given size that exist in the text. As you soon will discover, you really need the help of a computer and a good algorithm to solve such a puzzle.

Your task is to write a program that given the size, N, of the substring, the number of different characters that may occur in the text, NC, and the text itself, determines the number of different substrings of size N that appear in the text.

As an example, consider N=3, NC=4 and the text “daababac”. The different substrings of size 3 that can be found in this text are: “daa”; “aab”; “aba”; “bab”; “bac”. Therefore, the answer should be 5.

Input

The first line of input consists of two numbers, N and NC, separated by exactly one space. This is followed by the text where the search takes place. You may assume that the maximum number of substrings formed by the possible set of characters does not exceed 16 Millions.

Output

The program should output just an integer corresponding to the number of different substrings of size N found in the given text.

Sample Input

3 4
daababac

Sample Output
5

题意:问一个字符串中有多少不同长度为n的字串。

思路:
1.利用map或者set通过substr函数截取字符串, TLE
2.利用hash hash函数用来解决两个字符串是否相同,或者字符串出现的次数问题。

字符串hash函数把一个任意长度的字符串映射为一个非负整数 ,并且其冲突概率几乎为0.

我们取一个p的值,把字串串看成一个p进制的数,一般我们取p=131或者p = 13331(我也不知道为啥),这样计算hash值发生冲突的概率极低。计算hash函数就是p进制数%M (我们取M为2的64次方 两个优点 1.精度高2.如果计算的值大于2的64次方,他自动%2的64次方) 。

hash值预处理前缀hash值 : hash【i】=hash【i-1】*p+(S【i】-‘a’+1); 如果不明白可在纸上模拟下10进制1234 hash【1】=1 hash【2】=12 hash【3】=123 hash【4】=1234

计算某一段字符串【l,r】的hash值利用前缀和思想: sum=hash【r】-hash【l-1】*p的r-l+1次方

为什么要乘 p的r-l+1次方 这个代表这两个数差n位的进制的数,比如:10进制差3位为100 差10位为10的10次方。 举上面的例子1234 求 234的hash值求法,hash【1234】-hash【1】*10的3方=234
在计算中左面的p的次方>右边p的次方 求得时候需要加上,否则hash值比真实值要大。

代码:

#include<iostream>
#include<algorithm> 
#include<cstring>
#include<cstdio>
#include<set>
const int mm=2e6;
using namespace std;
typedef unsigned long long ull;
char s[mm];
ull f[mm],p[mm],q[mm];
int len;
int main()
{
	int n,m;
	scanf("%d%d",&n,&m);
	scanf("%s",s+1);
	len=strlen(s+1);
	p[0]=1;
	for(int i=1;i<=len;i++)
	{
		f[i]=f[i-1]*131+(s[i]-'a'+1);
		p[i]=p[i-1]*131;  //进制数的差 
	} 
	int k=0;
	for(int i=1;i<=len-n+1;i++)
	{
		q[i]=f[i+n-1]-f[i-1]*p[n];
		k++;	
	} 
	sort(q+1,q+k+1);
	k=unique(q,q+k+1)-q-1;
	printf("%d\n",k);
	return 0;
}