DNA Prefix--字典树
Given a set of n DNA samples, where each sample is a string containing characters from {A, C, G, T}, we are trying to find a subset of samples in the set, where the length of the longest common prefix multiplied by the number of samples in that subset is maximum.
To be specific, let the samples be:
ACGT
ACGTGCGT
ACCGTGC
ACGCCGT
If we take the subset {ACGT} then the result is 4 (4 * 1), if we take {ACGT, ACGTGCGT, ACGCCGT} then the result is 3 * 3 = 9 (since ACG is the common prefix), if we take {ACGT, ACGTGCGT, ACCGTGC, ACGCCGT} then the result is 2 * 4 = 8.
Now your task is to report the maximum result we can get from the samples.
InputInput starts with an integer T (≤ 10), denoting the number of test cases.
Each case starts with a line containing an integer n (1 ≤ n ≤ 50000) denoting the number of DNA samples. Each of the next n lines contains a non empty string whose length is not greater than 50. And the strings contain characters from {A, C, G, T}.
OutputFor each case, print the case number and the maximum result that can be obtained.
Sample Input3
4
ACGT
ACGTGCGT
ACCGTGC
ACGCCGT
3
CGCGCGCGCGCGCCCCGCCCGCGC
CGCGCGCGCGCGCCCCGCCCGCAC
CGCGCGCGCGCGCCCCGCCCGCTC
2
CGCGCCGCGCGCGCGCGCGC
GGCGCCGCGCGCGCGCGCTC
Sample OutputCase 1: 9
Case 2: 66
Case 3: 20
分析:题目大意是给你一个集合,让你从这个集合里面选取一个子集,然后求解这个子集的最长公共序列和子集元素个数的乘积,求解是的这些可能里面的最大值
这道题一开始我想来想去感觉很是复杂,首先集合的组合就有很多种,更别说每种组合再去统计最长公共序列了!后来才了解到有字典树这种好玩的东西存在!
下面先简单介绍一下字典树:
定义:
又称单词查找树,Trie树,是一种树形结构,是一种哈希树的变种。典型应用是用于统计,排序和保存大量的字符串(但不仅限于字符串),所以经常被搜索引擎系统用于文本词频统计。它的优点是:利用字符串的公共前缀来减少查询时间,最大限度地减少无谓的字符串比较,查询效率比哈希树高。
性质:
根节点不包含字符,除根节点外每一个节点都只包含一个字符; 从根节点到某一节点,路径上经过的字符连接起来,为该节点对应的字符串; 每个节点的所有子节点包含的字符都不相同。
应用:
串的快速检索
“串”排序
最长公共前缀
/*字典树:
* 对于每个序列纵向建树, 实际上每个字符串都是从根到叶子节点的一条路径,并通过num的值来统计出现次数
* 可想而知,这是一颗多么漂亮而又庞大的数
* */
import java.util.*;
public class Main{
static Scanner in = new Scanner(System.in);
public static void main(String args[]){
int k=in.nextInt();
int ca = 0;
while(k-->0){
ca++;
int n = in.nextInt();
String s ;
int max = 0 ;
TreeNode root = new TreeNode();
for (int i = 0; i < n; i++) {
s = in.next();
max = root.insert(root, s);
}
System.out.println("Case "+ca+": "+ max);
}
System.exit(0);//作用是将jvm直接干掉,其实就是释放全部内存,oj会内存超时
}
}
//字典树节点
class TreeNode{
int size = 4;//字符的种类,即每个节点的孩子数目
int ans;
int num;////有多少单词通过这个节点,即由根至该节点组成的字符串模式出现的次数
TreeNode[] son;
TreeNode() {
num = 0;
son = new TreeNode[size];
}
int insert(TreeNode root,String str){
if(str==null||str.length()==0){
return 0;
}
TreeNode p = root;//每次从根节点开始
int len=str.length();
int id = 0,t;
for(int i=0; i<len; i++){
switch(str.charAt(i)){
case 'A': id=0; break;
case 'C': id=1; break;
case 'G': id=2; break;
case 'T': id=3; break;
}
if(p.son[id]==null){
p.son[id]=new TreeNode();
}
p=p.son[id]; //字符串每个字符向下一层
(p.num)++;
t=(i+1)*(p.num); //i代表公共子序列长度,num其实代表样本数目
ans=Math.max(ans, t);//求解最大的
}
return ans;
}
}
好玩的东西以后持续更新。。