CNCI的使用--RNA蛋白编码预测软件

程序员文章站 2024-03-03 18:10:58

...

（生信）RNA蛋白编码预测软件-CNCI的使用

CNCI简介：

CNCI是由中科院研发的一款基于SVM（支持向量机）的LncRNA预测软件，它可以不依赖于已知的RNA注释信息来进行预测，同时其对不完全转录和反义的RNA有着良好的分类效果，本文将根据github的说明总结一些简单的操作。

软件安装与准备：

linux 32位或者linux 64位系统
python 2.74或者2.0版本（详细安装步骤见：https://blog.csdn.net/sherri_du/article/details/51810221）
CNCI下载地址：https://github.com/www-bioinfo-org/CNCI#install-cnci
CNCI的安装

git clone [email protected].com:www-bioinfo-org/CNCI.git
cd CNCI
unzip libsvm-3.0.zip
cd libsvm-3.0
make
cd ..

程序简介：

共有三个.py程序可供使用，分别是：compare.py / CNCI.py / filter_novel_lincRNA
下面是官网说明书里面对它们的解释：

1，compare.py: compare the merged/assembled transcripts with known gene annotation!
2，CNCI.py: A classification tool for identify coding or non-coding transcripts (fasta files and gtf files)
3，filter_novel_lincRNA.py: A tool that can convert the index file which produced by python CNCI_package/CNCI.py to four gene classes (novel_lincRNA,novel_coding, ambiguous_genes and filter_out_noncoding)
compare.py是用于组装转录本与已知的基因注释间的比较，CNCI.py则用于LncRNA的预测，filter_novel_lincRNA.py是对结果index文件进一步分类。

使用与操作：

在本片文章中使用的测试文件来源于PLEK软件中的测试数据（PLEK_test.fa），PLEK也是一款LncRNA的预测软件。将测试的数据集放在与CNCI文件夹的同一路径下

 python CNCI-master/CNCI.py -f PLEK_test.fa  -o test -p 8 -m ve

输出的文件有两个文件夹，“test_Tmp_Dir”用于临时存放得分和序列的文件夹，“test”文件夹下存放着结果文件CNCI.index。

参数详解：

链接: https://github.com/www-bioinfo-org/CNCI#install-cnci
具体的参数说明见github官网，这里只解释一些简单的参数：

-f   #输入文件名，可以是fasta可以是gtf格式
-g   #当使用gtf格式的文件时，必须在文件后面加一个-g，例：-f unannotation.gtf -g
-p   #线程数
-m   #参考的分类模型
-o   #输出文件

可选的分类模型有ve和pl两类，官网的解释是：

-m or --model : assign the classification models ("ve" for vertebrate species, "pl" for plat species)

个人感觉pl是指“植物“，不清楚”plat“是指什么分类。

结果文件：

在“test”文件下找到“CNCI.index”预测结果文件，包含四列信息：

Transcript ID	index	score	start	end	length
gi|98961144|ref|NM_022571.5| Homo sapiens G protein-coupled receptor 135 (GPR135), mRNA	coding	0.484	9	1485	1834
gi|53793662|ref|NM_001005466.1| Homo sapiens olfactory receptor, family 10, subfamily G, member 2 (OR10G2), mRNA	coding	0.219	0	780	933

index会显示该转录本是否有蛋白编码能力，分别为coding和noncoding

补充：

github中关于CNCI使用的例子：

python CNCI_package/CNCI.py -f unannotation.gtf -g -o test -m ve -p 8 -d hg19.2bit
python filter_novel_lincRNA.py -i test.index -g unannotation.gtf -s 0 -l 200 -e exon_num -o out_dir 
python extract.py -i novel-noncoding.gtf,nov.gtf -n known-non-coding.gtf -c known-coding.gtf

[参考文献]:

https://github.com/www-bioinfo-org/CNCI#install-cnci
http://nar.oxfordjournals.org/content/early/2013/08