欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页

Genome Browser使用方式札记

程序员文章站 2022-03-16 11:23:21
...

应该会持续更新,因为目前还搞不懂这些文件的作用,捂脸

安装步骤介绍:

-->1:安装docker

sudo apt-get update
sudo apt-get install docker.io
docker --version

-->2:获取NGB.git

git clone https://github.com/epam/NGB.git

-->3:进行构建

cd NGB
./gradlew buildDocker

-->4:确认一下已经dokcer中已经有了名称为ngb:latest的镜像仓库

docker images ls

-->5:基于已有的ngb:latest镜像运行NGB

docker run -p 8080:8080 -d --name ngbcore -v /home/2tong:/ngs ngb:latest

Tips:其中,ngbcore时可以随意更改的,/home/2tong是可以随意更改的

-->6:打开浏览器,键入http://localhost:8080/catgenome

Genome Browser使用方式札记

-->7:注册数据到ngb

docker exec -it ngbcore /bin/bash
ngb reg_ref /ngs/<Path to FASTA> -n my_genome -t
ngb reg_file my_genome /ngs/<Path to File> -n my_file1 -t
ngb reg_dataset my_genome my_sample_dataset my_file1
exit
Genome Browser使用方式札记

通过增加my_file2,my_file3等,可以增加这个dataset所包含的数据数目

注册数据的快慢和这个数据的大小还是有关系的~

-->8:选择文件,开始比对

Genome Browser使用方式札记

补充:

-->1:ngb常用命令集锦

CLI for NGB server
All objects can be addressed by biologicalDataItemID or by name.

REFERENCE commands:
rr	reg_ref		: registers a reference file	{rr \path\to\file.fa -n grch38}
dr	del_ref		: unregisters a reference file	{dr grch38}
lr	list_ref	: lists all reference files, registered on the server	{lr}
ag	add_genes	: adds a gene file to the reference	{ag grch38 genes.gtf}
an	add_ann	: adds a annotation file to the reference	{an grch38 annotations.gtf}
ran	remove_ann	: remove a annotation file from the list of reference annotation files	{ran grch38 annotations.gtf}
rg	remove_genes	: removes a gene file from the reference	{rg grch38}

FILE commands:
rf	reg_file	: registers a feature file for a specified reference	{rf grch38 \path\to\file.bam?\path\to\file.bam.bai -n my_vcf}
df	del_file	: deletes a feature file one	{df my_vcf}
if	index_file	: creates a feature index for a file. 	 {if genes.gtf}

SEARCH commands:
s	search		: finds a reference or feature file by it's name, search can be configured by a '-c' option	{s -l vcf}

DATASET commands:
rd	reg_dataset	: creates a new dataset (ex project) for a specified reference	{rd grch38 my_dataset}
add	add_dataset	: adds files to a dataset	{add my_dataset sample.bam sample.vcf}
rmd	remove_dataset	: removes files from a dataset	{rmd my_dataset my_vcf}
dd	del_dataset	: removes a dataset	{dd my_dataset}
md	move_dataset	: changes the dataset parent to the dataset specified by the "-p" option, if option isn't provided, the dataset will be moved to the top level of the datasets hierarchy	{md my_dataset -p parent}
ld	list_dataset	: lists all datasets, registered on the server	{ld}

ADDITIONAL commands:
url		: generate url for displaying required files. {url my_dataset}

TOOLS commands:
sort		: sorts given feature file. If target path is not specified, sorted file will be stored in the same folder as the original one with the `.sorted.` suffix in the name.
CONFIGURATION commands:
srv	set_srv		: sets working server url for CLI	srv http://{SERVER_IP_OR_NAME}:{SERVER_PORT}/catgenome
v	version		: prints CLI version to the console standard output

Available options (options may go before, after or between the arguments):

 -c (--config, --configuration) PATH : path to the configuration file
 -f (--force)                        : defines if a dataset will be force
                                       deleted (default: false)
 -g (--genes) VAL                    : specifies a gene file for reference
                                       registration
 -h (--help)                         : prints help (default: true)
 -j (--json)                         : output request's result in a json,
                                       otherwise the output of all commands
                                       will be ignored, excluding search and
                                       list commands (default: false)
 -l (--like)                         : use non-strict search for file finding
                                       (default: false)
 -loc (--location) VAL               : location of view port in format:
                                       chr:start-end
 -m (--max_memory) N                 : specifies amount of memory in megabytes
                                       to use when sorting (default: 500)
                                       (default: 0)
 -n (--name) VAL                     : explicitly specifies file name for
                                       registration
 -ngc (--nogccontent)                : specifies if GC content shouldn't be
                                       calculated during reference registration
                                       (default: false)
 -ni (--no_index)                    : defines if a feature index should not be
                                       created for registered VCF or GFF/GTF
                                       file (default: false)
 -p (--parent) VAL                   : specifies dataset parent for registration
 -pt (--pretty) VAL                  : pretty name for datasets or biological
                                       data file
 -t (--table)                        : output request's result in a table,
                                       otherwise the output of all commands
                                       will be ignored, excluding search and
                                       list commands (default: false)

-->2:indels的分类

替换:指与参考序列相比,一种碱基被另一种碱基所取代;以符号“>”进行表示;如:c.125A>T,表示与参考序列相比,第125位的A被T所取代;

缺失:指与参考序列相比,一个或多个碱基缺失的现象;以“DEL”进行表示;如:c.2054delA,表示与参考序列相比,第2054位发生A的缺失;

插入:指与参考序列相比,一个或多个碱基增添的现象;以“INS”进行表示;如:c.5750_5751insAGG,表示与参考序列相比,在第5750 与5751位点之间插入了三个碱基AGG;

缺失插入:指与参考序列相比,一个或多个碱基被其他碱基所取代的现象,并且这种变异不包括替换突变、倒置以及转换突变;常以“delins”进行表示,这里以“MIXED”表示;如:c.6776delinsGA,表示与参考序列相比,第6776位缺失了一个碱基,且缺失的碱基被GA做取代;

重复:指与参考序列相比,包含一个或多个碱基的拷贝以插入的形式直接掺入序列中的现象;以“DUP”进行表示;如:c.6_8dupT,表示从第6位到第8位发生了T的重复;

补充,就很蠢,docker需要补一波知识点~~

docker exec -it  ngbcore3 ngb reg_ref .....就可以的了