Genome Browser使用方式札记

程序员文章站 2022-03-16 11:23:21

...

应该会持续更新，因为目前还搞不懂这些文件的作用，捂脸

安装步骤介绍：

-->1:安装docker

sudo apt-get update
sudo apt-get install docker.io
docker --version

-->2:获取NGB.git

git clone https://github.com/epam/NGB.git

-->3:进行构建

cd NGB
./gradlew buildDocker

-->4:确认一下已经dokcer中已经有了名称为ngb:latest的镜像仓库

docker images ls

-->5:基于已有的ngb:latest镜像运行NGB

docker run -p 8080:8080 -d --name ngbcore -v /home/2tong:/ngs ngb:latest

Tips:其中，ngbcore时可以随意更改的，/home/2tong是可以随意更改的

-->6:打开浏览器，键入http://localhost:8080/catgenome

Genome Browser使用方式札记

-->7:注册数据到ngb

docker exec -it ngbcore /bin/bash
ngb reg_ref /ngs/<Path to FASTA> -n my_genome -t
ngb reg_file my_genome /ngs/<Path to File> -n my_file1 -t
ngb reg_dataset my_genome my_sample_dataset my_file1
exit

通过增加my_file2，my_file3等，可以增加这个dataset所包含的数据数目

注册数据的快慢和这个数据的大小还是有关系的~

-->8:选择文件，开始比对

Genome Browser使用方式札记

补充：

-->1:ngb常用命令集锦

CLI for NGB server
All objects can be addressed by biologicalDataItemID or by name.

REFERENCE commands:
rr	reg_ref		: registers a reference file	{rr \path\to\file.fa -n grch38}
dr	del_ref		: unregisters a reference file	{dr grch38}
lr	list_ref	: lists all reference files, registered on the server	{lr}
ag	add_genes	: adds a gene file to the reference	{ag grch38 genes.gtf}
an	add_ann	: adds a annotation file to the reference	{an grch38 annotations.gtf}
ran	remove_ann	: remove a annotation file from the list of reference annotation files	{ran grch38 annotations.gtf}
rg	remove_genes	: removes a gene file from the reference	{rg grch38}

FILE commands:
rf	reg_file	: registers a feature file for a specified reference	{rf grch38 \path\to\file.bam?\path\to\file.bam.bai -n my_vcf}
df	del_file	: deletes a feature file one	{df my_vcf}
if	index_file	: creates a feature index for a file. 	 {if genes.gtf}

SEARCH commands:
s	search		: finds a reference or feature file by it's name, search can be configured by a '-c' option	{s -l vcf}

DATASET commands:
rd	reg_dataset	: creates a new dataset (ex project) for a specified reference	{rd grch38 my_dataset}
add	add_dataset	: adds files to a dataset	{add my_dataset sample.bam sample.vcf}
rmd	remove_dataset	: removes files from a dataset	{rmd my_dataset my_vcf}
dd	del_dataset	: removes a dataset	{dd my_dataset}
md	move_dataset	: changes the dataset parent to the dataset specified by the "-p" option, if option isn't provided, the dataset will be moved to the top level of the datasets hierarchy	{md my_dataset -p parent}
ld	list_dataset	: lists all datasets, registered on the server	{ld}

ADDITIONAL commands:
url		: generate url for displaying required files. {url my_dataset}

TOOLS commands:
sort		: sorts given feature file. If target path is not specified, sorted file will be stored in the same folder as the original one with the `.sorted.` suffix in the name.
CONFIGURATION commands:
srv	set_srv		: sets working server url for CLI	srv http://{SERVER_IP_OR_NAME}:{SERVER_PORT}/catgenome
v	version		: prints CLI version to the console standard output

Available options (options may go before, after or between the arguments):

 -c (--config, --configuration) PATH : path to the configuration file
 -f (--force)                        : defines if a dataset will be force
                                       deleted (default: false)
 -g (--genes) VAL                    : specifies a gene file for reference
                                       registration
 -h (--help)                         : prints help (default: true)
 -j (--json)                         : output request's result in a json,
                                       otherwise the output of all commands
                                       will be ignored, excluding search and
                                       list commands (default: false)
 -l (--like)                         : use non-strict search for file finding
                                       (default: false)
 -loc (--location) VAL               : location of view port in format:
                                       chr:start-end
 -m (--max_memory) N                 : specifies amount of memory in megabytes
                                       to use when sorting (default: 500)
                                       (default: 0)
 -n (--name) VAL                     : explicitly specifies file name for
                                       registration
 -ngc (--nogccontent)                : specifies if GC content shouldn't be
                                       calculated during reference registration
                                       (default: false)
 -ni (--no_index)                    : defines if a feature index should not be
                                       created for registered VCF or GFF/GTF
                                       file (default: false)
 -p (--parent) VAL                   : specifies dataset parent for registration
 -pt (--pretty) VAL                  : pretty name for datasets or biological
                                       data file
 -t (--table)                        : output request's result in a table,
                                       otherwise the output of all commands
                                       will be ignored, excluding search and
                                       list commands (default: false)

-->2:indels的分类

替换：指与参考序列相比，一种碱基被另一种碱基所取代；以符号“>”进行表示；如：c.125A>T，表示与参考序列相比，第125位的A被T所取代；

缺失：指与参考序列相比，一个或多个碱基缺失的现象；以“DEL”进行表示；如：c.2054delA，表示与参考序列相比，第2054位发生A的缺失；

插入：指与参考序列相比，一个或多个碱基增添的现象；以“INS”进行表示；如：c.5750_5751insAGG，表示与参考序列相比，在第5750 与5751位点之间插入了三个碱基AGG；

缺失插入：指与参考序列相比，一个或多个碱基被其他碱基所取代的现象，并且这种变异不包括替换突变、倒置以及转换突变；常以“delins”进行表示，这里以“MIXED”表示；如：c.6776delinsGA，表示与参考序列相比，第6776位缺失了一个碱基，且缺失的碱基被GA做取代；

重复：指与参考序列相比，包含一个或多个碱基的拷贝以插入的形式直接掺入序列中的现象；以“DUP”进行表示；如：c.6_8dupT，表示从第6位到第8位发生了T的重复；

补充，就很蠢，docker需要补一波知识点~~

docker exec -it ngbcore3 ngb reg_ref .....就可以的了