欢迎您访问程序员文章站本站旨在为大家提供分享程序员计算机编程知识!
您现在的位置是: 首页  >  数据库

Genbank Accession number

程序员文章站 2022-05-30 16:10:45
...

Accession numbers are identifiers for a sequence, for example P123456. They can have version numbers if suffixed with a "." and a number, for example P123456.2. This aids distinguishing between older and newer versions of a sequence, and t

Accession numbers are identifiers for a sequence, for example P123456. They can have version numbers if suffixed with a "." and a number, for example P123456.2. This aids distinguishing between older and newer versions of a sequence, and to track which actual sequence was used in an analysis.

NCBI Reference sequences have their own syntax.

Accessions are allocated in batches to the different sequence repositories DDBJ, EMBL Database, and NCBI. Table 1 shows the format of some unversioned accession numbers.

Table 1: Some Accession Number Formats

Database Regular Expression Perl Regular Expression
RefSeq [:alpha]{2}_[:digit]{6,9} or NZ_[:alpha]{4} [:digit]{6,9} [A-Z]{2}_\d{6,9} or NZ_[A-Z]{4}\d{6,9}
Swissprot [OPQ][:digit][:alnum]{3}[:digit]
GenBank/EMBL/DDBJ [:alpha][:digit]{5} or [:alpha]{2}[:digit]{6} [A-Z]\d{5} or [A-Z]{2}\d{6}
PRF [:digit]{6,7} [:alpha] \d{6,7}[A-Z]
PDB [:digit][:alpha]{3} \d[A-Z]{3}
MMDB [:digit]{4} \d{4}
GenBank GI [:digit]{5,} \d{5,}