How to Download NGS Data - Using prefetch SRA and fastq-dump for Sequencing Reads

For every publication where Next-generation-sequencing data was obtained, that data was uploaded to NCBI‘s Short Read Archive (SRA). This share opened the possibility for other scientists to test the data, learn with that data, or use it in their own studies to search for other conclusions.

One can obtain all SRA available on https://www.ncbi.nlm.nih.gov/sra. Here, you can search for a specific SRA, or a species’ whole genomic sequencing data or even RNA-seq data. After you choose the SRA, you need to obtain the access ID (SRR), which is easily found in the SRA page you selected.

NCBI offered a command-line toolkit that allowed users to interact with the database and each SRA itself – the SRA Toolkit. It can be installed by running the following command:

$ sudo apt install sra-toolkit

The two most used sra-toolkit commands are prefetch and fastq-dump. The prefetch command is used to download the compressed archives from SRA – the SRR archives – to local access. Here is an example you can follow, corresponding to a whole genome sequencing sample of a Quercus lobata tree:

$ prefetch SRR14546180 -O path/to/srr/output

After download, you can extract the fastq files from the SRR file using fastq-dump:

$ fastq-dump --split-3 --gzip --outdir /path/to/fq/output SRR14546180.sra

Here, --split-3 ensures that paired-end and single-end reads are extracted in separated files, and --gzip compresses each file.

After obtaining the fastq.gz files, the next step is the quality control with FastQC.

Search This Website

LMG_BIO

How to Download NGS Data - Using prefetch SRA and fastq-dump for Sequencing Reads

Comments

Popular posts from this blog

Welcome!