How to obtain an Admixture bar plot using ANGSD (ngsTools)

Admixture bar plots are used to visualize the genetic structure of populations by assigning proportions of an individual's genome to different ancestral populations (K). Here are some key concepts:

  • Ancestral Populations (K): The number of distinct genetic populations assumed in the analysis. Users must choose a value for K, which represents the number of ancestral populations.
  • Individuals: Each individual in the dataset is represented as a vertical bar in the plot.
  • Ancestry Proportions: The colors within each bar indicate the proportion of an individual's genome that comes from each ancestral population.

 

To obtain an Admixture bar plot, you firstly need to install ngsTools (instructions here). This software uses genotype likelihoods rather than hard genotype calls. The analysis is based on BAM files.

Sample data used in this tutorial can be downloaded here.

Admixture proportions can be estimated from genotype likelihoods using ngsTools. Here are instructions for its instalation.

 

First, we need to create input files in BEAGLE format. For that purpose, ANGSD is run as the following command:

$ ngsTools/angsd/./angsd -P 4 -b <path_to_bamlist>/bam.list -ref <path_to_reference_genome>/chr1.fna -out <path_to_beagle>/samples -doCounts 1 -GL 1 -doMajorMinor 4 -doMaf 1 -skipTriallelic 1 -doGlf 2

In this case:

-P 4: Uses 4 threads for parallel processing.
-b: Path to the file listing input BAM files.
-ref: Path to the reference genome file.
-out: Prefix for output files.
-doCounts 1: Counts the number of reads covering each site.
-GL 1: Uses the SAMtools model for genotype likelihood calculations.
-doMajorMinor 4: Infers major and minor alleles from genotype likelihoods.
-doMaf 1: Calculates minor allele frequencies.
-skipTriallelic 1: Skips sites with more than two alleles (triallelic sites).
-doGlf 2: Outputs genotype likelihoods in BEAGLE format.

 

Let’s assume you want the admixture proportions for 2 ancestral components. These can be obtained using the following command:

$ ngsTools/angsd/misc/./NGSadmix -likes <path_to_beagle>/samples.beagle.gz -K 2 -outfiles <path_to_output>/samples -P 4 -minMaf 0

In this case:

-likes: Specifies the input file with genotype likelihoods in BEAGLE format (samples.beagle.gz).
-K 2: Sets the number of ancestral populations (clusters) to 2.
-outfiles: Specifies the prefix for output files.
-P 4: Uses 4 threads for parallel processing.
-minMaf 0: Sets the minimum minor allele frequency threshold to 0 (includes all sites regardless of minor allele frequency)


A .qopt file with the admixture proportions is generated. This python script – plot_admix.py – can be used to obtain the bar plot – this should be edited and the output file name should be added to q_file = on line 8. Simply place the script on the output folder, run it, and an admixture_barplot.png file is generated with the desired plot.

Comments

Popular posts from this blog

Welcome!