- Determine SNP variation among the aligned DNAs for a genomic region. See below for how to count SNP variation. The output file (Your_name_snp.txt) should have two columns of numbers. The first column will indicate total number of SNP sites per species and the second will be the percent of sequences/species having that same number of variant nucleotides.
- Determine in-del variation among the aligned DNAs for a genomic region. The output file (Your_name_in_del.txt) should be two columns of numbers. The first column will indicate total number of in-del sites per species and the second will be the percent of sequences/species having that same number of in-del.
- Determine overall variation (SNPs and in-dels) among the aligned DNAs for a genomic region. The output file (Your_name_both.txt) two columns of numbers. The first column will indicate total number of variant sites (SNP and in-del) per species and the second will be the percent of sequences/species having that same number of variant nucleotides. This will generate the same data used for the figure on page 3.
Sample Alignment: 48 bases, differences are highlighted
Seq1 ATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
Seq2 AAAAATGCATGCATGCATGCATGCATGCATGCATGCATGCATGCATGC
Seq3 AAAAATGCATGCATGCA-GCATGCATGCATGCATGCATGCATGCATGC
Seq4 AAAAATGCATGCATGCA-GCATGCATGCATTTTTGCATGCATGCATGC
Seq5 AAAAATGCATGCATGCA-GCATGCATGCATTTTTGCAT-CATGCATGC
Computation: Compare Seq1 to 2,3,4, and 5 you find the differences (SNPs and InDels).
Seq1:Seq1 = 0 changes
Seq1:Seq2 = 3 changes
Seq1:Seq3 = 4 changes
Seq1:Seq4 = 7 changes
Seq1:Seq5 = 8 changes
Repeat using each of the other sequences as the basis for comparison
Seq2:Seq1 = 3 changes Seq3:Seq1 = 4 changes
Seq2:Seq2 = 0 changes Seq3:Seq2 = 1 changes
Seq2:Seq3 = 1 changes Seq3:Seq3 = 0 changes
Seq2:Seq4 = 4 changes Seq3:Seq4 = 3 changes
Seq2:Seq5 = 5 changes Seq3:Seq5 = 4 changes
Seq4:Seq1 = 7 changes Seq5:Seq1 = 8 changes
Seq4:Seq2 = 4 changes Seq5:Seq2 = 5 changes
Seq4:Seq3 = 3 changes Seq5:Seq3 = 4 changes
Seq4:Seq4 = 0 changes Seq5:Seq4 = 1 changes
Seq4:Seq5 = 1 changes Seq5:Seq5 = 0 changes
Our input file is a FASTA format file of all sequences/species that has been previously aligned and trimmed. There are some odd characters in the file, so we'll have to deal with that.