Since BWA does not report the similarity of the read to reference, we can use the edit distance tag “NM” to retrieve the information regarding the difference between read and reference on sequence. Afterwards, we are able to calculate the identity using:
$$identity = \frac{edit\ distance}{read\ length}$$
Here it a script using AWk to compute the identity and count the number of alignments mapped to “EamGingi” with identity above 97%.
1 2 3 4 5 6 7 8 9 10 11 12 | for file in *.bam; do samtools view -h -F4 $file| \ awk -F"\t" -v maxpct=0.03 'NF>=11&&$3=="EamGingi" { nm=gensub(/.*\tNM:i:([0-9]+)\t.*/, "\\1", "g", $0);nmr=nm/length($10); if(nmr<=maxpct) { print $1, $12,$13, nm } }'|wc -l; done |