Space of a muser

Insight

Enjoy life, relish the moment!


Calculate the identity of the BWA alignments

Since BWA does not report the similarity of the read to reference, we can use the edit distance tag “NM” to retrieve the information regarding the difference between read and reference on sequence. Afterwards, we are able to calculate the identity using:

$$identity = \frac{edit\ distance}{read\ length}$$

Here it a script using AWk to compute the identity and count the number of alignments mapped to “EamGingi” with identity above 97%.

1
2
3
4
5
6
7
8
9
10
11
12
for file in *.bam;
do 
	samtools view -h -F4 $file| \
	awk -F"\t" -v maxpct=0.03 'NF>=11&&$3=="EamGingi"
	{
		nm=gensub(/.*\tNM:i:([0-9]+)\t.*/, "\\1", "g", $0);nmr=nm/length($10);
		if(nmr<=maxpct)
		{
		print $1, $12,$13, nm
		}
	}'|wc -l;
done
最近的文章

Survival analysis using R and Python

于  TCGA
comments powered by Disqus