Cookies on this website

We use cookies to ensure that we give you the best experience on our website. If you click 'Accept all cookies' we'll assume that you are happy to receive all cookies and you won't see this message again. If you click 'Reject all non-essential cookies' only necessary cookies providing core functionality such as security, network management, and accessibility will be enabled. Click 'Find out more' for information on how to change your cookie settings.

<ns4:p><ns4:bold>Background: </ns4:bold>High-throughput whole genome sequencing facilitates investigation of minority sub-populations from virus positive samples. Minority variants are useful in understanding within and between host diversity, population dynamics and can potentially help to elucidate person-person transmission chains. Several minority variant callers have been developed to describe the minority variants sub-populations from whole genome sequence data. However, they differ on bioinformatics and statistical approaches used to discriminate sequencing errors from low-frequency variants.</ns4:p><ns4:p> <ns4:bold>Methods: </ns4:bold>We evaluated the diagnostic performance and concordance between published minority variant callers used in identifying minority variants from whole-genome sequence data. The ART-Illumina read simulation tool was used to generate three artificial short-read datasets of varying coverage and error profiles from an RSV reference genome. The datasets were spiked with nucleotide variants at predetermined positions and frequencies. Variants were called using FreeBayes, LoFreq, Vardict, and VarScan2. The variant callers’ agreement in identifying known variants was quantified using two measures; concordance accuracy and the inter-caller concordance.</ns4:p><ns4:p> <ns4:bold>Results: </ns4:bold>The variant callers reported differences in identifying minority variants from the datasets. Concordance accuracy and inter-caller concordance were positively correlated with sample coverage. FreeBayes identified majority of the variants although it was characterised by variable sensitivity and precision in addition to a high false positive rate relative to the other minority variant callers and which varied with sample coverage. LoFreq was the most conservative caller.</ns4:p><ns4:p> <ns4:bold>Conclusions: </ns4:bold>We conducted a performance and concordance evaluation of four minority variant calling tools used to identify and quantify low frequency variants. Inconsistency in the quality of sequenced samples impact on sensitivity and accuracy of minority variant callers. Our study suggests that combining at least three tools when identifying minority variants is useful in filtering errors when calling low frequency variants.</ns4:p>

Original publication





Wellcome Open Research


F1000 Research Ltd

Publication Date





21 - 21