Haemophilus influenzae is a human-adapted pathogen that causes both respiratory and invasive diseases. Following the introduction of the H. influenzae type b (Hib) conjugate vaccine, non-typeable H. influenzae (NTHi) has emerged as the predominant cause of invasive disease, with reported cases continuing to rise. However, understanding of the species population structure remains limited due to its genetic diversity and the lack of high-resolution analytical tools and representative isolates from low- and middle-income countries (LMICs) such as Indonesia. This DPhil thesis addresses these gaps by developing a genomic framework for high-resolution typing, identifying genetic factors linked to invasiveness in NTHi, and characterising circulating strains in an underrepresented region. A core genome multilocus sequence typing (cgMLST) scheme was developed using 2,297 high-quality genomes, resulting in a stable set of 1,037 core genes. These genes were functionally annotated and used to construct a cgMLST framework that accurately reflects phylogenetic relationships. The scheme was implemented in the PubMLST database to support accessible and standardised population analysis. Building on this framework, a genomic clustering system based on the Life Identification Number (LIN) code was applied to define consistent, hierarchical groupings within the species. Demonstrated using published data, the cgLIN scheme enables scalable classification of H. influenzae lineages and supports public health applications including antimicrobial resistance (AMR) surveillance and outbreak detection. A genome-wide association study (GWAS) was conducted to investigate the genetic basis of invasiveness in NTHi. By comparing invasive and non-invasive isolates from global sources, the analysis identified variants in porin genes, TonB-dependent receptors, and regulatory elements, many of which correspond to known virulence factors. However, no single genetic determinant fully explained the phenotypic outcome of invasive disease in NTHi, suggesting that this trait is polygenic and shaped by extensive recombination, as previously observed in vitro and further supported by large-scale in silico genomic analyses in this study The final part of the thesis assessed the population structure of H. influenzae in Indonesia. A total of 113 isolates from both carriage and invasive disease were characterised using cgMLST and cgLIN. The Indonesian isolates were genetically diverse and largely composed of NTHi lineages similar to those found globally. Ampicillin resistance was common, mediated by mobile genetic elements and mutations in the ftsI gene. These findings demonstrate the importance of integrating genomic data with local epidemiological context to inform regional AMR surveillance. Overall, this thesis provides a publicly available framework for the genomic characterisation of H. influenzae and applies it to key questions in population structure, virulence, and resistance. The results support improved surveillance, inform vaccine development, and enhance public health response efforts, particularly in regions with limited genomic infrastructure.
Thesis / Dissertation
2025-11-10T00:00:00+00:00
Haemophilus influenzae, typing scheme, bacterial genomics, whole genome sequencing