Dr Etienne de Villiers

Research Area: Bioinformatics & Stats (inc. Modelling and Computational Biology)
Technology Exchange: Bioinformatics
Scientific Themes: Genetics & Genomics
Keywords: Bioinformatics and Genomics
Web Links:

I am the Bioinformatics group leader at the Wellcome – KEMRI – Oxford collaborative research programme in Kilifi, Kenya where I established a Bioinformatics and Genomics platform to support the application of bioinformatics and genomics in research projects at the programme.

Name Department Institution Country
Professor Kevin Marsh Tropical Medicine Oxford University, NDM Research Building United Kingdom
Professor James A Berkley Tropical Medicine Oxford University, Kilifi Kenya
Jean Langhorne National Institute of Medical Research Mill Hill United Kingdom
Professor Eduard Sanders Tropical Medicine Oxford University, Kilifi Kenya
Dr Francis Ndungu Tropical Medicine Oxford University, Kilifi Kenya
Omedo I, Mogeni P, Rockett K, Kamau A, Hubbart C, Jeffreys A, Ochola-Oyier LI, de Villiers EP, Gitonga CW, Noor AM et al. 2017. Geographic-genetic analysis of Plasmodium falciparum parasite populations from surveys of primary school children in Western Kenya Wellcome Open Research, 2 pp. 29-29. | Show Abstract | Read more

© 2017 Omedo I et al. Background. Malaria control, and finally malaria elimination, requires the identification and targeting of residual foci or hotspots of transmission. However, the level of parasite mixing within and between geographical locations is likely to impact the effectiveness and durability of control interventions and thus should be taken into consideration when developing control programs. Methods. In order to determine the geographic-genetic patterns of Plasmodium falciparum parasite populations at a sub-national level in Kenya, we used the Sequenom platform to genotype 111 genome-wide distributed single nucleotide polymorphic (SNP) positions in 2486 isolates collected from children in 95 primary schools in western Kenya. We analysed these parasite genotypes for genetic structure using principal component analysis and assessed local and global clustering using statistical measures of spatial autocorrelation. We further examined the region for spatial barriers to parasite movement as well as directionality in the patterns of parasite movement. Results. We found no evidence of population structure and little evidence of spatial autocorrelation of parasite genotypes (correlation coefficients < 0.03 among parasite pairs in distance classes of 1km, 2km and 5km; p value < 0.01). An analysis of the geographical distribution of allele frequencies showed weak evidence of variation in distribution of alleles, with clusters representing a higher than expected number of samples with the major allele being identified for 5 SNPs. Furthermore, we found no evidence of the existence of spatial barriers to parasite movement within the region, but observed directional movement of parasites among schools in two separate sections of the region studied. Conclusions. Our findings illustrate a pattern of high parasite mixing within the study region. If this mixing is due to rapid gene flow, then “one-off” targeted interventions may not be currently effective at the sub-national scale in Western Kenya, due to the high parasite movement that is likely to lead to re-introduction of infection from surrounding regions. However repeated targeted interventions may reduce transmission in the surrounding regions.

Hernández-de-Diego R, de Villiers EP, Klingström T, Gourlé H, Conesa A, Bongcam-Rudloff E. 2017. The eBioKit, a stand-alone educational platform for bioinformatics. PLoS Comput Biol, 13 (9), pp. e1005616. | Show Abstract | Read more

Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative.

Omedo I, Mogeni P, Bousema T, Rockett K, Amambua-Ngwa A, Oyier I, C Stevenson J, Y Baidjoe A, de Villiers EP, Fegan G et al. 2017. Micro-epidemiological structuring of Plasmodium falciparum parasite populations in regions with varying transmission intensities in Africa. Wellcome Open Res, 2 pp. 10. | Show Abstract | Read more

Background: The first models of malaria transmission assumed a completely mixed and homogeneous population of parasites. Recent models include spatial heterogeneity and variably mixed populations. However, there are few empiric estimates of parasite mixing with which to parametize such models. Methods: Here we genotype 276 single nucleotide polymorphisms (SNPs) in 5199 P. falciparum isolates from two Kenyan sites (Kilifi county and Rachuonyo South district) and one Gambian site (Kombo coastal districts) to determine the spatio-temporal extent of parasite mixing, and use Principal Component Analysis (PCA) and linear regression to examine the relationship between genetic relatedness and distance in space and time for parasite pairs. Results: Using 107, 177 and 82 SNPs that were successfully genotyped in 133, 1602, and 1034 parasite isolates from The Gambia, Kilifi and Rachuonyo South district, respectively, we show that there are no discrete geographically restricted parasite sub-populations, but instead we see a diffuse spatio-temporal structure to parasite genotypes. Genetic relatedness of sample pairs is predicted by relatedness in space and time. Conclusions: Our findings suggest that targeted malaria control will benefit the surrounding community, but unfortunately also that emerging drug resistance will spread rapidly through the population.

Gimode D, Odeny DA, de Villiers EP, Wanyonyi S, Dida MM, Mneney EE, Muchugi A, Machuka J, de Villiers SM. 2016. Identification of SNP and SSR Markers in Finger Millet Using Next Generation Sequencing Technologies PLOS ONE, 11 (7), pp. e0159437-e0159437. | Read more

Bishop RP, Fleischauer C, de Villiers EP, Okoth EA, Arias M, Gallardo C, Upton C. 2015. Comparative analysis of the complete genome sequences of Kenyan African swine fever virus isolates within p72 genotypes IX and X. Virus Genes, 50 (2), pp. 303-309. | Show Abstract | Read more

Twelve complete African swine fever virus (ASFV) genome sequences are currently publicly available and these include only one sequence from East Africa. We describe genome sequencing and annotation of a recent pig-derived p72 genotype IX, and a tick-derived genotype X isolate from Kenya using the Illumina platform and comparison with the Kenya 1950 isolate. The three genomes constitute a cluster that was phylogenetically distinct from other ASFV genomes, but 98-99 % conserved within the group. Vector-based compositional analysis of the complete genomes produced a similar topology. Of the 125 previously identified 'core' ASFV genes, two ORFs of unassigned function were absent from the genotype IX sequence which was 184 kb in size as compared to 191 kb for the genotype X. There were multiple differences among East African genomes in the 360 and 110 multicopy gene families. The gene corresponding to 360-19R has transposed to the 5' variable region in both genotype X isolates. Additionally, there is a 110 ORF in the tick-derived genotype X isolate formed by fusion of 13L and 14L that is unique among ASFV genomes. In future, functional analysis based on the variations in the multicopy families may reveal whether they contribute to the observed differences in virulence between genotpye IX and X viruses.

De Villiers EP, Bongcam-Rudloff E. 2014. eBioKit bioinformatics workshops in Dar es Salaam, Tanzania EMBnet.journal, 20 pp. 755-755. | Read more

Zubair S, de Villiers EP, Fuxelius HH, Andersson G, Johansson K-E, Bishop RP, Bongcam-Rudloff E. 2013. Genome Sequence of Streptococcus agalactiae Strain 09mas018883, Isolated from a Swedish Cow. Genome Announc, 1 (4), pp. e00456-13-e00456-13. | Show Abstract | Read more

We announce the complete genome sequence of Streptococcus agalactiae strain 09mas018883, isolated from the milk of a cow with clinical mastitis. The availability of this genome may allow identification of candidate genes, leading to discovery of antigens that might form the basis for development of a vaccine as an alternative means of mastitis control.

Zubair S, de Villiers EP, Younan M, Andersson G, Tettelin H, Riley DR, Jores J, Bongcam-Rudloff E, Bishop RP. 2013. Genome Sequences of Two Pathogenic Streptococcus agalactiae Isolates from the One-Humped Camel Camelus dromedarius. Genome Announc, 1 (4), pp. e00515-13-e00515-13. | Show Abstract | Read more

Streptococcus agalactiae causes a range of clinical syndromes in camels (Camelus dromedarius). We report the genome sequences of two S. agalactiae isolates that induce abscesses in Kenyan camels. These genomes provide novel data on the composition of the S. agalactiae "pan genome" and reveal the presence of multiple genomic islands.

Fischer A, Liljander A, Kaspar H, Muriuki C, Fuxelius H-H, Bongcam-Rudloff E, de Villiers EP, Huber CA, Frey J, Daubenberger C et al. 2013. Camel Streptococcus agalactiae populations are associated with specific disease complexes and acquired the tetracycline resistance gene tetM via a Tn916-like element. Vet Res, 44 (1), pp. 86. | Show Abstract | Read more

Camels are the most valuable livestock species in the Horn of Africa and play a pivotal role in the nutritional sustainability for millions of people. Their health status is therefore of utmost importance for the people living in this region. Streptococcus agalactiae, a Group B Streptococcus (GBS), is an important camel pathogen. Here we present the first epidemiological study based on genetic and phenotypic data from African camel derived GBS. Ninety-two GBS were characterized using multilocus sequence typing (MLST), capsular polysaccharide typing and in vitro antimicrobial susceptibility testing. We analysed the GBS using Bayesian linkage, phylogenetic and minimum spanning tree analyses and compared them with human GBS from East Africa in order to investigate the level of genetic exchange between GBS populations in the region. Camel GBS sequence types (STs) were distinct from other STs reported so far. We mapped specific STs and capsular types to major disease complexes caused by GBS. Widespread resistance (34%) to tetracycline was associated with acquisition of the tetM gene that is carried on a Tn916-like element, and observed primarily among GBS isolated from mastitis. The presence of tetM within different MLST clades suggests acquisition on multiple occasions. Wound infections and mastitis in camels associated with GBS are widespread and should ideally be treated with antimicrobials other than tetracycline in East Africa.

Ferguson ME, Hearne SJ, Close TJ, Wanamaker S, Moskal WA, Town CD, de Young J, Marri PR, Rabbi IY, de Villiers EP. 2012. Identification, validation and high-throughput genotyping of transcribed gene SNPs in cassava. Theor Appl Genet, 124 (4), pp. 685-695. | Show Abstract | Read more

The availability of genomic resources can facilitate progress in plant breeding through the application of advanced molecular technologies for crop improvement. This is particularly important in the case of less researched crops such as cassava, a staple and food security crop for more than 800 million people. Here, expressed sequence tags (ESTs) were generated from five drought stressed and well-watered cassava varieties. Two cDNA libraries were developed: one from root tissue (CASR), the other from leaf, stem and stem meristem tissue (CASL). Sequencing generated 706 contigs and 3,430 singletons. These sequences were combined with those from two other EST sequencing initiatives and filtered based on the sequence quality. Quality sequences were aligned using CAP3 and embedded in a Windows browser called HarvEST:Cassava which is made available. HarvEST:Cassava consists of a Unigene set of 22,903 quality sequences. A total of 2,954 putative SNPs were identified. Of these 1,536 SNPs from 1,170 contigs and 53 cassava genotypes were selected for SNP validation using Illumina's GoldenGate assay. As a result 1,190 SNPs were validated technically and biologically. The location of validated SNPs on scaffolds of the cassava genome sequence (v.4.1) is provided. A diversity assessment of 53 cassava varieties reveals some sub-structure based on the geographical origin, greater diversity in the Americas as opposed to Africa, and similar levels of diversity in West Africa and southern, eastern and central Africa. The resources presented allow for improved genetic dissection of economically important traits and the application of modern genomics-based approaches to cassava breeding and conservation.

De Villiers E, Kumuthini J, Bongcam-Rudloff E. 2011. ISCB Africa ASBCB Conference on Bioinformatics and eBioKit Workshop EMBnet.journal, 17 (2), pp. 7-7. | Read more

Ommeh S, Budd A, Ngara MV, Njaci I, de Villiers EP. 2011. Basic Molecular Evolution Workshop--A trans-African virtual training course: "Virtual Workshops": Is Africa ready to embrace the concept? Bioessays, 33 (4), pp. 243-247. | Read more

Fuxelius H, Bongcam E, Jaufeerally Y. 2011. The contribution of the eBioKit to Bioinformatics Education in Southern Africa EMBnet.journal, 16 (1), pp. 29-29. | Read more

Visendi P, Ng'ang'a W, Bulimo W, Bishop R, Ochanda J, de Villiers EP. 2011. TparvaDB: a database to support Theileria parva vaccine development. Database (Oxford), 2011 pp. bar015. | Show Abstract | Read more

We describe the development of TparvaDB, a comprehensive resource to facilitate research towards development of an East Coast fever vaccine, by providing an integrated user-friendly database of all genome and related data currently available for Theileria parva. TparvaDB is based on the Generic Model Organism Database (GMOD) platform. It contains a complete reference genome sequence, Expressed Sequence Tags (ESTs), Massively Parallel Signature Sequencing (MPSS) expression tag data and related information from both public and private repositories. The Artemis annotation workbench provides online annotation functionality. TparvaDB represents a resource that will underpin and promote ongoing East Coast fever vaccine development and biological research. Database URL: http://tparvadb.ilri.cgiar.org.

de Villiers EP, Gallardo C, Arias M, da Silva M, Upton C, Martin R, Bishop RP. 2010. Phylogenomic analysis of 11 complete African swine fever virus genome sequences. Virology, 400 (1), pp. 128-136. | Show Abstract | Read more

Viral molecular epidemiology has traditionally analyzed variation in single genes. Whole genome phylogenetic analysis of 123 concatenated genes from 11 ASFV genomes, including E75, a newly sequenced virulent isolate from Spain, identified two clusters. One contained South African isolates from ticks and warthog, suggesting derivation from a sylvatic transmission cycle. The second contained isolates from West Africa and the Iberian Peninsula. Two isolates, from Kenya and Malawi, were outliers. Of the nine genomes within the clusters, seven were within p72 genotype 1. The 11 genomes sequenced comprised only 5 of the 22 p72 genotypes. Comparison of synonymous and non-synonymous mutations at the genome level identified 20 genes subject to selection pressure for diversification. A novel gene of the E75 virus evolved by the fusion of two genes within the 360 multicopy family. Comparative genomics reveals high diversity within a limited sample of the ASFV viral gene pool.

Gichora NN, Fatumo SA, Ngara MV, Chelbat N, Ramdayal K, Opap KB, Siwo GH, Adebiyi MO, El Gonnouni A, Zofou D et al. 2010. Ten simple rules for organizing a virtual conference--anywhere. PLoS Comput Biol, 6 (2), pp. e1000650. | Read more

Weir W, Sunter J, Chaussepied M, Skilton R, Tait A, de Villiers EP, Bishop R, Shiels B, Langsley G. 2009. Highly syntenic and yet divergent: a tale of two Theilerias. Infect Genet Evol, 9 (4), pp. 453-461. | Show Abstract | Read more

The published genomic sequences of the two major host-transforming Theileria species of cattle represent a rich resource of information that has allowed novel bioinformatic and experimental studies into these important apicomplexan parasites. Since their publication in 2005, the genomes of T. annulata and T. parva have been utilised for a diverse range of applications, ranging from candidate antigen discovery to the identification of genetic markers for population analysis. This has led to advancements in the quest for a sub-unit vaccine, while providing a greater understanding of variation among parasite populations in the field. The unique ability of these Theileria species to induce host cell transformation is the subject of considerable scientific interest and the availability of full genomic sequences has provided new insights into this area of research. This article reviews the data underlying published comparative analyses, focussing on the general features of gene expression, the major Tpr/Tar multi-copy gene family and a re-examination of the predicted macroschizont secretome. Codon usage between the Theileria species is reviewed in detail, as this underpins ongoing comparative studies investigating selection at the intra- and inter-species level. The TashAT/TpshAT family of genes, conserved between T. annulata and T. parva, encodes products targeted to the host nucleus and has been implicated in contributing to the transformed bovine phenotype. Species-specific expansion and diversification at this critical locus is discussed with reference to the availability, in the near future, of genomic datasets which are based on non-transforming Theileria species.

Githui EK, De Villiers EP, McArthur AG. 2009. Plasmodium possesses dynein light chain classes that are unique and conserved across species. Infect Genet Evol, 9 (3), pp. 337-343. | Show Abstract | Read more

Plasmodium belongs to the phylum Apicomplexa. Within the Apicomplexa, Plasmodium, Toxoplasma and Cryptosporidium are parasites of considerable medical importance while Theileria and Eimeria are animal pathogens. P. falciparum is particularly important as it causes malaria, resulting in more than 1 million deaths each year. The malaria parasite actively invades the host cell in which it propagates and several proteins associated with the apical organelles have been implicated to be crucial in the invasion process. The biogenesis of the apical organelles is not well understood, but several studies indicate that microtubule-based vesicular transport is involved. Vesicular transport proteins are also present in Plasmodium and are presumed to be involved in transcellular transport in infected erythrocytes. Dynein is a multi-subunit motor protein involved in microtubule-based vesicular transport. In this study, we analyzed the cytoplasmic dynein light chains (Dlcs) of P. falciparum since they provide adaptor surface to the cargoes and are likely to be involved in differential transport. Dlcs consist of three different families: TcTex1/2, LC8 and LC7/roadblock. The data presented demonstrate that P. falciparum Dlcs sequences and functional domains show high sequence similarity within the species, but that only the Dlc group 1 (LC8) has a high similarity to human orthologues. TcTex1 and LC7/roadblock have low similarity to human orthologues. This sequence variation could be targeted for vaccine or drug development.

Langsley G, van Noort V, Carret C, Meissner M, de Villiers EP, Bishop R, Pain A. 2008. Comparative genomics of the Rab protein family in Apicomplexan parasites. Microbes Infect, 10 (5), pp. 462-470. | Show Abstract | Read more

Rab genes encode a subgroup of small GTP-binding proteins within the ras super-family that regulate targeting and fusion of transport vesicles within the secretory and endocytic pathways. These genes are of particular interest in the protozoan phylum Apicomplexa, since a family of Rab GTPases has been described for Plasmodium and most putative secretory pathway proteins in Apicomplexa have conventional predicted signal peptides. Moreover, peptide motifs have now been identified within a large number of secreted Plasmodium proteins that direct their targeting to the red blood cell cytosol, the apicoplast, the food vacuole and Maurer's clefs; in contrast, motifs that direct proteins to secretory organelles (rhoptries, micronemes and microspheres) have yet to be defined. The nature of the vesicle in which these proteins are transported to their destinations remains unknown and morphological structures equivalent to the endoplasmic reticulum and trans-Golgi stacks typical of other eukaryotes cannot be visualised in Apicomplexa. Since Rab GTPases regulate vesicular traffic in all eukaryotes, and this traffic in intracellular parasites could regulate import of nutrient and drugs and export of antigens, host cell modulatory proteins and lactate we compare and contrast here the Rab families of Apicomplexa.

Graham SP, Pellé R, Yamage M, Mwangi DM, Honda Y, Mwakubambanya RS, de Villiers EP, Abuya E, Awino E, Gachanja J et al. 2008. Characterization of the fine specificity of bovine CD8 T-cell responses to defined antigens from the protozoan parasite Theileria parva. Infect Immun, 76 (2), pp. 685-694. | Show Abstract | Read more

Immunity against the bovine intracellular protozoan parasite Theileria parva has been shown to be mediated by CD8 T cells. Six antigens targeted by CD8 T cells from T. parva-immune cattle of different major histocompatibility complex (MHC) genotypes have been identified, raising the prospect of developing a subunit vaccine. To facilitate further dissection of the specificity of protective CD8 T-cell responses and to assist in the assessment of responses to vaccination, we set out to identify the epitopes recognized in these T. parva antigens and their MHC restriction elements. Nine epitopes in six T. parva antigens, together with their respective MHC restriction elements, were successfully identified. Five of the cytotoxic-T-lymphocyte epitopes were found to be restricted by products of previously described alleles, and four were restricted by four novel restriction elements. Analyses of CD8 T-cell responses to five of the epitopes in groups of cattle carrying the defined restriction elements and immunized with live parasites demonstrated that, with one exception, the epitopes were consistently recognized by animals of the respective genotypes. The analysis of responses was extended to animals immunized with multiple antigens delivered in separate vaccine constructs. Specific CD8 T-cell responses were detected in 19 of 24 immunized cattle. All responder cattle mounted responses specific for antigens for which they carried an identified restriction element. By contrast, only 8 of 19 responder cattle displayed a response to antigens for which they did not carry an identified restriction element. These data demonstrate that the identified antigens are inherently dominant in animals with the corresponding MHC genotypes.

Graham SP, Honda Y, Pellé R, Mwangi DM, Glew EJ, de Villiers EP, Shah T, Bishop R, van der Bruggen P, Nene V, Taracha ELN. 2007. A novel strategy for the identification of antigens that are recognised by bovine MHC class I restricted cytotoxic T cells in a protozoan infection using reverse vaccinology. Immunome Res, 3 (1), pp. 2. | Show Abstract | Read more

BACKGROUND: Immunity against the bovine protozoan parasite Theileria parva has previously been shown to be mediated through lysis of parasite-infected cells by MHC class I restricted CD8+ cytotoxic T lymphocytes. It is hypothesized that identification of CTL target schizont antigens will aid the development of a sub-unit vaccine. We exploited the availability of the complete genome sequence data and bioinformatics tools to identify genes encoding secreted or membrane anchored proteins that may be processed and presented by the MHC class I molecules of infected cells to CTL. RESULTS: Of the 986 predicted open reading frames (ORFs) encoded by chromosome 1 of the T. parva genome, 55 were selected based on the presence of a signal peptide and/or a transmembrane helix domain. Thirty six selected ORFs were successfully cloned into a eukaryotic expression vector, transiently transfected into immortalized bovine skin fibroblasts and screened in vitro using T. parva-specific CTL. Recognition of gene products by CTL was assessed using an IFN-gamma ELISpot assay. A 525 base pair ORF encoding a 174 amino acid protein, designated Tp2, was identified by T. parva-specific CTL from 4 animals. These CTL recognized and lysed Tp2 transfected skin fibroblasts and recognized 4 distinct epitopes. Significantly, Tp2 specific CD8+ T cell responses were observed during the protective immune response against sporozoite challenge. CONCLUSION: The identification of an antigen containing multiple CTL epitopes and its apparent immunodominance during a protective anti-parasite response makes Tp2 an attractive candidate for evaluation of its vaccine potential.

Graham SP, Pellé R, Honda Y, Mwangi DM, Tonukari NJ, Yamage M, Glew EJ, de Villiers EP, Shah T, Bishop R et al. 2006. Theileria parva candidate vaccine antigens recognized by immune bovine cytotoxic T lymphocytes. Proc Natl Acad Sci U S A, 103 (9), pp. 3286-3291. | Show Abstract | Read more

East Coast fever, caused by the tick-borne intracellular apicomplexan parasite Theileria parva, is a highly fatal lymphoproliferative disease of cattle. The pathogenic schizont-induced lymphocyte transformation is a unique cancer-like condition that is reversible with parasite removal. Schizont-infected cell-directed CD8(+) cytotoxic T lymphocytes (CTL) constitute the dominant protective bovine immune response after a single exposure to infection. However, the schizont antigens targeted by T. parva-specific CTL are undefined. Here we show the identification of five candidate vaccine antigens that are the targets of MHC class I-restricted CD8(+) CTL from immune cattle. CD8(+) T cell responses to these antigens were boosted in T. parva-immune cattle resolving a challenge infection and, when used to immunize naïve cattle, induced CTL responses that significantly correlated with survival from a lethal parasite challenge. These data provide a basis for developing a CTL-targeted anti-East Coast fever subunit vaccine. In addition, orthologs of these antigens may be vaccine targets for other apicomplexan parasites.

Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M et al. 2005. Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science, 309 (5731), pp. 134-137. | Show Abstract | Read more

We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.

Pain A, Renauld H, Berriman M, Murphy L, Yeats CA, Weir W, Kerhornou A, Aslett M, Bishop R, Bouchier C et al. 2005. Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science, 309 (5731), pp. 131-133. | Show Abstract | Read more

Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.

Collins NE, Liebenberg J, de Villiers EP, Brayton KA, Louw E, Pretorius A, Faber FE, van Heerden H, Josemans A, van Kleef M et al. 2005. The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number. Proc Natl Acad Sci U S A, 102 (3), pp. 838-843. | Show Abstract | Read more

Heartwater, a tick-borne disease of domestic and wild ruminants, is caused by the intracellular rickettsia Ehrlichia ruminantium (previously known as Cowdria ruminantium). It is a major constraint to livestock production throughout subSaharan Africa, and it threatens to invade the Americas, yet there is no immediate prospect of an effective vaccine. A shotgun genome sequencing project was undertaken in the expectation that access to the complete protein coding repertoire of the organism will facilitate the search for vaccine candidate genes. We report here the complete 1,516,355-bp sequence of the type strain, the stock derived from the South African Welgevonden isolate. Only 62% of the genome is predicted to be coding sequence, encoding 888 proteins and 41 stable RNA species. The most striking feature is the large number of tandemly repeated and duplicated sequences, some of continuously variable copy number, which contributes to the low proportion of coding sequence. These repeats have mediated numerous translocation and inversion events that have resulted in the duplication and truncation of some genes and have also given rise to new genes. There are 32 predicted pseudogenes, most of which are truncated fragments of genes associated with repeats. Rather then being the result of the reductive evolution seen in other intracellular bacteria, these pseudogenes appear to be the product of ongoing sequence duplication events.

Bishop R, Shah T, Pelle R, Hoyle D, Pearson T, Haines L, Brass A, Hulme H, Graham SP, Taracha ELN et al. 2005. Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage. Nucleic Acids Res, 33 (17), pp. 5503-5511. | Show Abstract | Read more

Massively parallel signature sequencing (MPSS) was used to analyze the transcriptome of the intracellular protozoan Theileria parva. In total 1,095,000, 20 bp sequences representing 4371 different signatures were generated from T.parva schizonts. Reproducible signatures were identified within 73% of potentially detectable predicted genes and 83% had signatures in at least one MPSS cycle. A predicted leader peptide was detected on 405 expressed genes. The quantitative range of signatures was 4-52,256 transcripts per million (t.p.m.). Rare transcripts (<50 t.p.m.) were detected from 36% of genes. Sequence signatures approximated a lognormal distribution, as in microarray. Transcripts were widely distributed throughout the genome, although only 47% of 138 telomere-associated open reading frames exhibited signatures. Antisense signatures comprised 13.8% of the total, comparable with Plasmodium. Eighty five predicted genes with antisense signatures lacked a sense signature. Antisense transcripts were independently amplified from schizont cDNA and verified by sequencing. The MPSS transcripts per million for seven genes encoding schizont antigens recognized by bovine CD8 T cells varied 1000-fold. There was concordance between transcription and protein expression for heat shock proteins that were very highly expressed according to MPSS and proteomics. The data suggests a low level of baseline transcription from the majority of protein-coding genes.

de Villiers EP, Brayton KA, Zweygarth E, Allsopp BA. 2000. Genome size and genetic map of Cowdria ruminantium. Microbiology, 146 ( Pt 10) (10), pp. 2627-2634. | Show Abstract | Read more

Cowdria ruminantium is the cause of a serious tick-borne disease of domestic ruminants, known as heartwater or cowdriosis. The organism belongs to the tribe Ehrlichieae:, which contains obligate intracellular pathogens, causing several important animal and human diseases. Although a few C. ruminantium genes have been cloned and sequenced, very little is known about the size, gross structure and organization of the genome. This paper presents a complete physical map and a preliminary genetic map for C. ruminantium. Chromosomal C. ruminantium DNA was examined by PFGE and Southern hybridization. PFGE analysis revealed that C. ruminantium has a circular chromosome approximately 1576 kb in size. A physical map was derived by combining the results of PFGE analysis of DNA fragments resulting from digestion of the whole genome with KSP:I, RSR:II and SMA:I and Southern hybridization analysis with a series of gene probes and isolated macrorestriction fragments. A genetic map for C. ruminantium with a mean resolution of 290 kb was established, the first for a member of the Ehrlichieae: A total of nine genes or cloned C. ruminantium DNA fragments were mapped to specific KSP:I, RSR:II and SMA:I fragments, including the major antigenic protein gene, map-1.

de Villiers EP, Brayton KA, Zweygarth E, Allsopp BA. 2000. Macrorestriction fragment profiles reveal genetic variation of Cowdria ruminantium isolates. J Clin Microbiol, 38 (5), pp. 1967-1970. | Show Abstract

Macrorestriction profile analysis by pulsed-field gel electrophoresis (PFGE) was used to distinguish between seven isolates of Cowdria ruminantium from geographically different areas. Characteristic profiles were generated for each isolate by using the restriction endonucleases KspI, SalI, and SmaI with chromosomal sizes ranging between 1,546 and 1,692 kb. Statistical analysis of the macrorestriction profiles indicated that all the isolates were distinct from each other; these data contribute to a better understanding of the epidemiology of this pathogen and may be exploited for the identification of genotype-specific DNA probes.

Brayton KA, De Villiers EP, Fehrsen J, Nxomani C, Collins NE, Allsopp BA. 1999. Cowdria ruminantium DNA is unstable in a SuperCos1 library. Onderstepoort J Vet Res, 66 (2), pp. 111-117. | Show Abstract

A Cowdria ruminantium genomic library was constructed in a cosmid vector to serve as a source of easily accessible and pure C. ruminantium DNA for molecular genetic studies. The cosmid library contained 846 clones which were arrayed into microtitre plates. Restriction enzyme digestion patterns indicated that these clones had an average insert size of 35 kb. Probing of the arrays did not detect any bovine clones and only one of the known C. ruminantium genes, pCS20, was detected. Due to the high AT content and the fact that C. ruminantium genes are active in the Escherichia coli host, the C. ruminantium clones were unstable in the SuperCos1 vector and most clones did not grow reproducibly. The library was contaminated with E. coli clones and these clones were maintained with greater fidelity than the C. ruminantium clones, resulting in a skewed representation over time. We have isolated seven C. ruminantium clones which we were able to serially culture reproducibly; two of these clones overlap. These clones constitute the first large regions of C. ruminantium DNA to be cloned and represent almost 10% of the C. ruminantium genome.

de Villiers EP, Brayton KA, Zweygarth E, Allsopp BA. 1998. Purification of Cowdria ruminantium organisms for use in genome analysis by pulsed-field gel electrophoresis. Ann N Y Acad Sci, 849 (1), pp. 313-320. | Show Abstract | Read more

Cowdria ruminantium is an obligate intracellular rickettsial pathogen which is responsible for a tick-borne disease of domestic and wild ruminants called heartwater or cowdriosis. Although several genes have been cloned and partially sequenced, the genome size, gross structure, and organization of the C. ruminantium genome is unknown. Genome analysis of the organism has been hindered because it is difficult to obtain C. ruminantium DNA free from contaminating host cell DNA, and this probably accounts for the lack of genome size data for this organism. In this study we investigated several methods for purifying C. ruminantium from bovine cellular contaminants and organisms of a relatively high purity were obtained. These were used to prepare Cowdria DNA which was analyzed by pulsed-field gel electrophoresis (PFGE) and which revealed a genome approximately 1900 kbp in length plus an additional extra-chromosomal fragment migrating with an apparent size of 815 kbp. This is the first time that the genome size of C. ruminantium has been determined and the first demonstration of an extrachromosomal element.

Collins NE, De Villiers EP, Brayton KA, Allsopp BA. 1998. DNA sequence of a cosmid clone of Cowdria ruminantium. Ann N Y Acad Sci, 849 (1), pp. 365-368. | Read more

Brayton KA, Fehrsen J, de Villiers EP, van Kleef M, Allsopp BA. 1997. Construction and initial analysis of a representative lambda ZAPII expression library of the intracellular rickettsia Cowdria ruminantium: cloning of map1 and three other Cowdria genes. Vet Parasitol, 72 (2), pp. 185-199. | Show Abstract | Read more

The causative agent of heartwater, the rickettsia Cowdria ruminantium, is very poorly understood at the molecular level owing to a profound lack of suitable tools. We have developed an immunoaffinity chromatographic method to purify C. ruminantium from host cell components and the purified rickettsial cells have been used to prepare substantially pure Cowdria DNA. This DNA has been used to construct what we believe to be the first fully representative C. ruminantium expression library. A clone containing the complete Cowdria map1 gene has been isolated and sequenced. This gene has been expressed in E. coli cells from the native Cowdria promoter, suggesting that the mechanisms for gene transcription and translation are similar between these two organisms. Parts of three other Cowdria genes have also been isolated and sequenced.

Swart P, De Villiers EP, Swart AC, van der Merwe KJ, Todres PC. 1993. The interaction of biogenic amines with adrenal cytochrome P450-dependent enzymes. Biochem Soc Trans, 21 (4), pp. 413S. | Read more

Hernández-de-Diego R, de Villiers EP, Klingström T, Gourlé H, Conesa A, Bongcam-Rudloff E. 2017. The eBioKit, a stand-alone educational platform for bioinformatics. PLoS Comput Biol, 13 (9), pp. e1005616. | Show Abstract | Read more

Bioinformatics skills have become essential for many research areas; however, the availability of qualified researchers is usually lower than the demand and training to increase the number of able bioinformaticians is an important task for the bioinformatics community. When conducting training or hands-on tutorials, the lack of control over the analysis tools and repositories often results in undesirable situations during training, as unavailable online tools or version conflicts may delay, complicate, or even prevent the successful completion of a training event. The eBioKit is a stand-alone educational platform that hosts numerous tools and databases for bioinformatics research and allows training to take place in a controlled environment. A key advantage of the eBioKit over other existing teaching solutions is that all the required software and databases are locally installed on the system, significantly reducing the dependence on the internet. Furthermore, the architecture of the eBioKit has demonstrated itself to be an excellent balance between portability and performance, not only making the eBioKit an exceptional educational tool but also providing small research groups with a platform to incorporate bioinformatics analysis in their research. As a result, the eBioKit has formed an integral part of training and research performed by a wide variety of universities and organizations such as the Pan African Bioinformatics Network (H3ABioNet) as part of the initiative Human Heredity and Health in Africa (H3Africa), the Southern Africa Network for Biosciences (SAnBio) initiative, the Biosciences eastern and central Africa (BecA) hub, and the International Glossina Genome Initiative.

Bishop RP, Fleischauer C, de Villiers EP, Okoth EA, Arias M, Gallardo C, Upton C. 2015. Comparative analysis of the complete genome sequences of Kenyan African swine fever virus isolates within p72 genotypes IX and X. Virus Genes, 50 (2), pp. 303-309. | Show Abstract | Read more

Twelve complete African swine fever virus (ASFV) genome sequences are currently publicly available and these include only one sequence from East Africa. We describe genome sequencing and annotation of a recent pig-derived p72 genotype IX, and a tick-derived genotype X isolate from Kenya using the Illumina platform and comparison with the Kenya 1950 isolate. The three genomes constitute a cluster that was phylogenetically distinct from other ASFV genomes, but 98-99 % conserved within the group. Vector-based compositional analysis of the complete genomes produced a similar topology. Of the 125 previously identified 'core' ASFV genes, two ORFs of unassigned function were absent from the genotype IX sequence which was 184 kb in size as compared to 191 kb for the genotype X. There were multiple differences among East African genomes in the 360 and 110 multicopy gene families. The gene corresponding to 360-19R has transposed to the 5' variable region in both genotype X isolates. Additionally, there is a 110 ORF in the tick-derived genotype X isolate formed by fusion of 13L and 14L that is unique among ASFV genomes. In future, functional analysis based on the variations in the multicopy families may reveal whether they contribute to the observed differences in virulence between genotpye IX and X viruses.

Visendi P, Ng'ang'a W, Bulimo W, Bishop R, Ochanda J, de Villiers EP. 2011. TparvaDB: a database to support Theileria parva vaccine development. Database (Oxford), 2011 pp. bar015. | Show Abstract | Read more

We describe the development of TparvaDB, a comprehensive resource to facilitate research towards development of an East Coast fever vaccine, by providing an integrated user-friendly database of all genome and related data currently available for Theileria parva. TparvaDB is based on the Generic Model Organism Database (GMOD) platform. It contains a complete reference genome sequence, Expressed Sequence Tags (ESTs), Massively Parallel Signature Sequencing (MPSS) expression tag data and related information from both public and private repositories. The Artemis annotation workbench provides online annotation functionality. TparvaDB represents a resource that will underpin and promote ongoing East Coast fever vaccine development and biological research. Database URL: http://tparvadb.ilri.cgiar.org.

de Villiers EP, Gallardo C, Arias M, da Silva M, Upton C, Martin R, Bishop RP. 2010. Phylogenomic analysis of 11 complete African swine fever virus genome sequences. Virology, 400 (1), pp. 128-136. | Show Abstract | Read more

Viral molecular epidemiology has traditionally analyzed variation in single genes. Whole genome phylogenetic analysis of 123 concatenated genes from 11 ASFV genomes, including E75, a newly sequenced virulent isolate from Spain, identified two clusters. One contained South African isolates from ticks and warthog, suggesting derivation from a sylvatic transmission cycle. The second contained isolates from West Africa and the Iberian Peninsula. Two isolates, from Kenya and Malawi, were outliers. Of the nine genomes within the clusters, seven were within p72 genotype 1. The 11 genomes sequenced comprised only 5 of the 22 p72 genotypes. Comparison of synonymous and non-synonymous mutations at the genome level identified 20 genes subject to selection pressure for diversification. A novel gene of the E75 virus evolved by the fusion of two genes within the 360 multicopy family. Comparative genomics reveals high diversity within a limited sample of the ASFV viral gene pool.

Langsley G, van Noort V, Carret C, Meissner M, de Villiers EP, Bishop R, Pain A. 2008. Comparative genomics of the Rab protein family in Apicomplexan parasites. Microbes Infect, 10 (5), pp. 462-470. | Show Abstract | Read more

Rab genes encode a subgroup of small GTP-binding proteins within the ras super-family that regulate targeting and fusion of transport vesicles within the secretory and endocytic pathways. These genes are of particular interest in the protozoan phylum Apicomplexa, since a family of Rab GTPases has been described for Plasmodium and most putative secretory pathway proteins in Apicomplexa have conventional predicted signal peptides. Moreover, peptide motifs have now been identified within a large number of secreted Plasmodium proteins that direct their targeting to the red blood cell cytosol, the apicoplast, the food vacuole and Maurer's clefs; in contrast, motifs that direct proteins to secretory organelles (rhoptries, micronemes and microspheres) have yet to be defined. The nature of the vesicle in which these proteins are transported to their destinations remains unknown and morphological structures equivalent to the endoplasmic reticulum and trans-Golgi stacks typical of other eukaryotes cannot be visualised in Apicomplexa. Since Rab GTPases regulate vesicular traffic in all eukaryotes, and this traffic in intracellular parasites could regulate import of nutrient and drugs and export of antigens, host cell modulatory proteins and lactate we compare and contrast here the Rab families of Apicomplexa.

Graham SP, Pellé R, Honda Y, Mwangi DM, Tonukari NJ, Yamage M, Glew EJ, de Villiers EP, Shah T, Bishop R et al. 2006. Theileria parva candidate vaccine antigens recognized by immune bovine cytotoxic T lymphocytes. Proc Natl Acad Sci U S A, 103 (9), pp. 3286-3291. | Show Abstract | Read more

East Coast fever, caused by the tick-borne intracellular apicomplexan parasite Theileria parva, is a highly fatal lymphoproliferative disease of cattle. The pathogenic schizont-induced lymphocyte transformation is a unique cancer-like condition that is reversible with parasite removal. Schizont-infected cell-directed CD8(+) cytotoxic T lymphocytes (CTL) constitute the dominant protective bovine immune response after a single exposure to infection. However, the schizont antigens targeted by T. parva-specific CTL are undefined. Here we show the identification of five candidate vaccine antigens that are the targets of MHC class I-restricted CD8(+) CTL from immune cattle. CD8(+) T cell responses to these antigens were boosted in T. parva-immune cattle resolving a challenge infection and, when used to immunize naïve cattle, induced CTL responses that significantly correlated with survival from a lethal parasite challenge. These data provide a basis for developing a CTL-targeted anti-East Coast fever subunit vaccine. In addition, orthologs of these antigens may be vaccine targets for other apicomplexan parasites.

Gardner MJ, Bishop R, Shah T, de Villiers EP, Carlton JM, Hall N, Ren Q, Paulsen IT, Pain A, Berriman M et al. 2005. Genome sequence of Theileria parva, a bovine pathogen that transforms lymphocytes. Science, 309 (5731), pp. 134-137. | Show Abstract | Read more

We report the genome sequence of Theileria parva, an apicomplexan pathogen causing economic losses to smallholder farmers in Africa. The parasite chromosomes exhibit limited conservation of gene synteny with Plasmodium falciparum, and its plastid-like genome represents the first example where all apicoplast genes are encoded on one DNA strand. We tentatively identify proteins that facilitate parasite segregation during host cell cytokinesis and contribute to persistent infection of transformed host cells. Several biosynthetic pathways are incomplete or absent, suggesting substantial metabolic dependence on the host cell. One protein family that may generate parasite antigenic diversity is not telomere-associated.

Pain A, Renauld H, Berriman M, Murphy L, Yeats CA, Weir W, Kerhornou A, Aslett M, Bishop R, Bouchier C et al. 2005. Genome of the host-cell transforming parasite Theileria annulata compared with T. parva. Science, 309 (5731), pp. 131-133. | Show Abstract | Read more

Theileria annulata and T. parva are closely related protozoan parasites that cause lymphoproliferative diseases of cattle. We sequenced the genome of T. annulata and compared it with that of T. parva to understand the mechanisms underlying transformation and tropism. Despite high conservation of gene sequences and synteny, the analysis reveals unequally expanded gene families and species-specific genes. We also identify divergent families of putative secreted polypeptides that may reduce immune recognition, candidate regulators of host-cell transformation, and a Theileria-specific protein domain [frequently associated in Theileria (FAINT)] present in a large number of secreted proteins.

Collins NE, Liebenberg J, de Villiers EP, Brayton KA, Louw E, Pretorius A, Faber FE, van Heerden H, Josemans A, van Kleef M et al. 2005. The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number. Proc Natl Acad Sci U S A, 102 (3), pp. 838-843. | Show Abstract | Read more

Heartwater, a tick-borne disease of domestic and wild ruminants, is caused by the intracellular rickettsia Ehrlichia ruminantium (previously known as Cowdria ruminantium). It is a major constraint to livestock production throughout subSaharan Africa, and it threatens to invade the Americas, yet there is no immediate prospect of an effective vaccine. A shotgun genome sequencing project was undertaken in the expectation that access to the complete protein coding repertoire of the organism will facilitate the search for vaccine candidate genes. We report here the complete 1,516,355-bp sequence of the type strain, the stock derived from the South African Welgevonden isolate. Only 62% of the genome is predicted to be coding sequence, encoding 888 proteins and 41 stable RNA species. The most striking feature is the large number of tandemly repeated and duplicated sequences, some of continuously variable copy number, which contributes to the low proportion of coding sequence. These repeats have mediated numerous translocation and inversion events that have resulted in the duplication and truncation of some genes and have also given rise to new genes. There are 32 predicted pseudogenes, most of which are truncated fragments of genes associated with repeats. Rather then being the result of the reductive evolution seen in other intracellular bacteria, these pseudogenes appear to be the product of ongoing sequence duplication events.

Bishop R, Shah T, Pelle R, Hoyle D, Pearson T, Haines L, Brass A, Hulme H, Graham SP, Taracha ELN et al. 2005. Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage. Nucleic Acids Res, 33 (17), pp. 5503-5511. | Show Abstract | Read more

Massively parallel signature sequencing (MPSS) was used to analyze the transcriptome of the intracellular protozoan Theileria parva. In total 1,095,000, 20 bp sequences representing 4371 different signatures were generated from T.parva schizonts. Reproducible signatures were identified within 73% of potentially detectable predicted genes and 83% had signatures in at least one MPSS cycle. A predicted leader peptide was detected on 405 expressed genes. The quantitative range of signatures was 4-52,256 transcripts per million (t.p.m.). Rare transcripts (<50 t.p.m.) were detected from 36% of genes. Sequence signatures approximated a lognormal distribution, as in microarray. Transcripts were widely distributed throughout the genome, although only 47% of 138 telomere-associated open reading frames exhibited signatures. Antisense signatures comprised 13.8% of the total, comparable with Plasmodium. Eighty five predicted genes with antisense signatures lacked a sense signature. Antisense transcripts were independently amplified from schizont cDNA and verified by sequencing. The MPSS transcripts per million for seven genes encoding schizont antigens recognized by bovine CD8 T cells varied 1000-fold. There was concordance between transcription and protein expression for heat shock proteins that were very highly expressed according to MPSS and proteomics. The data suggests a low level of baseline transcription from the majority of protein-coding genes.

2554