|
Human Papilloma Virus (HPV) Study
ApoCom Genomics has designed a set of PCR primer candidates for oncogenic Human Papillomavirus (HPV) types 16 and 18. The motivations for this design were threefold: (1) to create a PCR-based test (as opposed to existing methods, such as hybrid capture), (2) to circumvent the intellectual property of others by choosing sequences in non-L1/2-based regions of the HPV genomes, and (3) to validate the “stability”based primer design approach used by ApoCom Genomics’ investigative team.
ApoCom Genomics undertook a phylogenetic study of all publicly available HPV sequences in Genbank. The purposes of the study were to gain a better understanding of the evolutionary relationships between the differing HPV types, to use the resulting multiple sequence alignments as a foundation for PCR primer design, and to correlate our findings with a recently published paper in the New England Journal of Medicine (“Epidemiologic Classification of Human Papillomavirus Types Associated with Cervical Cancer”, Munoz et al., 2003).
Tobacco Genome Annotation Project
ApoCom was contracted by East Tennessee University (ETSU) to employ its advanced bioinformatics software programs to annotate and note areas of significance within the genome of the tobacco plant (Nicotiana tabacum). This eukaryotic plant has a very large and complex genome, and therefore holds great promise as a repository for genes associated with a myriad of regulatory and production activities. N. tobacum’s genome is approximately 4.5 billion base pairs long, which is 1.5 times larger than the human genome.
ApoCom initiated a two phased approach in annotating the tobacco genome. The first phase focused on downloading publicly available data on its genome from sources such as GenBank, University of North Carolina, and other tobacco research centers around the world, and applying ApoCom’s advanced bioinformatics tools to locate regions of significance. These genomic regions included the coding (exons) and non-coding (introns) portions as well as promoter regions, and other areas if significance. The Company then applied its gene modeling capabilities to predict the nucleotide structure of the plant’s treasure trove of genes.
ApoCom has extracted a great deal of data from Genbank, which is a world wide repository of sequence data. To date ApoCom has downloaded and initiated analysis on the following:
- The complete N. tabacum plastid genome (~150,000 base pairs),
- 11,000 Expressed Sequence Tags (ESTs), Messenger RNA (mRNA), and complimentary DNA (cDNA) sequences (each EST ~400-1000 base pairs),
- 1,550 N. tabacum proteins (confirmed and hypothetical),
- 20,000 other ESTs belonging to the Nicotiana genus or listed as being similar to tobacco, and
- 600 other proteins from Nicotiana genus or listed as being similar to tobacco
ApoCom also analyzed some of the tobacco cDNA clones to identify genes. This process consists of taking ESTs/cDNAs and doing a “BLASTx” (Basic Local Alignment of Sequence Targets), in which the nucleotide sequence pieces are compared with known protein databases. An example of this BLAST effort is provided in Attachment 1. In this output we see that the unknown tobacco “MAT001” cDNA was identified as a lipoxygenase following a BLAST of the sequence against known proteins. The benefit of this finding is that once portions of the tobacco genome are released by North Carolina State and other sequencing centers, we will know immediately which genes are associated with coding for lipoxygenase proteins.
As part of this work, ApoCom also built a tobacco EST database for specific use with in-house tools such as GrailEXP. The tobacco plastid genome was analyzed with use of this EST database applied to ApoCom’s GrailEXP program.
.
|