We develop molecular methods that enable genomic discovery, leveraging advances in next-generation DNA sequencing and computational biology.
Current areas of interest include:
1. Next-generation proteomics and whole proteome sequencing
Unlike the recent massive acceleration realized in DNA sequencing, polypeptide sequencing is a comparatively slow process. Whereas we can now sequence ~1 billion 50 base-pair fragments of DNA per day on a single instrument, a single mass spectrometer (MS) is only capable of ~100 thousand unique polypeptide sequences. Even with improvements in upstream sample preparation and liquid chromatography, we are approaching the fundamental speed limit of MS analysis, and further increases in the speed of polypeptide sequencing by MS will likely be incremental. A massively-parallel platform for peptide sequencing – analogous to those available for sequencing DNA – could revolutionize the analysis of complex protein mixtures, enabling quantitative, digital measurements of peptide abundance that have not been previously achievable. We are developing methods for peptide sequencing and identification on an unprecedented scale, and are exploring extensions of this new technology to study protein posttranslational modifications including phosphorylation and glycosylation.
2. Genome-scale molecular tools for analyzing modified DNA nucleobases
In addition to the four primary DNA nucleobases, a number of base modifications expand the chemical diversity of DNA in vivo and have profound effects on genome function. Intrinsic modifications, such as 5- methyl cytosine and uracil, are part of normal cellular metabolism and play integral roles in genetic and epigenetic regulation. In contrast, extrinsic modifications, such as pyrimidine dimers and nucleobase oxidation, arise from a variety of environmental exposures and can initiate aberrant cell growth or death. A detailed view of intrinsic and extrinsic nucleobase modification is necessary for a holistic view of genetic and epigenetic regulation, but we lack a global picture of how base modifications are created, maintained and repaired, and how their spatial distribution impacts genome function. We developed a method for identifying the precise genomic positions of modified DNA nucleobases by coupling in vitro base excision with next-generation DNA sequencing. In our first application of this method, we determined the precise, genome-wide positions of uracil in E. coli and S. cerevisiae strains that accumulate uracil in their genomic DNA, and discovered a conserved positional bias in genome-wide uracil content. We have also extended the Excision-seq method to identify pyrimidine dimers in UV-irradiated DNA, and are currently using this method to characterize "hotspots" of UV-associated mutation that underlie common human skin diseases.
3. Genome-scale molecular tools for identifying novel RNAs
We recently developed a method for the capture and sequencing of RNAs with terminal 2’,3’-cyclic phosphates. Cyclic phosphate-terminated RNAs are generated by endonucleolytic cleavage and self-cleaving ribozymes, and are found as stable modifications on cellular RNAs such as the U6 spliceosomal RNA. We have collaborated with several groups to use this reagent to identify cleavage products of cellular ribonucleases as well as new catalytic RNAs. Because the method employs a ligase with unique specificity, it enables the characterization RNA populations that have gone undetected by other RNA cloning techniques. We have recently extended the method to identify RNAs with 5’-hydroxyl groups, and have applied the method to discover novel RNA cleavage fragments in the budding yeast transcriptome.