We develop new experimental methods for studying the genomics of gene regulation, making extensive use of second-generation DNA sequencing and computational biology.
Active areas of interest include:
Next-generation proteomics and whole proteome sequencing
Unlike the recent massive acceleration realized in DNA sequencing, polypeptide sequencing is a comparatively slow process. Whereas we can now sequence ~1 billion 50 base-pair fragments of DNA per day on a single instrument, a single mass spectrometer (MS) is only capable of ~100 thousand unique polypeptide sequences. Even with improvements in upstream sample preparation and liquid chromatography, we are approaching the fundamental speed limit of MS analysis, and further increases in the speed of polypeptide sequencing by MS will likely be incremental. A massively-parallel platform for peptide sequencing – analogous to those available for sequencing DNA – could revolutionize the analysis of complex protein mixtures, enabling quantitative, digital measurements of peptide abundance that have not been previously achievable. We are developing methods for peptide sequencing and identification on an unprecedented scale, and are exploring extensions of this new technology to study protein posttranslational modifications including phosphorylation and glycosylation.
Genome-scale molecular tools for analyzing modified DNA nucleobases
In addition to the four primary DNA nucleobases, a number of base modifications expand the chemical diversity of DNA in vivo and have profound effects on genome function. Intrinsic modifications, such as 5- methyl cytosine and deoxyuridine, are part of normal cellular metabolism and play integral roles in genetic and epigenetic regulation. In contrast, extrinsic modifications, such as pyrimidine dimers and nucleobase oxidation, arise from a variety of environmental exposures and can initiate aberrant cell growth or death. A detailed view of intrinsic and extrinsic nucleobase modification is necessary for a holistic view of genetic and epigenetic regulation, but we lack a global picture of how base modifications are created, maintained and repaired, and how their spatial distribution impacts genome function.
We have developed a method for identifying the precise genomic positions of modified DNA nucleobases by coupling in vitro base excision (BE) with next-generation DNA sequencing. In our first application of this method, we have determined the precise, genome-wide positions of deoxyuridine (dU) in E. coli and S. cerevisiae strains that accumulate dU in their genomic DNA, and discovered a conserved positional bias in genome-wide dU content. We have also extended the BE-seq method to identify pyrimidine dimers in UV-irradiated DNA, and are currently using this method to identify "hotspots" of UV-associated mutation that may underly common human skin diseases.
Quality control in pre-mRNA splicing
Splicing of pre-mRNAs in eukaryotes leads to branched RNA intermediates that are degraded by specialized pathways. Lariat-introns are products of normal splicing, and are degraded by the lariat- debranching enzyme, Dbr1, following their release from the spliceosome. Other branched intermediates derived from suboptimal pre-mRNA substrates are eliminated prior to productive splicing by the “discard” pathway.We have identified and characterized a novel debranching enzyme, Drn1/Ygr093w, in the budding yeast S. cerevisiae. Drn1 interacts with Dbr1 and displays 2’-5’ phosphodiesterase activity in vitro. Drn1 shares N- terminal sequence similarity to the catalytic domain of Dbr1, but is missing key active site residues found in Dbr1. Instead, the C-terminal CwfJ domains of Drn1 – which are not present in Dbr1 – are sufficient for debranching, and are inactivated by mutation of a single residue conserved across all CwfJ domains. Using in vitro splicing assays, we have shwon Drn1 and Dbr1 have non-redundant roles for debranching on optimal substrates; in addition, Drn1 is required for the efficient turnover of suboptimal substrates in vitro. This mechanism expands the complexity and scope of splicing regulation, and we are currently focused on understanding the molecular role of Drn1 in vitro, and are extending these studies to organisms with more complex mRNA splicing.
Genomic tools for RNA sequencing
Recently, we developed a method for the capture and sequencing of RNAs with terminal 2’,3’-cyclic phosphates (Pubmed). Cyclic phosphate-terminated RNAs are generated by endonucleolytic cleavage and self-cleaving ribozymes, and are found as stable modifications on cellular RNAs such as the U6 spliceosomal RNA. We have collaborated with several groups to use this reagent to identify e.g. cleavage products of cellular ribonucleases as well as catalytic RNAs. Because the method employs a ligase with unique specificity, we are discovering RNA populations that have gone undetected by other RNA cloning techniques.