Faculty in Biostatistics and Informatics carry out research to develop, evaluate and improve statistical methods for designing and analyzing health care studies. This research is often motivated by work with co-investigators doing clinical or public health research who encounter design or analysis questions that have not been adequately addressed in existing research literature. Here are some of the areas in which we work.1
New technologies in the 'omics field have changed the landscape of biomedical research in the last twenty years. Data generated from a variey of these high-throughput assays (microarrays, sequencing, mass-spectometry, etc.) pose interesting analytical challenges due to the high-dimensionality, large volume, and complex structure of the data. Many of our faculty are involved with developing methods from all stages of the omics data life-cycle, including study design, data generation, quality control, pre-processing, modeling, analysis, interpretation, archiving, data dissemination and software development. Depending on the question, statistical and computational approaches include machine learning, high-dimensional methods, latent variable models, statistical genetics, genetic risk prediction, sub-typing, causal modeling, data integration and non-parametric methods. Faculty work closely with core facilities generating the data and a variety of biomedical investigators on campus and beyond who provide interesting applications for specific systems and diseases.
Regarding microbiomics: Until recently, the contribution of the human microbiome (organisms that live in or on our body) to health has been studied using individually cultivatable organisms. High-throughput sequencing technology has now enabled investigators to obtain information on thousands of organisms within a community, thereby providing a global view of microbiota communities. This has opened the possibility of an entirely new, systemic view of how microbes react in response to certain stimuli, such as antibiotics and probiotics, and how they are associated with disease. With this new insight comes the desire to translate this knowledge to modify the microbial communities to promote health. Application of high-throughput sequencing technology to characterize microbiota communities produces large volumes of complex data requiring sophisticated statistical analyses to facilitate appropriate interpretation. Typical microbiome data derived from metagenomic sequencing is distinct and has characteristics similar to ecological, genomic and geochemical data. One challenge is that these studies demand a combination of computational, statistical, and microbiological expertise to manage, analyze, and provide scientific interpretation of the data.
Faculty: Tasha Fingerlin, Debashis Ghosh, Audrey Hendricks, Katerina Kechris, Miranda Kroehl, Sharon Lutz, Dominik Reinhold, Laura Saba, Stephanie Santorico, Brandie Wagner, Weiming Zhang
Links: Statistical Genomics Working Group2
Observations in health sciences studies are often made sequentially on subjects, resulting in temporal patterns and/or serial correlation. Studies also commonly involve subjects treated at the same hospital or by the same provider, which gives rise to similarities of those subjects. These situations require special statistical methods to maintain validity in situations of dependent observations and to learn about these sources of variation. In many cases multiple outcomes are also of interest. Faculty members are involved in developing new statistical methods for many such situations, for example where patterns of missing data depend on outcomes, e.g. in studies of viral and immune responses in HIV/AIDS patients; complex temporal patterns in longitudinal data subjects to pulses at unknown times, e.g. pulsatile stress hormones of depressed patients; use of surrogate measures when true covariates are not observable, e.g. associations of pollution measures with lung function of asthmatic children; and jointly modeled longitudinal measures and time to event outcomes, e.g. viral load trajectories and associated survival in HIV/AIDS.
Faculty: Anna Baron, Nichole Carlson, James Crooks, Diane Fairclough, Gary Grunwald, Elizabeth Juarez-Colunga, John Kittelson, Sam MaWhinney, Susan Mikulich-Gilbertson, Camille Moore, Matt Strand, Brandie Wagner, Gary Zerbe
Links: Joint and Longitudinal Modeling Working Group3
Estimating the number of subjects required to conduct a sensitive, efficient and ethical study is an extremely common and important question in health care research. Faculty are developing methods to carry out such estimation in situations where the study design is complex, for example involving longitudinal measurement of subjects over time, or subjects clustered within sites or clinicians. Some current faculty research can be found at SampleSizeShop.org, which provides all researchers, including behavioral and social scientists, with free, open-source, peer-reviewed power and sample size software, tutorials, and educational materials.
Faculty: Jud Blatchford, Deb Glueck, John Kittelson, Brandy Ringham
Links: RCTdesign; SampleSizeShop4
Outcomes in biomedical and public health studies are often more complex than simple numerical measures, and normality assumptions are not appropriate. When time to an event such as death, admission to hospital, or recurrence of disease is of interest, models are available to account for censoring (when only ranges of event times are known). These models have been extended to events that can occur more than once, a situation that is common in clinical medicine. Faculty members are studying situations when only counts of events in intervals are available, and implications of this limitation on required sample sizes. Other examples of statistical methods for non-normal outcomes being studied by faculty include numbers of bacteria of different species in samples of lung microbiome, and cost of health care in multiple cost categories or when cost may be zero.
Faculty: Anna Baron, Debashis Ghosh, Gary Grunwald, Elizabeth Juarez-Colunga, Brandie Wagner, Gary Zerbe
Exploring multiple types of imaging data, data processing algorithms, and new data analysis approaches for imaging data, multiple types of imaging data, data processing algorithms, and new data analysis approaches for imaging data.
Faculty: Nichole Carlson, Debashis Ghosh, Deborah Glueck
Links: Imaging Analysis Working Group6
Various aspects of study design and analysis necessary for moving beyond correlation and into causation.
Faculty: Debashis Ghosh, Sarah Schmiege, Sharon Lutz, Miranda Kroehl
Links: Causal Inference Working Group7
Distributions of variables and relationships among variables in health sciences studies are often more complex than can be captured with traditional parametric approaches. Non-parametric and semi-parametric methods relax some of the assumptions of parametric approaches. For non-parametric approaches, a distribution is not implicitly assumed in order to perform inference. This can be useful when a distribution is not known or is difficult to work with, situations that are especially common with biological data such as microbiome or genomic data. Common non-parametric approaches include permutation, bootstrapping and other resampling based methods. In other situations trends or relationships may be complex and nonlinear, in which case semi-parametric methods such as smoothing or splines are useful. Faculty are developing methods for such situations as drop-out patterns that depend in complex ways on patients’ health status in HIV/AIDS studies.
Faculty: Nichole Carlson, Miranda Kroehl, Sam MaWhinney, Brandie Wagner, Gary Zerbe8