Skip to main content
Sign In
 

Make your Research Reproducible


What is Reproducible Research?

"Reproducible Research (RR) is the practice of distributing, along with a research publication, all data, software source code, and tools required to reproduce the results discussed in the publication. As such the RR package not only describes the research and its results, but becomes a complete laboratory in which the research can be reproduced and extended." (Source: CTSPedia)

Strive for your analysis to be reproducible and document your code.

Data Management & Data Dictionary

Prepare your data for efficient analysis by creating a data dictionary prior to sharing your data. Data dictionaries give a list of variable names, type of variable (categorical, continuous, text), and interpretation of codes, e.g. 1="Female", 2="Male".

REDCap Database Development

Use REDCap to efficiently set up your database to make it easily accessible to your research team and usable for data analysis at the end of the study. REDCap is a secure web application for building and managing online surveys and databases.

If you want one of our biostatisticians to develop your database and forms for you, please submit our Request Biostatistics Consulting form and we will work with you to develop a scope of work and timeline for your project.

Analysis File Structure

  • Each row is an individual and any repeated measurements are additional columns.
  • Each row is a single observation, thus, individuals will have multiple rows if there are repeated observations on an individual.

Good Data

  • Datasets for analysis are best received as comma delimited text files
  • Columns cannot mix characters and numbers
  • Consistent capitalization is important; e.g. "Placebo" is different than placebo in data analysis
  • Choose variable names that reflect the measures for easier interpretation
  • Colors, bold, comments, etc. cannot be interpreted by statistical software
  • Each piece of information, such as group designation must be in a separate column
  • Missing data should be entered consistently for each variable. In comma delimited format a blank will be interpreted as a missing variable. Other common designations are ‘.’, "NA" or large negative numbers that are outside of the range of possible values, e.g. -999.

Colorado Biostatistics Consortium

13001 17th Place | Mail Stop B119 | Room W3129, Building 500 | Aurora, CO 80045
303-724-4370 | cbc.admin@ucdenver.edu

Biostatistics Consulting | Grant Collaboration | Department of Biostatistics and Informatics | BERD | BBSR | ColoradoSPH

© The Regents of the University of Colorado, a body corporate. All rights reserved.

Accredited by the Higher Learning Commission. All trademarks are registered property of the University. Used by permission only.