What is Reproducible Research?
"Reproducible Research (RR) is the practice of distributing, along with a research publication, all data, software source code, and tools required to reproduce the results discussed in the publication. As such the RR package not only describes the research and its results, but becomes a complete laboratory in which the research can be reproduced and extended." (Source: CTSPedia)
Strive for your analysis to be reproducible and document your code.
Data Management & Data Dictionary
Prepare your data for efficient analysis by creating a data dictionary prior to sharing your data. Data dictionaries give a list of variable names, type of variable (categorical, continuous, text), and interpretation of codes, e.g. 1="Female", 2="Male".
REDCap Database Development
Use REDCap to efficiently set up your database to make it easily accessible to your research team and usable for data analysis at the end of the study. REDCap is a secure web application for building and managing online surveys and databases.
If you want one of our biostatisticians to develop your database and forms for you, please submit our Request Biostatistics Consulting form and we will work with you to develop a scope of work and timeline for your project.
Analysis File Structure
- Each row is an individual and any repeated measurements are additional columns.
- Each row is a single observation, thus, individuals will have multiple rows if there are repeated observations on an individual.
- Datasets for analysis are best received as comma delimited text files
- Columns cannot mix characters and numbers
- Consistent capitalization is important; e.g. "Placebo" is different than placebo in data analysis
- Choose variable names that reflect the measures for easier interpretation
- Colors, bold, comments, etc. cannot be interpreted by statistical software
- Each piece of information, such as group designation must be in a separate column
- Missing data should be entered consistently for each variable. In comma delimited format a blank will be interpreted as a missing variable. Other common designations are ‘.’, "NA" or large negative numbers that are outside of the range of possible values, e.g. -999.