When to contact us
Contact at the earliest possible time, usually more than 6 weeks prior to any deadline. We can usually start on the data cleaning and analysis within a month. However, the amount of data cleaning and the complexity of the analysis will define the necessary timeline. Deadlines and starting times will be developed during the scope of work.
A data analysis project involves data management, performing the analysis and writing a report that is trimable for publication. We will also create summary tables and figures. We strive for the analysis to be reproducible and will provide documented code on request.
The size and scope of the analysis will be created as part of the scope of work. The data analysis chosen depends on the study design and other data complexities and the difficulties of analysis are often not linked with the amount of data collected. Thus, small clinical datasets can often take as much time as an analysis for a single outcome large clinical trial.
How to prepare your data for efficient analysis will be discussed at your scope of work. If data cleaning and significant data reorganization is needed, it will be a separate item written up in the scope of work.
It is often useful to create a data dictionary prior to sharing your data. Data dictionaries give a list of variable names, type of variable (categorical, continuous, text), and interpretation of codes, e.g. 1="Female", 2="Male".
Datasets for analysis are best received as comma delimited text files. There are two common formats for analysis files:
- each row is an individual and any repeated measurements are additional columns
- each row is a single observation;
thus, individuals will have multiple rows if there are repeated observations on an individual.
Columns cannot mix characters and numbers. Consistent capitalization is important; e.g. "Placebo" is different than placebo in data analysis.
Choose variable names that reflect the measures for easier interpretation.
Colors, bold, comments, etc... cannot be interpreted by statistical software. Each piece of information, such as group designation must be in a separate column.
Missing data should be entered consistently for each variable. In comma delimited format a blank will be interpreted as a missing variable. Other common designations are ‘.’, "NA" or large negative numbers that are outside of the range of possible values, e.g. -999.
The data analysis approach is chosen to align with the study design. We will work with you to identify the primary and secondary analyses. During this process a primary outcome measure is identified. This process is important to reduce false positive findings.
We strive to adapt new methods when appropriate, especially for emerging data such as microbiome, genetics/genomics and longitudinal analyses. Analyses are conducted in the statistical software of the analysts choosing.