## Tables

#### Objective of Tables

To clearly identify the measures and variables being used in a study and determine levels of confidence in reporting results.

To begin, let us look at an example of a Table 1. Typically, studies begin with a summary of the patient characteristics to show the properties of the sample being studied. This table is often referred to as "Table 1" and shows characteristics associated with the group of participants. Continuous and categorical variables are depicted in the table.

Suppose there is a drug treatment (Drug X) designed to reduce the risk of stroke among people aged 60 years or older with isolated systolic hypertension.

#### Table 1: Characteristics of Hypertension Participants

Baseline After 6 Months
Characteristic Active (N=2365) Placebo (N=2371) Total (N=4736) Active (N=2330) Placebo (N=2350) Total (N=4680)
Age, mean (SD), y 71.6 (6.7) 71.5 (6.7) 71.6 (6.7) 72 (6.7) 72 (6.7) 72 (6.7)
Systolic Blood Pressure, mean (SD), mmHg 170.5 (9.5) 170.1 (9.2) 170.3 (9.4) 160.5 (11) 170.1 (9.2) 165.3 (10.1)
Current Smokers (%) 12.6 12.9 12.7 12.5 12.2 12.3
Past Smokers (%) 36.6 37.6 37.1 36.2 36.3 36.2
Never Smokers (%) 50.8 49.6 50.2 51.3 51.5

51.5

##### General Notes
• Label the table: Give each table a number and a title that concisely describes what the table represents.
• Footnote when necessary: Provide footnotes at the bottom of the table to provide explanations of table information.
• Refer to the table in the text: Refer to the table by number ("As Table 1 indicates…").
• Test Statistics: Report test statistics (t, F, x2, p) to 2 decimal places.
• Numerical precision should be consistent throughout the paper:
• Summary statistics (such as means) should not be given to more than one extra decimal place over the raw data.
• Standard deviations or standard errors may warrant more precise values.
• Regression analysis results also warrant precise values.
• Continuous variables: Summarize with means and report the standard deviation
• Categorical variables: Summarize with frequencies and percentages.

#### Simple Graphs and Correlation/Association

##### Study Type

The use of simple graphs/visuals is a great way to get started with a set of data. For example, in the set of histograms above in which the Distribution of SBP is shown by gender, it spurs the question of why the two groups look a little different. By drawing out the linear relationship, one can start to see patterns of a positive or negative correlation between these two variables to start teasing out additional thoughts on why this would be true. ##### Confidence Intervals

Confidence intervals are a range of values in which we feel confident that the true parameter is contained. It is important to distinguish confidence intervals from probabilities as we cannot attach a probability to the true value of the statistic based on a single sample of data. Confidence intervals are calculated in different ways depending on the type of statistic we are evaluating. In the example above, Systolic Blood Pressure (SBP) is a continuous variable for which a mean and standard deviation are given for the total participants in the study.

###### To calculate a confidence interval: Unknown mean (μ) and known standard deviation (σ) Note: x is the sample mean and the critical value (Z) for a 95% Confidence Interval is 1.96.

###### To calculate a confidence interval: Unknown mean (μ) and unknown standard deviation (σ) Note: When a proportion is the statistic being examined, confidence intervals are generated in a different way.

###### Benefit of Confidence Intervals
• The lack of precision of a sample statistic (for example: a mean) which results from the degree of variability in the factor being investigated and the limited study size, can be shown by a confidence interval.
• The width of a confidence interval is based on the standard error and the sample size.

#### Measures of Difference and Simple Graphical Presentation of Results

##### Categorical vs. Continuous data
• Categorical data is data that takes on a limited number of values. As the name would imply, the data fit into discrete categories. For example procedure type = "dental" and hospital = "Children’s Hospital" would categorical data. Categorical data can also be measured numerically. For example, ASA (an anesthesiology health score) can take on the values 1-5 and each patient is put into a category based on their health.
• Continuous data is data that can take on many values - too many to create specific categories. For example, pediatric doctors measure each patient’s height in centimeters, which can take on many different values up to 150 cm. So, height would be continuous data. Other examples include volume, cost, and time.
##### Analyzing Categorical Data

Let’s go back to our ASA example which can take on the values 1-5. You want to look at the ASA values of patients at Children’s Hospital compared to those at University of Colorado Hospital. Your table would look something like this:

ASA Children’s Hospital Unversity of Colorado Hospital
1 20 36
2 34 49
3 23 19
4 15 14
5 0 2

This table describes how many patients fall into each category based on ASA value (the outcome) and hospital (the exposure). Now, you want to analyze your data—there are many ways to do this based on your research question.

###### Chi-square test

The chi-square test is used for categorical variables and tests whether ASA level is associated with hospital. In our example, we would be testing if ASA level differed between the two hospitals. It is important to pay attention to the cell counts, as the test expects at least 5 values in each cell. Since the cells coinciding with ASA values of 5 are 0 and 2, this test would not work. When the chi-square test is not an option, one may use Fisher’s exact test.

###### Risk Ratio (RR)

Describes the risk of a certain event happening in one group compared to another.

• Risk 1=P(Disease | Exposed to causal factor)
• Risk 2=P(Disease | Not exposed to causal factor)
• Relative Risk Ratio = Risk1/Risk2
###### Interpretation

The risk of disease for a person exposed to the causal factor is (RR) times greater than for a person who was not exposed.

• Risk Difference = Risk1-Risk2 (interpreted as the excess risk in group one vs. group two)
• Odds Ratio (OR): The odds of having an event divided by the odds of not having that event.

Example:

• Odds Event 1: those who have cancer who got the treatment/those with cancer who did not get the treatment (A/B)
• Odds Event 2: those without cancer who got the treatment/those without cancer who did not get the treatment (C/D)
• OR = (A/B)/(C/D)

Interpretation: the odds of cancer are OR times lower in the group who got the treatment compared to the group that didn’t get the treatment.

##### Analyzing Categorical Data
###### T-test

There are two types of common t-tests. 1) If you are looking to see if you sample has a different mean than a population value, use a one sample t-test. 2) If you are looking to see if two populations have different means, use the two sample t-test. If your data is dependent (i.e. two measurements on the same person), use a paired two-sample t-test. Note: An ANOVA (analysis of variance) is an equivalent way to compare 3 or more groups

###### Regression techniques

These are used when you are interested in the relationship between an outcome and its predictors (simple). Ex: does age predict ASA scores? Regression analyses are useful when you are interested in the effect of more than one predictor variable (multiple). Ex: do age and hospital predict ASA score?

##### Confidence Intervals

Once you calculate an estimated mean or measure of association for categorical data, you can also calculate a confidence interval around that mean. This is important because it gives a measure of uncertainty about where the true mean lies. Once a confidence interval is calculated, it is read as "We have a certain level (i.e., 95%) of confidence that the true population parameter is within this interval."

##### Graphing Data
• Scatterplot: used when graphing two continuous variables
• Bar Chart: visually compare groups
• Histogram: displays the distribution
• Box Plot: nicely displays mean, median, interquartile range, and outliers
• Line Graphs: primarily used for longitudinal data (tracking over time)

Know your assumptions. In general, we assume that the data were collected without bias, it is normally distributed for continuous variables and there are no unmeasured variables that actually explain the difference between the two means.

###### References
• Roberts, Donna. "Qualitative vs Quantitative Data." Qualitative vs Quantitative Data. 2012. Web. 13 Oct. 2015.
• "Risk Differences and Rate Differences." Risk Difference. Boston University School of Public Health, 16 Sept. 2015. Web. 13 Oct. 2015.
• Szumilas, Magdalena. “Explaining Odds Ratios." Journal of the Canadian Academy of Child and Adolescent Psychiatry 19.3 (2010): 227–229. Print.
• "Understanding Data Concepts." Understanding Data Concepts. Canada.ca, 9 Dec. 2013. Web. 13 Oct. 2015.

#### Center for Innovative Design & Analysis (CIDA)

Formerly known as the Colorado Biostatistics Consortium (CBC)
13001 17th Place | Mail Stop B119 | Room 100, Building 406 | Aurora, CO 80045