## General Statistics Terminology

To begin, we must first identify the differences between what statistics defines as population data and sample data. A population is the entire set people or things in a specified group. Characteristics of a population are called parameters. A sample is a subset of a population. Characteristics of a sample are called statistics.

Biostatistics is the use of statistics for public health, biological, or medical applications, and applied to a variety of research topics and fields. The main goal is to use appropriate statistical methods to understand the factors that affect human health.

#### Variable Types

##### Qualitative (Categorical)
• Ordinal: Ordered categorical variables (ex. never, sometimes, frequently, always)
• Nominal: Unordered categorical variables (ex. hair color, gender)
##### Quantitative
• Continuous: Numerical variables with an infinite number of values (ex. height)
• Discrete: Numerical variables that can be counted (ex. number of bacteria)

#### Displaying Data

• Tables: Numerical summary of frequencies, %, summary statistics, etc.
• Graphs: Visual representations of data:
• Histogram: Bar graph of frequencies
• Scatterplot: Plot of two numerical variables
• Boxplot: Visual representation of mean, median, quartiles, and range. Boxplot example: ### Study Design

#### Study Type

##### Observational study

Observe an existing situation and makes inferences.

• Case Control: Study of existing groups differing on outcome (ex: patients with disease vs w/o)
• Cross-sectional: (prevalence) Study observing patients at a single point in time
• Cohort: Study that follows a group of similar individuals who differ with respect to certain factors, to determine how those factors affect an outcome of interest
##### Experimental

Researcher randomly assigns individuals to treatment groups.

• Randomization: Technique used to select samples that keeps certain variables constant across groups (standardization) so true effect can be observed
• Placebo: Treatment given to a group that has no therapeutic effect
• Blinding: Treatment assignment is unknown to patient, doctor, or both
##### Hypothesis

Detailed prediction of a scientific question that can be tested.

• Null hypothesis: There is no relationship among the groups
• Alternative hypothesis: There is a relationship among groups
• P-value: Probability that the test shows a difference among the comparisons, assuming the null is true
##### Sample Size Justification

Method to ensure there are enough observations to find a statistical difference between groups when they are, in fact, biologically different.

Significance level (α): Threshold with which null hypothesis is rejected. Standard values for α include 0.05, 0.01, 0.001

• If the p-value is less than or equal to α, the null hypothesis is rejected
• If the p-value is greater than α, we fail to reject the null hypothesis

Power: Ability to detect a difference when a difference truly exists

Effect size: Clinically meaningful difference between comparisons

### Study Analysis

#### Bias

Any systematic error that can occur in multiple areas of a study, (e.g., study design, measurement technique, and or analyses) which will either over or under estimate a parameter and to false conclusions.

#### Descriptive Statistics

Characterizing data using graphs, tables, numerical summaries.

Measure of Location
Mean: Average of the data Median: Middle point of the data Mode: Most occurring data point
Standard deviation: Deviation of the data in a sample Interquartile Range: Difference between the 75th percentile and the 25th percentile Range: Difference between the largest and smallest values

Outliers: Very extreme data points

Frequency: The proportions of values within a single variable

#### Inferential Statistics

Drawing conclusions about populations based on samples

• Confidence Intervals: Combining the sample statistics and standard errors to estimate population parameters
• Standard error: Uncertainty of the sample mean
• Statistical Tests: Tests used to quantify the similarity between comparisons
• Statistical test performed depends on variable type, number of comparisons, and underlying distribution of population
• Number of comparisons can be between two or more groups, independent or paired
• Distribution of population can be parametric (normally distributed), or non-parametric (no assumed distribution)
• Types of statistical tests: ttests, ztests, ftests, Chi-squared, ANOVA, Regression, Correlation, etc.
