To begin, we must first identify the differences
between what statistics defines as population data and sample data. A **population** is the entire set people or
things in a specified group. Characteristics of a population are called parameters. A **sample** is a subset of a population. Characteristics of a sample are
called **statistics**.

**Biostatistics** is the use of statistics for public health, biological, or medical applications, and applied to a variety of research topics and fields. The main goal is to use appropriate statistical methods to understand the factors that affect human health.

**Ordinal:**Ordered categorical variables (ex. never, sometimes, frequently, always)**Nominal:**Unordered categorical variables (ex. hair color, gender)

**Continuous:**Numerical variables with an infinite number of values (ex. height)**Discrete:**Numerical variables that can be counted (ex. number of bacteria)

**Tables:**Numerical summary of frequencies, %, summary statistics, etc.**Graphs:**Visual representations of data:**Histogram:**Bar graph of frequencies**Scatterplot:**Plot of two numerical variables**Boxplot:**Visual representation of mean, median, quartiles, and range. Boxplot example:

Observe an existing situation and makes inferences.

**Case Control:**Study of existing groups differing on outcome (ex: patients with disease vs w/o)**Cross-sectional:**(prevalence) Study observing patients at a single point in time**Cohort:**Study that follows a group of similar individuals who differ with respect to certain factors, to determine how those factors affect an outcome of interest

Researcher randomly assigns individuals to treatment groups.

**Randomization:**Technique used to select samples that keeps certain variables constant across groups (standardization) so true effect can be observed**Placebo:**Treatment given to a group that has no therapeutic effect**Blinding:**Treatment assignment is unknown to patient, doctor, or both

Detailed prediction of a scientific question that can be tested.

**Null hypothesis:**There is no relationship among the groups**Alternative hypothesis:**There is a relationship among groups**P-value:**Probability that the test shows a difference among the comparisons, assuming the null is true

Method to ensure there are enough observations to find a statistical difference between groups when they are, in fact, biologically different.

**Significance level (α):** Threshold with which null hypothesis is rejected. Standard values for α include 0.05, 0.01, 0.001

- If the p-value is less than or equal to α, the null hypothesis is rejected
- If the p-value is greater than α, we fail to reject the null hypothesis

**Power:** Ability to detect a difference when a difference truly exists

**Effect size:** Clinically meaningful difference between comparisons

Any systematic error that can occur in multiple areas of a study, (e.g., study design, measurement technique, and or analyses) which will either over or under estimate a parameter and to false conclusions.

Characterizing data using graphs, tables, numerical summaries.

Measure of Location | ||
---|---|---|

Mean: Average of the data |
Median: Middle point of the data |
Mode: Most occurring data point |

Measure of Spread | ||

Standard deviation: Deviation of the data in a sample |
Interquartile Range: Difference between the 75th percentile
and the 25th percentile |
Range: Difference between the largest and smallest
values |

**Outliers:** Very extreme data points

**Frequency:** The proportions of values within a single
variable

Drawing conclusions about populations based on samples

**Confidence Intervals:**Combining the sample statistics and standard errors to estimate population parameters**Standard error:**Uncertainty of the sample mean**Statistical Tests:**Tests used to quantify the similarity between comparisons- Statistical test performed depends on variable type, number of comparisons, and underlying distribution of population
- Number of comparisons can be between two or more groups, independent or paired
- Distribution of population can be parametric (normally distributed), or non-parametric (no assumed distribution)
- Types of statistical tests: ttests, ztests, ftests, Chi-squared, ANOVA, Regression, Correlation, etc.

- Baron, Anna. Biostatistical Methods. Lecture 1 Overview. Fall 2015.
- Rosner. Fundamentals of Biostatistics. 7th ed. Brookes/Cole. 2011.
- Samuels & Witmer. Statistics for the Life Sciences. 3rd ed. Pearson Education. 2003.