Statistical Thinking

Sophie Lee

2024-08-14

Chapter 1:

What is statistical thinking?

What is statistics?

Statistics = science of data
  • Collection and storage of data
  • Data visualisation
  • Analysis of data
  • Interpretation of results
  • Communication of results

What is statistics?

What is statistics?

What is statistical thinking?

Statistics does not require complex analysis methods

Simplest approaches often the most effective

Statistical thinking = data-driven critical thinking

Why statistical thinking?

In 1990, 58% of the world’s population lived in low-income countries. What is the percentage today?

  1. Around 9%
  2. Around 37%
  3. Around 61%

Why statistical thinking?

In 1990, 58% of the world’s population lived in low-income countries. What is the percentage today?

  1. Around 9%
  2. Around 37%
  3. Around 61%

Why statistical thinking?

In low-income countries across the world in 2022, what share of girls went to school until at least age 11?

  1. Around 20%
  2. Around 40%
  3. Around 60%

Why statistical thinking?

In low-income countries across the world in 2022, what share of girls went to school until at least age 11?

  1. Around 20%
  2. Around 40%
  3. Around 60%

Why statistical thinking?

How many babies in the UK were vaccinated against some disease in 2019 (before the Coronavirus pandemic)?

  1. Around 40%
  2. Around 60%
  3. Around 90%

Why statistical thinking?

How many babies in the UK were vaccinated against some disease in 2019 (before the Coronavirus pandemic)?

  1. Around 40%
  2. Around 60%
  3. Around 90%

Why statistical thinking?

Necessary, not just in work but in personal life

Claims by news/social media often exaggerated or skewed

Human brain have a tendency to catastrophise, not good at assessing risk

Course content

This course will not introduce complex analysis.

Focus on thinking critically about data, identifying patterns, describing data in simple terms.

Course content

Topics covered in this course will include:

  • Research questions: what they are, why they are important, how to formulate them
  • Biases: common biases and how to recognise them
  • Data visualisation: how to use graphs to explore data, investigate trends, and convey important messages
  • Summarising data and quantifying differences and trends
  • Inferential statistics: what they are and how to interpret them

Chapter 2:

Research questions and biases

Research questions

One of the most important parts of statistical analysis

Should be formulated before any data collection or analysis carried out

Must be clear, answerable, and concise

Often not formally documented but helps develop an analysis plan

Research questions

All research questions must contain a target population and outcome

Often questions contain comparison groups, these must also be fully defined

Can be helpful to use PICO approach

PICO approach

Population

Intervention

Comparison

Outcome

Target population

Target population that we wish to make inferences about

Described fully, all characteristics clearly defined

Example: young male adult offenders→ male offenders aged between 18 and 20 at time of sentencing

Intervention/comparison

Optional comparison groups

Can be intervention, treatment, or just a characteristic

Research question must contain both I and C, or neither

Outcome

All research questions must define an outcome of interest

Must be measurable, specific, and relevant to the question

Type of variable should be defined as this determines appropriate visualisations, summaries, and analyses that can be used

Types of variables: numeric

Continuous: can take any value on a number scale, include decimal places

Examples: height, blood pressure, temperature

Types of variables: numeric

Discrete: can only take whole numbers or rounded numbers, e.g. counts

Types of variables: categorical

Categorical variables classified based on the number of groups/categories

Binary: two categories (yes/no, positive/negative)

Types of variables: categorical

Ordinal: more than 2 ordered categories (e.g. low/medium/high)

Types of variables: categorical

Nominal: more than 2 categories, no ordering

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: Obese adults

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: Obese adults

People aged 18 or over

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: Obese adults

People aged 18 or over with a BMI over 30

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: People aged 18 or over with a BMI over 30

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: People aged 18 or over with a BMI over 30

Intervention: Plant-based diet

Comparison: Standard diet (control group)

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: People aged 18 or over with a BMI over 30

Intervention: Plant-based diet

Comparison: Standard diet (control group)

Example research question

Does a plant-based diet reduce cholesterol levels in obese adults?

Population: People aged 18 or over with a BMI over 30

Intervention: Plant-based diet

Comparison: Standard diet (control group)

Outcome: Difference in cholesterol level

Biases

Almost all data and analyses will have some kind of bias included

Important to consider before analysis plan decided

Can arise at data collection, analysis, interpretation, and communication stages

Selection bias

Individuals more likely to be included in sample than others

Sample no longer random, cannot make inferences about target population

Recall bias

Participants asked to recall past events or experiences

Accuracy and completeness will differ

Not always trustworthy

Confirmation bias

Choosing to analyse or interpret data based on pre-conceived ideas

Inherent to human brains

Identify potential expectations before looking at data

Missing data

Missing data = holes in the dataset

Something we intended to collect but have not

Very common, not always obvious

Potential source of bias

Examples of missing data

  • Probation practitioners not adding data to administrative system as they are too busy
  • Questionnaires not complete as some questions are considered too personal by participants
  • Blood samples are dropped in a lab, losing the results, leaving holes in the data

Missing data

Impossible to truly know the reason for and impact of missing data

Best way to overcome missing data is to not have any!

Important to consider potential biases introduced by missing data and account for them in analysis

Be transparent when reporting missing data

Exercise 1:

research questions and missing data

Research questions: PICO

Does the introduction of a 4-day working week increase productivity in government departments?

P

I

C

O

Research questions: PICO

Does the introduction of a 4-day working week increase productivity in government departments?

P: Government departments

I

C

O

Research questions: PICO

Does the introduction of a 4-day working week increase productivity in government departments?

P: Government departments

I: 4-day working week

C

O

Research questions: PICO

Does the introduction of a 4-day working week increase productivity in government departments?

P: Government departments

I: 4-day working week

C: Standard working week

O

Research questions: PICO

Does the introduction of a 4-day working week increase productivity in government departments?

P: Government departments

I: 4-day working week

C: Standard working week

O: Productivity

Research questions: PICO

What is the average time between an offence being committed and case completion for defendants dealt with at magistrates’ courts in the North West of England?

P

I

C

O

Research questions: PICO

What is the average time between an offence being committed and case completion for defendants dealt with at magistrates’ courts in the North West of England?

P: Defendants dealt with at magistrates’ courts in the North West

I

C

O

Research questions: PICO

What is the average time between an offence being committed and case completion for defendants dealt with at magistrates’ courts in the North West of England?

P: Defendants dealt with at magistrates’ courts in the North West

I

C

O

Research questions: PICO

What is the average time between an offence being committed and case completion for defendants dealt with at magistrates’ courts in the North West of England?

P: Defendants dealt with at magistrates’ courts in the North West

I

C

O: Time between offence committed and case completion

Research questions: PICO

How has the prison population in England and Wales changed compared to pre-pandemic levels?

P

I

C

O

Research questions: PICO

How has the prison population in England and Wales changed compared to pre-pandemic levels?

P: Prisons in England and Wales

I

C

O

Research questions: PICO

How has the prison population in England and Wales changed compared to pre-pandemic levels?

P: Prisons in England and Wales

I: Pre-pandemic, i.e. prior to 2020

C

O

Research questions: PICO

How has the prison population in England and Wales changed compared to pre-pandemic levels?

P: Prisons in England and Wales

I: Pre-pandemic, i.e. prior to 2020

C: Post-pandemic

O

Research questions: PICO

How has the prison population in England and Wales changed compared to pre-pandemic levels?

P: Prisons in England and Wales

I: Pre-pandemic, i.e. prior to 2020

C: Post-pandemic

O: Prison population

Missing data

Probation practitioners were asked to record details in an administrative system for analytical purposes not directly related to their work.

Some practitioners were busy and forgot to record the data on the system, leading to holes in the data.

Give a scenario where this would introduce bias into the analysis, and another where this would not cause bias.

Chapter 3:

Data visualisation

Why data visualisation?

Powerful tool with multiple uses

Data exploration: identifying outliers, checking distributions

Analysis tool: generating hypotheses, identifying trends

Communication tool: conveying messages, sharing results

Choosing the most appropriate visualisation

  • Number of variables to display
  • Type of variable(s)
  • Intention of the visualisation
    • Explorative
    • Communicating results
    • Generating hypotheses

Data visualisations

  • Visualising a single, numeric variable
  • Visualisations to compare a numeric variable between groups
  • Visualising a single, categorical variable
  • Visualisations to compare a categorical variable between groups
  • Visualisations for two or more numeric variables
  • Visualising temporal data

Visualising a single, numeric variable

Histograms: identify outliers, check shape/distribution, identify peaks in data.

Histograms

Use the data to check outliers, see if they should be included

ons_code authority sfa_2015
E08000025 Birmingham 611.9105
- Greater London Authority 1163.4927

Greater London Authority is a duplicate of local authorities in data, should be removed

Birmingham not an error, but an outlier

Should not remove outliers from data unless they should not be included

Comparing numeric variables between groups

Depends on the intention of the graph:

  • To show individuals
  • To compare distributions
  • To compare summary statistics

Bar chart of averages

Simple graph comparing average between groups

  • x-axis: Grouping variable
  • y-axis: Group average

Comparing numeric variables between groups

Bar chart of averages easily interpretable

Removes most of the information about a variable

No information about spread of distribution or overlap between groups

Boxplot adds information about spread to average

Boxplots

Add extra information compared to bar chart

Useful to compare summaries between groups

Easy to identify potential outliers and investigate

Still losing a lot of information

Dot plots

Show every observation as a dot

  • x-axis: Grouping variable
  • y-axis : Numeric outcome

May need to ‘jitter’ points if lots of overlap

Visualising a single categorical variable

Want to show distribution of observations between groups

Choice between visualising counts or proportions

Frequency tables give full information and can provide both counts and proportions/percentages

Area name

Total recorded crimes

Percentage of total crimes

Derbyshire

90,181

21.51%

Leicestershire

103,806

24.76%

Lincolnshire

57,234

13.65%

Northamptonshire

62,116

14.82%

Nottinghamshire

105,899

25.26%

Bar chart

Categorical version of histogram

Length of bars = number of observations in each group

Simple, effective, easy to interpret

Pie charts

Each ‘slice’ of the pie represents the proportion of sample in a group

Compares groups in context of whole sample

Research shows people’s perceptions of dots, lines and bars are more accurate than angles and proportions

Comparing categorical variable between groups

Extensions of bar charts

  • Stacked bar chart
  • Side-by-side bar chart

Choice depends on whether comparing the counts or proportions between groups

Stacked bar chart

Length of bar is total number in each category

Bars are made up of multiple smaller bars, total in each category in each group

Useful when overall count in category is important as well as comparison

Side-by-side bar chart

Smaller bars are displayed side-by-side, clustered by category

Does not compare overall total in category but easier to compare counts in groups

Not useful when there are many categories and groups

Bar charts

All three charts show same information in different formats

Choice of visualisation depends on motivation:

  • Compare overall crime total between groups: stacked bar chart
  • Compare distribution of groups between categories: proportion bar chart
  • Compare group totals across all categories: side-by-side bar chart

Visualising 2 or more numeric variables

Scatterplots are used to visualise 2 numeric variables,

Each observation represented by a point on the graph

  • y-axis: outcome, or dependent variable (if appropriate)
  • x-axis: explanatory variable

Scatterplots

Scatterplots have multiple purposes:

  • Exploring data: some outliers and errors only visible when plotting multiple variables
  • Analysis: check for trends/relationships and generate hypotheses
  • Check assumptions: linear or nonlinear relationship

Scatterplots

Scatterplots can be extended by changing the colour, size or shape of the points based on another variables

Additional variables should only be added if they do not overload the plot

Multiple, simpler graphs are better than one confusing plot

Visualising temporal data

Visualisations must make it clear repeated measures are related

Line graph most common temporal visualisation

  • x-axis: time variable
  • y-axis: numeric variable

Exercise 2:

Data visualisations

Chapter 4:

Summarising data

Summarising data

Allows us to explore and quantify aspects of the sample

Can not be used to answer research question unless all information on target population is collected

Choice of summary depends on type of variable, distribution of data, and property we wish to quantify

Summarising categorical variables

Describe the distribution of observations between categories:

  • Proportion (0 → 1)
  • Percentage (0 → 100%)
  • Rate (0 → ∞)

Can use count but does not account for overall sample size

Summarising categorical variables

Example: number of recorded crimes in East Midlands, 2023

Total crimes in East Midlands: 419,236

Total crimes in Nottinghamshire: 105,899

Proportion of crimes in Nottinghamshire: 105,899 ÷ 419,236
= 0.2526

Percentage of crimes: 0.2526 x 100% = 25.26%

Rate of crimes: 2526 crimes per 10,000

Area name

Total recorded crimes

Proportion of total crimes

Percentage of total crimes

Derbyshire

90,181

0.2151

21.51%

Leicestershire

103,806

0.2476

24.76%

Lincolnshire

57,234

0.1365

13.65%

Northamptonshire

62,116

0.1482

14.82%

Nottinghamshire

105,899

0.2526

25.26%

Summarising numeric variables

Summarised using the centre (average) and spread of sample

Choice of summary depends on distribution of variable

Measure of centre

When data are normally distributed, centre is given using the mean

Represents the peak of a normal distribution

Sum of the sample values, divided by the sample size

Measure of centre

Measure 10 high school students’ heights in centimetres (cm):

142.23, 149.58, 146.06, 160.42, 174.64, 172.54, 148.67, 143.00, 173.11, 168.72

Measure of centre

Find the mean height:

142.23 + 149.58 + 146.06 + 160.42 + 174.64 + 172.54 + 148.67 + 143.00 + 173.11 + 168.72 = 1578.97cm

Measure of centre

Find the mean height:

1578.97 ÷ 10 = 157.897cm

Measure of center

Where data are not normal, use median instead

Order sample from smallest to largest, select middle value

Uses less information than mean (less powerful) but always valid

Measure of centre

To find the median, first order heights smallest to largest:

142.23, 149.58, 146.06, 160.42, 174.64, 172.54, 148.67, 143.00, 173.11, 168.72

Measure of centre

To find the median, first order heights smallest to largest:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of centre

Median is the middle value

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of centre

Median is between 149.58cm and 162.42cm

Middle value: (149.58 + 162.42) ÷ 2 = 155cm

Measure of centre

Before choosing summary, use a histogram to check distribution

When data are normally distributed, mean uses more of the data and gives centre of the sample

When data are skewed, mean is influenced by extreme values and longer tail

When data are normal, mean and median will be equal

Measure of spread

Measures how wide or narrow a sample is

Simplest measure is the range

Either given as the smallest and largest values or the difference between these

Measure of spread

Find the range of the 10 high schoolers’ heights:

142.23, 149.58, 146.06, 160.42, 174.64, 172.54, 148.67, 143.00, 173.11, 168.72

Measure of spread

Find the range of the 10 high schoolers’ heights:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of spread

Find the range of the 10 high schoolers’ heights:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of spread

Range is either given as two values: 142.23cm ,174.64cm

Or as the difference between these: 174.64 - 142.23 = 32.41cm

Measure of spread

Range only uses most extreme values, loses lots of information

Interquartile range (IQR): the range of the middle 50%

Lower quartile: 25th percentile, 25% of sample lies below

Upper quartile: 75th percentile, 75% of sample lies below

Measure of spread

Find the IQR of the 10 high schoolers’ heights:

142.23, 149.58, 146.06, 160.42, 174.64, 172.54, 148.67, 143.00, 173.11, 168.72

Measure of spread

Find the upper and lower quartile:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of spread

Find the upper and lower quartile:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of spread

Find the upper and lower quartile:

142.23, 143.00, 146.06, 148.67, 149.58, 160.42, 168.72, 172.54, 173.11, 174.64

Measure of spread

IQR is either given as 2 values: 146.06, 172.54cm

Or as the difference between these: 172.54 - 146.06 = 26.48cm

Measure of spread

IQR is still discarding most of the sample

If sample is normally distributed, can use the standard deviation (SD)

Average difference between observations and the mean

Bigger SD → wider, flatter curve

Smaller SD → narrower, taller curve

Normal distribution

Normal distribution completely defined by the mean (peak) and SD (spread)

Normal distribution

Approximately 68% of the sample will lie within 1 standard deviation of the mean

Normal distribution

Approximately 95% of the sample will lie within 2 standard deviations of the mean

Normal distribution

Mean and SD can be used to check normal distribution assumption

Settlement funding assessment in England:

Mean = £50.85 million

SD = £75.06 million

If normal, 95% of the sample would lie between
£50.85 ± (75.06 × 2) million.

Normal distribution

Mean and SD can be used to check normal distribution assumption

Settlement funding assessment in England:

Mean = £50.85 million

SD = £75.06 million

If normal, 95% of the sample would lie between

-£97.26 million and £198.97 million

Summarising numeric variables

Most appropriate summary depends on distribution (normal or not)

If normally distributed, use mean and SD

If not, these are invalid, use median and IQR

Mean and SD can be used to check normal distribution even without the full sample

Exercise 3:

Summarising data

Year Mean wait (weeks) SD wait (weeks)
2020 13.5 10.2
2021 19.1 15.6
2022 20.9 13.9
2023 22.8 14.7

Chapter 5:

Comparing groups

Comparing groups

Most appropriate choice depends on intention, type of outcome, nature of the relationship

  • Comparison of variable between groups
  • Investigating trends over time
  • Relationship between numeric variables

Comparing categorical outcomes

Compare summary statistics (proportions, percentages, or rates) between groups

Absolute difference: Subtract values

Relative difference: Divide values

Region Recorded crimes Violent crimes Nonviolent crimes
East Midlands 419236 161865 (38.61%) 257371 (61.39%)
West Midlands 572937 233851 (40.82%) 339086 (59.18%)

Absolute difference: 40.8% - 38.6% = 2.2%

West Midlands had 2.2% more violent crimes recorded in 2023 than the East Midlands

Region Recorded crimes Violent crimes Nonviolent crimes
East Midlands 419236 161865 (38.61%) 257371 (61.39%)
West Midlands 572937 233851 (40.82%) 339086 (59.18%)

Relative difference: 40.8% ÷ 38.6% = 1.057

West Midlands had 1.057 times higher percentage of violent crimes recorded in 2023 than the East Midlands

No difference = 1

Region Recorded crimes Violent crimes Nonviolent crimes
East Midlands 419236 161865 (38.61%) 257371 (61.39%)
West Midlands 572937 233851 (40.82%) 339086 (59.18%)

Relative difference: 38.6% ÷ 40.8% = 0.948

East Midlands had 0.946 times the percentage violent crimes recorded in 2023 than the West Midlands

Less than 1: reduction

Comparing numeric outcomes

Compare measures of centre/average (mean or median) between groups

Most appropriate depends on distribution of sample in each group

Requires a histogram per group

Comparing numeric outcomes

Median most appropriate measure for comparison

Median NW: £57.17 million

Median YH: £58.49 million

Difference in medians: £57.17 million - £58.49 million

= -£1.34 million

Comparing variables over time

Visualised using line graph

Common comparisons: absolute difference, relative difference, or percentage change

Choice depends on intention, interpretation differs

Comparing variables over time

Absolute difference: 1,841,000 - 1,239,000 = 602,000

There were 602,000 less violent crimes reported in 2020 compared to 2010

Relative difference: 1,841,000 ÷ 1,239,000 = 1.486

There were 1.486 times more violent crimes reported in 2020 compared to 2010

Comparing variables over time

The percentage change can also be found by converting the relative difference

Compare relative difference to no difference: 1.486 - 1 = 0.486

Convert the proportion change to percentage: 0.486 x 100%

= 48.6%

There were 48.6% more violent crimes reported in 2010 than 2020

Comparing variables over time

Percentage reduction found in similar way

Relative difference: 1,239,000 ÷ 1,841,000 = 0.673

Compare to no difference: 1 - 0.673 = 0.327 (32.7%)

There were 32.7% fewer violent crimes reported in 2020 than 2010

Relationship between numeric variables

Scatterplot used to visualise trends

Strength of relationship quantified using correlation coefficients

Choice of coefficients depends on if trend is linear or not

  • Linear trend: Pearson’s correlation coefficient
  • Nonlinear trend: Spearman’s correlation coefficient

Correlation coefficients

Take value between -1 and 1

Correlation of 0 means no association

Closer coefficient is to +1/-1, the stronger the positive/negative association is

Chapter 6:

Inferential statistics

What are inferential statistics?

What are inferential statistics?

Inferential statistics make inferences about target population based on a random, representative sample

Combine sample estimates with sample size and level of precision

Most common inferential statistics: p-values and confidence intervals

Measures of precision

Precision of an estimate quantified by standard error (SE)

Based on sample size and sample variability

Different formula for each type of estimate (e.g. mean, percentage, difference between means)

\(SE(\bar{x}) = \frac{SD}{\sqrt{n}}\)

Measures of precision

\(SE(\bar{x}) = \frac{SD}{\sqrt{n}}\)

For every parameter of interest:

  • Larger sample, higher precision → lower standard error
  • More variability, lower precision → higher standard error

Inferential statistics work based on the central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Central limit theorem

Confidence intervals

  • A range of values the true population parameter is compatible with
  • Based on sample estimate, precision, and confidence level

Confidence levels Number of SEs
80% 1.282
90% 1.645
95% 1.960
99% 2.576
99.9% 3.291

Confidence intervals

  • A range of values the true population parameter is compatible with
  • Based on sample estimate, precision, and confidence level

- Based on central limit theorem, can capture ranges we would expect a percentage of parameter estimates to lie:

\(\bar{x} \pm 1.96 \times SE(\bar{x})\)

Confidence interval example

Confidence interval example

First, we calculate the standard error of the mean:

\(SE = SD \div \sqrt{n}\)

\(= 16.37 \div \sqrt{130}\) \(= 1.436\)

Confidence interval example

mean = 266.55 days

Confidence level = 95%

Standard error = 1.436

95% confidence interval = [\(266 \pm 1.96 \times 1.436\)]

= [\(263.74, 269.37\)]

p-values

  • Probability of obtaining a result as extreme or more extreme as the sample if the null hypothesis is true
  • Null hypothesis (H0): no difference/association

p-values

p-values

p-values

  • Probability of obtaining a result as extreme or more extreme as the sample if the null hypothesis is true
  • Null hypothesis (H0): no difference/association
  • Low p-value: less evidence to support the null hypothesis
    • Very low p-value is known as statistically significant

Statistical significance

Often significance is defined by arbitrary cut-off (usually 0.05)

Be careful with these arbitrary definitions, it is not how probability behaves!

p < 0.05 is significant at the 5% level

We never accept or reject a null hypothesis

p-values example

p-value example

Null hypothesis: population mean is 40 weeks (280 days)

Sample mean: 266.55 days

standard error of the mean: 1.436 days

Begin assuming the null hypothesis is true

p-value example

p-value example

Relationship between p-values and confidence intervals

Confidence intervals and p-values are based on the same information and so agree with one another

If a p-value is above 0.05, the sample estimate is less than 1.96 SEs away. This means it will be within the 95% confidence interval

If the null hypothesis is outside the 99% confidence interval, it is over 2.576 SEs away from the sample estimate so p < 0.01

Exercise 4:

Inferential statistics

Final thoughts

Statistical thinking

Statistical analysis does not need to be complicated

Think critically about the data, the research question and any biases

Use visualisations to explore the data and convey messages in a clear, concise way

Quantify findings with summary statistics

Statistical thinking

None of the approaches here make causal inferences about the target population

Do not make causal statements unless using causal methods

Question findings, check past data, investigate unusual findings

Thank you for attending!