Alan Reifman, Texas Tech
University
Return to
Main
Syllabus
Dr. Reifman's Basic Intro Stat Lecture Notes |
Education scholars David Berliner and Bruce Biddle, in their 1995 book, The Manufactured Crisis: Myths, Fraud, and the Attack on America's Public Schools, make the following argument (p. 316):
...we cannot understand why a person who dislikes mathematics and does not want to work in a science field should be forced to take calculus... If we had to nominate a topic in mathematics that is needed today by all informed citizens, it would be statistics.
Similarly, Arthur Benjamin, the "Mathemagician," endorses the study of statistics and probability as the most useful aspect of mathematics for most people
Descriptive Statistics
Measures of Central (Typical) Tendency
MEAN (average): Add up all scores and divide by number of people (most commonly used).
MEDIAN: Score that same number of people fall above and fall below.
MODE: Most frequently occurring score.
For ordinal and ratio variables, which have quantitative meaning (e.g., a 3 signifies something more than a 2, a 2 signifies something more than a 1), all three descriptors -- mean, median, and mode -- can be used. For nominal variables, where people fall into non-quantitative categories such as gender (male/female) or religious affiliation (Protestant, Catholic, Jewish, Muslim, etc.), only the mode is applicable. Click for review of variable types.
Distribution of
Annual Income from
a Small High
School Class 5 Years After Graduation
(total = 20 people; each little green square = one person)
*Anywhere between $20,000 and $30,000 would meet definition of median (10 above and 10 below), but $25,000 is probably most convenient to use as median.
(Graduate students: See "Descriptive Statistics" section of links on my intro statistics blog for further discussion of mean, median, and mode).
Another example of the distinction between mathematical accuracy and "common sense accuracy" stems from when Wilt Chamberlain scored 100 points in a single NBA basketball game, all by himself. Suppose, as a purely hypothetical example, Wilt's teammate York Larese went around telling people that he and Wilt combined for 109 points in a single game.
A couple of songs to reinforce these ideas...
The Mean
Lyrics by Alan Reifman
(May be sung to the tune of “Jolene,” Dolly Parton)
The mean, the mean, the mean, the
mean,
You add up all the scores, and divide by N,
The mean, the mean, the mean, the mean,
The average you’ve computed, time and again,
You typically make sense to heed,
But with outliers, you mislead,
Because all exact values, it must know,
The median, won’t show much swing,
How far away, is not its thing,
Just so you’ve half above, and half below,
What comes up most, is called the
mode,
For nominal, the only
road,
Sometimes, multiple
peaks, are in your frame,
When data shapes, follow the bell,
There's only one thing, left to tell,
Mean, median, and mode, are all
the same,
The mean, the mean, the mean, the
mean,
A useful stat to know, but it’s not all,
The mean, the mean, the mean, the mean,
Into a trap, you do not want to fall...
Why Do You Have to Be an Outlier?
Lyrics by Alan Reifman
(May be sung to "Heartbreaker," Gibb/Gibb/Gibb)
(WORK IN PROGRESS)
Why do you have to be an outlier?
Why must you so overwhelm the mean?
Fortunately, there is still the median,
So central trends can be seen
Examples of Variables That Tend to Be... | |
Normally Distributed | Non-Normally Distributed |
Americans' Political Ideology |
People's Numbers of
Sexual Partners |
People's Height |
People's Alcohol
Consumption |
Students in my graduate statistics class should also see here. |
To get an idea of the
amount of spread or dispersion in a distribution, it can be helpful to take the
ratio of the SD to the mean. For example, an SD of 5 with a mean of 10 (ratio of
.5) suggests more spread than an SD of 5 with mean of 1,000 (.005). This ratio
is also known as the
relative standard deviation.
For my graduate students, here's the calculation of the top SD (3.27). Thanks to HB for the photo!
Explanation of why we divide by "n - 1" rather than just "n." Quoting from the linked document: "When examining the variance, it is very clear that the sample variance calculated with n is biased — it is systematically too low, while the variance calculated with n – 1 has a mean very close to [the true value]." That is, when a sample statistic is systematically too high or too low in estimating a population parameter, it is known as a biased estimator. An unbiased estimator "indicates that the values given by the estimator [statistic] from sample to sample will tend to be centered around the true value of the population parameter, rather than being consistently too high or too low." In other words, for an estimator to be unbiased, repeated samplings should yield statistics that are equally likely to be higher or lower than the true population parameter.
Here's a concrete example of why the standard deviation can sometimes be important. Ian Ayres writes in his book Super Crunchers, which is about "number-crunching" statistical analyses, as follows:
When I taught at Stanford Law School, professors were required to award grades that had a 3.2 mean. ...[S]tudents would ask if a professor was a "spreader" [wide range, high SD] or "clumper" [narrow range, low SD]. Good students would want to avoid clumpers so that they would have a better chance at getting an A, while bad students hated the spreaders who handed out more As but also more Fs (p. 201; segments in red inserted by Dr. Reifman).
Dr. Reifman wonders how this grading requirement is enforced. Perhaps non-compliant professors are made to teach 8:00 a.m. courses, stripped of their parking privileges, or made to dress up as the Stanford tree mascot!
Yet another
perspective on what the standard deviation represents comes from the
physicist James Kakalios in his book
The Amazing Story of Quantum Mechanics (pp. 78-79):
The standard deviation is an indication of how much we can trust the average value resulting from this bell-shaped curve. The bigger the standard deviation, the greater the possible uncertainty in the average value...
Regarding the hypothetical scenario of a mean of 50 resulting from a distribution with some people scoring as low as 1 and others as high as 100, Kakalios writes that:
...the large size of the standard deviation would indicate that the average grade was not a particularly meaningful or insightful indicator of any given student's performance.
The Standard Deviation (and z)
Lyrics by Alan Reifman
(May be sung to the tune of “Born to Run,” Bruce Springsteen)
When we view our data, we must be sure, to absorb everything that we can glean,
How spread out are the, values of data, from what you’ve determined to be the mean?
The dispersion, of your data set,
Serves important purposes,
For other statistics you can get,
Whoa, distributions can have the same mean,
But they’re different – one is fat, one is lean,
Thus, they vary in a second way,
We’re going to learn the standard deviation today,
(Interlude)
When we want to find, something that’s called the z,
We start out with each person’s value,
From each of these, we subtract the mean,
And divide the difference by SD,
Relative to a person’s group,
A z-score shows where that individual stands,
Whoa, when the data come in normal style,
A z-score will also, tell you the percentiles,
So you must look beyond the mean,
You must examine the data’s spread,
You must notice the data’s SD…
t-test
("t for two")
Compares two means, e.g., experimental vs.
control group; men vs. women
t is based on the following (actual formula a bit more involved):
Mean_{Group 1} -
Mean _{Group2} Spread (SD's) of the groups' data points |
t is increased (difference more likely to be significant, see below) when:
Two groups' means are very different.
Spread (SD's) are small.
EXAMPLE: Students are randomly assigned to one of two
groups (each group with 10 students), an experimental
group (E) that receives new math teaching techniques or a
control group (C) that receives the "usual" math
instruction (independent variable = instruction type).
At the end of the term, all students are given a 25-point test (dependent
variable). The black rows (below) represent scores on the test,
with the red and blue
blocks indicating how many students got each score.
Scenario 1
Mean_{C} = 12 |
t = 16.36, p < .000001 (Result SIGNIFICANT: Extremely unlikely to obtain such dramatic difference between E&C means by chance IF the null hypothesis of no mean difference in the population were true; REJECT null hypothesis) |
Mean_{E} = 18 | ||||||||||||||
SD_{C }= 0.82 | SD_{E }= 0.82 | |||||||||||||||
Clear gap between groups | ||||||||||||||||
C | E | |||||||||||||||
C | C | C | ↓ | E | E | E | ||||||||||
C | C | C | E | E | E | |||||||||||
C | C | C | E | E | E | |||||||||||
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
Mean and standard deviation can be obtained from the raw data (separately for each group) at this site. The resulting numbers can then be plugged in at this site to obtain the t-test. Significance (probability) level can be obtained at this site (df = combined sample size [20] minus 2 = 18).
Scenario 2
Mean_{C} = 12 |
t = 5.48, p <
.00005 (also SIGNIFICANT, but not quite as dramatically as above) |
Mean_{E} = 18 | ||||||||||||
SD_{C }= 2.45 | SD_{E }= 2.45 | |||||||||||||
Groups Overlap (difference not quite as clear as above) | ||||||||||||||
C | C | C | C | E | E | E | ||||||||
C | C | C | C | C | E | E | C | E | E | E | E | E | ||
8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 | 20 | 21 | 22 |
The respective means for the two groups are the same in both
scenarios,
but the standard deviations (spreads) are different, thus making the t-test
results different.
Statistical significance
Just because the means of two groups on a DV are different, it
doesn't guarantee that the
difference is authentic, substantial, or appreciable. The difference could be
due to chance. As Westfall and Henning (2013) state, "When
a difference is not easy to explain by chance alone, it is called a
statistically significant difference" (p. 401). Statistical significance (determined through various
statistical tests) tells us
if difference is so large, it is extremely unlikely to have resulted from chance.
A common standard in the social sciences is whether a finding would come up by
chance 5 times out of 100 or less often ( p <
.05).
Study of "College
Cuteness" compares the mean attractiveness of
female students at different universities (t test is used to compare one school
to another). I apologize in advance for the unbalanced nature of the study
(only women are evaluated on attractiveness, not men), which many could consider
sexist. However, the study presents significance testing in a very
down-to-earth manner. Update:
The website (www.collegecuteness.com) no longer appears to be operative, as of
May 2007. However, it appears the site can be recovered from the "Internet
Archive."
A little song on statistical significance...
p less than oh-five
Lyrics by Alan Reifman
(May be sung to the tune of “Up, Up, and Away,” Jimmy Webb, popularized by the Fifth Dimension)
(Back-up vocals in parentheses)
When you analyze, the data you’ve collected,
Should H-oh be kept, or should it be rejected?
You run the test, and see whatever your output will show,
About H-oh (H-oh)…
p less than oh-five,
On my dutiful, immutable, technique,
Below point-oh-five, on your significance test,
Tells you that H-oh must, of course, be put to rest,
The result that you’ve obtained, probably, is not from chance,
That is your stance (your stance)…
p less than oh-five,
On my dutiful, immutable, technique,
(Bridge)
p level doesn’t tell you the magnitude,
Of the relationship that you’re studying,
If the sample happens to be large enough,
A small result will survive, to get below the oh-five,
When you realize, that you will reject H-oh,
All you can claim is, the mu difference ain’t zero*,
You must remember what you can learn from the oh-five test,
On goes your quest (your quest)…
p less than oh-five,
On my dutiful, immutable, technique (technique)…
(Fade out: “p less than oh-five…”)
Note. *The version listed above is for a t-test. For a correlational analysis, you can substitute “rho value ain’t zero.” For a chi-square test, you can substitute “cell percents don’t all follow.”
Hypothesis Testing
We learned earlier that a hypothesis is "a
relatively specific prediction of how two or more variables
should be related;" today we look at scientific hypothesis testing more
formally
Important new concept: Null Hypothesis (H_{o}):
Statement that there is no relationship, no difference, no effect, etc.
Treatment will have no effect, experimental and control groups will not differ
on DV, etc.
Researcher does not necessarily believe null hypothesis; it’s
just a standard procedure.
Three Steps of Scientific Hypothesis Testing
1. State the null hypothesis (H_{o})_{
}2. Do the study.
3. If the groups significantly differ with p < .05, REJECT the null hypothesis.
(If the difference is not significant, null hypothesis must
be kept alive.)
Analogy to jurors in a criminal trial:
Analogy to a sporting event:
Expected frequencies represent the null hypothesis, H_{o} (e.g., what frequencies would be expected if there were no male-female differences).
Exp. Freq. = (Total for Row X Total for Column)/ Grand Total
Example:
Men's and Women's Preferences for Coca-Cola vs. Pepsi
Chi-Square
Lyrics by Alan Reifman
(May be sung to the tune of “Hey Jude,” Lennon/McCartney)
Chi-square, yes you are there,
So we can test, association,
Of nominal, variables, that you’ve got,
Give it a shot, and see what happens,
Chi-square, yes you compare,
Each cell’s observed counts, with the expected ones,
The null hypothesis, determines your E’s,
If you will, please, apply the formula,
Cell differences, each one you square, so that they’re all,
Displayed in a, positive direction,
Each cell’s squared difference, divide by its expected,
Summing these yields, the overall chi-square,
Nah nah nah, nah nah, nah nah nah nah,
Chi-square, columns and rows,
Give the table’s, degrees of freedom,
Remember to use these, when you test the,
Significance, of all your findings,
So is your test, significant, or is it not?
You must consult, the critical value,
So go ahead, with your df, can you reject,
The null that counts, are independent?
Nah nah nah, nah nah, nah nah nah nah,
Chi-square, yes you compare,
Each cell’s observed counts, with the expected ones,
The null hypothesis, determines your E’s,
If you will, please, have the computer...
Run it, run it, run it, run it, run it, run it, yeah...
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
O minus E, difference squared, divided by E, for each cell,
Then, add the cell-based chi-squares, into a sum, that will tell…
Expressing Data Through Relative Risk
(Using
Babbie Church-Attendance Example, p. 459)
Women’s absolute probability (observed) of
going to church is .83 (50/60)
Men’s absolute probability (observed) of going to church is .50
(20/40)
Relative risk compares one group’s
probability to another group’s
----------
= ---- =
Men
.50
Excerpt from NBC Today Show interview (May
28, 2003)
in which the distinction between absolute and relative risks comes up
Dr. Steven Goldstein, professor, OB/GYN, New York
University Medical Center, discusses a new study which finds hormone replacement
therapy may double the risk for dementia in women who are 65 and older.
[midway through the interview]
Dr. GOLDSTEIN: ...So if you look at the numbers,
although the relative risk doubled, what it really meant was your risk of
developing dementia, if you were on no hormones, was one woman in 500 per year.
If you went on hormones, your risk went to two women in 500 per year. That's a
doubling of risk. That's still 498 out of 500 women who did just fine. So it's
the difference between a relative risk and an absolute risk.
KATIE COURIC: As one doctor said, 'A small number doubled is still...
Dr. GOLDSTEIN: A small number.
COURIC: ...a small number.'
Dr. GOLDSTEIN: Still, if three million women in this country take hormone
therapy, when you look at that increased risk, that could be as many as 6000
women.
Websites Useful for Statistics
Survey Documentation and Analysis
(SDA) project (University of California-Berkeley)
Excellent website for conducting statistical analyses on real data (surveys in
the public domain)
Statistical Thinking (University of Baltimore)
Has a "ton" of information on virtually any statistical topic that
you'd encounter
Stat Pages -- Compendium of virtually any kind of statistical calculator one would need
Brief checklist for assessing one's proficiency in basic statistics
CAUSEweb (Consortium for the Advancement of Undergraduate Statistics Education) collection of fun resources (songs, jokes, etc.).
NOTE: Introductory information on the correlation coefficient is available here, within my lecture notes on reliability and validity.
A growing and exciting field, for those who are interested, is the statistical analysis of sports, going beyond simple averages. Probably the most prominent organization in this area is SABR, the Society for American Baseball Research. Although it originates in baseball, the term "sabermetrics" has come to represent statistical analysis of any sport. Here are some websites:
Phil Birnbaum's Sabermetric Research Blog (all sports).
Economist J.C. Bradbury's SaberNomics blog.
Two pro basketball sites: 82 Games and Courtside Times.
Cyril
Morong's
Clutch Hitting Links (yes, there are players who seem to hit well in the
clutch, but they're often great hitters in general, not just in the clutch).
My (Dr. Reifman's) sabermetric websites: