Alan Reifman, Texas Tech University Dr. Reifman's Basic Intro Stat Lecture Notes

Education scholars David Berliner and Bruce Biddle, in their 1995 book, The Manufactured Crisis: Myths, Fraud, and the Attack on America's Public Schools, make the following argument (p. 316):

...we cannot understand why a person who dislikes mathematics and does not want to work in a science field should be forced to take calculus...  If we had to nominate a topic in mathematics that is needed today by all informed citizens, it would be statistics.

Similarly, Arthur Benjamin, the "Mathemagician," endorses the study of statistics and probability as the most useful aspect of mathematics for most people

Descriptive Statistics
Measures of Central (Typical) Tendency

MEAN (average): Add up all scores and divide by number of people (most commonly used).

MEDIAN: Score that same number of people fall above and fall below.

MODE: Most frequently occurring score.

For ordinal and ratio variables, which have quantitative meaning (e.g., a 3 signifies something more than a 2, a 2 signifies something more than a 1), all three descriptors -- mean, median, and mode -- can be used.  For nominal variables, where people fall into non-quantitative categories such as gender (male/female) or religious affiliation (Protestant, Catholic, Jewish, Muslim, etc.), only the mode is applicable.  Click for review of variable types. Distribution of Annual Income from
a Small High School Class 5 Years After Graduation

(total = 20 people; each little green square = one person) *Anywhere between \$20,000 and \$30,000 would meet definition of median (10 above and 10 below), but \$25,000 is probably most convenient to use as median.

(Graduate students: See "Descriptive Statistics" section of links on my intro statistics blog for further discussion of mean, median, and mode).

Another example of the distinction between mathematical accuracy and "common sense accuracy" stems from when Wilt Chamberlain scored 100 points in a single NBA basketball game, all by himself.  Suppose, as a purely hypothetical example, Wilt's teammate York Larese went around telling people that he and Wilt combined for 109 points in a single game.

• Would this be mathematically accurate?
• Might this statement lead people to think that Chamberlain and Larese had made relatively equal contributions to the combined 109 points?

A couple of songs to reinforce these ideas...

The Mean
Lyrics by Alan Reifman
(May be sung to the tune of “Jolene,” Dolly Parton)

The mean, the mean, the mean, the mean,
You add up all the scores, and divide by N,
The mean, the mean, the mean, the mean,
The average you’ve computed, time and again,

You typically make sense to heed,
Because all exact values, it must know,

The median, won’t show much swing,
How far away, is not its thing,
Just so you’ve half above, and half below,

What comes up most, is called the mode,
Sometimes, multiple peaks, are in your frame,

When data shapes, follow the bell,
There's only one thing, left to tell,
Mean, median, and mode, are all the same,

The mean, the mean, the mean, the mean,
A useful stat to know, but it’s not all,
The mean, the mean, the mean, the mean,
Into a trap, you do not want to fall...

Why Do You Have to Be an Outlier?
Lyrics by Alan Reifman
(May be sung to "Heartbreaker," Gibb/Gibb/Gibb)

(WORK IN PROGRESS)

Why do you have to be an outlier?
Why must you so overwhelm the mean?
Fortunately, there is still the median,
So central trends can be seen

 Examples of Variables That Tend to Be... Normally Distributed Non-Normally Distributed Americans' Political Ideology People's Numbers of Sexual Partners People's Height People's Alcohol Consumption Students in my graduate statistics class should also see here. To get an idea of the amount of spread or dispersion in a distribution, it can be helpful to take the ratio of the SD to the mean. For example, an SD of 5 with a mean of 10 (ratio of .5) suggests more spread than an SD of 5 with mean of 1,000 (.005). This ratio is also known as the relative standard deviation.

For my graduate students, here's the calculation of the top SD (3.27). Thanks to HB for the photo! Explanation of why we divide by "n - 1" rather than just "n." Quoting from the linked document: "When examining the variance, it is very clear that the sample variance calculated with n is biased — it is systematically too low, while the variance calculated with n – 1 has a mean very close to [the true value]." That is, when a sample statistic is systematically too high or too low in estimating a population parameter, it is known as a biased estimator. An unbiased estimator "indicates that the values given by the estimator [statistic] from sample to sample will tend to be centered around the true value of the population parameter, rather than being consistently too high or too low." In other words, for an estimator to be unbiased, repeated samplings should yield statistics that are equally likely to be higher or lower than the true population parameter.

Here's a concrete example of why the standard deviation can sometimes be important.  Ian Ayres writes in his book Super Crunchers, which is about "number-crunching" statistical analyses, as follows:

When I taught at Stanford Law School, professors were required to award grades that had a 3.2 mean.  ...[S]tudents would ask if a professor was a "spreader" [wide range, high SD] or "clumper" [narrow range, low SD].  Good students would want to avoid clumpers so that they would have a better chance at getting an A, while bad students hated the spreaders who handed out more As but also more Fs (p. 201; segments in red inserted by Dr. Reifman).

Dr. Reifman wonders how this grading requirement is enforced. Perhaps non-compliant professors are made to teach 8:00 a.m. courses, stripped of their parking privileges, or made to dress up as the Stanford tree mascot!

Yet another perspective on what the standard deviation represents comes from the physicist James Kakalios in his book The Amazing Story of Quantum Mechanics (pp. 78-79):

The standard deviation is an indication of how much we can trust the average value resulting from this bell-shaped curve. The bigger the standard deviation, the greater the possible uncertainty in the average value...

Regarding the hypothetical scenario of a mean of 50 resulting from a distribution with some people scoring as low as 1 and others as high as 100, Kakalios writes that:

...the large size of the standard deviation would indicate that the average grade was not a particularly meaningful or insightful indicator of any given student's performance.

The Standard Deviation (and z)
Lyrics by Alan Reifman

(May be sung to the tune of “Born to Run,” Bruce Springsteen)

When we view our data, we must be sure, to absorb everything that we can glean,

How spread out are the, values of data, from what you’ve determined to be the mean?

The dispersion, of your data set,

Serves important purposes,

For other statistics you can get,

Whoa, distributions can have the same mean,

But they’re different – one is fat, one is lean,

Thus, they vary in a second way,

We’re going to learn the standard deviation today,

(Interlude)

When we want to find, something that’s called the z,

We start out with each person’s value,

From each of these, we subtract the mean,

And divide the difference by SD,

Relative to a person’s group,

A z-score shows where that individual stands,

Whoa, when the data come in normal style,

A z-score will also, tell you the percentiles,

So you must look beyond the mean,

You must examine the data’s spread,

You must notice the data’s SD…

t-test ("t for two")
Compares two means, e.g., experimental vs. control group; men vs. women

t is based on the following (actual formula a bit more involved):

 MeanGroup 1 - Mean Group2 ___________________________________ Spread (SD's) of the groups' data points

t is increased (difference more likely to be significant, see below) when:

• Two groups' means are very different.

EXAMPLE:  Students are randomly assigned to one of two groups (each group with 10 students), an experimental group (E) that receives new math teaching techniques or a control group (C) that receives the "usual" math instruction (independent variable = instruction type).  At the end of the term, all students are given a 25-point test (dependent variable).  The black rows (below) represent scores on the test, with the red and blue blocks indicating how many students got each score.

Scenario 1

 MeanC = 12 t = 16.36, p < .000001 (Result SIGNIFICANT: Extremely unlikely to obtain such dramatic difference between E&C means by chance IF the null hypothesis of no mean difference in the population were true; REJECT null hypothesis) MeanE = 18 SDC = 0.82 SDE = 0.82 Clear gap between groups C E C C C ↓ E E E C C C E E E C C C E E E 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

Mean and standard deviation can be obtained from the raw data (separately for each group) at this site.  The resulting numbers can then be plugged in at this site to obtain the t-test.  Significance (probability) level can be obtained at this site  (df = combined sample size  minus 2 = 18).

Scenario 2

 MeanC = 12 t = 5.48, p < .00005 (also SIGNIFICANT, but not quite as dramatically as above) MeanE = 18 SDC = 2.45 SDE = 2.45 Groups Overlap (difference not quite as clear as above) C C C C E E E C C C C C E E C E E E E E 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22

The respective means for the two groups are the same in both scenarios,
but the standard deviations (spreads) are different, thus making the t-test results different.

Statistical significance

Just because the means of two groups on a DV are different, it doesn't guarantee that the difference is authentic, substantial, or appreciable.  The difference could be due to chance. As Westfall and Henning (2013) state, "When a difference is not easy to explain by chance alone, it is called a statistically significant difference" (p. 401). Statistical significance (determined through various statistical tests) tells us if difference is so large, it is extremely unlikely to have resulted from chance. A common standard in the social sciences is whether a finding would come up by chance 5 times out of 100 or less often ( p < .05).

Study of "College Cuteness" compares the mean attractiveness of female students at different universities (t test is used to compare one school to another).  I apologize in advance for the unbalanced nature of the study (only women are evaluated on attractiveness, not men), which many could consider sexist.  However, the study presents significance testing in a very down-to-earth manner.  Update:  The website (www.collegecuteness.com) no longer appears to be operative, as of May 2007.  However, it appears the site can be recovered from the "Internet Archive."

A little song on statistical significance...

p less than oh-five

Lyrics by Alan Reifman

(May be sung to the tune of “Up, Up, and Away,” Jimmy Webb, popularized by the Fifth Dimension)

(Back-up vocals in parentheses)

When you analyze, the data you’ve collected,

Should H-oh be kept, or should it be rejected?

You run the test, and see whatever your output will show,

p less than oh-five,

On my dutiful, immutable, technique,

Below point-oh-five, on your significance test,

Tells you that H-oh must, of course, be put to rest,

The result that you’ve obtained, probably, is not from chance,

p less than oh-five,

On my dutiful, immutable, technique,

(Bridge)

p level doesn’t tell you the magnitude,

Of the relationship that you’re studying,

If the sample happens to be large enough,

A small result will survive, to get below the oh-five,

When you realize, that you will reject H-oh,

All you can claim is, the mu difference ain’t zero*,

You must remember what you can learn from the oh-five test,

p less than oh-five,

On my dutiful, immutable, technique (technique)…

(Fade out:  “p less than oh-five…”)

Note.  *The version listed above is for a t-test.  For a correlational analysis, you can substitute “rho value ain’t zero.”  For a chi-square test, you can substitute “cell percents don’t all follow.”

Hypothesis Testing

We learned earlier that a hypothesis is "a relatively specific prediction of how two or more variables should be related;" today we look at scientific hypothesis testing more formally

Important new concept: Null Hypothesis (Ho):  Statement that there is no relationship, no difference, no effect, etc.  Treatment will have no effect, experimental and control groups will not differ on DV, etc.

Researcher does not necessarily believe null hypothesis; it’s just a standard procedure.

Three Steps of Scientific Hypothesis Testing

1. State the null hypothesis (Ho)

2. Do the study.

3. If the groups significantly differ with p < .05, REJECT the null hypothesis.

(If the difference is not significant, null hypothesis must be kept alive.)

Analogy to jurors in a criminal trial:

• Jurors receive evidence (witnesses, physical evidence).
• If evidence overwhelmingly indicative of guilt ("beyond a reasonable doubt"), REJECT null hypothesis of innocence and vote "guilty."

Analogy to a sporting event:

• Start with "null hypothesis" -- Both teams are equal in ability.
• Play the game.
• For whichever team wins (especially if by a large margin), you can REJECT null hypothesis and say the winning team is better. Expected frequencies represent the null hypothesis, Ho (e.g., what frequencies would be expected if there were no male-female differences).

Exp. Freq. = (Total for Row X Total for Column)/ Grand Total

Chi-Square
Lyrics by Alan Reifman

(May be sung to the tune of “Hey Jude,” Lennon/McCartney)

Chi-square, yes you are there,
So we can test, association,
Of nominal, variables, that you’ve got,
Give it a shot, and see what happens,

Chi-square, yes you compare,

Each cell’s observed counts, with the expected ones,

The null hypothesis, determines your E’s,

If you will, please, apply the formula,

Cell differences, each one you square, so that they’re all,

Displayed in a, positive direction,

Each cell’s squared difference, divide by its expected,

Summing these yields, the overall chi-square,

Nah nah nah, nah nah, nah nah nah nah,

Chi-square, columns and rows,

Give the table’s, degrees of freedom,

Remember to use these, when you test the,

So is your test, significant, or is it not?

You must consult, the critical value,

The null that counts, are independent?

Nah nah nah, nah nah, nah nah nah nah,

Chi-square, yes you compare,

Each cell’s observed counts, with the expected ones,

The null hypothesis, determines your E’s,

If you will, please, have the computer...

Run it, run it, run it, run it, run it, run it, yeah...

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

O minus E, difference squared, divided by E, for each cell,

Then, add the cell-based chi-squares, into a sum, that will tell…

Expressing Data Through Relative Risk

(Using Babbie Church-Attendance Example, p. 459)

Women’s absolute probability (observed) of going to church is .83 (50/60)

Men’s absolute probability (observed) of going to church is .50 (20/40)

Relative risk compares one group’s probability to another group’s (i.e., one group relative to another):

Women              .83
----------      =     ----   =
1.67   =        Relative Risk
Men                 .50

Excerpt from NBC Today Show interview (May 28, 2003)
in which the distinction between absolute and relative risks comes up

Dr. Steven Goldstein, professor, OB/GYN, New York University Medical Center, discusses a new study which finds hormone replacement therapy may double the risk for dementia in women who are 65 and older.

[midway through the interview]

Dr. GOLDSTEIN:  ...So if you look at the numbers, although the relative risk doubled, what it really meant was your risk of developing dementia, if you were on no hormones, was one woman in 500 per year. If you went on hormones, your risk went to two women in 500 per year. That's a doubling of risk. That's still 498 out of 500 women who did just fine. So it's the difference between a relative risk and an absolute risk.

KATIE COURIC: As one doctor said, 'A small number doubled is still...

Dr. GOLDSTEIN: A small number.

COURIC: ...a small number.'

Dr. GOLDSTEIN: Still, if three million women in this country take hormone therapy, when you look at that increased risk, that could be as many as 6000 women.

Websites Useful for Statistics

Survey Documentation and Analysis (SDA) project (University of California-Berkeley)
Excellent website for conducting statistical analyses on real data (surveys in the public domain)

Statistical Thinking (University of Baltimore)
Has a "ton" of information on virtually any statistical topic that you'd encounter

Stat Pages -- Compendium of virtually any kind of statistical calculator one would need

Brief checklist for assessing one's proficiency in basic statistics

CAUSEweb (Consortium for the Advancement of Undergraduate Statistics Education) collection of fun resources (songs, jokes, etc.).

NOTE:  Introductory information on the correlation coefficient is available here, within my lecture notes on reliability and validity.

A growing and exciting field, for those who are interested, is the statistical analysis of sports, going beyond simple averages.  Probably the most prominent organization in this area is SABR, the Society for American Baseball Research.  Although it originates in baseball, the term "sabermetrics" has come to represent statistical analysis of any sport.  Here are some websites:

Phil Birnbaum's Sabermetric Research Blog (all sports).