We encounter data and conclusions based on data every day. Statistics is the scientific discipline that provides methods to help us make sense of data. Some people
are suspicious of conclusions based on statistical analyses. Extreme skeptics, usually
speaking out of ignorance, characterize the discipline as a subcategory of lying—
something used for deception rather than for positive ends. However, we believe that
statistical methods, used intelligently, offer a set of powerful tools for gaining insight
into the world around us. Statistical methods are used in business, medicine, agriculture, social sciences, natural sciences, and applied sciences, such as engineering. The
widespread use of statistical analyses in diverse fields has led to increased recognition
that statistical literacy—a familiarity with the goals and methods of statistics—should
be a basic component of a well-rounded educational program.
The field of statistics teaches us how to make intelligent judgments and informed
decisions in the presence of uncertainty and variation. In this chapter, we consider the
nature and role of variability in statistical settings, introduce some basic terminology,
and look at some simple graphical displays for summarizing data.
Three Reasons to Study Statistics
Because statistical methods are used to organize, summarize, and draw conclusions
from data, a familiarity with statistical techniques and statistical literacy is vital in today’s society. Everyone needs to have a basic understanding of statistics, and many .
college majors require at least one course in statistics. There are three important reasons why statistical literacy is important: (1) to be informed, (2) to understand issues
and be able to make sound decisions based on data, and (3) to be able to evaluate decisions that affect your life. Let’s explore each reason in detail.
■ The First Reason: Being Informed ....................................................................
How do we decide whether claims based on numerical information are reasonable?
We are bombarded daily with numerical information in news, in advertisements, and
even in conversation. For example, here are a few of the items employing statistical
methods that were part of just two weeks’ news.
■ The increasing popularity of online shopping has many consumers using Internet
access at work to browse and shop online. In fact, the Monday after Thanksgiving
has been nicknamed “Cyber Monday” because of the large increase in online purchases that occurs on that day. Data from a large-scale survey conducted in early
November, 2005, by a market research firm was used to compute estimates of the
percent of men and women who shop online while at work. The resulting estimates probably won’t make most employers happy—42% of the men and 32% of
the women in the sample were shopping online at work! (Detroit Free Press and
San Luis Obispo Tribune, November 26, 2005)
■ A story in the New York Times titled “Students Ace State Tests, but Earn D’s From
U.S.” investigated discrepancies between state and federal standardized test results. When researchers compared state test results to the most recent results on
the National Assessment of Educational Progress (NAEP), they found that large
differences were common. For example, one state reported 89% of fourth graders
were proficient in reading based on the state test, while only 18% of fourth graders
in that state were considered proficient in reading on the federal test! An explanation of these large discrepancies and potential consequences was discussed. (New
York Times, November 26, 2005)
■ Can dogs help patients with heart failure by reducing stress and anxiety? One of
the first scientific studies of the effect of therapeutic dogs found that a measure of
anxiety decreased by 24% for heart patients visited by a volunteer and dog, but
only by 10% for patients visited by just the volunteer. Decreases were also noted
in measures of stress and heart and lung pressure, leading researchers to conclude
that the use of therapeutic dogs is beneficial in the treatment of heart patients. (San
Luis Obispo Tribune, November 16, 2005)
■ Late in 2005, those eligible for Medicare had to decide which, if any, of the many
complex new prescription medication plans was right for them. To assist with this
decision, a program called PlanFinder that compares available options was made
available online. But are seniors online? Based on a survey conducted by the Los
Angeles Times, it was estimated that the percentage of senior citizens that go online is only between 23% and 30%, causing concern over whether providing only
an online comparison is an effective way to assist seniors with this important decision. (Los Angeles Times, November 27, 2005)
■ Are kids ruder today than in the past? An article titled “Kids Gone Wild” summarized data from a survey conducted by the Associated Press. Nearly 70% of
those who participated in the survey said that people were ruder now than 20 years
ago, with kids being the biggest offenders. As evidence that this is a serious problem, the author of the article also referenced a 2004 study conducted by Public
Agenda, a public opinion research group. That study indicated that more than one third of teachers had either seriously considered leaving teaching or knew a
colleague who left because of intolerable student behavior. (New York Times,
November 27, 2005)
■ When people take a vacation, do they really leave work behind? Data from a poll
conducted by Travelocity led to the following estimates: Approximately 40% of
travelers check work email while on vacation, about 33% take cell phones on vacation in order to stay connected with work, and about 25% bring a laptop computer on vacation. The travel industry is paying attention—hotels, resorts, and
even cruise ships are now making it easier for “vacationers” to stay connected to
work. (San Luis Obispo Tribune, December 1, 2005)
■ How common is domestic violence? Based on interviews with 24,000 women in
10 different countries, a study conducted by the World Health Organization found
that the percentage of women who have been abused by a partner varied widely—
from 15% of women in Japan to 71% of women in Ethiopia. Even though the domestic violence rate differed dramatically from country to country, in all of the
countries studied women who were victims of domestic violence were about
twice as likely as other women to be in poor health, even long after the violence
had stopped. (San Francisco Chronicle, November 25, 2005)
■ Does it matter how long children are bottle-fed? Based on a study of 2121 children between the ages of 1 and 4, researchers at the Medical College of Wisconsin concluded that there was an association between iron deficiency and the length
of time that a child is bottle-fed. They found that children who were bottle-fed
between the ages of 2 and 4 were three times more likely to be iron deficient than
those who stopped by the time they were 1 year old. (Milwaukee Journal Sentinel
and San Luis Obispo Tribune, November 26, 2005)
■ Parental involvement in schools is often regarded as an important factor in student
achievement. However, data from a study of low-income public schools in California led researchers to conclude that other factors, such as prioritizing student
achievement, encouraging teacher collaboration and professional development,
and using assessment data to improve instruction, had a much greater impact on
the schools’ Academic Performance Index. (Washington Post and San Francisco
Chronicle, November 26, 2005)
To be an informed consumer of reports such as those described above, you must be
able to do the following:
1. Extract information from tables, charts, and graphs.
2. Follow numerical arguments.
3. Understand the basics of how data should be gathered, summarized, and analyzed
to draw statistical conclusions.
Your statistics course will help prepare you to perform these tasks.
■ The Second Reason:
Making Informed Judgments .....................................
Throughout your personal and professional life, you will need to understand statistical information and make informed decisions using this information. To make these
decisions, you must be able to do the following:
1. Decide whether existing information is adequate or whether additional information
is required.
2. If necessary, collect more information in a reasonable and thoughtful way4. Analyze the available data.
5. Draw conclusions, make decisions, and assess the risk of an incorrect decision.
People informally use these steps to make everyday decisions. Should you go out for
a sport that involves the risk of injury? Will your college club do better by trying to
raise funds with a benefit concert or with a direct appeal for donations? If you choose
a particular major, what are your chances of finding a job when you graduate? How
should you select a graduate program based on guidebook ratings that include information on percentage of applicants accepted, time to obtain a degree, and so on? The
study of statistics formalizes the process of making decisions based on data and provides the tools for accomplishing the steps listed.
■ The Third Reason: Evaluating Decisions That Affect Your Life ......................
While you will need to make informed decisions based on data, it is also the case that
other people will use statistical methods to make decisions that affect you as an individual. An understanding of statistical techniques will allow you to question and evaluate decisions that affect your well-being. Some examples are:
■ Many companies now require drug screening as a condition of employment. With
these screening tests there is a risk of a false-positive reading (incorrectly indicating drug use) or a false-negative reading (failure to detect drug use). What are the
consequences of a false result? Given the consequences, is the risk of a false result acceptable?
■ Medical researchers use statistical methods to make recommendations regarding
the choice between surgical and nonsurgical treatment of such diseases as coronary heart disease and cancer. How do they weigh the risks and benefits to reach
such a recommendation?
■ University financial aid offices survey students on the cost of going to school and
collect data on family income, savings, and expenses. The resulting data are used
to set criteria for deciding who receives financial aid. Are the estimates they use
accurate?
■ Insurance companies use statistical techniques to set auto insurance rates, although some states restrict the use of these techniques. Data suggest that young
drivers have more accidents than older ones. Should laws or regulations limit how
much more young drivers pay for insurance? What about the common practice of
charging higher rates for people who live in urban areas?
An understanding of elementary statistical methods can help you to evaluate
whether important decisions such as the ones just mentioned are being made in a reasonable way.
We hope that this textbook will help you to understand the logic behind statistical
reasoning, prepare you to apply statistical methods appropriately, and enable you to
recognize when statistical arguments are faulty.
The Nature and Role of Variability
Statistics is a science whose focus is on collecting, analyzing, and drawing conclusions
from data. If we lived in a world where all measurements were identical for every individual, all three of these tasks would be simple. Imagine a population consisting ofall students at a particular university. Suppose that every student took the same number of units, spent exactly the same amount of money on textbooks this semester, and
favored increasing student fees to support expanding library services. For this population, there is no variability in the number of units, amount spent on books, or student
opinion on the fee increase. A researcher studying a sample from this population to
draw conclusions about these three variables would have a particularly easy task. It
would not matter how many students the researcher included in the sample or how the
sampled students were selected. In fact, the researcher could collect information on
number of units, amount spent on books, and opinion on the fee increase by just stopping the next student who happened to walk by the library. Because there is no variability in the population, this one individual would provide complete and accurate information about the population, and the researcher could draw conclusions based on
the sample with no risk of error.
The situation just described is obviously unrealistic. Populations with no variability are exceedingly rare, and they are of little statistical interest because they present
no challenge! In fact, variability is almost universal. It is variability that makes life
(and the life of a statistician, in particular) interesting. We need to understand variability to be able to collect, analyze, and draw conclusions from data in a sensible way.
One of the primary uses of descriptive statistical methods is to increase our understanding of the nature of variability in a population.
Examples 1.1 and 1.2 illustrate how an understanding of variability is necessary
to draw conclusions based on data.
..........................................................................................................................................
Example 1.1 If the Shoe Fits
The graphs in Figure 1.1 are examples of a type of graph called a histogram. (The
construction and interpretation of such graphs is discussed in Chapter 3.) Figure
1.1(a) shows the distribution of the heights of female basketball players who played
at a particular university between 1990 and 1998. The height of each bar in the
1.2 ■ The Nature and Role of Variability 5
74
10
20
0
58 60 62 64 66 68 70 72 76 78
Height
Frequency
(b)
74
10
20
30
40
0
58 60 62 64 66 68 70 72 76 78
Height
Frequency
(a)
Figure 1.1 Histograms of heights (in inches) of female athletes: (a) basketball
players; (b) gymnastsThe first histogram shows that the heights of female basketball players varied,
with most heights falling between 68 in. and 76 in. In the second histogram we see
that the heights of female gymnasts also varied, with most heights in the range of
60 in. to 72 in. It is also clear that there is more variation in the heights of the gymnasts than in the heights of the basketball players, because the gymnast histogram
spreads out more about its center than does the basketball histogram.
Now suppose that a tall woman (5 ft 11 in.) tells you she is looking for her sister
who is practicing with her team at the gym. Would you direct her to where the basketball team is practicing or to where the gymnastics team is practicing? What reasoning would you use to decide? If you found a pair of size 6 shoes left in the locker
room, would you first try to return them by checking with members of the basketball
team or the gymnastics team?
You probably answered that you would send the woman looking for her sister
to the basketball practice and that you would try to return the shoes to a gymnastics
team member. To reach these conclusions, you informally used statistical reasoning
that combined your own knowledge of the relationship between heights of siblings
and between shoe size and height with the information about the distributions of
heights presented in Figure 1.1. You might have reasoned that heights of siblings
tend to be similar and that a height as great as 5 ft 11 in., although not impossible,
would be unusual for a gymnast. On the other hand, a height as tall as 5 ft 11 in.
would be a common occurrence for a basketball player. Similarly, you might have
reasoned that tall people tend to have bigger feet and that short people tend to have
smaller feet. The shoes found were a small size, so it is more likely that they belong
to a gymnast than to a basketball player, because small heights and small feet are
usual for gymnasts and unusual for basketball players.