The Role of Statistics and the Data Analysis Process

 We encounter data and conclusions based on data every day. Statistics is the scientific discipline that provides methods to help us make sense of data. Some people are suspicious of conclusions based on statistical analyses. Extreme skeptics, usually speaking out of ignorance, characterize the discipline as a subcategory of lying— something used for deception rather than for positive ends. However, we believe that statistical methods, used intelligently, offer a set of powerful tools for gaining insight into the world around us. Statistical methods are used in business, medicine, agriculture, social sciences, natural sciences, and applied sciences, such as engineering. The widespread use of statistical analyses in diverse fields has led to increased recognition that statistical literacy—a familiarity with the goals and methods of statistics—should be a basic component of a well-rounded educational program. The field of statistics teaches us how to make intelligent judgments and informed decisions in the presence of uncertainty and variation. In this chapter, we consider the nature and role of variability in statistical settings, introduce some basic terminology, and look at some simple graphical displays for summarizing data.

Three Reasons to Study Statistics

Because statistical methods are used to organize, summarize, and draw conclusions from data, a familiarity with statistical techniques and statistical literacy is vital in today’s society. Everyone needs to have a basic understanding of statistics, and many .
college majors require at least one course in statistics. There are three important reasons why statistical literacy is important: (1) to be informed, (2) to understand issues and be able to make sound decisions based on data, and (3) to be able to evaluate decisions that affect your life. Let’s explore each reason in detail. ■ The First Reason: Being Informed .................................................................... How do we decide whether claims based on numerical information are reasonable? We are bombarded daily with numerical information in news, in advertisements, and even in conversation. For example, here are a few of the items employing statistical methods that were part of just two weeks’ news. ■ The increasing popularity of online shopping has many consumers using Internet access at work to browse and shop online. In fact, the Monday after Thanksgiving has been nicknamed “Cyber Monday” because of the large increase in online purchases that occurs on that day. Data from a large-scale survey conducted in early November, 2005, by a market research firm was used to compute estimates of the percent of men and women who shop online while at work. The resulting estimates probably won’t make most employers happy—42% of the men and 32% of the women in the sample were shopping online at work! (Detroit Free Press and San Luis Obispo Tribune, November 26, 2005) ■ A story in the New York Times titled “Students Ace State Tests, but Earn D’s From U.S.” investigated discrepancies between state and federal standardized test results. When researchers compared state test results to the most recent results on the National Assessment of Educational Progress (NAEP), they found that large differences were common. For example, one state reported 89% of fourth graders were proficient in reading based on the state test, while only 18% of fourth graders in that state were considered proficient in reading on the federal test! An explanation of these large discrepancies and potential consequences was discussed. (New York Times, November 26, 2005) ■ Can dogs help patients with heart failure by reducing stress and anxiety? One of the first scientific studies of the effect of therapeutic dogs found that a measure of anxiety decreased by 24% for heart patients visited by a volunteer and dog, but only by 10% for patients visited by just the volunteer. Decreases were also noted in measures of stress and heart and lung pressure, leading researchers to conclude that the use of therapeutic dogs is beneficial in the treatment of heart patients. (San Luis Obispo Tribune, November 16, 2005) ■ Late in 2005, those eligible for Medicare had to decide which, if any, of the many complex new prescription medication plans was right for them. To assist with this decision, a program called PlanFinder that compares available options was made available online. But are seniors online? Based on a survey conducted by the Los Angeles Times, it was estimated that the percentage of senior citizens that go online is only between 23% and 30%, causing concern over whether providing only an online comparison is an effective way to assist seniors with this important decision. (Los Angeles Times, November 27, 2005) ■ Are kids ruder today than in the past? An article titled “Kids Gone Wild” summarized data from a survey conducted by the Associated Press. Nearly 70% of those who participated in the survey said that people were ruder now than 20 years ago, with kids being the biggest offenders. As evidence that this is a serious problem, the author of the article also referenced a 2004 study conducted by Public Agenda, a public opinion research group. That study indicated that more than one third of teachers had either seriously considered leaving teaching or knew a colleague who left because of intolerable student behavior. (New York Times, November 27, 2005) ■ When people take a vacation, do they really leave work behind? Data from a poll conducted by Travelocity led to the following estimates: Approximately 40% of travelers check work email while on vacation, about 33% take cell phones on vacation in order to stay connected with work, and about 25% bring a laptop computer on vacation. The travel industry is paying attention—hotels, resorts, and even cruise ships are now making it easier for “vacationers” to stay connected to work. (San Luis Obispo Tribune, December 1, 2005) ■ How common is domestic violence? Based on interviews with 24,000 women in 10 different countries, a study conducted by the World Health Organization found that the percentage of women who have been abused by a partner varied widely— from 15% of women in Japan to 71% of women in Ethiopia. Even though the domestic violence rate differed dramatically from country to country, in all of the countries studied women who were victims of domestic violence were about twice as likely as other women to be in poor health, even long after the violence had stopped. (San Francisco Chronicle, November 25, 2005) ■ Does it matter how long children are bottle-fed? Based on a study of 2121 children between the ages of 1 and 4, researchers at the Medical College of Wisconsin concluded that there was an association between iron deficiency and the length of time that a child is bottle-fed. They found that children who were bottle-fed between the ages of 2 and 4 were three times more likely to be iron deficient than those who stopped by the time they were 1 year old. (Milwaukee Journal Sentinel and San Luis Obispo Tribune, November 26, 2005) ■ Parental involvement in schools is often regarded as an important factor in student achievement. However, data from a study of low-income public schools in California led researchers to conclude that other factors, such as prioritizing student achievement, encouraging teacher collaboration and professional development, and using assessment data to improve instruction, had a much greater impact on the schools’ Academic Performance Index. (Washington Post and San Francisco Chronicle, November 26, 2005) To be an informed consumer of reports such as those described above, you must be able to do the following: 1. Extract information from tables, charts, and graphs. 2. Follow numerical arguments. 3. Understand the basics of how data should be gathered, summarized, and analyzed to draw statistical conclusions. Your statistics course will help prepare you to perform these tasks. ■ The Second Reason: 
Making Informed Judgments ..................................... Throughout your personal and professional life, you will need to understand statistical information and make informed decisions using this information. To make these decisions, you must be able to do the following: 1. Decide whether existing information is adequate or whether additional information is required. 2. If necessary, collect more information in a reasonable and thoughtful way4. Analyze the available data. 5. Draw conclusions, make decisions, and assess the risk of an incorrect decision. People informally use these steps to make everyday decisions. Should you go out for a sport that involves the risk of injury? Will your college club do better by trying to raise funds with a benefit concert or with a direct appeal for donations? If you choose a particular major, what are your chances of finding a job when you graduate? How should you select a graduate program based on guidebook ratings that include information on percentage of applicants accepted, time to obtain a degree, and so on? The study of statistics formalizes the process of making decisions based on data and provides the tools for accomplishing the steps listed. ■ The Third Reason: Evaluating Decisions That Affect Your Life ...................... While you will need to make informed decisions based on data, it is also the case that other people will use statistical methods to make decisions that affect you as an individual. An understanding of statistical techniques will allow you to question and evaluate decisions that affect your well-being. Some examples are: ■ Many companies now require drug screening as a condition of employment. With these screening tests there is a risk of a false-positive reading (incorrectly indicating drug use) or a false-negative reading (failure to detect drug use). What are the consequences of a false result? Given the consequences, is the risk of a false result acceptable? ■ Medical researchers use statistical methods to make recommendations regarding the choice between surgical and nonsurgical treatment of such diseases as coronary heart disease and cancer. How do they weigh the risks and benefits to reach such a recommendation? ■ University financial aid offices survey students on the cost of going to school and collect data on family income, savings, and expenses. The resulting data are used to set criteria for deciding who receives financial aid. Are the estimates they use accurate? ■ Insurance companies use statistical techniques to set auto insurance rates, although some states restrict the use of these techniques. Data suggest that young drivers have more accidents than older ones. Should laws or regulations limit how much more young drivers pay for insurance? What about the common practice of charging higher rates for people who live in urban areas? An understanding of elementary statistical methods can help you to evaluate whether important decisions such as the ones just mentioned are being made in a reasonable way. We hope that this textbook will help you to understand the logic behind statistical reasoning, prepare you to apply statistical methods appropriately, and enable you to recognize when statistical arguments are faulty.


The Nature and Role of Variability

Statistics is a science whose focus is on collecting, analyzing, and drawing conclusions from data. If we lived in a world where all measurements were identical for every individual, all three of these tasks would be simple. Imagine a population consisting ofall students at a particular university. Suppose that every student took the same number of units, spent exactly the same amount of money on textbooks this semester, and favored increasing student fees to support expanding library services. For this population, there is no variability in the number of units, amount spent on books, or student opinion on the fee increase. A researcher studying a sample from this population to draw conclusions about these three variables would have a particularly easy task. It would not matter how many students the researcher included in the sample or how the sampled students were selected. In fact, the researcher could collect information on number of units, amount spent on books, and opinion on the fee increase by just stopping the next student who happened to walk by the library. Because there is no variability in the population, this one individual would provide complete and accurate information about the population, and the researcher could draw conclusions based on the sample with no risk of error. The situation just described is obviously unrealistic. Populations with no variability are exceedingly rare, and they are of little statistical interest because they present no challenge! In fact, variability is almost universal. It is variability that makes life (and the life of a statistician, in particular) interesting. We need to understand variability to be able to collect, analyze, and draw conclusions from data in a sensible way. One of the primary uses of descriptive statistical methods is to increase our understanding of the nature of variability in a population. Examples 1.1 and 1.2 illustrate how an understanding of variability is necessary to draw conclusions based on data.

 .......................................................................................................................................... Example 1.1 If the Shoe Fits The graphs in Figure 1.1 are examples of a type of graph called a histogram. (The construction and interpretation of such graphs is discussed in Chapter 3.) Figure 1.1(a) shows the distribution of the heights of female basketball players who played at a particular university between 1990 and 1998. The height of each bar in the 1.2 ■ The Nature and Role of Variability 5 74 10 20 0 58 60 62 64 66 68 70 72 76 78 Height Frequency (b) 74 10 20 30 40 0 58 60 62 64 66 68 70 72 76 78 Height Frequency (a) Figure 1.1 Histograms of heights (in inches) of female athletes: (a) basketball players; (b) gymnasts
The first histogram shows that the heights of female basketball players varied, with most heights falling between 68 in. and 76 in. In the second histogram we see that the heights of female gymnasts also varied, with most heights in the range of 60 in. to 72 in. It is also clear that there is more variation in the heights of the gymnasts than in the heights of the basketball players, because the gymnast histogram spreads out more about its center than does the basketball histogram. Now suppose that a tall woman (5 ft 11 in.) tells you she is looking for her sister who is practicing with her team at the gym. Would you direct her to where the basketball team is practicing or to where the gymnastics team is practicing? What reasoning would you use to decide? If you found a pair of size 6 shoes left in the locker room, would you first try to return them by checking with members of the basketball team or the gymnastics team? You probably answered that you would send the woman looking for her sister to the basketball practice and that you would try to return the shoes to a gymnastics team member. To reach these conclusions, you informally used statistical reasoning that combined your own knowledge of the relationship between heights of siblings and between shoe size and height with the information about the distributions of heights presented in Figure 1.1. You might have reasoned that heights of siblings tend to be similar and that a height as great as 5 ft 11 in., although not impossible, would be unusual for a gymnast. On the other hand, a height as tall as 5 ft 11 in. would be a common occurrence for a basketball player. Similarly, you might have reasoned that tall people tend to have bigger feet and that short people tend to have smaller feet. The shoes found were a small size, so it is more likely that they belong to a gymnast than to a basketball player, because small heights and small feet are usual for gymnasts and unusual for basketball players.