Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Elementary Statistics 10th Ed.

Elementary Statistics 10th Ed.

Published by Junix Kaalim, 2022-09-12 13:26:53

Description: Triola, Mario F.

Search

Read the Text Version

6-3 Applications of Normal Distributions 267 15. Designing Doorways The standard doorway height is 80 in. a. What percentage of men are too tall to fit through a standard doorway without bending, and what percentage of women are too tall to fit through a standard door- way without bending? Based on those results, does it appear that the current door- way design is adequate? b. If a statistician designs a house so that all of the doorways have heights that are sufficient for all men except the tallest 5%, what doorway height would be used? 16. Designing Caskets The standard casket has an inside length of 78 in. a. What percentage of men are too tall to fit in a standard casket, and what percentage of women are too tall to fit in a standard casket? Based on those results, does it ap- pear that the standard casket size is adequate? b. A manufacturer of caskets wants to reduce production costs by making smaller caskets. What inside length would fit all men except the tallest 1%? 17. Birth Weights Birth weights in the United States are normally distributed with a mean of 3420 g and a standard deviation of 495 g. If a hospital plans to set up special observation conditions for the lightest 2% of babies, what weight is used for the cut- off separating the lightest 2% from the others? 18. Birth Weights Birth weights in Norway are normally distributed with a mean of 3570 g and a standard deviation of 500 g. Repeat Exercise 17 for babies born in Norway. Is the result very different from the result found in Exercise 17? 19. Eye Contact In a study of facial behavior, people in a control group are timed for eye contact in a 5-minute period. Their times are normally distributed with a mean of 184.0 s and a standard deviation of 55.0 s (based on data from “Ethological Study of Facial Behavior in Nonparanoid and Paranoid Schizophrenic Patients,” by Pittman, Olk, Orr, and Singh, Psychiatry, Vol. 144, No. 1). For a randomly selected person from the control group, find the probability that the eye contact time is greater than 230.0 s, which is the mean for paranoid schizophrenics. 20. Body Temperatures Based on the sample results in Data Set 2 of Appendix B, assume that human body temperatures are normally distributed with a mean of 98.20ЊF and a standard deviation of 0.62ЊF. a. Bellevue Hospital in New York City uses 100.6ЊF as the lowest temperature con- sidered to be a fever. What percentage of normal and healthy persons would be considered to have a fever? Does this percentage suggest that a cutoff of 100.6ЊF is appropriate? b. Physicians want to select a minimum temperature for requiring further medical tests. What should that temperature be, if we want only 5.0% of healthy people to exceed it? (Such a result is a false positive, meaning that the test result is positive, but the subject is not really sick.) 21. Lengths of Pregnancies The lengths of pregnancies are normally distributed with a mean of 268 days and a standard deviation of 15 days. a. One classical use of the normal distribution is inspired by a letter to “Dear Abby” in which a wife claimed to have given birth 308 days after a brief visit from her husband, who was serving in the Navy. Given this information, find the probability of a pregnancy lasting 308 days or longer. What does the result suggest? b. If we stipulate that a baby is premature if the length of pregnancy is in the lowest 4%, find the length that separates premature babies from those who are not prema- ture. Premature babies often require special care, and this result could be helpful to hospital administrators in planning for that care.

268 Chapter 6 Normal Probability Distributions 22. Hip Breadths and Aircraft Seats Engineers want to design seats in commercial aircraft so that they are wide enough to fit 98% of all males. (Accommodating 100% of males would require very wide seats that would be much too expensive.) Men have hip breadths that are normally distributed with a mean of 14.4 in. and a standard deviation of 1.0 in. (based on anthropometric survey data from Gordon, Clauser, et al.). Find P98. That is, find the hip breadth for men that separates the smallest 98% from the largest 2%. 23. Designing Helmets Engineers must consider the breadths of male heads when design- ing motorcycle helmets. Men have head breadths that are normally distributed with a mean of 6.0 in. and a standard deviation of 1.0 in. (based on anthropometric survey data from Gordon, Churchill, et al.). Due to financial constraints, the helmets will be designed to fit all men except those with head breadths that are in the smallest 2.5% or largest 2.5%. Find the minimum and maximum head breadths that the helmets will fit. 24. Sitting Distance A common design requirement is that an item (such as an aircraft or theater seat) must fit the range of people who fall between the 5th percentile for women and the 95th percentile for men. If this requirement is adopted, what is the minimum sitting distance and what is the maximum sitting distance? For the sitting distance, use the buttock-to-knee length. Men have buttock-to-knee lengths that are normally distributed with a mean of 23.5 in. and a standard deviation of 1.1 in. Women have buttock-to-knee lengths that are normally distributed with a mean of 22.7 in. and a standard deviation of 1.0 in. 25. Appendix B Data Set: Systolic Blood Pressure Refer to Data Set 1 in Appendix B and use the systolic blood pressure levels for males. a. Using the systolic blood pressure levels for males, find the mean, standard devia- tion, and verify that the data have a distribution that is roughly normal. b. Assuming that systolic blood pressure levels of males are normally distributed, find the 5th percentile and the 95th percentile. [Treat the statistics from part (a) as if they were population parameters.] Such percentiles could be helpful when physi- cians try to determine whether blood pressure levels are too low or too high. 26. Appendix B Data Set: Systolic Blood Pressure Repeat Exercise 25 for females. 6-3 BEYOND THE BASICS 27. Units of Measurement Heights of women are normally distributed. a. If heights of individual women are expressed in units of inches, what are the units used for the z scores that correspond to individual heights? b. If heights of all women are converted to z scores, what are the mean, standard de- viation, and distribution of these z scores? 28. Using Continuity Correction There are many situations in which a normal distribu- tion can be used as a good approximation to a random variable that has only discrete values. In such cases, we can use this continuity correction: Represent each whole number by the interval extending from 0.5 below the number to 0.5 above it. Assume that IQ scores are all whole numbers having a distribution that is approximately nor- mal with a mean of 100 and a standard deviation of 15. a. Without using any correction for continuity, find the probability of randomly se- lecting someone with an IQ score greater than 105. b. Using the correction for continuity, find the probability of randomly selecting someone with an IQ score greater than 105. c. Compare the results from parts (a) and (b).

6-4 Sampling Distributions and Estimators 269 29. Curving Test Scores A statistics professor gives a test and finds that the scores are normally distributed with a mean of 25 and a standard deviation of 5. She plans to curve the scores. a. If she curves by adding 50 to each grade, what is the new mean? What is the new standard deviation? b. Is it fair to curve by adding 50 to each grade? Why or why not? c. If the grades are curved according to the following scheme (instead of adding 50), find the numerical limits for each letter grade. A: Top 10% B: Scores above the bottom 70% and below the top 10% C: Scores above the bottom 30% and below the top 30% D: Scores above the bottom 10% and below the top 70% F: Bottom 10% d. Which method of curving the grades is fairer: Adding 50 to each grade or using the scheme given in part (c)? Explain. 30. SAT and ACT Tests Scores by women on the SAT-I test are normally distributed with a mean of 998 and a standard deviation of 202. Scores by women on the ACT test are normally distributed with a mean of 20.9 and a standard deviation of 4.6. Assume that the two tests use different scales to measure the same aptitude. a. If a woman gets a SAT score that is the 67th percentile, find her actual SAT score and her equivalent ACT score. b. If a woman gets a SAT score of 1220, find her equivalent ACT score. Sampling Distributions 6-4 and Estimators Key Concept The main objective of this section is to understand the concept of a sampling distribution of a statistic, which is the distribution of all values of that statistic when all possible samples of the same size are taken from the same popula- tion. Specifically, we discuss the sampling distribution of the proportion and the sam- pling distribution of the mean. We will also see that some statistics (such as the pro- portion and mean) are good for estimating values of population parameters, whereas other statistics (such as the median) don’t make good estimators of population parameters. In a survey conducted by the Education and Resources Institute, 750 college students were randomly selected, and 64% (or 0.64) of them said that they had at least one credit card. This survey involves only 750 respondents from a popu- lation of roughly 15 million college students. We know that sample statistics naturally vary from sample to sample, so another sample of 750 college students will probably yield a sample proportion different from the 64% found in the first survey. In this context, we might think of the sample proportion as a random variable. Given that the sample of 750 college students is such a tiny percentage (0.005%) of the population, can we really expect the sample proportion to be a rea- sonable estimate of the actual proportion of all college students who have credit cards? Yes! Statisticians, being the clever creatures that they are, have devised methods for using sample results to estimate population parameters with fairly

270 Chapter 6 Normal Probability Distributions good accuracy. How do they do it? Their approach is based on an understanding of how statistics behave. By understanding the distribution of a sample proportion, statisticians can determine how accurate an individual sample proportion is likely to be. They also understand the distribution of sample means. In general, they un- derstand the concept of a sampling distribution of a statistic. Do Boys or Girls Run Definition in the Family? The sampling distribution of a statistic (such as a sample proportion or The author of this book, his sib- sample mean) is the distribution of all values of the statistic when all possible lings, and his siblings’ children samples of the same size n are taken from the same population. (The consist of 11 males and only sampling distribution of a statistic is typically represented as a probability one female. Is this an example distribution in the format of a table, probability histogram, or formula.) of a phenomenon whereby one particular gender runs in a fam- Sampling Distribution of Proportion ily? This issue was studied by examining a random sample of The preceding definition can be applied to the specific statistic of a sample pro- 8770 households in the United portion. States. The results were re- ported in the Chance magazine Definition article “Does Having Boys or Girls Run in the Family?” by The sampling distribution of the proportion is the distribution of sample Joseph Rodgers and Debby proportions, with all samples having the same sample size n taken from the Doughty. Part of their analysis same population. involves use of the binomial probability. Their conclusion is We will better understand the important concept of a sampling distribution of the that “We found no compelling proportion if we consider some specific examples. evidence that sex bias runs in the family.” EXAMPLE Sampling Distribution of the Proportion of Girls from Two Births When two births are randomly selected, the sample space is bb, bg, gb, gg. Those four equally likely outcomes suggest that the probabil- ity of 0 girls is 0.25, the probability of 1 girl is 0.50, and the probability of 2 girls is 0.25. The accompanying display shows the probability distribution for the number of girls, followed by two different formats (table and graph) describing the sampling distribution for the proportion of girls. In addition to a table or graph, a sampling distribution can also be ex- pressed as a formula (see Exercise 15), or it might be described some other way, such as this: “The sampling distribution of the sample mean is a normal distribution with m ϭ 100 and s ϭ 15.” In this section, we usually describe a sampling distribution using a table that lists values of the sample statistic along with their corresponding probabilities. In later chapters we will use some of the other descriptions. Although typical surveys involve sample sizes around 1000 to 2000 and population sizes often in the millions, the next example involves a population with only three values so that we can easily list every possible sample.

6-4 Sampling Distributions and Estimators 271 Number of P (x) Girls from 0.25 2 Births 0.50 0.25 x 0 1 2 Sampling distribution of Prophets for Profits the proportion of girls Many books and computer pro- from 2 births grams claim to be helpful in predicting winning lottery Table Probability numbers. Some use the theory histogram that particular numbers are “due” (and should be selected) Proportion of Probability Probability 0.5 because they haven’t been girls from coming up often; others use the 2 births 0.25 0.25 theory that some numbers are 0.50 “cold” (and should be avoided) 0 0.25 0 0 0.5 1 because they haven’t been 0.5 Proportion of coming up often; and still oth- 1 girls from ers use astrology, numerology, 2 births or dreams. Because selections of winning lottery number EXAMPLE Sampling Distribution of Proportions A quarterback combinations are independent threw 1 interception in his first game, 2 interceptions in his second game, 5 in- events, such theories are terceptions in his third game, and he then retired. Consider the population con- worthless. A valid approach is sisting of the values 1, 2, 5. Note that two of the values (1 and 5) are odd, so to choose numbers that are the proportion of odd numbers in the population is 2>3. “rare” in the sense that they are not selected by other people, so a. List all of the different possible samples of size n ϭ 2 selected with replace- that if you win, you will not ment. (Later, we will explain why sampling with replacement is so important.) need to share your jackpot with For each sample, find the proportion of numbers that are odd. Use a table to many others. For this reason, represent the sampling distribution for the proportion of odd numbers. the combination of 1, 2, 3, 4, 5, and 6 is a bad choice because b. Find the mean of sampling distribution for the proportion of odd numbers. many people use it, whereas 12, 17, 18, 33, 40, 46 is a much c. For the population of 1, 2, 5, the proportion of odd numbers is 2>3. Is the better choice, at least until it mean of the sampling distribution for the proportion of odd numbers also was published in this book. equal to 2>3? Do sample proportions target the value of the population pro- portion? That is, do the sample proportions have a mean that is equal to the population proportion? continued

272 Chapter 6 Normal Probability Distributions Table 6-2 Sampling Distribution of Proportions of Odd Numbers Sample Proportion Probability 1, 1 of Odd 1, 2 Numbers 1>9 1, 5 1>9 2, 1 1 1>9 2, 2 0.5 1>9 2, 5 1 1>9 5, 1 0.5 1>9 5, 2 0 1>9 5, 5 0.5 1>9 1 1>9 0.5 1 SOLUTION a. In Table 6-2 we list the nine different possible samples of size n ϭ 2 taken with replacement from the population of 1, 2, 5. Table 6-2 also shows the numbers of sample values that are odd numbers, and it includes their proba- bilities. (Because there are 9 equally likely samples, each sample has proba- bility 1>9.) Table 6-3, which is simply a condensed version of Table 6-2, con- cisely represents the sampling distribution of the proportion of odd numbers. b. Table 6-3 is a probability distribution, so we can find its mean by using Formula 5-1 from Section 5-2. We get the mean of 2>3 as follows: m 5 ©[x ? Psxd] 5 s0 ? 1>9d 1 s0.5 ? 4>9d 1 s1 ? 4>9d 5 6>9 5 2>3 c. The mean of the sampling distribution of proportions is 2>3, and 2>3 of the numbers in the popualation are odd. This is no coincidence. In gen- eral, the sampling distribution of proportions will have a mean that is equal to the population proportion. It is in this sense that we say that sample proportions “target” the population proportion. INTERPRETATION For the case of selecting two values (with replacement) from the population of 1, 2, 5, we have identified the sampling distribution (Table 6-3). We also found that the mean of the sampling distribution is 2>3, which is equal to the proportion of odd numbers in the population. Sample pro- portions therefore tend to target the population proportion, instead of systemat- ically tending to underestimate or overestimate that value. Table 6-3 Condensed Version of Table 6-2 Proportion of Probability Odd Numbers 0 1>9 0.5 4>9 1 4>9

6-4 Sampling Distributions and Estimators 273 The preceding example involves a fairly small population, so let’s now con- sider the genders of the Senators in the 107th Congress. Because there are only 100 members [13 females (F) and 87 males (M)], we can list the entire population: M F MM F MMMMMMM F MMMMMMM MMMMMMMMMMMM F F MMMMMM MMM F MMMMM F MMMMMMMMMM F MMMMMMMMMMMMMMMM F F F MMM F M F MMMMMMMMMMMMMM The population proportion of female Senators is p ϭ 13>100 ϭ 0.13. Usually, we don’t know all of the members of the population, so we must estimate it from a sample. For the purpose of studying the behavior of sample proportions, we list a few samples of size n ϭ 10, and we show the corresponding proportion of females. Sample 1: MF MM F MMMMM S sample proportion is 0.2 Table 6-4 Sample 2: MF MMMMMMMM S sample proportion is 0.1 Results from 100 Sample 3: MMMMMM F MMM S sample proportion is 0.1 Samples Sample 4: MMMMMMMMMM S sample proportion is 0 Sample 5: MMMMMMMM F M S sample proportion is 0.1 We prefer not to list all of the 100,000,000,000,000,000,000 different possible Proportion samples. Instead, the author randomly selected just 95 additional samples before of Female stopping to rotate his car tires. Combining these additional 95 samples with the 5 Senators Frequency listed here, we get 100 samples summarized in Table 6-4. 0.0 26 We can see from Table 6-4 that the mean of the 100 sample proportions is 0.119, 0.1 41 but if we were to include all other possible samples of size 10, the mean of the sample 0.2 24 proportions would equal 0.13, which is the value of the population proportion. Figure 0.3 7 6-17 shows the distribution of the 100 sample proportions summarized in Table 6-4. 0.4 1 The shape of that distribution is reasonably close to the shape that would have been 0.5 1 obtained with all possible samples of size 10. We can see that the distribution de- picted in Figure 6-17 is somewhat skewed to the right, but with a bit of a stretch, it Mean: 0.119 might be approximated very roughly by a normal distribution. In Figure 6-18 we 0.100 show the results obtained from 10,000 samples of size 50 randomly selected with re- Standard placement from the above list of 100 genders. Figure 6-18 very strongly suggests that deviation: the distribution is approaching the characteristic bell shape of a normal distribution. The results from Table 6-4 and Figure 6-18 therefore suggest the following: Frequency 50 Figure 6-17 40 30 100 Sample Proportions with 20 n ‫ ؍‬10 in Each Sample 10 0 0.0 0.1 0.2 00..43 0.5 Sample Proportion

274 Chapter 6 Normal Probability Distributions Figure 6-18 20%Relative Frequency 10,000 Sample Proportions 15%000000000000000...............2120021010213214466488268220000 with n ‫ ؍‬50 in Each Sample 10% 5% 0% Sample Proportion Properties of the Sampling Distribution of the Sample Proportion ● Sample proportions tend to target the value of the population propor- tion. (That is, all possible sample proportions have a mean equal to the population proportion.) ● Under certain conditions, the distribution of the sample proportion can be approximated by a normal distribution. Sampling Distribution of the Mean Now let’s consider the sampling distribution of the mean. Definition The sampling distribution of the mean is the distribution of sample means, with all samples having the same sample size n taken from the same popula- tion. (The sampling distribution of the mean is typically represented as a prob- ability distribution in the format of a table, probability histogram, or formula.) Again, instead of getting too abstract, we use a small population to illustrate the important properties of this distribution. EXAMPLE Sampling Distribution of the Mean A population con- sists of the values 1, 2, 5. Note that the mean of this population is m ϭ 8>3. a. List all of the possible samples (with replacement) of size n ϭ 2 along with the sample means and their individual probabilities. b. Find the mean of the sampling distribution. c. The population mean is 8>3. Do the sample means target the value of the population mean? SOLUTION a. In Table 6-5 we list the nine different possible samples of size n ϭ 2 taken with replacement from the population of 1, 2, 5. Table 6-5 also shows the sample means, and it includes their probabilities. (Because

6-4 Sampling Distributions and Estimators 275 there are 9 equally likely samples, each sample has probability 1>9.) Table 6-5 Table 6-6, which is simply a condensed version of Table 6-5, concisely Sampling Distribution represents the sampling distribution of the sample means. of x b. Table 6-6 is a probability distribution, so we can find its mean by using For- Sample Mean x Probability mula 5-1 from Section 5-2. We get the mean of 8>3 as follows: 1, 1 1.0 1>9 m 5 ©[x ? Psxd] 5 s1.0 ? 1>9d 1 s1.5 ? 2>9d 1 c 1 s5.0 ? 1>9d 1, 2 1.5 1>9 5 24>9 5 8>3 1, 5 3.0 1>9 2, 1 1.5 1>9 c. The mean of the sampling distribution of proportions is 8>3, and the popu- 2, 2 2.0 1>9 lation mean is also 8>3. Again, this is not a coincidence. In general, the dis- 2, 5 3.5 1>9 tribution of sample means will have a mean equal to the population mean. 5, 1 3.0 1>9 The sample means therefore tend to target the population mean instead of 5, 2 3.5 1>9 systematically being too low or too high. 5, 5 5.0 1>9 INTERPRETATION For the case of selecting two values (with replacement) — from the population of 1, 2, 5, we have identified the sampling distribution (Table 6-6). We also found that the mean of the sampling distribution is 8>3, Table 6-6 which is equal to the population mean. Sample means therefore tend to target Condensed Version the population mean. of Table 6-5 From the preceding example we see that the mean of all of the different possi- x Probability ble sample means is equal to the mean of the original population, which is m 5 8/3. We can generalize this as a property of sample means: For a fixed sam- 1.0 1>9 ple size, the mean of all possible sample means is equal to the mean of the popula- 1.5 2>9 tion. We will revisit this important property in the next section. 2.0 1>9 3.0 2>9 Let’s now make an obvious but important observation: Sample means vary. 3.5 2>9 See Table 6-5 and observe how the sample means are different. The first sample 5.0 1>9 mean is 1.0, the second sample mean is 1.5, and so on. This leads to the following definition. Definition The value of a statistic, such as the sample mean x, depends on the particular values included in the sample, and it generally varies from sample to sample. This variability of a statistic is called sampling variability. In Chapter 2 we introduced the important characteristics of a data set: center, variation, distribution, outliers, and pattern over time. In examining the samples in Table 6-5, we have already identified a property describing the behavior of sample means: The mean of sample means is equal to the mean of the population. This property addresses the characteristic of center, and we will investigate other char- acteristics in the next section. We will see that as the sample size increases, the sampling distribution of sample means tends to become a normal distribution. Consequently, the normal distribution assumes an importance that goes far be- yond the applications illustrated in Section 6-3. The normal distribution will be used for many cases in which we want to use a sample mean x for the purpose of making some inference about a population mean m.

276 Chapter 6 Normal Probability Distributions Which Statistics Make Good Estimators of Parameters? Chapter 7 will introduce formal methods for using sample statistics to make esti- mates of the values of population parameters. Some statistics work much better than others, and we can judge their value by examining their sampling distribu- tions, as in the following example. EXAMPLE Sampling Distributions A population consists of the values 1, 2, 5. If we randomly select samples of size 2 with replacement, there are nine different possible samples, and they are listed in Table 6-7. Because the nine different samples are equally likely, each sample has probability 1>9. a. For each sample, find the mean, median, range, variance, standard devi- ation, and the proportion of sample values that are odd. (For each statis- tic, this will generate nine values which, when associated with nine probabilities of 1>9 each, will combine to form a sampling distribution for the statistic.) b. For each statistic, find the mean of the results from part(a). Table 6-7 Sampling Distributions of Statistics (for Samples of Size 2 Drawn with Replacement from the Population 1, 2, 5) Standard Proportion Mean Variance Deviation of Odd Sample x Median Range s2 s Numbers Probability 1, 1 1.0 1.0 0 0.0 0.000 1 1>9 1, 2 1.5 1.5 1 0.5 0.707 0.5 1>9 1, 5 3.0 3.0 4 8.0 2.828 1 1>9 2, 1 1.5 1.5 1 0.5 0.707 0.5 1>9 2, 2 2.0 2.0 0 0.0 0.000 0 1>9 2, 5 3.5 3.5 3 4.5 2.121 0.5 1>9 5, 1 3.0 3.0 4 8.0 2.828 1 1>9 5, 2 3.5 3.5 3 4.5 2.121 0.5 1>9 5, 5 5.0 5.0 0 0.0 0.000 1 1>9 Mean of 8>3 8>3 16>9 26>9 1.3 2>3 Statistic Values Population 8>3 2 4 26>9 1.7 2>3 Parameter Does the Yes No No Yes No Yes sample statistic target the population parameter?

6-4 Sampling Distributions and Estimators 277 c. Compare the means from part (b) to the corresponding population parame- ters, then determine whether each statistic targets the value of the popula- tion parameter. For example, the sample means tend to center about the value of the population mean, which is 8>3, so the sample means target the value of the population mean. SOLUTION a. See Table 6-7. The individual statistics are listed for each sample. b. The means of the sample statistics are shown near the bottom of Table 6-7. The mean of the sample means is 8>3, the mean of the sample medians is 8>3, and so on. c. The bottom row of Table 6-7 is based on a comparison of the population parameter and results from the sample statistics. For example, the popula- tion mean of 1, 2, 5 is m ϭ 8>3, and the sample means “target” that value of 8>3 in the sense that the mean of the sample means is also 8>3. INTERPRETATION Based on the results in Table 6-7, we can see that when us- ing a sample statistic to estimate a population parameter, some statistics are good in the sense that they target the population parameter and are therefore likely to yield good results. Such statistics are called unbiased estimators. Other statistics are not so good (because they are biased estimators). Here is a summary. ● Statistics that target population parameters: Mean, Variance, Proportion ● Statistics that do not target population parameters: Median, Range, Standard Deviation Although the sample standard deviation does not target the population standard deviation, the bias is relatively small in large samples, so s is often used to esti- mate s. Consequently, means, proportions, variances, and standard deviations will all be considered as major topics in following chapters, but the median and range will rarely be used. Why sample with replacement? For small samples of the type that we have considered so far in this section, sampling without replacement would have the very practical advantage of avoiding wasteful duplication whenever the same item is selected more than once. However, we are particularly interested in sam- pling with replacement for these reasons: (1) When selecting a relatively small sample from a large population, it makes no significant difference whether we sample with replacement or without replacement. (2) Sampling with replacement results in independent events that are unaffected by previous outcomes, and inde- pendent events are easier to analyze and they result in simpler formulas. We therefore focus on the behavior of samples that are randomly selected with re- placement. Many of the statistical procedures discussed in the following chapters are based on the assumption that sampling is conducted with replacement. The key point of this section is to introduce the concept of a sampling distribution of a statistic. Consider the goal of trying to find the mean body tem- perature of all adults. Because that population is so large, it is not practical to measure the temperature of every adult. Instead, we obtain a sample of body temperatures and use it to estimate the population mean. Data Set 2 in Appendix B

278 Chapter 6 Normal Probability Distributions includes a sample of 106 such body temperatures, and the mean for that sample is x 5 98.208F. Conclusions that we make about the population mean temperature of all adults require that we understand the behavior of the sampling distribution of all such sample means. Even though it is not practical to obtain every possible sample and we are stuck with just one sample, we can form some very meaningful and important conclusions about the population of all body temperatures. A major goal of the following sections and chapters is to learn how we can effectively use a sample to form conclusions about a population. In Section 6-5 we consider more details about the sampling distribution of sample means, and in Section 6-6 we consider more details about the sampling distribution of sample proportions. 6-4 BASIC SKILLS AND CONCEPTS Statistical Literacy and Critical Thinking 1. Sampling Distribution Use your own words to answer this question: “What is a sam- pling distribution?” 2. Unbiased Estimator What does it mean when we say that sample means “target” the population mean, or that the sample mean is an unbiased estimator of the population mean? 3. Unbiased Estimators Which of the following statistics are unbiased estimators? a. Sample mean used to estimate a population mean b. Sample median used to estimate a population median c. Sample proportion used to estimate a population proportion d. Sample variance used to estimate a population variance e. Sample standard deviation used to estimate a population standard deviation f. Sample range used to estimate a population range 4. Sampling with Replacement Give at least one reason why statistical methods tend to be based on the assumption that sampling is conducted with replacement, instead of without replacement. 5. Survey of Voters Based on a random sample of n ϭ 400 voters, the NBC news divi- sion predicts that the Democratic candidate for the presidency will get 49% of the vote, but she actually gets 51%. Should we conclude that the survey was done incor- rectly? Why or why not? 6. Sampling Distribution of Body Temperatures Data Set 2 in Appendix B includes a sample of 106 body temperatures of adults. If we were to construct a histogram to de- pict the shape of the distribution of that sample, would that histogram show the shape of a sampling distribution of sample means? Why or why not? In Exercises 7–14, represent sampling distributions in the format of a table that lists the different values of the sample statistic along with their corresponding probabilities. 7. Phone Center The Nome Ice Company was in business for only three days (guess why). Here are the numbers of phone calls received on each of those days: 10, 6, 5. Assume that samples of size 2 are randomly selected with replacement from this pop- ulation of three values. a. List the 9 different possible samples and find the mean of each of them. b. Identify the probability of each sample and describe the sampling distribution of sample means (Hint: See Table 6-6.)

6-4 Sampling Distributions and Estimators 279 c. Find the mean of the sampling distribution. d. Is the mean of the sampling distribution [from part (c)] equal to the mean of the population of the three listed values? Are those means always equal? 8. Telemarketing Here are the numbers of sales per day that were made by Kim Ryan, a courteous telemarketer who worked four days before being fired: 1, 11, 9, 3. Assume that samples of size 2 are randomly selected with replacement from this population of four values. a. List the 16 different possible samples and find the mean of each of them. b. Identify the probability of each sample, then describe the sampling distribution of sample means. (Hint: See Table 6-6.) c. Find the mean of the sampling distribution. d. Is the mean of the sampling distribution (from part [c]) equal to the mean of the population of the four listed values? Are those means always equal? 9. Wealthiest People The assets (in billions of dollars) of the five wealthiest people in the United States are 47 (Bill Gates), 43 (Warren Buffet), 21 (Paul Allen), 20 (Alice Walton), and 20 (Helen Walton). Assume that samples of size 2 are randomly selected with replacement from this population of five values. a. After listing the 25 different possible samples and finding the mean of each sam- ple, use a table to describe the sampling distribution of the sample means. (Hint: See Table 6-6.) b. Find the mean of the sampling distribution. c. Is the mean of the sampling distribution [from part (b)] equal to the mean of the population of the five listed values? Are those means always equal? 10. Military Presidents Here is the population of all five U.S. presidents who had profes- sions in the military, along with their ages at inauguration: Eisenhower (62), Grant (46), Harrison (68), Taylor (64), and Washington (57). Assume that samples of size 2 are randomly selected with replacement from the population of five ages. a. After listing the 25 different possible samples and finding the mean of each sam- ple, use a table to describe the sampling distribution of the sample means. (Hint: See Table 6-6.) b. Find the mean of the sampling distribution. c. Is the mean of the sampling distribution [from part (b)] equal to the mean of the population of the five listed values? Are those means always equal? 11. Genetics A genetics experiment involves a population of fruit flies consisting of 1 male named Mike and 3 females named Anna, Barbara, and Chris. Assume that two fruit flies are randomly selected with replacement. a. After listing the 16 different possible samples, find the proportion of females in each sample, then use a table to describe the sampling distribution of the propor- tions of females.(Hint: See Table 6-3.) b. Find the mean of the sampling distribution. c. Is the mean of the sampling distribution [from part (b)] equal to the population proportion of females? Does the mean of the sampling distribution of proportions always equal the population proportion? 12. Quality Control After constructing a new manufacturing machine, 5 prototype car head- lights are produced and it is found that 2 are defective (D) and 3 are acceptable (A). Assume that two headlights are randomly selected with replacement from this population. a. After identifying the 25 different possible samples, find the proportion of defects in each of them, then use a table to describe the sampling distribution of the pro- portions of defects. (Hint: See Table 6-3). continued

280 Chapter 6 Normal Probability Distributions b. Find the mean of the sampling distribution. c. Is the mean of the sampling distribution [from part (b)] equal to the population proportion of defects? Does the mean of the sampling distribution of proportions always equal the population proportion? 13. Ranks of Olympic Triathlon Competitors U.S. women competed in the triathlon in the Olympic games held in Athens, and their final rankings were 3, 9, and 23. Assume that samples of size 2 are randomly selected with replacement. a. Use a table to describe the sampling distribution of the sample means. b. Given that the data consist of ranks, does it really make sense to identify the sam- pling distribution of the sample means? 14. Median and Moons of Jupiter Jupiter has 4 large moons and 12 small moons. The 4 large moons have these orbit times (in days): 1.8 (Io), 3.6 (Europa), 7.2 (Ganymede), and 16.7 (Callisto). Assume that two of these values are randomly selected with replacement. a. After identifying the 16 different possible samples, find the median in each of them, then use a table to describe the sampling distribution of the medians. b. Find the mean of the sampling distribution. c. Is the mean of the sampling distribution [from part (b)] equal to the population me- dian? Is the median an unbiased estimator of the population median? 6-4 BEYOND THE BASICS 15. Using a Formula to Describe a Sampling Distribution See the first example in this section, which includes a table and graph to describe the sampling distribution of the proportions of girls from two births. Consider the formula shown below, and evaluate that formula using sample proportions x of 0, 0.5, and 1. Based on the results, does the formula describe the sampling distribution? Why or why not? Psxd 5 2s2 2 1 where x 5 0, 0.5, 1 2xd!s2xd! 16. Mean Absolute Deviation The population of 1, 2, 5 was used to develop Table 6-7. Identify the sampling distribution of the mean absolute deviation (defined in Section 3-3), then determine whether the mean absolute deviation of a sample is a good statis- tic for estimating the mean absolute deviation of the population. 6-5 The Central Limit Theorem Key Concept Section 6-4 included some discussion of the sampling distribution of x, and this section describes procedures for using that sampling distribution in very real and practical applications. The procedures of this section form the foundation for estimating population parameters and hypothesis testing—topics discussed at length in the following chapters. When selecting a simple random sample from a population with mean m and standard deviation s, it is essential to know these principles: 1. If n Ͼ 30, then the sample means have a distribution that can be approximated by a normal distribution with mean m and standard deviation s> !n. (This guide- line is commonly used, regardless of the distribution of the original population.) 2. If n # 30 and the original population has a normal distribution, then the sample means have a normal distribution with mean m and standard deviation s> !n.

6-5 The Central Limit Theorem 281 3. If n # 30 but the original population does not have a normal distribution, then the methods of this section do not apply. Try to keep this big picture in mind: As we sample from a population, we want to know the behavior of the sample means. The central limit theorem tells us that if the sample size is large enough, the distribution of sample means can be approxi- mated by a normal distribution, even if the original population is not normally distributed. Although we discuss a “theorem,” we do not include rigorous proofs. Instead, we focus on the concepts and how to apply them. Here are the key points that form an important foundation for the following chapters. The Central Limit Theorem and the Sampling Distribution of x Given: 1. The random variable x has a distribution (which may or may not be normal) with mean m and standard deviation s. 2. Simple random samples all of the same size n are selected from the population. (The samples are selected so that all possible samples of size n have the same chance of being selected.) Conclusions: 1. The distribution of sample means x will, as the sample size increases, approach a normal distribution. 2. The mean of all sample means is the population mean m. (That is, the normal distribution from Conclusion 1 has mean m.) 3. The standard deviation of all sample means is s/!n. (That is, the normal distri- bution from Conclusion 1 has standard deviation s/!n.) Practical Rules Commonly Used 1. If the original population is not itself normally distributed, here is a com- mon guideline: For samples of size n greater than 30, the distribution of the sample means can be approximated reasonably well by a normal distribu- tion. (There are exceptions, such as populations with very nonnormal dis- tributions requiring sample sizes larger than 30, but such exceptions are relatively rare.) The approximation gets better as the sample size n be- comes larger. 2. If the original population is itself normally distributed, then the sample means will be normally distributed for any sample size n (not just the values of n larger than 30). The central limit theorem involves two different distributions: the distribution of the original population and the distribution of the sample means. As in previous chapters, we use the symbols m and s to denote the mean and standard deviation of the original population, but we use the following new notation for the mean and standard deviation of the distribution of sample means.

282 Chapter 6 Normal Probability Distributions Notation for the Sampling Distribution of x If all possible random samples of size n are selected from a population with mean m and standard deviation s, the mean of the sample means is denoted by mx, so mx 5 m Also, the standard deviation of the sample means is denoted by sx, so s sx 5 !n sx, is often called the standard error of the mean. Minitab EXAMPLE Simulation with Random Digits Computers are often used to randomly generate digits of telephone numbers to be called for polling purposes. (For example, the Pew Research Center randomly generates the last two digits of telephone numbers so that a “listing bias” can be avoided.) The digits 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 are generated in such a way that they are all equally likely. The accompanying Minitab display shows the histogram of 500,000 generated digits. Observe that the distribution appears to be a uniform distribution, as we expect. Now group the 500,000 digits into 5000 samples, with each sample having n ϭ 100 values. We find the mean for each sample and show the histogram of the 5000 sample means. See this absolutely astounding effect: Even though the orig- inal 500,000 digits have a uniform distribution, the distribution of the 5000 sample means is approximately a normal distribution! It’s a truly fascinating and intriguing phenomenon in statistics that by sampling from any distribution, we can create a distribution of sample means that is normal or at least approxi- mately normal. Minitab Applying the Central Limit Theorem Many important and practical problems can be solved with the central limit theo- rem. When working with such problems, remember that if the sample size is greater than 30, or if the original population is normally distributed, treat the

6-5 The Central Limit Theorem 283 distribution of sample means as if it were a normal distribution with mean m and standard deviation s/!n. In the following example, part (a) involves an individual value, but part (b) in- volves the mean for a sample of 20 men, so we must use the central limit theorem in working with the random variable x. Study this example carefully to understand the significant difference between the procedures used in parts (a) and (b). See how this example illustrates the following working procedure: • When working with an individual value from a normally distributed population, use the methods of Section 6-3. Use z 5 x2m . s • When working with a mean for some sample (or group), be sure to use the value of s/!n for the standard deviation of the sample means. Use x2m z5 . s !n EXAMPLE Water Taxi Safety In the Chapter Problem we noted that some passengers died when a water taxi sank in Balti- more’s Inner Harbor. Men are typically heaver than women and chil- dren, so when loading a water taxi, let’s assume a worst-case scenario in which all passengers are men. Based on data from the National Health and Nutrition Examination Survey, assume that weights of men are normally distributed with a mean of 172 lb and a standard deviation of 29 lb. a. Find the probability that if an individual man is randomly selected, his weight will be greater than 175 lb. b. Find the probability that 20 randomly selected men will have a mean that is greater than 175 lb (so that their total weight exceeds the safe capacity of 3500 lb). SOLUTION a. Approach: Use the methods presented in Section 6-3 (because we are deal- ing with an individual value from a normally distributed population). We seek the area of the green-shaded region in Figure 6-19(a). If using Table A-2, we convert the weight of 175 to the corresponding z score: x 2 m 175 2 172 z5 5 5 0.10 s 29 Refer to Table A-2 using z ϭ 0.10 and find that the cumulative area to the left of 175 lb is 0.5398. The green-shaded region is therefore 1 Ϫ 0.5398 ϭ 0.4602. The probability of a randomly selected man weighing more than 175 lb is 0.4602. (If using a calculator or software instead of Table A-2, the more accurate result is 0.4588 instead of 0.4602.) continued

284 Chapter 6 Normal Probability Distributions b. Approach: Use the central limit theorem (because we are dealing with the mean for a sample of 20 men, not an individual man). Even though the sample size is not greater than 30, we use a normal distribution for this rea- son: The original population of men has a normal distribution, so samples of any size will yield means that are normally distributed. Because we are now dealing with a distribution of sample means, we must use the parame- ters mx and sx, which are evaluated as follows: mx 5 m 5 172 sx 5 s 5 29 5 6.4845971 !n !20 Here is a really important point: We must use the computed standard devi- ation of 6.4845971, not the original standard deviation of 29 (because we are working with the distribution of sample means for which the standard deviation is 6.4845971, not the distribution of individual weights for which the standard deviation is 29). We want to find the green-shaded area shown in Figure 6-19(b). If using Table A-2, we find the relevant z score, which is calculated as follows: x 2 mx 175 2 172 3 z 5 sx 5 5 5 0.46 29 6.4845971 !20 Referring to Table A-2, we find that z ϭ 0.46 corresponds to a cumulative left area of 0.6772, so the green-shaded region is 1 Ϫ 0.6772 ϭ 0.3228. The probability that the 20 men have a mean weight greater than 175 lb is 0.3228. (If using a calculator or software, the result is 0.3218 instead of 0.3228.) INTERPRETATION There is a 0.4602 probability that an individual man will weigh more than 175 lb, and there is a 0.3228 probability that 20 men will 0.5398 0.4602 0.6772 0.3228 m ϭ 172 x ϭ 175 smx ϭ 172 x ϭ 175 (s ϭ 29) ͙n 29 (sx ϭ ϭ ͙20 ϭ 6.4845971) (a) (b) Figure 6-19 Men’s Weights (a) Distribution of Individual Men’s Weights; (b) Distribution of Sample Means

6-5 The Central Limit Theorem 285 have a mean weight of more than 175 lb. Given that the safe capacity of the water taxi is 3500 lb, there is a fairly good chance (with probability 0.3228) that it will be overweight if is filled with 20 randomly selected men. Given that 21 people have already died, and given the high chance of overloading, it would be wise to limit the number of passengers to some level below 20. The capacity of 20 passengers is just not safe enough. The calculations used here are exactly the type of calculations used by en- gineers when they design ski lifts, elevators, escalators, airplanes, and other devices that carry people. Interpreting Results The Placebo Effect The next example illustrates another application of the central limit theorem, but It has long been believed that carefully examine the conclusion that is reached. This example shows the type of placebos actually help some thinking that is the basis for the important procedure of hypothesis testing (dis- patients. In fact, some formal cussed in Chapter 8). This example illustrates the rare event rule for inferential studies have shown that when statistics, first presented in Section 4-1. given a placebo (a treatment with no medicinal value), Rare Event Rule many test subjects show some improvement. Estimates of im- If, under a given assumption, the probability of a particular observed event is provement rates have typically exceptionally small, we conclude that the assumption is probably not correct. ranged between one-third and two-thirds of the patients. EXAMPLE Body Temperatures Assume that the population of human However, a more recent study body temperatures has a mean of 98.6ЊF, as is commonly believed. Also as- suggests that placebos have no sume that the population standard deviation is 0.62ЊF (based on data from Uni- real effect. An article in the versity of Maryland researchers). If a sample of size n ϭ 106 is randomly se- New England Journal of lected, find the probability of getting a mean of 98.2ЊF or lower. (The value of Medicine (Vol. 334, No. 21) 98.2ЊF was actually obtained; see the midnight temperatures for Day 2 in Data was based on research of 114 Set 2 of Appendix B.) medical studies over 50 years. The authors of the article con- SOLUTION cluded that placebos appear to have some effect only for re- We weren’t given the distribution of the population, but because the sample lieving pain, but not for other size n ϭ 106 exceeds 30, we use the central limit theorem and conclude that physical conditions. They con- the distribution of sample means is a normal distribution with these param- cluded that apart from clinical eters: trials, the use of placebos “can- not be recommended.” mx 5 m 5 98.6 (by assumption) sx 5 s 5 0.62 5 0.0602197 !n !106 Figure 6-20 shows the shaded area (see the tiny left tail of the graph) corre- sponding to the probability we seek. Having already found the parameters that apply to the distribution shown in Figure 6-20, we can now find the shaded continued

286 Chapter 6 Normal Probability Distributions 0.0001 x ϭ 98. 2 mx ϭ 98.6 z Ϫ6.64 0 The Fuzzy Central Figure 6-20 Limit Theorem Distribution of Mean Body Temperatures for Samples of In The Cartoon Guide to Size n ‫ ؍‬106 Statistics, by Gonick and Smith, the authors describe the area by using the same procedures developed in Section 6-3. Using Table A-2, Fuzzy Central Limit Theorem we first find the z score: as follows: “Data that are influ- enced by many small and unre- z5 x 2 mx 98.20 2 98.6 lated random effects are ap- 5 5 26.64 proximately normally sx 0.0602197 distributed. This explains why the normal is everywhere: Referring to Table A-2, we find that z ϭ Ϫ6.64 is off the chart, but for values stock market fluctuations, stu- dent weights, yearly tempera- of z below Ϫ3.49, we use an area of 0.0001 for the cumulative left area up to ture averages, SAT scores: All are the result of many different z ϭ Ϫ3.49. We therefore conclude that the shaded region in Figure 6-20 is effects.” People’s heights, for example, are the results of 0.0001. (If using a TI-83>84 Plus calculator or software, the area of the shaded hereditary factors, environmen- tal factors, nutrition, health region is closer to 0.00000000002, but even those results are only approxima- care, geographic region, and other influences which, when tions. We can safely report that the probability is quite small, such as less than combined, produce normally distributed values. 0.001.) INTERPRETATION The result shows that if the mean of our body tempera- tures is really 98.6ЊF, then there is an extremely small probability of getting a sample mean of 98.2ЊF or lower when 106 subjects are randomly selected. University of Maryland researchers did obtain such a sample mean, and there are two possible explanations: Either the population mean really is 98.6ЊF and their sample represents a chance event that is extremely rare, or the population mean is actually lower than 98.6ЊF and so their sample is typical. Because the probability is so low, it seems more reasonable to conclude that the population mean is lower than 98.6ЊF. This is the type of reasoning used in hypothesis test- ing, to be introduced in Chapter 8. For now, we should focus on the use of the central limit theorem for finding the probability of 0.0001, but we should also observe that this theorem will be used later in developing some very important concepts in statistics. Correction for a Finite Population In applying the central limit theorem, our use of sx 5 s> !n assumes that the population has infinitely many members. When we sample with replacement (that is, put back each selected item before making the next selection), the population is effectively infinite. Yet many realistic applications involve sampling without re- placement, so successive samples depend on previous outcomes. In manufactur- ing, quality-control inspectors typically sample items from a finite production run

6-5 The Central Limit Theorem 287 without replacing them. For such a finite population, we may need to adjust sx. Here is a common rule of thumb: When sampling without replacement and the sample size n is greater than 5% of the finite population size N (that is, n Ͼ 0.05N), adjust the standard deviation of sample means sx by multiplying it by the finite population correction factor: N2n ÅN 2 1 Except for Exercises 22 and 23, the examples and exercises in this section assume that the finite population correction factor does not apply, because we are sam- pling with replacement or the population is infinite or the sample size doesn’t exceed 5% of the population size. The central limit theorem is so important because it allows us to use the basic normal distribution methods in a wide variety of different circumstances. In Chap- ter 7, for example, we will apply the theorem when we use sample data to estimate means of populations. In Chapter 8 we will apply it when we use sample data to test claims made about population means. Such applications of estimating popula- tion parameters and testing claims are extremely important uses of statistics, and the central limit theorem makes them possible. 6-5 BASIC SKILLS AND CONCEPTS Statistical Literacy and Critical Thinking 1. Standard Error of the Mean What is the standard error of the mean? 2. Small Sample If selecting samples of size n ϭ 2 from a population with a known mean and standard deviation, what requirement must be satisfied in order to assume that the distribution of the sample means is a normal distribution? 3. Notation Large (n Ͼ 30) samples are randomly selected from a population with mean m and standard deviation s. What notation is used for the mean of the population con- sisting of all sample means? What notation is used for the standard deviation of the population consisting of all sample means? 4. Convenience Sample Because a statistics student waited until the last minute to do a project, she has only enough time to collect heights from female friends and female relatives. She then calculates the mean height of the females in her sample. Assuming that females have heights that are normally distributed with a mean of 63.6 in. and a standard deviation of 2.5 in., can she use the central limit theorem when analyzing the mean height of her sample? Using the Central Limit Theorem. In Exercises 5–8, assume that women’s heights are normally distributed with a mean given by m ϭ 63.6 in. and a standard deviation given by s ϭ 2.5 in. (based on data from the National Health Survey). 5. a. If 1 woman is randomly selected, find the probability that her height is less than 64 in. b. If 36 women are randomly selected, find the probability that they have a mean height less than 64 in.

288 Chapter 6 Normal Probability Distributions 6. a. If 1 woman is randomly selected, find the probability that her height is greater than 63 in. b. If 100 women are randomly selected, find the probability that they have a mean height greater than 63 in. 7. a. If 1 woman is randomly selected, find the probability that her height is between 63.5 in. and 64.5 in. b. If 9 women are randomly selected, find the probability that they have a mean height between 63.5 in. and 64.5 in. c. Why can the central limit theorem be used in part (b), even though the sample size does not exceed 30? 8. a. If 1 woman is randomly selected, find the probability that her height is between 60 in. and 65 in. b. If 16 women are randomly selected, find the probability that they have a mean height between 60 in. and 65 in. c. Why can the central limit theorem be used in part (b), even though the sample size does not exceed 30? 9. Gondola Safety A ski gondola in Vail, Colorado, carries skiers to the top of a moun- tain. It bears a plaque stating that the maximum capacity is 12 people or 2004 pounds. That capacity will be exceeded if 12 people have weights with a mean greater than 2004>12 ϭ 167 pounds. Because men tend to weigh more than women, a “worst case” scenario involves 12 passengers who are all men. Men have weights that are normally distributed with a mean of 172 lb and a standard deviation of 29 lb (based on data from the National Health Survey). a. Find the probability that if an individual man is randomly selected, his weight will be greater than 167 pounds. b. Find the probability that 12 randomly selected men will have a mean that is greater than 167 pounds (so that their total weight is greater than the gondola maximum capacity of 2004 lb). c. Does the gondola appear to have the correct weight limit? Why or why not? 10. Casino Buses The new Lucky Lady Casino wants to increase revenue by providing buses that can transport gamblers from other cities. Research shows that these gam- blers tend to be older, they tend to play slot machines only, and they have losses with a mean of $182 and a standard deviation of $105. The buses carry 35 gamblers per trip. The casino gives each bus passenger $50 worth of vouchers that can be con- verted to cash, so the casino needs to recover that cost in order to make a profit. Find the probability that if a bus is filled with 35 passengers, the mean amount lost by a passenger will exceed $50. Based on the result, does the casino gamble when it pro- vides such buses? 11. Amounts of Coke Assume that cans of Coke are filled so that the actual amounts have a mean of 12.00 oz and a standard deviation of 0.11 oz. a. Find the probability that a sample of 36 cans will have a mean amount of at least 12.19 oz, as in Data Set 12 in Appendix B. b. Based on the result from part(a), is it reasonable to believe that the cans are actu- ally filled with a mean of 12.00 oz? If the mean is not 12.00 oz, are consumers be- ing cheated? 12. Coaching for the SAT Scores for men on the verbal portion of the SAT-I test are nor- mally distributed with a mean of 509 and a standard deviation of 112 (based on data

6-5 The Central Limit Theorem 289 from the College Board). Randomly selected men are given the Columbia Review Course before taking the SAT test. Assume that the course has no effect. a. If 1 of the men is randomly selected, find the probability that his score is at least 590. b. If 16 of the men are randomly selected, find the probability that their mean score is at least 590. c. In finding the probability for part(b), why can the central limit theorem be used even though the sample size does not exceed 30? d. If the random sample of 16 men does result in a mean score of 590, is there strong evidence to support the claim that the course is actually effective? Why or why not? 13. Designing Strobe Lights An aircraft strobe light is designed so that the times between flashes are normally distributed with a mean of 3.00 s and a standard deviation of 0.40 s. a. Find the probability that an individual time is greater than 4.00 s. b. Find the probability that the mean for 60 randomly selected times is greater than 4.00 s. c. Given that the strobe light is intended to help other pilots see an aircraft, which re- sult is more relevant for assessing the safety of the strobe light: The result from part(a) or part(b)? Why? 14. Designing Motorcycle Helmets Engineers must consider the breadths of male heads when designing motorcycle helmets. Men have head breadths that are normally dis- tributed with a mean of 6.0 in. and a standard deviation of 1.0 in. (based on anthropo- metric survey data from Gordon, Churchill, et al.). a. If one male is randomly selected, find the probability that his head breadth is less than 6.2 in. b. The Safeguard Helmet company plans an initial production run of 100 helmets. Find the probability that 100 randomly selected men have a mean head breadth less than 6.2 in. c. The production manager sees the result from part(b) and reasons that all helmets should be made for men with head breadths less than 6.2 in., because they would fit all but a few men. What is wrong with that reasoning? 15. Blood Pressure For women aged 18–24, systolic blood pressures (in mm Hg) are nor- mally distributed with a mean of 114.8 and a standard deviation of 13.1 (based on data from the National Health Survey). Hypertension is commonly defined as a sys- tolic blood pressure above 140. a. If a woman between the ages of 18 and 24 is randomly selected, find the probabil- ity that her systolic blood pressure is greater than 140. b. If 4 women in that age bracket are randomly selected, find the probability that their mean systolic blood pressure is greater than 140. c. Given that part(b) involves a sample size that is not larger than 30, why can the central limit theorem be used? d. If a physician is given a report stating that 4 women have a mean systolic blood pressure below 140, can she conclude that none of the women have hypertension (with a blood pressure greater than 140)? 16. Staying Out of Hot Water In planning for hot water requirements, the manager of the Luxurion Hotel finds that guests spend a mean of 11.4 min each day in the shower (based on data from the Opinion Research Corporation). Assume that the shower times are normally distributed with a standard deviation of 2.6 min. a. Find the percentage of guests who shower more than 12 min. continued

290 Chapter 6 Normal Probability Distributions b. The hotel has installed a system that can provide enough hot water provided that the mean shower time for 84 guests is less than 12 min. If the hotel currently has 84 guests, find the probability that there will not be enough hot water. Does the current system appear to be effective? 17. Redesign of Ejection Seats When women were allowed to become pilots of fighter jets, engineers needed to redesign the ejection seats because they had been designed for men only. The ACES-II ejection seats were designed for men weighing between 140 lb and 211 lb. The population of women has normally distributed weights with a mean of 143 lb and a standard deviation of 29 lb (based on data from the National Health Survey). a. If 1 woman is randomly selected, find the probability that her weight is between 140 lb and 211 lb. b. If 36 different women are randomly selected, find the probability that their mean weight is between 140 lb and 211 lb. c. When redesigning the fighter jet ejection seats to better accommodate women, which probability is more relevant: The result from part (a) or the result from part (b)? Why? 18. Labeling of M&M Packages M&M plain candies have a mean weight of 0.8565 g and a standard deviation of 0.0518 g (based on Data Set 13 in Appendix B). The M&M candies used in Data Set 13 came from a package containing 465 candies, and the package label stated that the net weight is 396.9 g. (If every package has 465 can- dies, the mean weight of the candies must exceed 396.9>465 ϭ 0.8535 for the net contents to weigh at least 396.9 g.) a. If 1 M&M plain candy is randomly selected, find the probability that it weighs more than 0.8535 g. b. If 465 M&M plain candies are randomly selected, find the probability that their mean weight is at least 0.8535 g. c. Given these results, does it seem that the Mars Company is providing M&M con- sumers with the amount claimed on the label? 19. Vending Machines Currently, quarters have weights that are normally distributed with a mean of 5.670 g and a standard deviation of 0.062 g. A vending machine is configured to accept only those quarters with weights between 5.550 g and 5.790 g. a. If 280 different quarters are inserted into the vending machine, what is the ex- pected number of rejected quarters? b. If 280 different quarters are inserted into the vending machine, what is the proba- bility that the mean falls between the limits of 5.550 g and 5.790 g? c. If you own the vending machine, which result would concern you more? The result from part (a) or the result from part (b)? Why? 20. Aircraft Safety Standards Under older Federal Aviation Administration rules, airlines had to estimate the weight of a passenger as 185 pounds. (That amount is for an adult traveling in winter, and it includes 20 pounds of carry-on baggage.) Current rules re- quire an estimate of 195 pounds. Men have weights that are normally distributed with a mean of 172 pounds and a standard deviation of 29 pounds. a. If 1 adult male is randomly selected and is assumed to have 20 pounds of carry-on baggage, find the probability that his total is greater than 195 pounds. b. If a Boeing 767-300 aircraft is full of 213 adult male passengers and each is as- sumed to have 20 pounds of carry-on baggage, find the probability that the mean passenger weight (including carry-on baggage) is greater than 195 pounds. Does a pilot have to be concerned about exceeding this weight limit?

6-6 Normal as Approximation to Binomial 291 6-5 BEYOND THE BASICS 21. Seating Design You need to build a bench that will seat 18 male college football play- ers, and you must first determine the length of the bench. Men have hip breadths that are normally distributed with a mean of 14.4 in. and a standard deviation of 1.0 in. a. What is the minimum length of the bench if you want a 0.975 probability that it will fit the combined hip breadths of 18 randomly selected men? b. What would be wrong with actually using the result from part(a) as the bench length? 22. Correcting for a Finite Population The Boston Women’s Club needs an elevator lim- ited to 8 passengers. The club has 120 women members with weights that approxi- mate a normal distribution with a mean of 143 lb and a standard deviation of 29 lb. (Hint: See the discussion of the finite population correction factor.) a. If 8 different members are randomly selected, find the probability that their total weight will not exceed the maximum capacity of 1300 lb. b. If we want a 0.99 probability that the elevator will not be overloaded whenever 8 members are randomly selected as passengers, what should be the maximum al- lowable weight? 23. Population Parameters A population consists of these values: 2, 3, 6, 8, 11, 18. a. Find m and s. b. List all samples of size n ϭ 2 that can be obtained without replacement. c. Find the population of all values of x by finding the mean of each sample from part(b). d. Find the mean mx and standard deviation sx for the population of sample means found in part(c). e. Verify that s N2n mx 5 m and sx 5 !n Å N 2 1 Normal as Approximation 6-6 to Binomial Key Concept This section presents a method for using a normal distribution as an approximation to a binomial probability distribution. If the conditions np $ 5 and nq $ 5 are both satisfied, then probabilities from a binomial probability dis- tribution can be approximated reasonably well by using a normal distribution with mean m ϭ np and standard deviation s ϭ !npq. Because a binomial probability distribution typically uses only whole numbers for the random variable x, while the normal approximation is continuous, we must use a “continuity correction” with a whole number x represented by the interval from x Ϫ 0.5 to x ϩ 0.5. Important note: Instead of using a normal distribution as an approximation to a bi- nomial probability distribution, most practical applications of the binomial distri- bution can be handled with software or a calculator, but this section introduces the important principle that a binomial distribution can be approximated by a normal distribution, and that principle will be used in later chapters. Consider the loading of an American Airlines Boeing 767-300, which carries 213 passengers. In analyzing the safe load that can be carried, we must consider

292 Chapter 6 Normal Probability Distributions Voltaire Beats the weight of the passengers. We know that a typical man weighs about 30 lb Lottery more than a typical woman, so the number of male passengers becomes an impor- tant issue. We can use the binomial probability distribution with n ϭ 213, p ϭ 0.5, In 1729, the philosopher and q ϭ 0.5 (assuming that men and women are equally likely). See the accompa- Voltaire became rich by devis- nying Minitab display showing a graph of the probability for each number of male ing a scheme to beat the Paris passengers from 0 through 213, and notice how the graph appears to be a normal lottery. The government ran a distribution, even though the plotted points are from a binomial distribution. This lottery to repay municipal graph suggests that we can use a normal distribution to approximate the binomial bonds that had lost some value. distribution. The city added large amounts of money with the net effect Minitab that the prize values totaled more than the cost of all tick- Normal Distribution as Approximation ets. Voltaire formed a group to Binomial Distribution that bought all the tickets in the monthly lottery and won for If a binomial probability distribution satisfies the requirements that np $ 5 and more than a year. A bettor in nq $ 5, then that binomial probability distribution can be approximated by a the New York State Lottery normal distribution with mean m ϭ np and standard deviation s 5 !npq, and tried to win a share of an ex- with the discrete whole number x adjusted with a continuity correction, so that x ceptionally large prize that is represented by the interval from x Ϫ 0.5 to x ϩ 0.5. grew from a lack of previous winners. He wanted to write a When using a normal approximation to a binomial distribution, follow this proce- $6,135,756 check that would dure: cover all combinations, but the state declined and said that the Procedure for Using a Normal Distribution to Approximate nature of the lottery would a Binomial Distribution have been changed. 1. Establish that the normal distribution is a suitable approximation to the bino- mial distribution by checking the requirement that np $ 5 and nq $ 5. (If these conditions are not both satisfied, then you must use software, or a calcu- lator, or Table A-1, or calculations using the binomial probability formula.) 2. Find the values of the parameters m and s by calculating m ϭ np and s 5 !npq. 3. Identify the discrete value x (the number of successes). Change the discrete value x by replacing it with the interval from x Ϫ 0.5 to x ϩ 0.5. (For further

6-6 Normal as Approximation to Binomial 293 clarification, see the discussion under the subheading “Continuity Correc- tions” found later in this section.) Draw a normal curve and enter the values of m, s, and either x Ϫ 0.5 or x ϩ 0.5, as appropriate. 4. Change x by replacing it with x Ϫ 0.5 or x ϩ 0.5, as appropriate. 5. Using x Ϫ 0.5 or x ϩ 0.5 (as appropriate) in place of x, find the area corre- sponding to the desired probability by first finding the z score: z 5 sx 2 md/s. Now use that z score to find the area to the left of the adjusted value of x. That area can now be used to identify the area corresponding to the desired probability. We will illustrate this normal approximation procedure with the following example. EXAMPLE Passenger Load on a Boeing 767-300 An American Air- Is Parachuting Safe? lines Boeing 767-300 aircraft has 213 seats. When fully loaded with passengers, baggage, cargo, and fuel, the pilot must verify that the gross weight is below the About 30 people die each year maximum allowable limit, and the weight must be properly distributed so that the as more than 100,000 people balance of the aircraft is within safe acceptable limits. Instead of weighing each make about 2.25 million passenger, their weights are estimated according to Federal Aviation Administra- parachute jumps. In compari- tion rules. In reality, we know that men have a mean weight of 172 pounds, son, a typical year includes whereas women have a mean weight of 143 pounds, so disproportionately more about 200 scuba diving fatali- male passengers might result in an unsafe overweight situation. Assume that if ties, 7000 drownings, 900 bicy- there are at least 122 men in a roster of 213 passengers, the load must be some- cle deaths, 800 lightning how adjusted. Assuming that passengers are booked randomly, male passengers deaths, and 1150 deaths from and female passengers are equally likely, and the aircraft is full of adults, find the bee stings. Of course, these fig- probability that a Boeing 767-300 with 213 passengers has at least 122 men. ures don’t necessarily mean that parachuting is safer than SOLUTION The given problem does involve a binomial distribution with a bike riding or swimming. A fair fixed number of trials (n ϭ 213), which are presumably independent, two cate- comparison should involve fa- gories (man, woman) of outcome for each trial, and a probability of a male tality rates, not just the total (p ϭ 0.5) that presumably remains constant from trial to trial. Calculations number of deaths. with the binomial probability formula are not practical, because we would The author, with much trepida- have to apply it 92 times (once for each value of x from 122 to 213 inclusive). tion, made two parachute Instead, we proceed with the five-step approach of using a normal distribution jumps but quit after missing to approximate the binomial distribution. the spacious drop zone both times. He has also flown in a Step 1: Requirement check: We must first verify that it is reasonable to ap- hang glider, hot air balloon, ul- proximate the binomial distribution by the normal distribution be- tralight, sailplane, and cause np $ 5 and nq $ 5. With n ϭ 213, p ϭ 0.5, and q ϭ 1 Ϫ p ϭ Goodyear blimp. 0.5, we verify the required conditions as follows: np 5 213 ? 0.5 5 106.5 sTherefore np $ 5.d nq 5 213 ? 0.5 5 106.5 sTherefore nq $ 5.d Step 2: We now proceed to find the values for m and s that are needed for the normal distribution. We get the following: m 5 np 5 213 ? 0.5 5 106.5 s 5 !npq 5 !213 ? 0.5 ? 0.5 5 7.2972598 continued

294 Chapter 6 Normal Probability Distributions This interval represents 122 men. } m ϭ 106.5 121.5 122.5 zϭ0 122 z scale z ϭ 2.06 Figure 6-21 Finding the Probability of “At Least 122 Men” Among 213 Passengers Step 3: We want the probability of “at least 122 males,” and the discrete Step 4: value of 122 is adjusted by using the continuity correction as follows: Step 5: Represent x ϭ 122 by the vertical strip bounded by 121.5 and 122.5. Because we want the probability of at least 122 men, we want the area representing the discrete whole number of 122 (the region bounded by 121.5 and 122.5), as well as the area to the right, as shown in Figure 6-21. We can now proceed to find the shaded area of Figure 6-21 by using the same methods used in Section 6-3. If we use Table A-2 for the standard normal distribution, we must first convert 121.5 to a z score, then use the table to find the area to the left of 121.5, which is then subtracted from 1. The z score is found as follows: x 2 m 121.5 2 106.5 5 2.06 z5 5 s 7.2972598 Using Table A-2, we find that z ϭ 2.06 corresponds to an area of 0.9803, so the shaded region is 1 Ϫ 0.9803 ϭ 0.0197. If using a TI-83>84 Plus calculator or software, we get a more accurate result of 0.0199. INTERPRETATION There is a 0.0197 probability of getting at least 122 men among 213 passengers. Because that probability is so small, we know that a roster of 213 passengers will rarely include at least 122 men, so the adjust- ments to the aircraft loading will not have to be made very often. Continuity Corrections The procedure for using a normal distribution to approximate a binomial distribu- tion includes an adjustment in which we change a discrete whole number to an in- terval that is 0.5 below and 0.5 above. This particular step, called a continuity cor- rection, is usually difficult to understand, so we now consider it in more detail.

6-6 Normal as Approximation to Binomial 295 Definition When we use the normal distribution (which is a continuous probability distribution) as an approximation to the binomial distribution (which is discrete), a continuity correction is made to a discrete whole number x in the binomial distribution by representing the single value x by the interval from x Ϫ 0.5 to x ϩ 0.5 (that is, adding and subtracting 0.5). The following practical suggestions should help you use continuity correc- tions properly. Procedure for Continuity Corrections At least 122 1. When using the normal distribution as an approximation to the binomial dis- tribution, always use the continuity correction. (It is required because we are 121.5 using the continuous normal distribution to approximate the discrete binomial distribution.) More than 122 2. In using the continuity correction, first identify the discrete whole number x that is relevant to the binomial probability problem. For example, if you’re 122.5 trying to find the probability of getting at least 122 men among 213 randomly selected people, the discrete whole number of concern is x ϭ 122. First focus At most on the x value itself, and temporarily ignore whether you want at least x, more 122 than x, fewer than x, or whatever. 122.5 3. Draw a normal distribution centered about m, then draw a vertical strip area centered over x. Mark the left side of the strip with the number equal to x Ϫ Fewer 0.5, and mark the right side with the number equal to x ϩ 0.5. With x ϭ 122, than 122 for example, draw a strip from 121.5 to 122.5. Consider the entire area of the entire strip to represent the probability of the discrete whole number x itself. 121.5 4. Now determine whether the value of x itself should be included in the proba- Exactly 122 bility you want. (For example, “at least x” does include x itself, but “more than x” does not include x itself.) Next, determine whether you want the prob- ability of at least x, at most x, more than x, fewer than x, or exactly x. Shade the area to the right or left of the strip, as appropriate; also shade the interior of the strip itself if and only if x itself is to be included. This total shaded re- gion corresponds to the probability being sought. To see how this procedure results in continuity corrections, see the common cases illustrated in Figure 6-22. Those cases correspond to the statements in the follow- ing list. Statement Area 121.5 122.5 At least 122 (includes 122 and above) To the right of 121.5 Figure 6-22 More than 122 (doesn’t include 122) To the right of 122.5 Using Continuity Corrections At most 122 (includes 122 and below) To the left of 122.5 Fewer than 122 (doesn’t include 122) To the left of 121.5 Exactly 122 Between 121.5 and 122.5

296 Chapter 6 Normal Probability Distributions EXAMPLE Internet Use A recent survey showed that among 2013 ran- domly selected adults, 1358 (or 67.5%) stated that they are Internet users (based on data from Pew Research Center). If the proportion of all adults using the Internet is actually 2>3, find the probability that a random sample of 2013 adults will result in exactly 1358 Internet users. SOLUTION We have n ϭ 2013 independent survey subjects, x ϭ 1358 of them are Internet users, and we assume that the population proportion is p ϭ 2>3, so it follows that q ϭ 1>3. We will use a normal distribution to approxi- mate the binomial distribution. Step 1: We begin by checking the requirements to determine whether the normal approximation is suitable: np 5 2013 ? 2>3 5 1342 sTherefore np $ 5.d nq 5 2013 ? 1>3 5 671 sTherefore nq $ 5.d Step 2: We now proceed to find the values for m and s that are needed for the normal distribution. We get the following: m 5 np 5 2013 ? 2>3 5 1342 s 5 !npq 5 !2013 ? s2>3d ? s1>3d 5 21.150256 Step 3: We draw the normal curve shown in Figure 6-23. The shaded region Step 4: of the figure represents the probability of getting exactly 1358 Inter- net users. Using the continuity correction, we represent x ϭ 1358 by the region between 1357.5 and 1358.5. Here is the approach used to find the shaded region in Figure 6-23: First find the total area to the left of 1358.5, then find the total area to the left of 1357.5, then find the difference between those two areas. Let’s begin with the total area to the left of 1358.5. If we want to use Table A-2 we must first find the z score corresponding to 1358.5. We get 1358.5 2 1342 z 5 5 0.78 21.150256 Area of shaded region is 0.0150, which is found by using the normal distribution as an approximation. Area of striped rectangle is 0.0142, which is the exact value found by using the binomial probability formula. m ϭ 1342 1358 1358.5 1357. 5 Figure 6-23 Using the Continuity Correction

6-6 Normal as Approximation to Binomial 297 We use Table A-2 to find that z ϭ 0.78 corresponds to a probability of 0.7823, which is the total area to the left of 1358.5. Now we proceed to find the area to the left of 1357.5 by first finding the z score corre- sponding to 1357.5: 1357.5 2 1342 z 5 5 0.73 21.150256 We use Table A-2 to find that z ϭ 0.73 corresponds to a probability of 0.7673, which is the total area to the left of 1357.5. The shaded area is 0.7823 Ϫ 0.7673 ϭ 0.0150. (Using technology, we get 0.0142.) INTERPRETATION If we assume that 2>3 of all adults use the Internet, the probability of getting exactly 1358 Internet users among 2013 randomly se- lected people is 0.0150. This probability tells us that if the proportion of Inter- net users in the population is 2>3, then it is highly unlikely that we will get ex- actly 1358 Internet users when we survey 2013 people. Actually, when surveying 2013 people, the probability of any single number of Internet users will be very small. If we solve the preceding example using STATDISK, Minitab, or a calculator, we get a result of 0.0142, but the normal approximation method resulted in a value of 0.0150. The discrepancy of 0.0008 results from two factors: (1) The use of the normal distribution results in an approximate value that is the area of the shaded region in Figure 6-23, whereas the exact correct area is a rectangle cen- tered above 1358 (Figure 6-23 illustrates this discrepancy); (2) the use of Table A-2 forced us to find one of a limited number of table values based on a rounded z score. The area of the rectangle is 0.0142, but the area of the approximating shaded region is 0.0150. Interpreting Results In reality, when we use a normal distribution as an approximation to a binomial distribution, our ultimate goal is not simply to find a probability number. We often need to make some judgment based on the probability value. For example, sup- pose a newspaper reporter sees the sample data in the preceding example and, af- ter observing that 67.5% of the surveyed adults used the Internet, she writes the headline that “More Than 2>3 of Adults Use the Internet.” That headline is not justified by the sample data, for the following reason: If the true population pro- portion equals 2>3 (instead of being greater than 2>3), there is a high likelihood (0.2327) of getting at least 1358 Internet users among the 2013 adults surveyed. (The area to the left of 1357.5 is 0.7673, so the probability of getting at least 1358 Internet users is 1 Ϫ 0.7673 ϭ 0.2327.) That is, with a population proportion equal to 2>3, the results of 1358 Internet users is not unusually high. Such conclu- sions might seem a bit confusing at this point, but following chapters will intro- duce systematic methods that will make them much easier. For now, we should understand that low probabilities correspond to events that are very unlikely, whereas large probabilities correspond to likely events. The probability value of

298 Chapter 6 Normal Probability Distributions 0.05 is often used as a cutoff to distinguish between unlikely events and likely events. The following criterion (from Section 5-2) describes the use of probabili- ties for distinguishing between results that could easily occur by chance and those results that are highly unusual. Using Probabilities to Determine When Results Are Unusual ● Unusually high: x successes among n trials is an unusually high number of successes if P(x or more) is very small (such as 0.05 or less). ● Unusually low: x successes among n trials is an unusually low number of successes if P(x or fewer) is very small (such as 0.05 or less). The Role of the Normal Approximation In reality, almost all practical applications of the binomial probability distribution can now be handled well with software or a TI-83>84 Plus calculator. This section presents methods for dealing with cases in which software cannot be used, but, more importantly, it also illustrates the principle that under appropriate circum- stances, the binomial probability distribution can be approximated by a normal distribution. Later chapters will include procedures based on the use of a normal distribution as an approximation to a binomial distribution, so this section forms a foundation for those important procedures. 6-6 BASIC SKILLS AND CONCEPTS Statistical Literacy and Critical Thinking 1. Distribution of Sample Proportions Consider a study in which we obtain records from the next 50 babies that are born, then compute the proportion of girls in this sample. Assume that this study is repeated many times, and the sample proportions are used to construct a histogram. What is the shape of the histogram? 2. Continuity Correction The Wechsler test is used to measure IQ scores. It is designed so that the mean is 100 and the standard deviation is 16. It is known that IQ scores have a normal distribution. Assume that we want to find the probability that a ran- domly selected person has an IQ equal to 107. What is the continuity correction, and how would it be applied in finding that probability? 3. Distribution of Sample Proportions The Newport Bottling plant produces bottles of cola that are packaged in six-packs. The probability of a defective bottle is 0.001. Can we approximate the distribution of defects in six-packs as a normal distribution? Why or why not? 4. Interpreting Binomial Probability In a test of a method of gender selection, 80 cou- ples are given a treatment designed to increase the likelihood that a baby will be a girl. There were 47 girls among the 80 babies born. If the gender-selection method has no effect, the probability of getting exactly 47 girls is 0.0264, and the probability of getting 47 or more girls is 0.0728. Which probability should be used to assess the effectiveness of the gender-selection method? Does it appear that the method is effective?

6-6 Normal as Approximation to Binomial 299 Applying Continuity Correction. In Exercises 5–12, the given values are discrete. Use the continuity correction and describe the region of the normal distribution that corre- sponds to the indicated probability. For example, the probability of “more than 20 defec- tive items” corresponds to the area of the normal curve described with this answer: “the area to the right of 20.5.” 5. Probability of more than 15 people in prison for removing caution labels from pillows 6. Probability of at least 12 adult males on an elevator in the Empire State Building 7. Probability of fewer than 12 crying babies on an American Airlines Flight 8. Probability that the number of working vending machines in the United States is ex- actly 27 9. Probability of no more than 4 absent students in a statistics class 10. Probability that the number of incorrect statistical procedures in Excel is between 15 and 20 inclusive 11. Probability that the number of truly honest politicians is between 8 and 10 inclusive 12. Probability that exactly 3 employees were fired for inappropriate use of the Internet Using Normal Approximation. In Exercises 13–16, do the following: (a) Find the indi- cated binomial probability by using Table A-1 in Appendix A. (b) If np Ն 5 and nq Ն 5, also estimate the indicated probability by using the normal distribution as an approxima- tion to the binomial distribution; if np Ͻ 5 or nq Ͻ 5, then state that the normal approxi- mation is not suitable. 13. With n ϭ 12 and p ϭ 0.6, find P(7). 14. With n ϭ 14 and p ϭ 0.4, find P(6). 15. With n ϭ 11 and p ϭ 0.5, find P(at least 4). 16. With n ϭ 13 and p ϭ 0.3, find P(fewer than 5). 17. Probability of More Than 36 Girls Assuming that boys and girls are equally likely, estimate the probability of getting more than 36 girls in 64 births. Is it unusual to get more than 36 girls in 64 births? 18. Probability of at Least 42 Girls Assuming that boys and girls are equally likely, esti- mate the probability of getting at least 42 girls in 64 births. Is it unusual to get at least 42 girls in 64 births? 19. Voters Lying? In a survey of 1002 people, 701 said that they voted in a recent presi- dential election (based on data from ICR Research Group). Voting records show that 61% of eligible voters actually did vote. Given that 61% of eligible voters actually did vote, find the probability that among 1002 randomly selected eligible voters, at least 701 actually did vote. What does the result suggest? 20. TV Advertising Charges for advertising on a TV show are based on the number of viewers, which is measured by the rating. The rating is a percentage of the population of 110 million TV households. The CBS television show 60 Minutes recently had a rating of 7.8, indicating that 7.8% of the households were tuned to that show. An advertiser conducts an independent survey of 100 households and finds that only 4 were tuned to 60 Minutes. Assuming that the 7.8 rating is correct, find the probability of surveying 100 randomly selected households and getting 4 or fewer tuned to 60 Minutes. Does the

300 Chapter 6 Normal Probability Distributions result suggest that the rating of 7.8 is too high? Does the advertiser have grounds for claiming a refund on the basis that the size of the audience was exaggerated? 21. Mendel’s Hybridization Experiment When Mendel conducted his famous hybridiza- tion experiments, he used peas with green pods and yellow pods. One experiment in- volved crossing peas in such a way that 25% (or 145) of the 580 offspring peas were expected to have yellow pods. Instead of getting 145 peas with yellow pods, he ob- tained 152. Assuming that Mendel’s 25% rate is correct, estimate the probability of getting at least 152 peas with yellow pods among the 580 offspring peas. Is there strong evidence to suggest that Mendel’s rate of 25% is wrong? 22. Cholesterol Reducing Drug The probability of flu symptoms for a person not receiv- ing any treatment is 0.019. In a clinical trial of Lipitor, a common drug used to lower cholesterol, 863 patients were given a treatment of 10-mg atorvastatin tablets, and 19 of those patients experienced flu symptoms (based on data from Pfizer, Inc.). Assum- ing that these tablets have no effect on flu symptoms, estimate the probability that at least 19 of 863 people experience flu symptoms. What do these results suggest about flu symptoms as an adverse reaction to the drug? 23. Cell Phones and Brain Cancer In a study of 420,095 cell phone users in Denmark, it was found that 135 developed cancer of the brain or nervous system. Assuming that cell phones have no effect, there is a 0.000340 probability of a person developing can- cer of the brain or nervous system. We therefore expect about 143 cases of such can- cer in a group of 420,095 randomly selected people. Estimate the probability of 135 or fewer cases of such cancer in a group of 420,095 people. What do these results sug- gest about media reports that cell phones cause cancer of the brain or nervous system? 24. Overbooking Flights Air America is considering a new policy of booking as many as 400 persons on an airplane that can seat only 350. (Past studies have revealed that only 85% of the booked passengers actually arrive for the flight.) Estimate the proba- bility that if Air America books 400 persons, not enough seats will be available. Is that probability low enough to be workable, or should the policy be changed? 25. Identifying Gender Discrimination After being rejected for employment, Kim Kelly learns that the Bellevue Advertising Company has hired only 21 women among its last 62 new employees. She also learns that the pool of applicants is very large, with an equal number of qualified men and women. Help her in her charge of gender dis- crimination by estimating the probability of getting 21 or fewer women when 62 peo- ple are hired, assuming no discrimination based on gender. Does the resulting proba- bility really support such a charge? 26. M&M Candies: Are 20% Orange? According to Mars (the candy company, not the planet), 20% of M&M plain candies are orange. Data Set 13 in Appendix B shows that among 100 M&Ms chosen, 25 are orange. Assuming that the claimed orange M&Ms rate of 20% is correct, estimate the probability of randomly selecting 100 M&Ms and getting 25 or more that are orange. Based on the result, is it unusual to get 25 or more orange M&Ms when 100 are randomly selected? 27. Blood Group Forty-five percent of us have Group O blood, according to data provided by the Greater New York Blood Program. Providence Memorial Hospital is conducting a blood drive because its supply of Group O blood is low, and it needs 177 donors of Group O blood. If 400 volunteers donate blood, estimate the probability that the number with Group O blood is at least 177. Is the pool of 400 volunteers likely to be sufficient? 28. Acceptance Sampling Some companies monitor quality by using a method of accep- tance sampling, whereby an entire batch of items is rejected if a random sample of a

6-6 Normal as Approximation to Binomial 301 particular size includes more than some specified number of defects. The Dayton Machine Company buys machine bolts in batches of 5000 and rejects a batch if, when 50 of them are sampled, at least 2 defects are found. Estimate the probability of re- jecting a batch if the supplier is manufacturing the bolts with a defect rate of 10%. Is this monitoring plan likely to identify the unacceptable rate of defects? 29. Cloning Survey A recent Gallup poll consisted of 1012 randomly selected adults who were asked whether “cloning of humans should or should not be allowed.” Results showed that 89% of those surveyed indicated that cloning should not be allowed. A news reporter wants to determine whether these survey results constitute strong evi- dence that the majority (more than 50%) of people are opposed to such cloning. Assum- ing that 50% of all people are opposed, estimate the probability of getting at least 89% opposed in a survey of 1012 randomly selected people. Based on that result, is there strong evidence supporting the claim that the majority is opposed to such cloning? 30. Bias in Jury Selection In Orange County, 12% of those eligible for jury duty are left- handed. Among 250 people selected for jury duty, 25 (or 10%) are lefties. Find the probability of getting at most 25 lefties asssuming that they are chosen with a process designed to yield a 12% rate of lefties. Can we conclude that this process of selecting jurors discriminates against lefties? 31. Detecting Credit Card Fraud The Dynamic Credit company issues credit cards and uses software to detect fraud. After tracking the spending habits of one consumer, it is found that charges over $100 constitute 35.8% of the credit transactions. Among 30 charges made this month, 18 involve totals that exceed $100. Does this constitute an unusual spending pattern that should be verified? Explain. 32. Detecting Fraud When working for the Brooklyn District Attorney, investigator Robert Burton analyzed the leading digits of amounts on checks from companies that were suspected of fraud. Among 784 checks, 479 had amounts with leading digits of 5, but checks issued in the normal course of honest transactions were expected to have 7.9% of the checks with amounts having leading digits of 5. Is there strong evi- dence to indicate that the check amounts are significantly different from amounts that are normally expected? Explain? 6-6 BEYOND THE BASICS 33. Winning at Roulette Marc Taylor plans to place 200 bets of $1 each on the number 7 at roulette. A win pays off with odds of 35:1 and, on any one spin, there is a probabil- ity of 1>38 that 7 will be the winning number. Among the 200 bets, what is the mini- mum number of wins needed for Marc to make a profit? Estimate the probability that Marc will make a profit. 34. Replacement of TVs Replacement times for TV sets are normally distributed with a mean of 8.2 years and a standard deviation of 1.1 year (based on data from “Getting Things Fixed,” Consumer Reports). Estimate the probability that for 250 randomly selected TV sets, at least 15 of them have replacement times greater than 10.0 years. 35. Joltin’ Joe Assume that a baseball player hits .350, so his probability of a hit is 0.350. (Ignore the complications caused by walks.) Also assume that his hitting attempts are independent of each other. a. Find the probability of at least 1 hit in 4 tries in 1 game. b. Assuming that this batter gets up to bat 4 times each game, estimate the probability of getting a total of at least 56 hits in 56 games. continued

302 Chapter 6 Normal Probability Distributions c. Assuming that this batter gets up to bat 4 times each game, find the probability of at least 1 hit in each of 56 consecutive games (Joe DiMaggio’s 1941 record). d. What minimum batting average would be required for the probability in part (c) to be greater than 0.1? 36. Overbooking Flights Vertigo Airlines works only with advance reservations and ex- periences a 7% rate of no-shows. How many reservations could be accepted for an airliner with a capacity of 250 if there is at least a 0.95 probability that all reservation holders who show will be accommodated? 37. Normal Approximation Required This section included the statement that “In reality, almost all practical applications of the binomial probability distribution can now be handled well with software or a TI-83>84 Plus calculator.” Using a specific software package or a TI-83>84 Plus calculator, identify a case in which the technology fails so that a normal approximation to a binomial distribution is required. 6-7 Assessing Normality Key Concept The following chapters include some very important statistical methods requiring sample data randomly selected from a population with a normal distribution. This section provides criteria for determining whether the re- quirement of a normal distribution is satisfied. The criteria involve visual inspec- tion of a histogram to see if it is roughly bell-shaped, identifying any outliers, and constructing a new graph called a normal quantile plot. Definition A normal quantile plot (or normal probability plot) is a graph of points (x, y) where each x value is from the original set of sample data, and each y value is the corresponding z score that is a quantile value expected from the standard normal distribution. (See Step 3 in the following procedure for de- tails on finding these z scores.) Procedure for Determining Whether Data Have a Normal Distribution 1. Histogram: Construct a histogram. Reject normality if the histogram departs dramatically from a bell shape. 2. Outliers: Identify outliers. Reject normality if there is more than one outlier present. (Just one outlier could be an error or the result of chance variation, but be careful, because even a single outlier can have a dramatic effect on results.) 3. Normal quantile plot: If the histogram is basically symmetric and there is at most one outlier, construct a normal quantile plot. The following steps de- scribe one relatively simple procedure for constructing a normal quantile plot, but different statistical packages use various other approaches. (STATDISK and the TI-83>84 Plus calculator use the procedure given here.) This procedure is messy enough so that we usually use software or a calculator to generate the graph, and the end of this section includes instructions for using STATDISK, Minitab, Excel, and a TI-83>84 Plus calculator.

6-7 Assessing Normality 303 a. First sort the data by arranging the values in order from lowest to highest. States Rig Lottery b. With a sample of size n, each value represents a proportion of 1>n of the Selections sample. Using the known sample size n, identify the areas of 1>2n, 3>2n, Many states run a lottery in 5/2n, 7/2n, and so on. These are the cumulative areas to the left of the cor- which players select four dig- responding sample values. its, such as 1127 (the author’s c. Use the standard normal distribution (Table A-2 or software or a calcula- birthday). If a player pays $1 tor) to find the z scores corresponding to the cumulative left areas found in and selects the winning se- Step (b). (These are the z scores that are expected from a normally dis- quence in the correct order, a tributed sample.) prize of $5000 is won. States d. Match the original sorted data values with their corresponding z scores monitor the number selections found in Step (c), then plot the points (x, y), where each x is an original and, if one particular sequence sample value and y is the corresponding z score. is selected too often, players e. Examine the normal quantile plot using these criteria: If the points do not lie are prohibited from placing any close to a straight line, or if the points exhibit some systematic pattern that is more bets on it. The lottery not a straight-line pattern, then the data appear to come from a population machines are rigged so that that does not have a normal distribution. If the pattern of the points is rea- once a popular sequence sonably close to a straight line, then the data appear to come from a popula- reaches a certain level of sales, tion that has a normal distribution. (These criteria can be used loosely for the machine will no longer ac- small samples, but they should be used more strictly for large samples.) cept that particular sequence. This prevents states from pay- Steps 1 and 2 are straightforward, but we illustrate the construction of a normal ing out more than they take in. quantile plot (Step 3) in the following example. Critics say that this practice is unfair. According to William EXAMPLE Heights of Men Data Set 1 in Appendix B includes heights Thompson, a gambling expert (in inches) of randomly selected men. Let’s consider only the first five heights at the University of Nevada in listed for men: 70.8, 66.2, 71.7, 68.7, 67.6. With only five values, a histogram Las Vegas, “They’re saying will not be very helpful in revealing the distribution of the data. Instead, con- that they (the states) want to be struct a normal quantile plot for these five values and determine whether they in the gambling business, but appear to come from a population that is normally distributed. they don’t want to be gamblers. It just makes a sham out of the SOLUTION The following steps correspond to those listed in the above pro- whole numbers game.” cedure for constructing a normal quantile plot. a. First, sort the data by arranging them in order. We get 66.2, 67.6, 68.7, 70.8, 71.7. b. With a sample of size n ϭ 5, each value represents a proportion of 1>5 of the sample, so we proceed to identify the cumulative areas to the left of the cor- responding sample values. Those cumulative left areas, which are expressed in general as 1>2n, 3>2n, 5>2n, 7>2n, and so on, become these specific areas for this example with n ϭ 5: 1>10, 3>10, 5>10, 7>10, and 9>10. Those same cumulative left areas expressed in decimal form are 0.1, 0.3, 0.5, 0.7, and 0.9. c. We now search the body of Table A-2 for the cumulative left areas of 0.1000, 0.3000, 0.5000, 0.7000, and 0.9000. We find these corresponding z scores: Ϫ1.28, Ϫ0.52, 0, 0.52, and 1.28. d. We now pair the original sorted heights with their corresponding z scores, and we get these (x, y) coordinates which are plotted as in the accom- panying STATDISK display: (66.2, Ϫ1.28), (67.6, Ϫ0.52), (68.7, 0), (70.8, 0.52), and (71.7, 1.28). continued

304 Chapter 6 Normal Probability Distributions STATDISK INTERPRETATION We examine the normal quantile plot in the STATDISK display. Because the points appear to lie reasonably close to a straight line and there does not appear to be a systematic pattern that is not a straight-line pat- tern, we conclude that the sample of five heights appears to come from a nor- mally distributed population. The following STATDISK display shows the normal quantile plot for the same data from the preceding example, with one change: The largest value of 71.7 is changed to 717, so it becomes an outlier. Note how the change in that one value affects the graph. This normal quantile plot does not result in points that reason- ably approximate a straight-line pattern. The following STATDISK display sug- gests that the values of 66.2, 67.6, 68.7, 70.8, 717 are from a population with a distribution that is not a normal distribution. STATDISK The following example illustrates the use of a histogram and quantile plot with a larger data set. Such larger data sets typically require the use of software.

6-7 Assessing Normality 305 EXAMPLE Heights of Men The preceding two normal quantile plots referred to heights of men, but they involved only five sample values. Consider the following 100 heights of men supplied by a researcher who was instructed to randomly select 100 men and measure each of their heights. 63.3 63.4 63.5 63.6 63.7 63.8 63.9 64.0 64.1 64.2 64.3 64.4 64.5 64.6 64.7 64.8 64.9 65.0 65.1 65.2 65.3 65.4 65.5 65.6 65.7 65.8 65.9 66.0 66.1 66.2 66.3 66.4 66.5 66.6 66.7 66.8 66.9 67.0 67.1 67.2 67.3 67.4 67.5 67.6 67.7 67.8 67.9 68.0 68.1 68.2 68.3 68.4 68.5 68.6 68.7 68.8 68.9 69.0 69.1 69.2 69.3 69.4 69.5 69.6 69.7 69.8 69.9 70.0 70.1 70.2 70.3 70.4 70.5 70.6 70.7 70.8 70.9 71.0 71.1 71.2 71.3 71.4 71.5 71.6 71.7 71.8 71.9 72.0 72.1 72.2 72.3 72.4 72.5 72.6 72.7 72.8 72.9 73.0 73.1 73.2 SOLUTION Step 1: Construct a histogram. The accompanying STATDISK display shows the histogram of the 100 heights, and that histogram suggests that those heights are not normally distributed. STATDISK Step 2: Identify outliers. Examining the list of 100 heights, we find that no Step 3: values appear to be outliers. Construct a normal quantile plot. The STATDISK display on page 306 shows the normal quantile plot. Examination of the normal quan- tile plot reveals a systematic pattern that is not a straight-line pattern, suggesting that the data are not from a normally distributed popula- tion. INTERPRETATION Because the histogram does not appear to be bell-shaped, and because the normal quantile plot reveals a pattern of points that is not a straight-line pattern, we conclude that the heights do not appear to be normally distributed. Some of the statistical procedures in later chapters require that sam- ple data be normally distributed, but that requirement is not satisfied for this data continued

306 Chapter 6 Normal Probability Distributions set. We expect that 100 randomly selected heights of men should have a distribu- tion that is approximately normal, so the researcher should be investigated. We could also examine the data more closely. Note that the values, when arranged in order, increase consistently by 0.1. This is further evidence that the heights are not the result of measurements obtained from randomly selected men. STATDISK Data Transformations Many data sets have a distribution that is not normal, but we can transform the data so that the modified values have a normal distribu- tion. One common transformation is to replace each value of x with log(x ϩ 1). [Instead of using log (x ϩ 1), we could use the more direct transformation of re- placing each value x with log x, but the use of log(x ϩ 1) has some advantages, in- cluding the property that if x ϭ 0, then log(x ϩ 1) can be evaluated, whereas log x is undefined.] If the distribution of the log(x ϩ 1) values is a normal distribution, the distribution of the x values is referred to as a lognormal distribution. (See Exercise 22.) In addition to replacing each x value with log(x ϩ 1), there are other transfor- mations, such as replacing each x value with !x, or 1>x, or x2. In addition to getting a required normal distribution when the original data values are not normally dis- tributed, such transformations can be used to correct other deficiencies, such as a re- quirement (found in later chapters) that different data sets have the same variance. Here are a few final comments about procedures for determining whether data are from a normally distributed population: ● If the requirement of a normal distribution is not too strict, examination of a histogram and consideration of outliers may be all that you need to assess normality. ● Normal quantile plots can be difficult to construct on your own, but they can be generated with a TI-83>84 Plus calculator or suitable software, such as STATDISK, SPSS, SAS, Minitab, and Excel. ● In addition to the procedures discussed in this section, there are other more advanced procedures for assessing normality, such as the chi-square goodness-of-fit test, the Kolmogorov-Smirnov test, and the Lilliefors test.

6-7 Assessing Normality 307 Using Technology Proceed to enter the column number for DDXL. (If DDXL does not appear on the the data, then click Evaluate. Menu bar, install the Data Desk XL add-in.) STATDISK STATDISK can be used Select Charts and Plots, then select the to generate a normal quantile plot, and the MINITAB Minitab can generate a graph function type of Normal Probability Plot. result is consistent with the procedure de- similar to the normal quantile plot described Click on the pencil icon for “Quantitative scribed in this section. Enter the data in a in this section. Minitab’s procedure is some- Variable,” then enter the range of values, column of the Sample Editor window. Next, what different, but the graph can be inter- such as A1:A36. Click OK. select Data from the main menu bar at the preted by using the same criteria given in this top, then select Normal Quantile Plot. section. That is, normally distributed data TI-83/84 PLUS The TI-83>84 Plus should lie reasonably close to a straight line, calculator can be used to generate a normal and points should not reveal a pattern that is quantile plot, and the result is consistent not a straight-line pattern. First enter the val- with the procedure described in this section. ues in column C1, then select Stat, Basic First enter the sample data in list L1, press Statistics, and Normality Test. Enter C1 for 2nd and the Y‫ ؍‬key (for STAT PLOT), the variable, then click on OK. then press ENTER. Select ON, select the “type” item, which is the last item in the sec- EXCEL The Data Desk XL add-in can ond row of options, and enter L1 for the data generate a graph similar to the normal quan- list. After making all selections, press tile plot described in this section. First enter ZOOM, then 9. the sample values in column A, then click on 6-7 BASIC SKILLS AND CONCEPTS Statistical Literacy and Critical Thinking 1. Normal Quantile Plot What is the purpose of constructing a normal quantile plot? 2. Rejecting Normality Identify two different characteristics of a normal quantile plot, where each characteristic would lead to the conclusion that the data are not from a normally distributed population. 3. Normal Quantile Plot If you randomly select 100 women aged 21–30 and then con- struct a normal quantile plot of their heights, describe the normal quantile plot that you would expect. 4. Outlier If you have 49 data values randomly selected from a normally distributed population, and there is also a 50th value that is an outlier, will that outlier somehow stand out in the graph of the normal quantile plot, or will it seem to fit in because it is only one value among 50?

308 Chapter 6 Normal Probability Distributions Interpreting Normal Quantile Plots. In Exercises 5–8, examine the normal quantile plot and determine whether it depicts sample data from a population with a normal distribution. 5. STATDISK 6. STATDISK 7. STATDISK

6-7 Assessing Normality 309 8. STATDISK Determining Normality. In Exercises 9–12, refer to the indicated data set and determine whether the requirement of a normal distribution is satisfied. Assume that this require- ment is loose in the sense that the population distribution need not be exactly normal, but it must be a distribution that is basically symmetric with only one mode. 9. BMI The measured BMI (body mass index) values of a sample of men, as listed in Data Set 1 in Appendix B. 10. Weights of Pennies The weights of the wheat pennies, as listed in Data Set 14 in Ap- pendix B. 11. Precipitation The precipitation amounts, as listed in Data Set 8 in Appendix B. 12. Temperatures The average daily temperatures, as listed in Data Set 9 in Appendix B. Using Technology to Generate Normal Quantile Plots. In Exercises 13–16, use the data from the indicated exercise in this section. Use a TI-83>84 Plus calculator or software (such as STATDISK, Minitab, or Excel) capable of generating normal quantile plots (or normal probability plots). Generate the graph, then determine whether the data come from a normally distributed population. 13. Exercise 9 14. Exercise 10 15. Exercise 11 16. Exercise 12 17. Comparing Data Sets Using the heights of women and the cholesterol levels of women, as listed in Data Set 1 in Appendix B, analyze each of the two data sets and determine whether each appears to come from a normally distributed population. Compare the results and give a possible explanation for any notable differences be- tween the two distributions. 18. Comparing Data Sets Using the systolic blood pressure levels and the elbow breadths of women, as listed in Data Set 1 in Appendix B, analyze each of the two data sets and determine whether each appears to come from a normally distributed population. Compare the results and give a possible explanation for any notable differences be- tween the two distributions.

310 Chapter 6 Normal Probability Distributions Constructing Normal Quantile Plots. In Exercises 19 and 20, use the given data values and identify the corresponding z scores that are used for a normal quantile plot, then construct the normal quantile plot and determine whether the data appear to be from a population with a normal distribution. 19. Heights of LA Lakers Use this sample of heights (in inches) of the players in the starting lineup for the LA Lakers professional basketball team: 85, 79, 82, 73, 78. 20. Monitoring Lead in Air On the days immediately following the destruction caused by the terrorist attacks on September 11, 2001, lead amounts (in micrograms per cubic meter) in the air were recorded at Building 5 of the World Trade Center site, and these values were obtained: 5.40, 1.10, 0.42, 0.73, 0.48, 1.10. 6-7 BEYOND THE BASICS 21. Using Standard Scores When constructing a normal quantile plot, suppose that in- stead of finding z scores using the procedure described in this section, each value in a sample is converted to its corresponding standard score using z 5 sx 2 xd/s. If the (x, z) points are plotted in a graph, can this graph be used to determine whether the sample comes from a normally distributed population? Explain. 22. Lognormal Distribution Test the following phone call times (in seconds) for normal- ity, then replace each x value with log(x ϩ 1) and test the transformed values for nor- mality. What do you conclude? 31.5 75.9 31.8 87.4 54.1 72.2 138.1 47.9 210.6 127.7 160.8 51.9 57.4 130.3 21.3 403.4 75.9 93.7 454.9 55.1 Review We introduced the concept of probability distributions in Chapter 5, but included only discrete distributions. In this chapter we introduced continuous probability distributions and focused on the most important category: normal distributions. Normal distributions will be used extensively in the following chapters. In Section 6-2 we observed that normal distributions are approximately bell- shaped when graphed. The total area under the density curve of a normal distribution is 1, so there is a convenient correspondence between areas and probabilities. Specific areas can be found using Table A-2 or a TI-83>84 Plus calculator or software. (We do not use Formula 6-1, the equation that is used to define the normal distribution.) In Section 6-3 we presented important methods for working with normal dis- tributions, including those that use the standard score z 5 sx 2 md>s for solving problems such as these: ● Given that IQ scores are normally distributed with m ϭ 100 and s ϭ 15, find the probability of randomly selecting someone with an IQ above 90. ● Given that IQ scores are normally distributed with m ϭ 100 and s ϭ 15, find the IQ score separating the bottom 85% from the top 15%. In Section 6-4 we introduced the concept of a sampling distribution. The sampling distribution of the mean is the probability distribution of sample means, with all samples having the same sample size n. The sampling distribution of the propor- tion is the probability distribution of sample proportions, with all samples having

Review Exercises 311 the same sample size n. In general, the sampling distribution of any statistic is the probability distribution of that statistic. In Section 6-5 we presented the following important points associated with the central limit theorem: 1. The distribution of sample means will, as the sample size n increases, ap- proach a normal distribution. 2. The mean of the sample means is the population mean m. 3. The standard deviation of the sample means is s> 2n. In Section 6-6 we noted that we can sometimes approximate a binomial prob- ability distribution by a normal distribution. If both np $ 5 and nq $ 5, the bino- mial random variable x is approximately normally distributed with the mean and standard deviation given as m ϭ np and s 5 !npq. Because the binomial proba- bility distribution deals with discrete data and the normal distribution deals with continuous data, we apply the continuity correction, which should be used in nor- mal approximations to binomial distributions. Finally, in Section 6-7 we presented a procedure for determining whether sample data appear to come from a population that has a normal distribution. Some of the statistical methods covered later in this book have a loose requirement of a normally distributed population. In such cases, examination of a histogram and outliers might be all that is needed. In other cases, normal quantile plots might be necessary because of a very strict requirement that the population must have a normal distribution. Statistical Literacy and Critical Thinking 1. Normal Distribution What is a normal distribution? What is a standard normal distri- bution? 2. Distribution of Sample Means A process consists of randomly selecting 250 adults, measuring the grip strength (right hand only), then finding the sample mean. Assum- ing that this process is repeated many times, what important fact do we know about the distribution of the resulting sample means? 3. Simple Random Sample A researcher has collected a large sample of IQ scores from friends and relatives. He claims that because his sample is large and the distribution of his sample scores is very close to the bell shape of a normal distribution, his sample is representative of the population. Is this reasoning correct? 4. Central Limit Theorem What does the central limit theorem tell us? Review Exercises 1. Weighing Errors A scale is designed so that when items are weighed, the errors in the indicated weights are normally distributed with a mean of 0 g and a standard devia- tion of 1 g. (If the scale reading is too low, the error is negative. If the scale reading is too high, the error is positive.) a. If an item is randomly selected and weighed, what is the probability that it has an error between Ϫ0.5 g and 0.5 g? b. If 16 items are randomly selected and weighed, what is the probability that the mean of the errors is between Ϫ0.5 g and 0.5 g? c. What is the 90th percentile for the errors?

312 Chapter 6 Normal Probability Distributions 2. Boston Beanstalk Club The Boston Beanstalk Club has a minimum height require- ment of 74 in. for men. Heights of men are normally distributed with a mean of 69.0 in. and a standard deviation of 2.8 in. (based on data from the National Health Survey). a. What percentage of men meet the minimum height requirement? b. If four men are randomly selected, what is the probability that their mean height is at least 74 in.? c. If the minimum height requirement for men is to be changed so that only the tallest 10% of men are eligible, what is the new height requirement? 3. High Cholesterol Levels The serum cholesterol levels in men aged 18–24 are nor- mally distributed with a mean of 178.1 and a standard deviation of 40.7. Units are in mg>100 mL, and the data are based on the National Health Survey. a. If one man aged 18–24 is randomly selected, find the probability that his serum cholesterol level is greater than 260, a value considered to be “moderately high.” b. If one man aged 18–24 is randomly selected, find the probability that his serum cholesterol level is between 170 and 200. c. If 9 men aged 18–24 are randomly selected, find the probability that their mean serum cholesterol level is between 170 and 200. d. The Providence Health Maintenance Organization wants to establish a criterion for recommending dietary changes if cholesterol levels are in the top 3%. What is the cutoff for men aged 18–24? 4. Detecting Gender Bias in a Test Question When analyzing one particular question on an IQ test, it is found that among the 20 people who answered incorrectly, 18 are women. Given that the test was given to the same number of men and women, all carefully chosen so that they have the same intellectual ability, find the probability that among 20 wrong answers, at least 18 are made by women. Does the result pro- vide strong evidence that the question is biased in favor of men? 5. Gender Discrimination When several women are not hired at the Telektronics Company, they do some research and find that among the many people who ap- plied, 30% were women. However, the 20 people who were hired consist of only 3 women and 17 men. Find the probability of randomly selecting 20 people from a large pool of applicants (30% of whom are women) and getting 3 or fewer women. Based on the result, does it appear that the company is discriminating based on gender? 6. Testing for Normality Listed below are the lengths of time (in days) that the New York State budget has been late for each of 19 consecutive and recent years. Do those lengths of time appear to come from a population that has a normal distribution? Why or why not? 4 4 10 19 18 48 64 1 4 68 67 103 125 13 125 34 124 45 44 Cumulative Review Exercises 1. Carbohydrates in Food Some standard food items are randomly selected and the car- bohydrate contents (in grams) are measured with the results listed below (based on data from the U.S. Department of Agriculture). (The items are 12 oz of regular coffee, one cup of whole milk with 3.3% fat, one egg, one banana, one plain doughnut, one tablespoon of peanut butter, one carrot, and 10 potato chips.) 0 12 1 27 24 3 7 10

Cooperative Group Activities 313 a. Find the mean x. b. Find the median. c. Find the standard deviation s. d. Find the variance s2. e. Convert the value of 3 g to a z score. f. Find the actual percentage of these sample values that exceeds 3 g. g. Assuming a normal distribution, find the percentage of population amounts that exceeds 3 g. Use the sample values of x and s as estimates of m and s. h. What level of measurement (nominal, ordinal, interval, ratio) describes this data set? i. The listed measurements appear to be rounded to the nearest gram, but are the ex- act unrounded amounts discrete data or continuous data? j. Does it make sense to use the sample mean as an estimate of the carbohydrate con- tent of the food items consumed by the average adult American? Why or why not? 2. Left-Handedness According to data from the American Medical Association, 10% of us are left-handed. a. If three people are randomly selected, find the probability that they are all left- handed. b. If three people are randomly selected, find the probability that at least one of them is left-handed. c. Why can’t we solve the problem in part (b) by using the normal approximation to the binomial distribution? d. If groups of 50 people are randomly selected, what is the mean number of left- handed people in such groups? e. If groups of 50 people are randomly selected, what is the standard deviation for the numbers of left-handed people in such groups? f. Would it be unusual to get 8 left-handed people in a randomly selected group of 50 people? Why or why not? Cooperative Group Activities in the group and record n ϭ total number of births and x ϭ number of girls. Given batches of n births, compute 1. Out-of-class activity Divide into groups of three or four the mean and standard deviation for the number of students. In each group, develop an original procedure girls. Is the simulated result usual or unusual? Why? to illustrate the central limit theorem. The main objec- tive is to show that when you randomly select samples 3. In-class activity Divide into groups of three or four stu- from a population, the means of those samples tend to dents. Select a set of data from Appendix B (excluding be normally distributed, regardless of the nature of the Data Sets 1, 8, 9, and 14, which were used in examples population distribution. or exercises in Section 6-7). Use the methods of Section 6-7 and construct a histogram and normal quantile plot, 2. In-class activity Divide into groups of three or four then determine whether the data set appears to come from students. Using a coin to simulate births, each individ- a normally distributed population. ual group member should simulate 25 births and record the number of simulated girls. Combine all the results

314 Chapter 6 Normal Probability Distributions Technology Project One of the best technology projects for this chapter is to Minitab: Select the options of Calc, Random verify important properties of the central limit theorem de- Data, then Integer. Proceed to enter 100 scribed in Section 6-5. Proceed as follows: for the number of rows, and enter C1–C50 for the columns in which to store 1. Use a calculator or software to simulate 100 rolls of a the data. (This will allow you to generate die. Select a random generator that produces the whole all 50 samples in one step.) Also enter a numbers 1, 2, 3, 4, 5, 6, all randomly selected. (See the minimum of 1 and a maximum of 6, then instructions below.) click OK. Find the mean of each column by using Stat, Basic Statistics, Display 2. Find and record the mean of the 100 results. Descriptive Statistics. 3. Repeat the first two steps until 50 sample means have Excel: First enter the values of 1, 2, 3, 4, 5, 6 in been obtained. column A, then enter the expression ‫؍‬1>6 in each of the first six cells in column B. 4. Enter the 50 sample means, then generate a histogram (This defines a probability distribution in and descriptive statistics for those sample means. columns A and B.) Select Tools from the main menu bar, then select Data Analysis 5. Without actually generating a histogram, what is the and Random Number Generation. After approximate shape of the histogram for the 5000 simu- clicking OK, use the dialog box to enter lated rolls of a die? How does it compare to the his- 50 for the number of variables and 100 togram found in Step 4? for the number of random numbers, then select “discrete” for the type of distribu- 6. What is the mean of the 50 sample means? How does it tion. Enter A1:B6 for the “Value and compare to the mean of many rolls of a fair die? Probability Input Range.” The data should be displayed. Find the mean of each col- 7. What is the standard deviation of the 50 sample means? umn by using Tools, Data Analysis, De- How does it compare to the standard deviation of out- scriptive Statistics. Enter an input range comes when a single die is rolled a large number of of A1:AX100, and click on the box la- times? beled Summary Statistics. After clicking the OK button, all of the sample means 8. In your own words, describe how the preceding results should be displayed. demonstrate the central limit theorem. STATDISK: Select Data from the main menu bar, then choose the option of Uniform Generator. Enter 100 for the sample size, enter 1 for TI-83>84 Plus: Press MATH, then select PRB, then use the minimum, enter 6 for the maximum, randInt to enter randInt(1, 6, 100), which and use 0 decimal places. Hint: Use the will generate 100 integers between 1 and 6 same window by clicking on the inclusive. Enter STOSL1 to store the data Evaluate button. After each sample has in list L1. Now press STAT, select Calc, been generated, find the sample mean by then select 1-Var Stats and proceed to ob- using Copy>Paste to copy the data to the tain the descriptive statistics (including the Sample Editor window. sample mean) for the generated sample.

From Data to Decision 315 From Data to Decision row of six seats can cost around $8 million cally feasible. One of the hard decisions that over the life of the aircraft. If the sitting dis- we must make is this: What percentage of Critical Thinking: tance is too small, passengers might be un- the population are we willing to exclude? Designing aircraft seating comfortable and might prefer to fly other air- Another necessary decision is this: How craft, or their safety might be compromised much extra room do we want to provide for In this project we consider the issue of de- because of their limited mobility. passenger comfort and safety? Use the avail- termining the “sitting distance” shown in able information to determine the sitting dis- Figure 6-24(a). We define the sitting distance In determining the sitting distance in our air- tance. Identify the choices and decisions that to be the length between the back of the seat craft, we will use previously collected data were made in that determination. cushion and the seat in front. Determining the from measurements of large numbers of peo- sitting distance must take into account human ple. Results from those measurements are Buttock-to-Knee Length (inches) body measurements. Specifically, we must summarized in the given table. We can use consider the “buttock-to-knee length,” as the data in the table to determine the required Males Mean Standard Distribution shown in Figure 6-24(b). Determining the sit- sitting distance, but we must make some Females Deviation ting distance for an aircraft is extremely im- hard choices. If we are to accommodate 23.5 in. Normal portant. If the sitting distance is unnecessarily everyone in the population, we will have a 22.7 in. 1.1 in. Normal large, rows of seats might be eliminated. It sitting distance that is so costly in terms of 1.0 in. has been estimated that removing a single reduced seating that it might not be economi- • Distance from the seat back cushion to the seat in front • Buttock-to-knee length plus any additional distance to provide comfort Buttock-to- knee length (a) (b) Figure 6-24 Sitting Distance and Buttock-to-Knee Length

316 Chapter 6 Normal Probability Distributions Internet Project Exploring the Central Limit Theorem The Internet Project for this chapter, found at the Elementary Statistics Web site The central limit theorem is one of the most im- portant results in statistics. It also may be one of http://www.aw.com/triola the most surprising. Informally, the central limit theorem says that the normal distribution is ev- will allow you to do just that. You will be asked erywhere. No matter what probability distribu- to view, interpret, and discuss a demonstration tion underlies an experiment, there is a corre- of the central limit theorem as part of a dice sponding distribution of means that will be rolling experiment. In addition, you will be approximately normal in shape. guided in a search through the Internet for other such demonstrations. The best way to both understand and appreciate the central limit theorem is to see it in action.


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook