Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Elementary Statistics 10th Ed.

Elementary Statistics 10th Ed.

Published by Junix Kaalim, 2022-09-12 13:26:53

Description: Triola, Mario F.

Search

Read the Text Version

9-5 Comparing Variation in Two Samples 501 P-Value Test and Confidence Intervals We have described the traditional method of using the F test for claims made about two population variances. The P-value method is easy to use with software capable of providing P-values. For the preceding example, STATDISK, Excel, Minitab, and the TI-83>84 Plus calculator all provide a P-value of 0.1081. Exercise 24 deals with the construction of confidence intervals. Alternative Methods We have presented the F test for comparing variances, but that test is very sensi- tive to departures from normality. Here we briefly describe some alternatives that are more robust. Count Five The count five method is a relatively simple alternative to the F test, and it does not require normally distributed populations. (See “A Quick, Compact, Two-Sample Dispersion Test: Count Five,” by McGrath and Yeh, American Statistician, Vol. 59, No. 1.) If the two sample sizes are equal, and if one sample has at least five of the largest mean absolute deviations (MAD), then we conclude that its population has a larger variance. See Exercise 21 for the specific procedure. Levene-Brown-Forsythe Test The Levene-Brown-Forsythe test (or modified Levene’s test) is another alternative to the F test, and it is much more robust. This test begins with a transformation of each set of sample values. Within the first sample, replace each x value with k x Ϫ median k , and do the same for the second sample. Using the transformed values, conduct a t test of equality of means from independent samples, as described in Part 1 of Section 9-3. Because the transformed values are now deviations, the t test for equality of means is actually a test comparing variation in the two samples. See Exercise 22. In addition to the count five test and the Levene-Brown-Forsythe test, there are other alternatives to the F test, as well as adjustments that improve the perfor- mance of the F test. See “Fixing the F Test for Equal Variances,” by Shoemaker, American Statistician, Vol. 57, No. 2. Using Technology MINITAB Either obtain the summary the values of the second sample in column statistics for both samples, or enter the indi- B. Select Tools, Data Analysis, and then STATDISK Select Analysis from the vidual sample values in two columns. (If F-Test Two-Sample for Variances. In the main menu, then select either Hypothesis using Minitab Release 13 or earlier, you must dialog box, enter the range of values for Testing or Confidence Intervals, then StDev- use the original lists of raw data.) Select Stat, the first sample (such as A1:A36) and the Two Samples. Enter the required items in the then Basic Statistics, then 2 Variances. A dia- range of values for the second sample. Enter dialog box and click on the Evaluate button. log box will appear: Either select the option of the value of the significance level in the “Samples in different columns” and proceed “Alpha” box. Excel will provide the F test to enter the column names, or select “Summa- statistic, the P-value for the one-tailed case, rized data” and proceed to enter the summary and the critical F value for the one-tailed statistics (if using Minitab Release 14 or case. For a two-tailed test, make two adjust- later). Click on the Options button and enter ments: Enter a>2 instead of a, and double the confidence level. (Enter 0.95 for a hypoth- the P-value given by Excel. esis test with a 0.05 significance level). Click OK, then click OK in the main dialog box. TI-83/84 PLUS Press the STAT key, then select TESTS, then 2-SampFTEST. EXCEL First enter the data from the You can use the summary statistics or you first sample in the first column A, then enter can use the data that are entered as lists.

502 Chapter 9 Inferences from Two Samples TI-83/84 Plus 9-5 BASIC SKILLS AND CONCEPTS Statistical Literacy and Critical Thinking 1. Weakness of F Test What is a major weakness of the F test described in this section? Name two alternatives that do a better job of overcoming that weakness. 2. Effects of Nonnormal Distributions What is a consequence of using the F test with samples from populations with distributions that are not normal? That is, in what way does the test perform poorly? 3. F Distribution In the context of this section, what is the F distribution? 4. F Distribution Identify two different properties of the F distribution. Hypothesis Test of Equal Variances. In Exercises 5 and 6, test the given claim. Use a sig- nificance level of a ϭ 0.05 and assume that all populations are normally distributed. 5. Claim: The treatment population and the placebo population have different variances. Treatment group: n ϭ 16, x ϭ 21.33, s ϭ 0.80 Placebo group: n ϭ 41, x ϭ 25.34, s ϭ 0.40 6. Claim: IQ scores of statistics students vary less than IQ scores of other students. Statistics students: n ϭ 28, x ϭ 118, s ϭ 10.0 Other students: n ϭ 25, x ϭ 112, s ϭ 12.0 7. Interpreting Display from Weights of Regular Coke and Diet Coke This section in- cluded an example about a hypothesis test of the claim that weights of regular Coke and regular Pepsi have the same standard deviation. Use a 0.05 significance level to test the claim that regular Coke and diet Coke have weights with different standard deviations. Sample weights are found in Data Set 12 in Appendix B, but see the ac- companying displayed results of the F test from the TI-83>84 Plus calculator. If the results were to show that the standard deviations are significantly different, what would be an important factor that might explain the difference? 8. Interpreting Display for Test of Echinacea In a randomized, double-blind, placebo- controlled trial of children, echinacea was tested as a treatment for upper respiratory infections in children. “Days of fever” was one criterion used to measure effects. Among 337 children treated with echinacea, the mean number of days with fever was 0.81, with a standard deviation of 1.50 days. Among 370 children given a placebo, the mean number of days with fever was 0.64 with a standard deviation of 1.16 days (based on data from “Efficacy and Safety of Echinacea in Treating Upper Respiratory Tract Infections in Children,” by Taylor et al., Journal of the American Medical Asso- ciation, Vol. 290, No. 21). Use a 0.05 significance level to test the claim that children treated with echinacea have a larger standard deviation than those given a placebo. See the accompanying displayed results from Excel. Excel

9-5 Comparing Variation in Two Samples 503 9. Weights of Diet Coke and Diet Pepsi This section included an example about a hy- Treatment Placebo pothesis test of the claim that weights of regular Coke and regular Pepsi have the Group Group same standard deviation. Use a 0.05 significance level to test the claim that Diet Coke and Diet Pepsi have weights with different standard deviations. Sample weights are n1 ϭ 22 n2 ϭ 22 found in Data Set 12 in Appendix B, but here are the summary statistics: Diet Coke: x1 ϭ 4.20 x2 ϭ 1.71 n ϭ 36, x ϭ 0.784794 lb, s ϭ 0.004391 lb; Diet Pepsi: n ϭ 36, x ϭ 0.783858 lb, s ϭ s1 ϭ 2.20 s2 ϭ 0.72 0.004362 lb. 10. Bipolar Depression Treatment In clinical experiments involving different groups of independent samples, it is important that the groups be similar in the important ways that affect the experiment. In an experiment designed to test the effectiveness of paroxetine for treating bipolar depression, subjects were measured using the Hamil- ton depression scale with the results given below (based on data from “Double-Blind, Placebo-Controlled Comparison of Imipramine and Paroxetine in the Treatment of Bipolar Depression,” by Nemeroff et al., American Journal of Psychiatry, Vol. 158, No. 6). Using a 0.05 significance level, test the claim that both populations have the same standard deviation. Based on the results, does it appear that the two populations have different standard deviations? Placebo group: n ϭ 43, x ϭ 21.57, s ϭ 3.87 Paroxetine treatment group: n ϭ 33, x ϭ 20.38, s ϭ 3.91 11. Hypothesis Test for Magnet Treatment of Pain Researchers conducted a study to de- termine whether magnets are effective in treating back pain, with results given below (based on data from “Bipolar Permanent Magnets for the Treatment of Chronic Lower Back Pain: A Pilot Study,” by Collacott, Zimmerman, White, and Rindone, Journal of the American Medical Association, Vol. 283, No. 10). The values represent measure- ments of pain using the visual analog scale. Use a 0.05 significance level to test the claim that those given a sham treatment (similar to a placebo) have pain reductions that vary more than the pain reductions for those treated with magnets. Reduction in pain level after sham treatment: n ϭ 20, x ϭ 0.44, s ϭ 1.4 Reduction in pain level after magnet treatment: n ϭ 20, x ϭ 0.49, s ϭ 0.96 12. Hypothesis Test for Effect of Marijuana Use on College Students In a study of the ef- fects of marijuana use, light and heavy users of marijuana in college were tested for memory recall, with the results given below (based on data from “The Residual Cog- nitive Effects of Heavy Marijuana Use in College Students” by Pope and Yurgelun- Todd, Journal of the American Medical Association, Vol. 275, No. 7). Use a 0.05 sig- nificance level to test the claim that the population of heavy marijuana users has a standard deviation different from that of light users. Items sorted correctly by light marijuana users: n ϭ 64, x ϭ 53.3, s ϭ 3.6 Items sorted correctly by heavy marijuana users: n ϭ 65, x ϭ 51.3, s ϭ 4.5 13. Effects of Alcohol An experiment was conducted to test the effects of alcohol. The errors were recorded in a test of visual and motor skills for a treatment group of peo- ple who drank ethanol and another group given a placebo. The results are shown in the accompanying table (based on data from “Effects of Alcohol Intoxication on Risk Taking, Strategy, and Error Rate in Visuomotor Performance,” by Streufert et al., Journal of Applied Psychology, Vol. 77, No. 4). Use a 0.05 significance level to test the claim that the treatment group has scores that vary more than the scores of the placebo group.

504 Chapter 9 Inferences from Two Samples 14. Ages of Faculty and Student Cars Students at the author’s college randomly selected 217 student cars and found that they had ages with a mean of 7.89 years and a stan- dard deviation of 3.67 years. They also randomly selected 152 faculty cars and found that they had ages with a mean of 5.99 years and a standard deviation of 3.65 years. Is there sufficient evidence to support the claim that the ages of faculty cars vary less than the ages of student cars? 15. Weights of Quarters Weights of quarters are used by vending machines as one way to detect counterfeit coins. Data Set 14 in Appendix B includes weights of pre-1964 sil- ver quarters and post-1964 quarters. Here are the summary statistics: pre-1964: n ϭ 40, x ϭ 6.19267 g, s ϭ 0.08700 g; post-1964: n ϭ 40, x ϭ 5.63930 g, s ϭ 0.06194. Use a 0.05 significance level to test the claim that the two populations of quarters have the same standard deviation. If the amounts of variation are different, vending machines might need more complicated adjustments. Does it appear that such adjust- ments are necessary? 16. Weights of Pennies and Quarters Data Set 14 in Appendix B includes weights of post-1983 pennies and post-1964 quarters. Here are the summary statistics: post-1983 pennies: n ϭ 37, x ϭ 2.49910 g, s ϭ 0.01648 g; post-1964 quarters: n ϭ 40, x ϭ 5.63930 g, s ϭ 0.06194. Test the claim that post-1983 pennies and post-1964 quarters have the same amount of variation. Should they have the same amount of variation? 17. Blanking out on Tests Many students have had the unpleasant experience of panicking on a test because the first question was exceptionally difficult. The arrangement of test items was studied for its effect on anxiety. Sample values consisting of measures of “debilitating test anxiety” (which most of us call panic or blanking out) are obtained for a group of subjects with test questions arranged from easy to difficult, and another group with test questions arranged from difficult to easy. Here are the summary statis- tics: Easy-to-difficult group: n ϭ 25, x ϭ 27.115, s2 ϭ 47.020; difficult-to-easy group: n ϭ 16, x ϭ 31.728, s2 ϭ 18.150 (based on data from “Item Arrangement, Cognitive Entry Characteristics, Sex and Test Anxiety as Predictors of Achievement in Examina- tion Performance,” by Klimko, Journal of Experimental Education, Vol. 52, No. 4). Test the claim that the two samples come from populations with the same variance. 18. Effect of Birth Weight on IQ Score When investigating a relationship between birth weight and IQ, researchers found that 258 subjects with extremely low birth weights (less than 1000 g) had Wechsler IQ scores at age 8 with a mean of 95.5 and a variance of 256.0. For 220 subjects with normal birth weights, the mean at age 8 is 104.9 and the variance is 198.8 (based on data from “Neurobehavioral Outcomes of School-age Children Born Extremely Low Birth Weight or Very Preterm in the 1990s,” by Anderson et al., Journal of the American Medical Association, Vol. 289, No. 24). Using a 0.05 significance level, test the claim that babies with extremely low birth weights and babies with normal birth weights have different amounts of variation. (Hint: The con- clusion is not clear from Table A-5, so use this upper critical value: F ϭ 1.2928.) 19. Appendix B Data Set: Rainfall on Weekends USA Today and other newspapers re- ported on a study that supposedly showed that it rains more on weekends. The study referred to areas on the East Coast near the ocean. Data Set 10 in Appendix B lists the rainfall amounts in Boston for one year. a. Assuming that we want to use the methods of this section to test the claim that Wednesday and Sunday rainfall amounts have the same standard deviation, iden- tify the F test statistic, critical value, and conclusion. Use a 0.05 significance level. b. Consider the prerequisite of normally distributed populations. Instead of construct- ing histograms or normal quantile plots, simply examine the numbers of days with

9-5 Comparing Variation in Two Samples 505 no rainfall. Are Wednesday rainfall amounts normally distributed? Are Sunday rainfall amounts normally distributed? c. What can be concluded from the results of parts (a) and (b)? 20. Appendix B Data Set: Tobacco and Alcohol Use in Animated Children’s Movies Data Set 5 in Appendix B lists times (in seconds) that animated children’s movies show tobacco use and alcohol use. a. Assuming that we want to use the methods of this section to test the claim that the times of tobacco use and the times of alcohol use have different standard devia- tions, identify the F test statistic, critical value, and conclusion. Use a 0.05 signifi- cance level. b. Do the data appear to satisfy the requirement of independent populations? Explain. c. Do the data appear to satisfy the requirement of normally distributed populations? Instead of constructing histograms or normal quantile plots, simply examine the numbers of movies showing no tobacco or alcohol use. Are the times for tobacco use normally distributed? Are the times for alcohol use normally distributed? d. What can be concluded from the preceding results? 9-5 BEYOND THE BASICS 21. Count Five Test for Comparing Variation in Two Populations See the example in this section and, instead of using the F test, use the following procedure for a “count five” test of equal variation. What do you conclude? a. For the first sample, find the mean absolute deviation (MAD) of each value. (Re- call from Section 3-3 that the MAD of a sample value x is k x 2 x k .) Sort the MAD values. Do the same for the second sample. b. Let c1 be the count of the number of MAD values in the first sample that are greater than the largest MAD value in the other sample. Also, let c2 be the count of the number of MAD values in the second sample that are greater than the largest MAD in the other sample. (One of these counts will always be zero.) c. If the sample sizes are equal (n1 ϭ n2), use a critical value of 5. If n1 n2, calcu- late the critical value shown below. logsa>2d loga n1 b n1 1 n2 d. If c1 Ն critical value, then conclude that s21 . s22. If c2 Ն critical value, then con- clude that s22 . s21. Otherwise, fail to reject the null hypothesis of s21 5 s22. 22. Levene-Brown-Forsythe Test for Comparing Variation in Two Populations See the example in this section and, instead of using the F test, use the Levene-Brown- Forsythe test described near the end of this section. What do you conclude? 23. Finding Lower Critical F Values In this section, for hypothesis tests that were two- tailed, we need to find only the upper critical value. Let’s denote that value by FR, where the subscript indicates the critical value for the right tail. The lower critical value FL (for the left tail) can be found as follows: First interchange the degrees of freedom, and then take the reciprocal of the resulting F value found in Table A-5. Find the critical values FL and FR for two-tailed hypothesis tests based on the follow- ing values. a. n1 ϭ 10, n2 ϭ 10, a ϭ 0.05

506 Chapter 9 Inferences from Two Samples b. n1 ϭ 10, n2 ϭ 7, a ϭ 0.05 c. n1 ϭ 7, n2 ϭ 10, a ϭ 0.05 24. Constructing Confidence Intervals In addition to testing claims involving s21 and s22, we can also construct confidence interval estimates of the ratio s21>s22 using the fol- lowing expression: ass1222 ? 1 , s12 , a s21 ? 1 b s22 s22 b FR FL Here FL and FR are as described in Exercise 23. Refer to the sample data given in Ex- ercise 5 and construct a 95% confidence interval estimate for the ratio of the treatment group variance to the placebo group variance. Review We use methods of inferential statistics when we use sample data to form conclusions about populations. Two major activities of inferential statistics are (1) constructing confi- dence interval estimates of population parameters (such as p, m, s), and (2) testing hy- potheses or claims made about population parameters. In Chapters 7 and 8 we discussed the estimation of population parameters and methods of testing hypotheses made about population parameters, but Chapters 7 and 8 considered only cases involving a single pop- ulation. In this chapter we considered two samples drawn from two populations. ● Section 9-2 considered inferences made about two population proportions. Given conditions in which the listed requirements are satisfied, we use the normal distri- bution for constructing confidence interval estimates of the difference p1 Ϫ p2 and for testing claims, such as the claim that p1 ϭ p2. ● Section 9-3 considered inferences made about the means of two independent popu- lations. Section 9-3 included three different methods, but one method is rarely used because it requires that the two population standard deviations be known. Another method involves pooling the two sample standard deviations to develop an esti- mate of the standard error, but this method is based on the assumption that the two population standard deviations are known to be equal, and that assumption is often risky. See Figure 9-3 for help in determining which method to apply. The proce- dure generally recommended is the t test that does not assume equal population variances. ● Section 9-4 considered inferences made about the mean difference for a population consisting of matched pairs. ● Section 9-5 presented the F test for testing claims about the equality of two popula- tion standard deviations or variances. It is important to know that the F test is not robust, meaning that it performs poorly with populations not having normal distri- butions. Alternatives to the F test were briefly described. Statistical Literacy and Critical Thinking 1. Which Method? A candidate for political office is concerned about reports of a “gender gap” claiming that he is preferred more by male voters than by women voters. You have been hired to investigate the gender gap. What methods of this chapter would you use? 2. Simple Random Sample You have been hired to compare the mean credit debt of men in your state to the mean credit debt of women in your state. You have been

Review Exercises 507 given samples that were obtained by this process: First, a complete list of all creditors was obtained, then a computer was used to arrange the list in a random order, then a random sample of 200 creditors was selected. Does this selection process satisfy the requirement of being a simple random sample? Explain. 3. Comparing Incomes Using data from the Bureau of Labor Statistics, a researcher ob- tains the mean income of men and the mean income of women for each of the 50 states. She then conducts a t test of the null hypothesis that men and women in the United States have equal mean incomes. Is this procedure okay? Why or why not? 4. Independent Samples What is the difference between two samples that are indepen- dent and two samples that are not independent? Review Exercises 1. Racial Profiling Racial profiling is the controversial practice of targeting someone for suspicion of criminal behavior on the basis of race, national origin, or ethnicity. The table below includes data from randomly selected drivers stopped by police in a recent year (based on data from the U.S. Department of Justice, Bureau of Justice Statistics). a. Use a 0.05 significance level to test the claim that the proportion of blacks stopped by police is significantly greater than the proportion of whites. b. Construct a confidence interval that could be used to test the claim in part (a). Be sure to use the correct level of significance. What do you conclude based on the confidence interval? Drivers stopped by police Race and Ethnicity Total number of observed drivers Black and Non-Hispanic White and Non-Hispanic 24 147 200 1400 2. Self-Reported and Measured Heights of Male Statistics Students Eleven male statis- tics students were given a survey that included a question asking them to report their height in inches. They weren’t told that their height would be measured, but heights were accurately measured after the survey was completed. Anonymity was main- tained through the use of code numbers instead of names, and the results are shown below. Is there sufficient evidence to support a claim that male statistics students ex- aggerate their heights? Reported height 68 74 66.5 69 68 71 70 70 67 68 70 Measured height 66.8 73.9 66.1 67.2 67.9 69.4 69.9 68.6 67.9 67.6 68.8 3. Comparing Readability of J. K. Rowling and Leo Tolstoy Listed below are Flesch Reading Ease scores taken from randomly selected pages in J. K. Rowling’s Harry Potter and the Sorcerer’s Stone and Leo Tolstoy’s War and Peace. (Higher Flesch Reading Ease scores indicated writing that is easier to read.) Use a 0.05 significance level to test the claim that Harry Potter and the Sorcerer’s Stone is easier to read than War and Peace. Is the result as expected? Rowling: 85.3 84.3 79.5 82.5 80.2 84.6 79.2 70.9 78.6 86.2 74.0 83.7 Tolstoy: 69.4 64.2 71.4 71.6 68.5 51.9 72.2 74.4 52.8 58.4 65.4 73.6 4. Variation in J. K. Rowling and Leo Tolstoy Refer to the same data used in Exercise 3 and use a 0.05 significance level to test the claim that pages from Harry Potter and

508 Chapter 9 Inferences from Two Samples the Sorcerer’s Stone and War and Peace have Flesch Reading Ease scores with the same standard deviation. 5. Warmer Surgical Patients Recover Better? An article published in USA Today stated that “in a study of 200 colorectal surgery patients, 104 were kept warm with blankets and intravenous fluids; 96 were kept cool. The results show: Only 6 of those warmed developed wound infections vs. 18 who were kept cool.” a. Use a 0.05 significance level to test the claim of the article’s headline: “Warmer surgical patients recover better.” If these results are verified, should surgical pa- tients be routinely warmed? b. If a confidence interval is to be used for testing the claim in part (a), what confi- dence level should be used? c. Using the confidence level from part (b), construct a confidence interval estimate of the difference between the two population proportions. d. In general, if a confidence interval estimate of the difference between two popula- tion proportions is used to test some claim about the proportions, will the conclu- sion based on the confidence interval always be the same as the conclusion from a standard hypothesis test? 6. Brain Volume and Psychiatric Disorders A study used x-ray computed tomography (CT) to collect data on brain volumes for a group of patients with obsessive- compulsive disorders and a control group of healthy persons. Sample results (in mL) are given below for total brain volumes (based on data from “Neuroanatomical Ab- normalities in Obsessive-Compulsive Disorder Detected with Quantitative X-Ray Computed Tomography,” by Luxenberg et al., American Journal of Psychiatry, Vol. 145, No. 9). a. Construct a 95% confidence interval for the difference between the mean brain volume of obsessive-compulsive patients and the mean brain volume of healthy persons. Do not assume that the two populations have equal variances. b. Use a 0.05 significance level to test the claim that there is no difference between the mean for obsessive-compulsive patients and the mean for healthy persons. Do not assume that the two populations have equal variances. c. Based on the results from parts (a) and (b), does it appear that the total brain vol- ume can be used as an indicator of obsessive-compulsive disorders? Obsessive-compulsive patients: n ϭ 10, x ϭ 1390.03, s ϭ 156.84 Control group: n ϭ 10, x ϭ 1268.41, s ϭ 137.97 7. Variation of Brain Volumes Use the same sample data given in Exercise 6 with a 0.05 significance level to test the claim that the populations of total brain volumes for obsessive-compulsive patients and the control group have different amounts of variation. 8. Historical Data Set In 1908, “Student” (William Gosset) published the article “The Probable Error of a Mean” (Biometrika, Vol. 6, No. 1). He included the data listed be- low for two different types of straw seed (regular and kiln dried) that were used on ad- jacent plots of land. The listed values are the yields of straw in cwt per acre. a. Using a 0.05 significance level, test the claim that there is no difference between the yields from the two types of seed. b. Construct a 95% confidence interval estimate of the mean difference between the yields from the two types of seed. c. Does it appear that either type of seed is better? Regular 19.25 22.75 23 23 22.5 19.75 24.5 15.5 18 14.25 17 Kiln dried 25 24 24 28 22.5 19.5 22.25 16 17.25 15.75 17.25

Cumulative Review Exercises 509 Cumulative Review Exercises 1. Highway Speeds A section of Highway 405 in Los Angeles has a speed limit of 65 mi>h, and recorded speeds are listed below for randomly selected cars traveling on northbound and southbound lanes (based on data from Sigalert.com). a. Using the speeds for the northbound lanes, find the mean, median, standard devia- tion, variance, and range. b. Using all of the speeds combined, test the claim that the mean is greater than the posted speed limit of 65 mi>h. c. Do the northbound speeds appear to come from a normally distributed population? Explain. d. Assuming that the speeds are from normally distributed populations, test the claim that the mean speed on the northbound lanes is equal to the mean speed on the southbound lanes. Based on the result from part (c), does it appear that this hypoth- esis test is likely to be valid? Highway 405 North: 68 68 72 73 65 74 73 72 68 65 65 73 66 71 68 74 66 71 65 73 Highway 405 South: 59 75 70 56 66 75 68 75 62 72 60 73 61 75 58 74 60 73 58 75 2. Tossing Coins An illusionist claims that she has the ability to toss a coin so that it turns up heads. Listed below are results from a test of her abilities. a. Consider only the results from the tosses of a quarter. What is the probability of getting nine heads in nine tosses if the outcomes are determined only by chance? What does that result suggest about the claim that a coin can be tossed so that it turns up heads? Explain. b. Are the results from the quarter independent of the results from the penny, or are the sample data matched pairs? Explain. c. Using all of the results combined with a 0.01 significance level, test the claim that a coin can be tossed so that heads turns up more often than can be expected by chance. Quarter: H H H H H H H H H Penny: H H T H T H H T T 3. Cell Phones and Crashes: Analyzing Newspaper Report In an article from the Associ- ated Press, it was reported that researchers “randomly selected 100 New York mo- torists who had been in an accident and 100 who had not. Of those in accidents, 13.7 percent owned a cellular phone, while just 10.6 percent of the accident-free drivers had a phone in the car.” Analyze these results.

510 Chapter 9 Inferences from Two Samples Cooperative Group Activities waiting times of a sample of Burger King customers. Use a hypothesis test to determine whether there is a 1. Out-of-class activity Are estimates influenced by an- significant difference. choring numbers? Refer to the related Chapter 3 Cooper- ative Group Activity. In Chapter 3 we noted that, accord- 5. Out-of-class activity Construct a short survey of just a ing to author John Rubin, when people must estimate a few questions, including a question asking the subject to value, their estimate is often “anchored” to (or influ- report his or her height. After the subject has completed enced by) a preceding number. In that Chapter 3 activity, the survey, measure the subject’s height (without shoes) some subjects were asked to quickly estimate the value using an accurate measuring system. Record the gender, of 8 ϫ 7 ϫ 6 ϫ 5 ϫ 4 ϫ 3 ϫ 2 ϫ 1, and others were reported height, and measured height of each subject. asked to quickly estimate the value of 1 ϫ 2 ϫ 3 ϫ 4 ϫ (See Review Exercise 2.) Do male subjects appear to 5 ϫ 6 ϫ 7 ϫ 8. In Chapter 3, we could compare the two exaggerate their heights? Do female subjects appear to sets of results by using statistics (such as the mean) and exaggerate their heights? Do the errors for males appear graphs (such as boxplots). The methods of Chapter 9 to have the same mean as the errors for females? now allow us to compare the results with a formal hy- pothesis test. Specifically, collect your own sample data 6. In-class activity Without using any measuring device, and test the claim that when we begin with larger num- each student should draw a line believed to be 3 in. bers (as in 8 ϫ 7 ϫ 6), our estimates tend to be larger. long. Then use rulers to measure and record the lengths of the lines drawn. Test for a difference between the 2. In-class activity Divide into groups according to gen- mean length of lines drawn by males and the mean der, with about 10 or 12 students in each group. Each length of lines drawn by females. group member should record his or her pulse rate by counting the number of heartbeats in 1 minute, and the 7. In-class activity Use a ruler as a device for measuring group statistics (n, x, s) should be calculated. The reaction time. One person should suspend the ruler by groups should test the null hypothesis of no difference holding it at the top while the subject holds his or her between their mean pulse rate and the mean of the pulse thumb and forefinger at the bottom edge, ready to catch rates for the population from which subjects of the same the ruler when it is released. Record the distance that the gender were selected for Data Set 1 in Appendix B. ruler falls before it is caught. Convert that distance to the time (in seconds) that it took the subject to react and 3. Out-of-class activity Randomly select a sample of male catch the ruler. (If the distance is measured in inches, students and a sample of female students and ask each se- use t ϭ 2d>192. If the distance is measured in lected person whether they support a death penalty for centimeters, use t ϭ 2d>487.68.) Test each subject people convicted of murder. Use a formal hypothesis test once with the dominant hand and once with the other to determine whether there is a gender gap on this issue. hand, and record the paired data. Does there appear to Also, keep a record of the responses according to the gen- be a difference between the mean of the reaction times der of the person asking the question. Does the response using the dominant hand and the mean from the other appear to be influenced by the gender of the interviewer? hand? Do males and females appear to have different mean reaction times? 4. Out-of-class activity Use a watch to record the waiting times of a sample of McDonald’s customers and the

Technology Project 511 Technology Project STATDISK, Minitab, Excel, and the TI-83>84 Plus calcula- You can see from the way the data are generated that both tor are all capable of generating normally distributed data data sets really come from the same population, so there drawn from a population with a specified mean and stan- should be no difference between the two sample means. dard deviation. Generate two sets of sample data that repre- sent simulated IQ scores, as shown below. a. After generating the two data sets, use a 0.10 signifi- cance level to test the claim that the two samples come IQ Scores of Treatment Group: Generate 10 sample values from populations with the same mean. from a normally distributed population with mean 100 and standard deviation 15. b. If this experiment is repeated many times, what is the ex- pected percentage of trials leading to the conclusion that IQ Scores of Placebo Group: Generate 12 sample values the two population means are different? How does this from a normally distributed population with mean 100 and relate to a type I error? standard deviation 15. c. If your generated data should lead to the conclusion that STATDISK: Select Data, then select Normal Genera- the two population means are different, would this con- Minitab: tor. clusion be correct or incorrect in reality? How do you Excel: Select Calc, Random Data, Normal. know? Select Tools, Data Analysis, Random TI-83>84 Plus: Number Generator, and be sure to select d. If part (a) is repeated 20 times, what is the probability Normal for the distribution. that none of the hypothesis tests leads to rejection of the Press MATH, select PRB, then select null hypothesis? 6:randNorm( and proceed to enter the mean, the standard deviation, and the e. Repeat part (a) 20 times. How often was the null hypoth- number of scores (such as 100, 15, 10). esis of equal means rejected? Is this the result you ex- pected?

512 Chapter 9 Inferences from Two Samples From Data to Decision percentage of men and the percentage of afraid (18%), somewhat afraid (26%), not women who fear flying. Do the confidence very afraid (17%), not afraid at all (38%), Critical Thinking: interval limits contain 0, and what is the and no opinion (1%). Are these Gallup poll The fear of flying significance of whether they do or do not? results consistent with those obtained by the survey conducted by the Marist Insti- The lives of many people are affected by a 4. Construct a 95% confidence interval for tute for Public Opinion? Explain. Can dis- fear that prevents them from flying. Sports the percentage of men who fear flying. crepancies be explained by the fact that the announcer John Madden gained notoriety as Gallup survey was conducted after the he crossed the country by rail or motor 5. Based on the result from the confidence terrorist attacks of September 11, 2001, home, traveling from one football stadium interval obtained in Exercise 4, complete whereas the other survey was conducted to another. The Marist Institute for Public the following statement, which is typical before that date? Opinion conducted a poll of 1014 adults, of the statement that would be reported in 48% of whom were men. The results re- a newspaper or magazine: “Based on the 8. Construct a graph which would make the ported in USA Today showed that 12% of Marist Institute for Public Opinion poll, results understandable to typical news- the men and 33% of the women fear flying. the percentage of men who fear flying is paper readers. 12% with a margin of error of _____.” Analyzing the Results 1. How many men were surveyed? How 6. Examine the completed statement in Ex- ercise 5. What important piece of infor- many women were surveyed? How many mation should be included, but is not in- of the surveyed men fear flying? How cluded? many of the surveyed women fear flying? 7. In a separate Gallup poll, 1001 randomly 2. Is there sufficient evidence to conclude selected adults were asked this question: that there is a significant difference be- “If you had to fly on an airplane tomorrow, tween the percentage of men and the per- how would you describe your feelings centage of women who fear flying? about flying? Would you be—very afraid, somewhat afraid, not very afraid, or not 3. Construct a 95% confidence interval esti- afraid at all?” Here are the responses: very mate of the difference between the Internet Project Comparing Populations There you will find several hypothesis-testing problems involving multiple populations. In The previous chapter showed you methods for these problems, you will analyze salary fair- testing hypotheses about a single population. ness, population demographics, and a tradi- This chapter expanded on those ideas, allowing tional superstition. In each case you will formu- you to test hypotheses about the relationships late the problem as a hypothesis test, collect between two populations. In a similar fashion, relevant data, then conduct and summarize the the Internet Project for this chapter differs from appropriate test. that of the previous chapter in that you will need data for two populations or groups to con- duct investigations. Go to the Internet Project for this chapter at http://www.aw.com/triola.

Statistics @ Work 513 Statistics @ Work “It would be impossible How important is the use of and grown at colonial period sites in to conduct archaeo- statistics in archaeology? New Mexico and India. logical research without at least a It would be impossible to conduct ar- Our data have been used to make working knowledge of chaeological research without at least a inferences about the number and kind of basic statistics.” working knowledge of basic statistics. archaeological sites that existed in our study areas; to reconstruct ancient pat- Mark T. Lycett What concepts of statistics terns of vegetation, agriculture, and econ- do you use? omy; and to study the effects of colonial- Kathleen Morrison ism and imperialism on local social, Archaeologists make extensive use of economic, and religious practices. Mark T. Lycett and Kathleen both descriptive and inferential statistics Morrison are both on the faculty of on a daily basis. Exploratory data analysis Is your use of probability and the Department of Anthropology at using a variety of graphical and numeri- statistics increasing, decreasing, the University of Chicago. cal summaries is increasingly common or remaining stable? Dr. Lycett’s research deals with in modern archaeology. Archaeological issues of economic, social, and problems routinely include studies of as- Both the number and variety of statistical political transformation associated sociation for categorical variables, hy- applications in archaeology is increasing, with Spanish colonialism in the pothesis testing for both 2-sample and k- particularly as more sophisticated spatial southwestern United States, and sample data, correlation and regression databases become available through the Dr. Morrison’s research in southern problems, and a suite of nonparametric widespread use of Geographic Informa- India deals with problems of agri- approaches. tion Systems technology. cultural change, imperialism, and regional economic organization. Please give a specific example In terms of statistics, what would illustrating the use of statistics you recommend for prospective in your work. employees? We have explored the size distribution of When we were college students, we un- ancient grass pollen grains to investigate derstood that statistics would be a part changes in agriculture in both the New of our professional lives, but we never and Old Worlds during the first centuries imagined the degree to which we would of European colonial expansion. Although use it on a daily basis. Undergraduates almost all important crops are grasses interested in archaeology should begin with morphologically similar pollen, New with an introductory course in probabil- World staple crops (corn) have much ity and statistics. Those with professional larger pollen grains than wild grasses, or academic goals should consider more and Old World crops (principally wheat, advanced undergraduate or graduate barley, and rice) are intermediate in size. level course work in quantitative data By studying the size distribution of refer- analysis. ence samples of these staple crops as well as fossil grass pollen from archaeo- logical contexts, we have been able to specify the range of crops introduced

Correlation and Regression 10 10-1 Overview 10-2 Correlation 10-3 Regression 10-4 Variation and Prediction Intervals 10-5 Multiple Regression 10-6 Modeling

CHAPTER PROBLEM Can we predict the time of the next eruption of the Old Faithful geyser? The Old Faithful geyser is the most popular attraction in “interval after” times, we might begin by constructing Yellowstone National Park. It is located near the Old scatterplots such as those generated by Minitab and Faithful Inn, which is very possibly Yellowstone’s sec- shown in Figure 10-1. (Scatterplots were first intro- ond most popular attraction. Tourists enjoy the food, duced in Section 2-4.) By simply examining the pat- drink, lodging, and shopping facilities of the inn, but terns of the points in the three scatterplots, we can make they want to be sure to see at least one eruption of the these subjective conclusions: famous Old Faithful geyser. Park rangers help tourists by posting the predicted time to the next eruption. How 1. There appears to be a relationship between the time do they make those predictions? interval after an eruption and the duration of the eruption. [See Figure 10-1(a).] When Old Faithful erupts, these measurements are recorded: duration (in seconds) of the eruption, the time 2. There does not appear to be a relationship between interval (in minutes) between the preceding eruption the time interval after an eruption and the height of and the current eruption, the time interval (in minutes) the eruption. [See Figure 10-1(b).] between the current eruption and the following erup- tion, and the height (in feet) of the eruption. Table 10-1 3. There does not appear to be a relationship between includes measurements from eight eruptions. (The mea- the time interval after an eruption and the time in- surements in Table 10-1 are from eight of the 40 erup- terval before the eruption. [See Figure 10-1(c).] tions included in Data Set 11 from Appendix B. Table 10-1 includes a small sample so that calculations will be Such conclusions based on scatterplots are largely easier when the data are used in discussing the methods subjective, so this chapter presents tools for addressing of the following sections.) issues such as these: Once an eruption has occurred, we want to predict ● How can methods of statistics be used to objec- the time to the next eruption, which is the time “interval tively determine whether there is a relationship after” the eruption. To see which variables affect the between two variables, such as the time intervals after eruptions and the durations of eruptions? Table 10-1 Eruptions of the Old Faithful Geyser Duration 240 120 178 234 235 269 255 220 Interval Before 98 90 92 98 93 105 81 108 Interval After 92 65 72 94 83 94 101 87 Height 140 110 125 120 140 120 125 150

516 Chapter 10 Correlation and Regression ● If there is a relationship between two variables, ● If we can predict the time to the next Old Faithful how can it be described? Is there an equation eruption, how accurate is that prediction likely that can be used to predict the time to the next to be? geyser eruption, given the duration of the current eruption? Minitab Minitab (a) (b) Minitab (c) Figure 10-1 Scatterplots from Old Faithful Eruption Measurements


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook