338 9. Assessing Goodness of Fit for Logistic Regression Test The following questions and computer output consider a data from a cross-sectional study carried out at Grady Hospital in Atlanta, Georgia involving 289 adult patients seen in an emergency department whose blood cultures taken within 24 hours of admission were found to have Staph aureus infection (Rezende et al., 2002). Information was obtained on several variables, some of which were considered risk factors for methicillin-resitance (MRSA). The outcome variable is MRSA status (1 ¼ yes, 0 ¼ no), and covariates of interest included the following variables: PREVHOSP (1 ¼ previous hospitalization, 0 ¼ no previ- ous hospitalization), AGE (continuous), GENDER (1 ¼ male, 0 ¼ female), and PAMU (1 ¼ antimicrobial drug use in the previous 3 months, 0 ¼ no previous antimicro- bial drug use). The SAS output provided below was obtained for the fol- lowing logistic model: Logit PðXÞ ¼ a þ b1PREVHOSP þ b2AGE þ b3GENDER þ b4PAMU Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance 159.2017 181 0.8796 0.8769 Pearson 167.0810 181 0.9231 0.7630 Number of unique profiles: 186 Criterion Model Fit Statistics Intercept and À2 Log L Covariates Intercept Only 279.317 387.666 Analysis of Maximum Likelihood Estimates Wald Parameter DF Estimate Std Error Chi-Sq Pr > ChiSq Intercept 1 À5.0583 0.7643 43.8059 <.0001 PREVHOSP 1 1.4855 0.4032 13.5745 0.0002 AGE 1 0.0353 0.00920 14.7004 0.0001 gender 1 0.9329 0.3418 7.4513 0.0063 pamu 1 1.7819 0.3707 23.1113 <.0001
Test 339 Partition for the Hosmer and Lemeshow Test Group Total mrsa ¼ 1 mrsa ¼ 0 1 29 Observed Expected Observed Expected 2 31 3 29 1 0.99 28 28.01 4 29 5 1.95 26 29.05 5 30 2 2.85 27 26.15 6 31 5 5.73 24 23.27 7 29 10 9.98 20 20.02 8 29 12 14.93 19 16.07 9 29 16 17.23 13 11.77 10 23 20 19.42 22 21.57 9 9.58 21 19.36 7 7.43 2 3.64 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 7.7793 8 0.4553 Questions about the above output begin on the follow- ing page. 1. Is data listing used for the above analysis in events trials (ET) format or in subject-specific format? Explain briefly. 2. How many covariate patterns are there for the model being fitted? Why are there so many? 3. Is the model being fitted a fully parameterized model? Explain briefly. 4. Is the model being fitted a saturated model? Explain briefly. 5. a. Is the deviance value of 159.2017 shown in the above output calculated using the deviance formula Devðb^Þ ¼ À2 lnðL^c=L^maxÞ; where L^c ¼ ML for current model and L^max ¼ ML for saturated model? Explain briefly. b. The deviance value of 159.2017 is obtained by com- paring log likelihood values from two logistic mod- els, one of which is the (no-interaction) model being fitted. Describe the other logistic model, called, say, Model 2. (Hint: You should answer this question without explicitly stating the independent variables contained in Model 2.) c. How can the deviance value of 159.2017 be calcu- lated using the difference between two log likeli- hood values obtained from the two models described in part b? What are the values of these two log likelihood functions?
340 9. Assessing Goodness of Fit for Logistic Regression d. Why is the deviance value of 159.2017 not distributed approximately as a chi-square variable under the null hypothesis that the no-interaction model provides adequate fit? 6. a. What can you conclude from the Hosmer–Leme- show statistic provided in the above output about whether the model has lack of fit to the data? Explain briefly. b. What two models are actually being compared by the Hosmer–Lemeshow statistic of 7.7793? Explain briefly. c. How can you choose between the two models described in part b? d. Does either of the two models described in part c perfectly fit the data? Explain briefly. 7. Consider the information shown in the ouput under the heading “Partition for the Hosmer and Lemeshow Test.” a. Briefly describe how the 10 groups shown in the output under “Partition for the Hosmer and Leme- show Test” are formed. b. Why does not each of the 10 groups have the same total number of subjects? c. For group 5, describe how the expected number of cases (i.e., mrsa ¼ 1) and expected number of non- cases (i.e., mrsa ¼ 0) are computed. d. For group 5, compute the two values that are included as two of the terms in summation formula for the Hosmer–Lemeshow statistic. e. How many terms are involved in the summation formula for the Hosmer–Lemeshow statistic? Additional questions consider SAS output provided below for the following logistic model: Logit PðXÞ ¼ a þ b1PREVHOSP þ g1AGE þ g2GENDER þ g3PAMU þ d1PRHAGE þ d2PRHGEN þ d2PRHPAMU Deviance and Pearson Goodness-of-Fit Statistics Criterion Value DF Value/DF Pr > ChiSq Deviance 157.1050 178 0.8826 0.8683 Pearson 159.8340 178 0.8979 0.8320 Model Fit Statistics Criterion Intercept Only Intercept and Covariates À2 Log L 387.666 277.221
Test 341 Partition for the Hosmer and Lemeshow Test Group Total mrsa ¼ 1 mrsa ¼ 0 1 29 Observed Expected Observed Expected 2 30 3 29 1 1.50 28 27.50 4 29 2 2.44 28 27.56 5 29 4 3.01 25 25.99 6 29 5 4.76 24 24.24 7 31 10 7.87 19 21.13 8 32 11 12.96 18 16.04 9 31 17 18.27 14 12.73 10 20 22 21.93 10 10.07 24 23.85 18 17.40 7 7.15 2 2.60 Hosmer and Lemeshow Goodness-of-Fit Test Chi-Square DF Pr > ChiSq 2.3442 8 0.9686 8. Is the model being fitted a fully parameterized model? Explain briefly. 9. Is the model being fitted a saturated model? Explain briefly. 10. a. Is the deviance value of 157.1050 shown in the above output calculated using the deviance formula Devðb^Þ ¼ À2 lnðL^c=L^maxÞ; where L^c ¼ ML for current model and L^max ¼ ML for saturated model? Explain briefly. b. Why cannot you use this deviance statistic to test whether the interaction model provides adequate fit to the data? Explain briefly. 11. a. What can you conclude from the Hosmer– Lemeshow statistic provided in the above output about whether the interaction model has lack of fit to the data? Explain briefly. b. Based on the Hosmer–Lemeshow test results for both the no-interaction and interaction models, can you determine which of these two models is the better model? Explain briefly. c. How can you use the deviance values from the out- put for both the interaction and no-interaction models to carry out an LR test that compares these two models? In your answer, state the null hypothe- sis being tested, the formula for the LR statistic using deviances, carry out the computation of the LR test and draw a conclusion of which model is more appropriate.
342 9. Assessing Goodness of Fit for Logistic Regression Answers to 1. The data listing is in events trials (ET) format. There Practice are eight lines of data corresponding to the distinct Exercises covariate patterns defined by the model; each line con- tains the number of cases (i.e., events) and the number of subjects (i.e., trials) for each covariate pattern. 2. There are eight covariate patterns: Pattern 1: X ¼ (CAT ¼ 0, AGE ¼ 0, ECG ¼ 0) Pattern 2: X ¼ (CAT ¼ 0, AGE ¼ 1, ECG ¼ 0) Pattern 3: X ¼ (CAT ¼ 0, AGE ¼ 0, ECG ¼ 1) Pattern 4: X ¼ (CAT ¼ 0, AGE ¼ 1, ECG ¼ 1) Pattern 5: X ¼ (CAT ¼ 1, AGE ¼ 0, ECG ¼ 0) Pattern 6: X ¼ (CAT ¼ 1, AGE ¼ 1, ECG ¼ 0) Pattern 7: X ¼ (CAT ¼ 1, AGE ¼ 0, ECG ¼ 1) Pattern 8: X ¼ (CAT ¼ 1, AGE ¼ 1, ECG ¼ 1) 3. No. The model contains four parameters, whereas there are eight covariate patterns. 4. No. The model does not perfectly predict the case/ noncase status of each of the 609 subjects in the data. 5. a. No. The deviance value of 0.9544 is not calculated using the deviance formula Devðb^Þ ¼ À2 lnðL^c=L^maxÞ: In particular À 2 ln L^c ¼ 418:181 and À 2 ln L^max ¼ 0, so Devðb^Þ ¼ 418:181. b. Model 1 : Logit PðXÞ ¼ a þ bCAT þ g1AGE þ g2ECG Model 2 : Logit PðXÞ ¼ a þ bCAT þ g1AGE þ g2ECG þg3AGE  ECG þd1CAT  AGE þd2CAT  ECG þ d3CAT  AGE  ECG c. 0:9544 ¼ À2 ln L^Model 1 À ðÀ2 ln L^Model 2Þ, where À 2 ln L^Model 1 ¼ 418:1810 and À 2 ln L^Model 2 ¼ 418:1810 À 0:9544 ¼ 417:2266: d. H0: d1 ¼ d2 ¼ d3 ¼ 0, i.e., the deviance is used to test for whether the coefficients of all the product terms in Model 2 are collectively nonsignificant. e. G ¼ no. of covariate patterns ¼ 8 << n ¼ 609. 6. a. The HL test has a P-value of 0.9177, which is highly nonsignificant. Therefore, the HL test indicates that the model does not have lack of fit. b. The model contains only eight covariate patterns, so it is not possible to obtain more than eight distinct predicted risk values from the data. The degrees of freedom is 4 because it is calculated as the number of groups (i.e., 6) minus 2. c. Models 1 and Models 2 as stated in the answer to question 5b.
Answers to Practice Exercises 343 d. The deviance of 0.9544 is equivalent to the LR test that compares Model 1 with Model 2. Since this test statistic (df ¼ 3) is highly nonsignificant, we would choose Model 1 over Model 2. e. Neither of the two models of part c perfectly fit the data for each subject. However, since Model 2 is fully parameterized, it perfectly predicts the group proportions. 7. Yes, the interaction model is fully parameterized since the model contains eight parameters and there are eight distinct covariate patterns. 8. No, as with the no-interaction model, the interaction model does not perfectly predict the case/noncase sta- tus of each of the 609 subjects in the data. 9. a. No. The deviance value of 0.0000 is not calculated using the deviance formula Devðb^Þ ¼ À2 lnðL^c=L^maxÞ: In particular À2 ln L^c ¼ 417:226 and À2 ln L^max ¼ 0, so Devðb^Þ ¼ 417:226. b. 0:0000 ¼ À2 ln L^Model 2 À ðÀ2 ln L^Model 2Þ. The two log likelihood functions are identical since the deviance statistic is comparing the current model (i.e., Model 2) to the fully parameterized model (i.e., Model 2). c. What is actually being tested is whether or not Model 2 is a fully parameterized model. 10. a. The HL statistic of 0.0000 indicates that the inter- action model is a fully parameterized model and therefore perfectly predicts the group proportion for each covariate pattern. b. The same two models are being compared by the HL statistic of 0.0000, i.e., Model 2. c. No and Yes. The interaction model does not per- fectly predict each subject’s response but it does perfectly predict the group proportion for each covariate pattern.
10 Assessing Discriminatory Performance of a Binary Logistic Model: ROC Curves n Contents Introduction 346 Abbreviated Outline 346 Objectives 347 386 Presentation 348 Detailed Outline 373 Practice Exercises 377 Test 380 Answers to Practice Exercises D.G. Kleinbaum and M. Klein, Logistic Regression, Statistics for Biology and Health, 345 DOI 10.1007/978-1-4419-1742-3_10, # Springer ScienceþBusiness Media, LLC 2010
346 10. Assessing Discriminatory Performance of a Binary Logistic Model Introduction In this chapter, we describe and illustrate methods for assessing the extent that a fitted binary logistic model can Abbreviated be used to distinguish the observed cases (Y ¼ 1) from the Outline observed noncases (Y ¼ 0). One approach for assessing such discriminatory perfor- mance involves using the fitted model to predict which study subjects will be cases and which will not be cases and then determine the proportions of observed cases and noncases that are correctly predicted. These proportions are generally referred to as sensitivity and specificity parameters. Another approach involves plotting a receiver operating curve (ROC) for the fitted model and computing the area under the curve as a measure of discriminatory perfor- mance. The use of ROCs has become popular in recent years because of the availability of computer software to conveniently produce such a curve as well as compute the area under the curve. The outline below gives the user a preview of the material to be covered by the presentation. A detailed outline for review purposes follows the presentation. I. Overview (pages 348–350) II. Assessing discriminatory performance using sensitivity and specificity parameters (pages 350–354) III. Receiver operating characteristic (ROC) curves (pages 354–358) IV. Computing the area under the ROC: AUC (pages 358–365) V. Example from study on screening for knee fracture (pages 365–370) VI. Summary (page 371)
Objectives Objectives 347 Upon completing this chapter, the learner should be able to: 1. Given a fitted binary logistic model, describe or illustrate how a cut-point can be used to classify subjects as predicted cases (Y ¼ 1) and predicted noncases (Y ¼ 0). 2. Given a fitted binary logistic model, describe or illustrate how a cut-point can be used to form a misclassification (or diagnostic table). 3. Define and illustrate what is meant by true positives, false positives, true negatives, and false negatives. 4. Define and illustrate what is meant by sensitivity and specificity. 5. Define and illustrate “perfect discrimination.” 6. Describe what happens to sensitivity and specificity parameters when a cut-point used for discrimination of a fitted logistic model decreases from 1 to 0. 7. Describe what happens to (1 À specificity) when a cut- point used for discrimination decreases from 1 to 0. 8. State one or more uses of an ROC curve. 9. State and/or describe briefly how an ROC curve is constructed. 10. State and/or describe briefly how the area under an ROC curves is calculated. 11. Describe briefly how to interpret a calculated area under an ROC curve in terms of the discriminatory performance of a fitted logistic model. 12. Given a printout of a fitted binary logistic model, evaluate how well the model discriminates cases from noncases.
348 10. Assessing Discriminatory Performance of a Binary Logistic Model Presentation I. Overview This presentation describes how to assess dis- criminatory performance (DP) of a binary logis- Focus Assessing discriminatory tic model. performance (DP) of a binary We say that a model provides good DP if the logistic model covariates in the model help to predict (i.e., discriminate) which subjects will develop the Good DP: model discriminates outcome (Y ¼ 1, or the cases) and which will cases ðY ¼ 1Þ from noncases ðY ¼ 0Þ not develop the outcome (Y ¼ 0, or the non- cases). Example: Blunt knee trauma ) For example, we may wish to determine whether or not a subject with blunt knee X-ray? trauma should be sent for an X ray based on a Predictor variables: physical exam that measures ability to flex knee, ability to put weight on knee, injury to ability to flex knee knee head, injury to patella, and age. The out- ability to put weight come here is whether or not the person has a knee fracture. on knee patient’s age injury to knee head injury to patella Outcome variable: knee fracture status Approach 1 One way to measure DP involves using the Use fitted model to predict which fitted model to decide how to predict which subjects will be cases and which will be non- subjects will be cases or non- cases. For example, one may decide that if the cases e.g., predicted probability for subject X (i.e., P^ðXÞ) If P^ðXÞ > 0:2, predict subj X to be is greater than 0.2, we will predict that subject case, X will be a case, whereas otherwise, a noncase. if P^ðXÞ 0:2, predict subj X to be The value of 0.2 used here is called a cut-point. noncase, where cut-point ¼ 0.2 Note that for a very rare health outcome, a predicted probability of 0.2, or even 0.02, Note: Rare outcome ) 0:2, or could be considered a high “risk.” even 0.02, high Classification/Diagnostic Table The observed and predicted outcomes are com- bined into a classification or diagnostic table, an True (Observed) Outcome example of which is shown at the left, In this table, we focus on two quantities: the number Y¼1 Y¼0 of true cases (i.e., we are assuming that the observed cases are the true cases) that are pre- Predicted Y ¼ 1 nTP ¼ 70 20 dicted to be cases (true positives or TP), and the number of true noncases that are predicted to Outcome Y ¼ 0 30 nTN ¼ 80 be noncases (true negatives or TN). n1 ¼ 100 n0 ¼ 100 nTP ¼ # of true þ , nTN ¼ # of true À,
Presentation: I. Overview 349 Se ¼ nTP=n1 ¼ 70=100 ¼ 0:7 The proportion of true positives among all Sp ¼ nTN=n0 ¼ 80=100 ¼ 0:8 cases is called sensitivity (Se), and the propor- tion of true negatives among all noncases is Perfect (ideal) discrimination: called the specificity (Sp). Ideally, perfect dis- Se ¼ Sp ¼ 1 crimination would occur if both sensitivity and specificity are equal to 1. Example: cut-point ¼ 0.2: Thus, for a given cut-point, the closer both the Model 1: Se ¼ 0.7 and Sp ¼ 0.8 sensitivity and specificity are to 1, the better the discriminatory performance (see example at left). better DP than Model 2: Se ¼ 0.6 and Sp ¼ 0.5 A drawback to measuring discrimination as described above is that the sensitivity and spec- Drawback: Sensitivity and specific- ificity that results from a given cut-point may ity varies by cut-point vary with the cut-point chosen. An alternative approach involves obtaining a summary mea- ROC curve: sure based on a range of cut-points chosen for considers Se and Sp for a a given model. Such a measure is available range of cut-points. from an ROC curve. Example A ´ ROC stands for receiver operating characteris- 1.00 tic, which was originally developed in the con- Sensitivity text of electronic signal detection. When ´ applied to a logistic model, an ROC is a plot of sensitivity (Se) vs. 1 À specificity (1 2 Sp) ´ derived from several cut-points for the predicted x = cut-point value. ´ Note that 1 2 Sp gives the proportion of observed noncases that are (falsely) predicted ´ to be cases, i.e., 1 À Sp gives the proportion of 1 – specificity 1.00 false positives (FPs). Since we want both Se and Sp close to 1, we would like 1 2 Sp close 1 À Sp ¼ falsely predicted cases to zero, and moreover, we would expect Se to be observed noncases larger than 1 2 Sp, as in the above graph. ¼ nFP n0 Want: 1 À Sp close to 0 and Se > 1 À Sp Example A Example B ROC curves for two different models based on 1.00 1.00 ´ the same data are shown at the left. These ´ ´ graphs may be compared according to the fol- ´ lowing criterion: The larger the area under the Sensitivity ´ ´ curve, the better is the discrimination. In our Sensitivity ´ example, we see that the area in Example A is ´ ´ larger than the area in Example B, indicating that the model used in Example A discrimi- ´ ´ nates better than the model in Example B. 1.00 1.00 1 – specificity 1 – specificity Key: The larger the area under the curve, the better is the DP.
350 10. Assessing Discriminatory Performance of a Binary Logistic Model Why does area under ROC measure Why does the area under the ROC measure DP? See Section III. discriminatory performance (DP)? We discuss this question and other characteristics of ROC curves in Section III of this chapter. II. Assessing Discriminatory Performance Using Sensitivity and Specificity Parameters Cut-point can be used with P^ðXÞ to In the previous section, we illustrated how a predict whether subject is case or cut-point could be used with a fitted logistic noncase. model to assign a subject X based on the predicted value P^ðXÞ to be a “predicted” case or noncase. If P^ðXÞ > cp, predict subj X to be Denoting the general cut-point as cp, we typi- case. cally predict a subject to be a case if P^ðXÞ If P^ðXÞ cp, predict subj X to be exceeds cp vs. a noncase if P^ðXÞ doesn not noncase. exceed cp. Table 10.1 Given a cut-point cp, the observed and pre- General Classification/Diagnostic dicted outcomes can then be combined into a Table classification (diagnostic) table, the general True (Observed) Outcome form of which is shown here. The cell frequen- Predicted cp Y¼1 Y¼0 cies within this table give the number of true Outcome (case) (noncase) Y¼1 positives (nTP) and false negatives (nFN) out of Y¼0 nTP nFP the number of true cases (n1), and the number nFN nTN of false positives (nFP) and true negatives (nTN) n1 n0 out of the number of true noncases (n0). Se ¼ Prðtrue positive j true caseÞ From the classification table, we can compute ¼ nTP=n1 the sensitivity (Se) and the specificity (Sp). Sp ¼ Prðtrue negative j true noncaseÞ ¼ nTN=n0 Perfect Discrimination (Se ¼ Sp ¼ 1) Ideally, perfect discrimination would occur if True (Observed) Outcome both sensitivity and specificity are equal to 1, which would occur if there were no false nega- Predicted cp Y¼1 Y¼0 tives (nFN ¼ 0) and no false positives (nFP ¼ 0). Outcome Y¼1 nTP 0 Y¼0 0 n1 nTN n0
Presentation: II. Assessing Discriminatory Performance Using Sensitivity 351 Sp and Se values vary with cp In our overview, we pointed out that the sensitiv- ity and specificity values that result from a given cut-point may vary with the cut-point chosen. Two different cps: cp ¼ 1 and cp ¼ 0 As a simple illustration, suppose the following two extreme cut-points are used: cp ¼ 1 and cp ¼ 0. The corresponding classification tables for each of these cut-points are shown below at the left. cp 5 1: Se 5 0, Sp 5 1 If the cut point is cp ¼ 1, then assuming that P^ðXÞ ¼ 1 is not attained for any subject, there OBS Y will be no predicted cases among either the n1 true cases or the n0 true noncases. For this PRED Y¼1 Y¼1 Y¼0 situation, then, the sensitivity is 0 and the Y Y¼0 0 0 specificity is 1. nFN nTN On the other hand, if the cut-point is cp ¼ 0, n1 n0 then assuming that P^ðXÞ ¼ 0 is not attained for any subject, there will be no predicted noncases cp 5 0: Se 5 1, Sp 5 0 among either the n1 true cases or the n0 true noncases. For this situation, then, the sensitiv- OBS Y ity is 1 and the specificity is 0. PRED Y¼1 Y¼1 Y¼0 Y Y¼0 nTP nFP 0 0 n1 n0 Question: cp decreases from 1 to 0? Let us now consider what would happen if cp decreases from 1 to 0. As we will show by Answer: Se increases from 0 to 1 Sp decreases from 1 to 0 example, as cp decreases from 1 to 0, the sensi- tivity will increase from 0 to 1 whereas the specifity will decrease from 1 to 0.
352 10. Assessing Discriminatory Performance of a Binary Logistic Model EXAMPLE We illustrate on the left the classification tables and corresponding sensitivity and specificity Table 10.2 values obtained from varying the cut-points Classification Tables for Two for two hypothetical logistic regression models. Models by Varying Classification Based on this information, what can you con- Cut-Point (cp) clude for each model separately as to how the sensitivity changes as the cut-point cp decreases MODEL 1 MODEL 2 from 1.00 to 0.75 to 0.50 to 0.25 to 0.10 to 0.00? Similarly, what can you conclude for each cP = 1.00 model as to how the specificity changes as the cut-point decreases from 1.00 to 0.00? Se = 0.00, Sp = 1.00 Se = 0.00, Sp = 1.00 The answers to the above two questions are OBS Y OBS Y that for both models, as the cut-put cp decreases from 1.00 to 0.00, the sensitivity increases from Y=1 Y=0 Y=1 Y=0 0.00 to 1.00 and the specificity decreases from 1.00 to zero. Note that this result will always be PRED Y=1 0 0 Y=1 0 0 the case for any binary logistic model. Y Y = 0 100 100 Y = 0 100 100 Next question: For each model separately, as the cut-point decreases, does the sensitivity cP = 0.75 increase at a faster rate than the specificity decreases? Se = 0.10, Sp = 1.00 Se = 0.10, Sp = 0.90 The answer to the latter question depends on OBS Y OBS Y which model we consider. For Model 1, the answer is yes, since the sensitivity starts to Y=1 Y=0 Y=1 Y=0 change immediately as the cut-point changes, whereas the specificity remains at 1 until the PRED Y = 1 10 0 Y = 1 10 0 cut-point changes to 0.10. Y Y = 0 90 100 Y = 0 90 90 For Model 2, however, the answer is no, because the sensitivity increases at the same cP = 0.50 rate that the specificity decreases. In particu- lar, the sensitivity increases by 0.10 (from 0.00 Se = 0.60, Sp = 1.00 Se = 0.60, Sp = 0.40 to 0.10) while the sensitivity decreases by 0.10 (from 1.00 to 0.90), followed by correspond- OBS Y OBS Y ingly equal changes of 0.50, 0.20, 0.10 and 0.10 as the cut-point decreases to 0. Y=1 Y=0 Y=1 Y=0 So, even though the sensitivity increases and PRED Y = 1 60 0 Y = 1 60 60 the specificity decreases as the cut-point Y decreases, the specificity may change at a differ- Y = 0 40 100 Y = 0 40 40 ent rate than the sensitivity depending on the model being considered. cP = 0.25 Se = 1.00, Sp = 1.00 Se = 0.80, Sp = 0.20 OBS Y OBS Y Y=1 Y=0 Y=1 Y=0 PRED Y = 1 100 0 Y = 1 80 80 Y Y=0 0 100 Y = 0 20 20 cP = 0.10 Se = 1.00, Sp = 0.40 Se = 0.90, Sp = 0.10 OBS Y OBS Y Y=1 Y=0 Y=1 Y=0 PRED Y = 1 100 60 Y = 1 90 90 Y Y=0 0 40 Y = 0 10 10 cP = 0.00 Se = 1.00, Sp = 0.00 Se = 1.00, Sp = 0.00 OBS Y OBS Y Y=1 Y=0 Y=1 Y=0 PRED Y = 1 100 100 Y = 1 100 100 Y Y=0 0 0 Y=0 0 0 Sp may change at a different rate than Se
Presentation: II. Assessing Discriminatory Performance Using Sensitivity 353 EXAMPLE (continued) An alternative way to evaluate the discrimina- tion performance exhibited in a classification Table 10.3 table is to consider “1 À specificity” (1 – Sp) Summary of Classification instead of “specificity” in addition to the Information For Models 1 and 2 sensitivity. (incl. 1 À Specificity) The tables at the left summarize the results of the previous misclassification tables, and they MODEL 1: include 1 – Sp values as additional summary information. cp 1.00 0.75 0.50 0.25 0.10 0.00 Se 0.00 0.10 0.60 1.00 1.00 1.00 Sp 1.00 1.00 1.00 1.00 0.40 0.00 1 – Sp 0.00 0.00 0.00 0.00 0.60 1.00 MODEL 2: cp 1.00 0.75 0.50 0.25 0.10 0.00 Se 0.00 0.10 0.60 0.80 0.90 1.00 Sp 1.00 0.90 0.40 0.20 0.10 0.00 1 – Sp 0.00 0.10 0.60 0.80 0.90 1.00 Model 1: Se increases at faster rate For Model 1, when we compare Se to 1 – Sp than 1 À Sp values as the cut-point decreases, we see that the Se values increase at a faster rate than the values of 1 À Sp. Model 2: Se and 1 À Sp increase at For Model 2, however, we find that both Se and same rate 1 À Sp values increase at the exact same rate. 1 À Sp more appealing than Sp Using 1 À Sp instead of Sp is descriptively because appealing for the following reason: both Se and 1 À Sp focus, respectively, on the proba- Se and 1 À Sp both focus on bility of being either correctly or falsely pre- predicted cases dicted to be a case. Se ¼ Prop: True Positives ðTPÞ Among the observed (i.e., true) cases, Se con- ¼ nTP=n1 siders the proportion of subjects who are “true where nTP ¼ correctly predicted positives” (TP), that is, correctly predicted as cases cases. Among the observed (i.e., true) noncases, 1 À Sp considers the proportion of subjects 12 Sp ¼ Prop: False Positive ðFPÞ who are “false positives” (FP), that is, are falsely ¼ nFP=n0 predicted as cases. where nFP ¼ falsely predicted cases One would expect for a model that has good discrimination that the proportion of true Good discrimination cases that are (correctly) predicted as cases (i.e., Se) would be higher than the proportion +ðexpectÞ of true noncases that are (falsely) diagnosed as cases (i.e., 1 À Sp). Thus, to evaluate discrimi- Se ¼ nTP=n1 > 1 À Sp ¼ nFP=n0 nation performance, it makes sense to com- pare Se (i.e., involving correctly diagnosed Correctly predicted falsely predicted cases) with 1 À Sp (i.e., involving falsely pre- cases noncases dicted noncases).
354 10. Assessing Discriminatory Performance of a Binary Logistic Model Randomly Returning to Table 10.3, suppose we pick a case select and a noncase at random from the subjects ana- lyzed in each model. Is the case or the noncase Study Subjects Case Control more likely to have a higher predicted probabil- ity? P(Xcase) > P(Xnoncase) ? Using Table 10.3, we can address this question by “collectively” comparing for each model, the EXAMPLE proportion of true positives (Se) with the Table 10.3: Collectively compare corresponding proportion of false positives Se with 1 À Sp (1 À Sp) over all cut-points considered. over all cut-points For Model 1, we find that at each cut-point, the MODEL 1: proportion of true positives is larger than the cp 1.00 0.75 0.50 0.25 0.10 0.00 proportion of false positives at each cut-point Se 0.00 0.10 0.60 1.00 1.00 1.00 except when cp ¼ 1.00 or 0.00, at which both >? No Yes Yes Yes Yes No proportions are equal. These results suggest 1 – Sp 0.00 0.00 0.00 0.00 0.60 1.00 that Model 1 provides good discrimination Good discrimination: Se > 1 – Sp overall since, overall, Se values are greater than 1 À Sp values. MODEL 2: cp 1.00 0.75 0.50 0.25 0.10 0.00 For Model 2, however, we find that at each cut- Se 0.00 0.10 0.60 0.80 0.90 1.00 point, the proportion of true positives is identi- >? No No No No No No cal to the proportion of false positives at each 1 – Sp 0.00 0.10 0.60 0.80 0.90 1.00 cut-point. These results suggest that Model Poor discrimination: Se never > 1 – Sp 2 does not provide good discrimination, since Se is never greater (although also never less) (Here: Se ¼ 1 À Sp always) than 1 À Sp. Problem with using above info: Nevertheless, the use of information from Se and 1 À Sp values are summary Table 10.3 is not the best way to compare pre- statistics for several subjects based dicted probabilities obtained from randomly on a specific cut-point selecting a case and noncase from the data. The reason: sensitivity and 1 À specificity Better approach: values are summary statistics for several sub- Compute and compare predicted jects based on a specific cut-point; what is probabilities for specific pairs needed instead is to compute and compare pre- of subjects dicted probabilities for specific pairs of subjects. The use of ROC curves, which we describe in + the next section, provides an appropriate Obtained via ROC curves way to quantify and compare such predicted probabilities. ðnext sectionÞ
Presentation: III. Receiver Operating Characteristic (ROC) Curves 355 III. Receiver Operating A Receiver Operating Curve (ROC) is a plot Characteristic (ROC) of sensitivity (Se) by 1 – specificity (1 – Sp) Curves values derived from several classification tables corresponding to different cut-points ROC Example used to classify subjects into one of two- 1.0 ´ groups, e.g., predicted cases and noncases of a disease. ´ Equivalently, the ROC is a plot of the true posi- ´ Denotes cut-point tive rate (TPR ¼ Se) by the false positive rate for classification (FPR ¼ 1 À Sp). Se (= TPR) ´ As described in Wikipedia (a free Web-based encyclopedia), “the ROC was first developed by ´ electrical engineers and radar engineers during World War II for detecting enemy objects in 1.0 battle fields, also known as the signal detection theory; in this situation, a signal represents the 1 – Sp (= FPR) predicted probability that a given object is an enemy weapon.” ROC analysis is now widely ROC history: used in medicine, radiology, psychology and, more recently in the areas of machine learning Developed by engineers in WW and data mining. II to detect enemy objects (signal detection), i.e., P^ðXÞ is a radar signal Now used in medicine, radiology, psychology, machine learning, data mining When using an ROC derived from a logistic model used to predict a binary outcome, the ROC allows for an overall assessment of how well the model predicts who will have the out- come and who will not have the outcome. Stated another way in the context of epidemio- logic research, the ROC provides a measure of how well the fitted model distinguishes true cases (i.e., those observed to have the outcome) from true noncases (i.e., those observed not to have the outcome). ROC provides answer to: More specifically, an ROC provides an appro- priate answer to the question we previously If Xtrue case and Xtrue noncase are asked when we compared classification tables covariate values for a randomly for two models: How often will a randomly chosen case/noncase pair, chosen (true) case have a higher probability of being predicted to be a case than a randomly will chosen true noncase? P^ðXtrue caseÞ > P^ðXtrue noncaseÞ?
356 10. Assessing Discriminatory Performance of a Binary Logistic Model 1.0 ´ Moreover, we will see that the answer to this question can be quantified by obtaining the ´ area under an ROC curve (AUC): the larger the area, the better the discrimination. Se (= TPR) ´ ´ Area under ROC (AUC) ´ 1.0 1 – Sp (= FPR) EXAMPLE Cut-pt for prefect First, we provide the two ROCs derived from hypothetical Models 1 and 2 that we consid- prediction: Se = Sp = 1 ered in the previous section. Notice that the ROC for each model is determined by connect- Model 1 ing the dots that plot pairs of Se and 1 À Sp 1.0 values obtained for several classification cut-points. 0.6 AUC = 1.0 Se (= TPR) 0.1 For Model 1, the area under the ROC is 1.0. 0.0 0.6 1.0 1 – Sp (= FPR) 1.0 Model 2 In contrast, for Model 2, the area under the 0.9 ROC is 0.5. 0.8 AUC = 0.5 Since the area under the ROC for Model 1 is Se (= TPR) 0.6 0.6 0.9 1.0 twice that for Model 2, we would conclude that Model 1 has better discriminatory performance 0.1 1 – Sp (= FPR) than Model 2. 0.0 0.1 So why is Model 1 a better dis- How can we explain this conceptually? criminator than Model 2? Our explanation: The AUC measures discrimination, that is, the Good discrimination ability of the model to correctly classify those , with and without the disease. We would expect a model that provides good discrimination to TPR > FPR have the property that true cases have a higher where predicted probability (of being classified as a case) than true noncases. In other words, we Se ¼ TPR, 1 À Sp ¼ FPR would expect the true positive rate (TPR ¼ Se) to be higher than the false positive rate (FPR ¼ 1 À Sp) for all cut-points.
Presentation: III. Receiver Operating Characteristic (ROC) Curves 357 Model 1: TPR ! FPR always Observing the above ROCs, we see that, for + Model 1, TPR (i.e., Se) is consistently higher than its corresponding FPR (i.e., 1 À Sp); so, Excellent discrimination this indicates that Model 1 does well in differ- entiating the true cases from the true noncases. Model 2: TPR ¼ FPR always In contrast, for Model 2 corresponding true + positive and false positive rates are always equal, which indicates that Model 2 fails to No discrimination differentiate true cases from true noncases. Two extremes: The two ROCs we have shown actually repre- sent two extremes of what typically results for Model 1: perfect discrimination such plots. Model 1 gives perfect discrimina- Model 2: no discrimination tion whereas Model 2 gives no discrimination. ROC Types Extremes We show in the figure at the left several differ- 1 ent types of ROCs that may occur. Typically, as shown by the two dashed curves, the ROC plot Se (=TPR) will lie above the central diagonal (45) line that corresponds to Se ¼ 1 À Sp; for such curves, the AUC is at least 0.5. 0 1 – Sp (= FPR) 1 It is also possible that the ROC may lie completely below the diagonal line, as shown Legend: by the dotted curve near the bottom of the figure, in which case the AUC is less than 0.5. perfect discrimination This situation indicates negative discrimina- (Area = 1.0) tion, i.e., the model predicts true noncases bet- positive discrimination ter (i.e., higher predicted probability) than it (0.5 < Area ≤ 1.0) predicts true cases. negative discrimination (0.0 ≤ Area < 0.5) An AUC of exactly 0.5 indicates that the model no discrimination provides no discrimination, i.e., predicting the (Area = 0.5) case/noncase status of a randomly selected subject is equivalent to flipping a fair coin. Grading Guidelines for AUC A rough guide for grading the discriminatory values: performance indicated by the AUC follows the traditional academic point system, as shown 0.90–1.0 ¼ excellent on the left. discrimination (A) 0.80–0.90 ¼ good discrimination (B) 0.70–0.80 ¼ fair discrimination (C) 0.60–0.70 ¼ poor discrimination (D) 0.50–0.60 ¼ failed discrimination (F)
358 10. Assessing Discriminatory Performance of a Binary Logistic Model However: Note, however, that it is typically unusual to obtain an AUC as high as 0.90, and if so, almost Unusual to find AUC ! 0.9 all exposed subjects are cases and almost all If so, there is nearly complete unexposed subjects are noncases (i.e., there is nearly complete separation of data points). separation of data points When there is such “complete separation,” it is impossible as well as unnecessary to fit a + logistic model to the data. E Not E In this section, we return to the previously D n1 0 asked question: Not D 0 n0 OdR undefined Suppose we pick a case and a noncase at random n1 n0 from the subjects analyzed using a logistic regres- sion model. Is the case or the noncase more likely IV. Computing the Area to have a higher predicted probability? Under the ROC (AUC) Randomly select Study Subjects Case Control P(Xcase) > P(Xnoncase) ? pd ¼ no: of pairs in which P^ðXcaseÞ ! P^ðXnoncaseÞ To answer this question precisely, we must use Total # case-control pairs the fitted model to compute the proportion of total case/noncase pairs for which the pre- pd > 0:5 ) P^ðXcaseÞ > P^ðXnoncaseÞ dicted value for cases is at least as large as the for randomly chosen predicted value for noncases. case-control pair If this proportion is larger than 0.5, then the ðexpect this result if model answer is that the randomly chosen case will discriminates cases from noncasesÞ likely have a higher predicted probability than the randomly chosen noncase. Note that this is More important: what we would expect to occur if the model pd ¼ AUC provides at least minimal predictive power to discriminate cases from noncases. EXAMPLE Example of AUC calculation: Moreover, the actual value of this proportion, tells us much more, namely this proportion gives the “Area under the ROC” (i.e., AUC), which, as discussed in the previous section, provides an overall measure of the model’s ability to discriminate cases from noncases. n ¼ 300 subjects To illustrate the calculation of this proportion, n1 ¼ 100 true cases suppose there are 300 (i.e., n) subjects in the n0 ¼ 200 true noncases entire study, of which 100 (i.e., n1) are true cases and 200 (i.e., n0) are true noncases. EXAMPLE
Presentation: IV. Computing the Area Under the ROC (AUC) 359 Example: (Continued) We then fit a logistic model P(X) to this data Fit logistic model PðXÞ set, and we compute the predicted probability and of being a case, i.e., P^ðXiÞ, for each of the 300 subjects. compute P^ðXiÞ for i ¼ 1, . . . , 300 For this dataset, the total number of possible np ¼ n1 Â n0 ¼ 100 Â 200 ¼ 20,000 case/noncase pairs (i.e., np) is the product 100 Â 200, or 20,000. w = no. of case/noncase pairs for which We now let w denote the number of these pairs Pˆ (Xcase) > Pˆ (Xnoncase) for which P^ðXÞ for the case is larger than P^ðXÞ for the corresponding control. Suppose, for EXAMPLE example, that w ¼ 11,480, which means that Example: Suppose w ¼ 11,480 in 57.4% of the 20,000 pairs, the case had a higher predicted probability than its noncase (i.e., 57.4% of 20,000) pair. Z = no. of case/noncase pairs for which Now let z denote the number of case/noncase Pˆ (Xcase) = Pˆ (Xnoncase) pairs in which both case and noncase had exactly the same predicted probability. Con- EXAMPLE tinuing our example, we suppose z ¼ 5,420, so that this result occurred for only 27.1% of the Example: Suppose z ¼ 5,420. 20,000 pairs. (i.e., 27.1% of 20,000) Then, for our example, the proportion of the pd ¼ wþ z ¼ 11,480 þ 5,420 ¼ 0:8450 20,000 case-control pairs for which the case np 20,000 has at least as large a predicted probability as the control is (w þ z)/np, which is 16,900/ 20,000, or 0.8450. Modified formula: A modification of this formula (called “c”) involves weighting by 0.5 any pair with equal c= 11,480+ 0.5(5,420) = 14,190 = 0.7095 predicted probabilities; that is, the numerator is 20,000 modified to “w þ 0.5z”, so that c becomes 0.7095. 20,000 It is the latter modified formula that is equiva- c = w+ 0.5z = AUC lent to the area under the ROC, i.e., AUC. np Interpretation from guidelines: Based on the grading guidelines for AUC that AUC ¼ 0:7095 ) Fair discrimination we provided in the previous section, the AUC of 0.7095 computed for this hypothetical example ðgrade CÞ would be considered to provide fair discrimi- nation (i.e., grade C). How does AUC formula provide In our presentation of the above AUC formula, geometrical area under curve? we have not explicitly demonstrated why this formula actually works to provide the area Ilustrative Example below. under the ROC curve.
360 10. Assessing Discriminatory Performance of a Binary Logistic Model We now illustrate how this numerical formula ROC curve translates into the geometrical area under the 100% curve. Se Tp = trapezoidal T sub-area The method we illustrate is often referred to as defined by 2 T P cut-pts the trapezoid method; this is because the area P T directly under the curve requires the computa- TP 0% P 100% 1 – Sp tion and summation of several trapezoidal sub-areas, as shown in the sketch at the left. EXAMPLE P^ðXÞ ¼ 1 þ exp½Àða^ 1 þ b^2X2Þ As in our previous example, we consider 100 þ b^1X1 cases and 200 noncases and the fitted logistic regression model shown at the left involving b^1 > 0, b^2 > 0 two binary predictors, in which both b^1 and b^2 are positive. Classification information for different cut points (cp) A classification table for these data shows four covariate patterns of the predictors that define X1 X2 P(X) C NC C+ NC+ Se% 1 – Sp% exactly four cut points for classifying a subject as positive or negative in the construction of an Covariate -- c0=1 0 0000 0 ROC curve. A fifth cut point is included patterns (c0 ¼ 1) for which nobody tests positive since 11 c1 10 2 10 2 10 1 P^ðXÞ 1 always. The ROC curve will be deter- 10 c2 50 48 60 50 60 25 mined from a plot of these five points. 01 c3 20 50 80 100 80 50 00 c4 20 100 100 200 100 100 c =1 ⇒ 0 cases (C) and 0 0 non-cases (NC) test + c1 ¼ P^ðXc1 Þ > c2 ¼ P^ðXc2 Þ > Á Á Á > c4 ¼ P^ðXc4 Þ Note that the cut points are listed in decreasing cp # ) Se and 1 À Sp \" order. Se 1 – Sp Also, as the cut point lowers, both the sensitiv- ity and 1 À specificity will increase. c1 10% cases test + 1% noncases test + More specifically, at cutpoint c1, 10 of 100 c2 60% cases test + 25% noncases test + cases (10%) test positive and 2 out of 200 (1%) noncases test positive. At cutpoint c2, 60% of c3 80% cases test + 50% noncases test + the cases and 25% of the noncases test positive. At cutpoint c3, 80% of the cases and 50% of the c4 100% cases test + 100% noncases test + noncases test positive. At cutpoint c4, all 100 cases and 200 noncases test positive because P^ðXÞ is equal to the cut point even for subjects without any risk factor (X1 ¼ 0 and X2 ¼ 0).
Presentation: IV. Computing the Area Under the ROC (AUC) 361 EXAMPLE (continued) ROC curve (AUC = 0.7095) The resulting ROC curve is shown at the left. The AUC for this curve is 0.7095. We now 100% show the calculation of this area using the AUC formula given earlier and using the trapezoid 80% approach. Se 60% 10% 25% 50% 1 – Sp 100% 0% 1% np ¼ 100 cases  200 noncases We first apply the above AUC formula. In our ¼ 20,000 case=noncase pairs example, there are 100 cases and 200 noncases yielding 100  200 ¼ 20,000 total pairs. When X1 ¼ 1 and X2 ¼ 1: The ten cases with X1 ¼ 1 and X2 ¼ 1 have the 8 same predicted probability (tied) as the two <>>>>>>>>> 10 cases and 2 noncases have same P^ðXÞ, noncases who also have X1 ¼ 1 and X2 ¼ 1. i:e:, 10  2 ¼ 20 tied pairs But those same ten cases have a higher pre- :>>>>>>>>> 10 cases have higher P^ðXÞ than pairs dicted probability (concordant) than the other 48 þ 50 þ 100 ¼ 198 noncases 48 þ 50 þ 100 noncases. i:e:, 10  198 ¼ 1,980 concordant When X1 ¼ 1 and X2 ¼ 0: Similarly, the 50 cases with X1 ¼ 1 and X2 ¼ 0 are 8 50 cases have lower P^ðXÞ than 2 noncases ><>>>>>>>>>>>>>>>> i:e:, 50  2 ¼ 100 discordant pairs discordant with the 2 noncases that have a higher predicted probability, 50 cases and 48 noncases have same P^ðXÞ, tied with 48 noncases, and >>>>>>>:>>>>>>>>>> i:e:, 50  48 ¼ 2,400 tied pairs concordant with 50 þ 100 ¼ 150 noncases. 50 cases have higher P^ðXÞ than 50 þ 100 ¼ 150 noncases pairs i:e:, 50  150 ¼ 7,500 concordant >>>>>>>>>>>>>>>>><8W2h0ei:cnea2:,sXþe2s104hÂ8¼a¼5v00e5¼0laonnw1od,e0nr0XcP0^a2ðsdXe¼isÞsct1ho:arndant pairs The 20 cases with X1 ¼ 0 and X2 ¼ 1 are and 50 noncases have same P^ðXÞ, discordant with 2 þ 48 ¼ 50 noncases, >:>>>>>>>>>>>>>>>> 20 cases  50 ¼ 1,000 tied pairs tied with 50 noncases, and i:e:, 20 have higher P^ðXÞ than 100 noncases concordant with 100 noncases. 20 cases  100 ¼ 2,000 concordant pairs i:e:, 20
362 10. Assessing Discriminatory Performance of a Binary Logistic Model EXAMPLE (continued) W>>>>>>>>8>< 2h0ein:cea2:X,sþe21s04¼hÂ8aþ01v0e5a00lno¼¼dw2e1X,r00200P^¼0nðXod0nÞi:cstahcsoaenrsdant pairs Finally, the 20 cases that did not have either risk factor X1 ¼ 0 and X2 ¼ 0 are :>>>>>>>>> 20 cases and 100 noncases have same P^ðXÞ, i:e:, 20 Â 100 ¼ 2,000 tied pairs discordant with 2 þ 48 þ 50 ¼ 100 noncases and tied with 100 noncases. total no. of concordant pairs: We now sum up all the above concordant and w ¼ 1,980 þ 7,500 þ 2,000 ¼ 11,480 tied pairs, respectively, to obtain total no. of tied pairs: w ¼ 11,480 total concordant pairs and z ¼ 20 þ 2,400 þ 1,000 þ 2,000 ¼ 5,420 z ¼ 5,420 total tied pairs. AUC ¼ w þ 0:5z np We then use w, z, and np to calculate the area under the ROC curve using the AUC formula, ¼ 11,480 þ 0:5ð5,420Þ ¼ 14,190 as shown on the left. 20,000 20,000 ¼ 0:7095 Geometrical Approach for To describe how to obtain this result geometri- Calculating AUC cally, we first point out that with 100 cases and 200 noncases, the total number of case/ 100 Discordant + ½ ties noncase pairs (i.e., 100 Â 200) can be geomet- rically represented by the rectangular area with 80 height 100 and width 200 shown at the left. Cases 60 Concordant + ½ ties 10 50 100 200 02 Noncases ROC curve: scaled-up A scaled-up version of the ROC curve is super- (from 100% Â 100% axes to 100 Â imposed within this area. Also, the values listed 200 axes) on the Y-axis (i.e., for cases) correspond to the Y-axis: no. of cases testing þ number of cases testing positive at the cut- points used to plot the ROC curve. Similarly, X-axis: no. of noncases testing þ the values listed on the X-axis (i.e., for non- cases) correspond to the number of noncases testing positive at these same cut-points.
Presentation: IV. Computing the Area Under the ROC (AUC) 363 EXAMPLE (continued) Within the above rectangle, the concordant pairs are represented by the area under the Concordant pairs: within area under ROC curve while the discordant pairs are repre- ROC sented by the area over the ROC curve. The tied pairs and are split equally over and under the Discordant pairs: within area over ROC curve (using the trapezoid rule). ROC Tied pairs, split equally over and under ROC 100 T To compute the actual area within the rectan- 80 TC gle under the ROC curve, we can partition this Cases 60 area using sub-areas of rectangles and trian- gles as shown at the left. The areas denoted by 10 T T C C C represent concordant pairs. The triangular 2 C areas denoted by T represent ½ of tied pairs. 0 C C 50 100 200 Noncases 100 Using the grids provided for the Y- and X-axes, the actual areas can be calculated as shown at Cases 80 500 1000 the left. Note that an area labeled as T is calcu- 60 2000 lated as ½ the corresponding rectangular area above and below the hypotenuse of a triangle 2500 5000 that connects two consecutive cut points. 1200 10 10 1000 02 480 500 50 100 200 Noncases Sum of subareas under ROC The sum of all the subareas under the curve is ¼ 10 þ 480 þ 500 þ 1,000 þ 1,200 14,190, whereas the total area in the rectangle þ Á Á Á þ 1,000 of width 200 and height 100 is 200Â100, or ¼ 14,190 20,000 (np). Therefore, the proportion of the total area taken up by the area under the ROC Proportion of total rectangular area curve is 14,190 divided by 20,000 or 0.7095, under ROC which is the value calculated using the AUC formula. ¼ 14,190=20,000 ¼ 0:7095 ð¼ AUCÞ
364 10. Assessing Discriminatory Performance of a Binary Logistic Model EXAMPLE (continued) An alternative way to obtain the sub-area values without having to geometrically calculate the Alternative calculation of sub-area: each subarea can be obtained by rewriting the Rewrite product formula for the total case/noncase pairs as shown at the left. np ¼ 100  200 Each term in the sum on the left side of this as product gives the number of cases with the same predicted risk (i.e., P^ðXÞ) at one of the cut- (10 + 50 + 20 + 20) (2 + 48 + 50 + 100) points used to form the ROC. Similarly each term in the sum on the right side gives the number of Classification information for different cutpoints (cp) noncases with the same P^ðXÞ at each cut-point. X1 X2 P(X) Cases Noncases -- c0=1 0 0 11 c1 10 2 10 c2 50 48 01 c3 20 50 00 c4 20 100 ð10 þ 50 þ 20 þ 20Þ Â ð2 þ 48 þ 50 þ 100Þ We then multiply the two partitioned terms in ¼ 20t þ 480c þ 500c þ 1,000c the product formula to obtain 16 different terms, as shown at the left. Those terms identi- þ 100d þ 2,400t þ 2,500c þ 5,000c fied with the subscript “t” denote tied pairs, þ 40d þ 960d þ 1,000t þ 2,000c those terms with the subscript “c” denote con- þ 40d þ 960d þ 1,000d þ 2,000t cordant pairs, and those terms with the sub- script “d” denote discordant pairs. Same values as in geometrical diagram The six values with the subscript “c” are exactly the same as the six concordant areas shown 480c + 500c + 1,000c + 2,500c + 5,000c + 2,000c in the geometrical diagram given earlier. The = 11,480 concordant pairs (= w) sum of these six values, therefore, gives the total area under the ROC curve for concordant pairs (i.e., w). Twice each triangular area The four values with the subscript “t” are exactly twice the four triangular areas under 20t + 2,400t + 1,000t + 2,000t the ROC curve. Their sum therefore gives twice = 5,420 ties (= z) the total tied pairs (i.e., z) under the ROC curve. 100d þ40d þ960d þ40d þ960d þ1,000d The remaining six terms identify portions of ¼ 3,100 discordant pairs the area above the ROC curve corresponding to discordant pairs. These are not used to com- Rescaled Area pute AUC. 100% Note that we can rescale the height and width 80% 500 of the rectangle to 100%  100%, which will 1000 portray the dimensions of the rectangular area 60% 250 in (Se  1 À Sp) percent mode. To do this, the 2500 value in each subarea under the curve needs to Se be halved, as shown at the left. 1250 500 100% 10% 600 5 240 250 0% 1% 25% 50% 1 – Sp
Presentation: V. Example from Study on Screening for Knee Fracture 365 EXAMPLE (continued) The combined area under the rescaled ROC curve is then 7,095, which represents a propor- Combined area under rescaled ROC tion of 0.7095 of the total rectangular area of ¼ 5 þ 240 þ 250 þ 500 þ 600 þ Á Á Á 10,000. þ 500 ¼ 7,095 Proportion of total area under rescaled ROC ¼ 7,095=10,000 ¼ 0:7095 ð¼ AUCÞ V. Example from Study A logistic regression model was used in the on Screening for Knee analysis of a dataset containing information Fracture from 348 patients who entered an emergency room (ER) complaining of blunt knee trauma, EXAMPLE and who subsequently were X-rayed for possi- ble knee fracture (Tigges et al., 1999). Logistic model n ¼ 348 ER patients Complaint: blunt knee trauma X-rayed for knee fracture Study purpose: use covariates to The purpose of the analysis was to assess screen for decision to perform whether a patient’s pattern of covariates could X-ray be used as a screening test before performing the X-ray. 1.3 million people per year visit ER with blunt knee trauma Since 1.3 million people visit North American ER departments annually complaining of Substantial total cost for X-rays blunt knee trauma, the total cost associated with even a relatively inexpensive test such as Outcome variable: a knee X-ray (about $200 for each X-ray) may FRACTURE ¼ knee fracture status be substantial. (1 ¼ yes, 0 ¼ no) The variables considered in this analysis are Predictor variables: listed at the left. The outcome variable is called FRACTURE, which represents a binary vari- FLEX ¼ ability to flex knee able for knee fracture status. (0 ¼ yes, 1 ¼ no) The five predictor variables are FLEX, WEIGHT ¼ ability to put weight WEIGHT, AGECAT, HEAD, and PATELLAR, on knee (0 ¼ yes, and are defined at the left. 1 ¼ no) AGECAT ¼ patient’s age (0 ¼ age < 55, 1 ¼ age ! 55) HEAD ¼ injury to knee head (0 ¼ no, 1 ¼ yes) PATELLAR ¼ injury to patella (0 ¼ no, 1 ¼ yes)
366 10. Assessing Discriminatory Performance of a Binary Logistic Model EXAMPLE (continued) The logistic model used in the analysis is shown at the left, and includes all five predictor Logistic Model: variables. Although some of these predictors could have been evaluated for significance, we logit PðXÞ ¼ b0 þ b1FLEX report here only on the ability of the 5-variable þ b2WEIGHT model to discriminate cases (fracture ¼ 1) þ b3AGECAT from noncases (fracture ¼ 0). þ b4HEAD þ b5PATELLAR Results shown below based on We summarize the results of this analysis SAS’s based on using SAS’s LOGISTIC procedure, although the analysis could alternatively have LOGISTIC procedure been carried out using either STATA or SPSS (but can also use STATA or (see Computer Appendix for computer code SPSS) and output). Fitted Logistic Regression Model: The output showing the fitted model is now shown at the left. Parameter DF Estimate Std Wald Pr > Err ChiSq ChiSq Notice that three of the variables (FLEX, AGE- Intercept 1 À3.4657 0.4118 70.8372 <.0001 CAT, and HEAD) in the model have nonsignifi- 0.3743 1.9877 0.1586 cant Wald tests, indicating backward elimination FLEX 1 0.5277 0.4093 13.5320 0.0002 would result in removal of one or more of these 0.3994 1.9376 0.1639 variables, e.g., HEAD would be eliminated first, WEIGHT 1 1.5056 0.3761 0.3367 0.5617 since it has the highest Wald P-value (0.5617). 0.3518 3.1746 0.0748 Nevertheless, we will focus on the full model for AGECAT 1 0.5560 now, assuming that we wish to use all five pre- dictors to carry out the screening. HEAD 1 0.2183 PATELLAR 1 0.6268 FLEX, WEIGHT, and HEAD have nonsignif Wald statistics. BW elimination would simplify model Focus for now on full model Classification Table Correct Incorrect Prob Non- Non- Percentages 1 – Sp † Level Event Event Event Event Correct Se Sp We now show the classification table that uses the patients’ predicted outcome probabilities 0.000 45 0 303 0 12.9 100.0 0.0 100.0 obtained from the fitted logistic model to screen each patient. The probability levels 0.050 39 93 210 6 37.9 86.7 30.7 69.3 (first column) are prespecified cut points (in increments of 0.05) requested in the model 0.100 36 184 119 9 63.2 80.0 60.7 39.3 statement. 0.150 31 200 103 14 66.4 68.9 66.0 34.0 For example, in the third row, the cut-point is 0.100. If this cut-point is used for screening, 0.200 32 235 68 23 73.9 48.9 77.6 22.4 then any patient whose predicted probability is greater than 0.100 will test positive for knee 0.250 16 266 37 29 81.0 35.6 87.8 12.2 fracture on the screening test and therefore will receive an X-ray. 0.300 6 271 32 39 79.6 13.3 89.4 10.6 0.350 3 297 6 42 86.2 6.7 98.0 2.0 0.400 3 301 2 42 87.4 6.7 99.3 0.7 0.450 2 301 2 43 87.1 4.4 99.3 0.7 0.500 0 303 0 45 87.1 0.0 100.0 0.0 0.550 0 303 0 45 87.1 0.0 100.0 0.0 0.600 0 303 0 45 87.1 0.0 100.0 0.0 0.650 0 303 0 45 87.1 0.0 100.0 0.0 0.700 0 303 0 45 87.1 0.0 100.0 0.0 0.750 0 030 0 45 87.1 0.0 100.0 0.0 0.800 0 030 0 45 87.1 0.0 100.0 0.0 0.850 0 303 0 45 87.1 0.0 100.0 0.0 0.900 0 303 0 45 87.1 0.0 100.0 0.0 0.950 0 303 0 45 87.1 0.0 100.0 0.0 1.000 0 303 0 45 87.1 0.0 100.0 0.0 † 1 – Sp is not automatically output in SAS; s LOGISTIC
Presentation: V. Example from Study on Screening for Knee Fracture 367 EXAMPLE (continued) Notice that at cut-point 0.100, 36 of 45 true cp ¼ 0.100: events were correctly classified as events, and 9 of 45 were incorrectly classified as nonevents; Se ¼ 36=45 ¼ 0:80 also 184 of 303 true nonevents were correctly Sp ¼ 184=303 ¼ 0:607 classified as nonevents, and 119 of 303 were 1 À Sp ¼ 0:393 incorrectly classified as events. The sensitivity (Se) for this row is 36/45, or 80%, the specificity Se ¼ 0:80 > 1 À Sp ¼ 0:393 (Sp) is 184/303, or 60.7%, so that 1 À Sp is ðgood discriminationÞ 39.3%. Thus, in this row (cut-pt 0.100), the Se is larger than 1ÀSp, which indicates good dis- crimination (for this cut point). Se ! 1 À Sp for all cut-points, We can also see from this table that Se is at where least as large as 1 À Sp for all cut-pts. Notice, further, that once the cut-pt reaches 0.5 (and Se ¼ 1 À Sp ¼ 0 for cp ! 0:500 higher), none of the 45 true cases are correctly classified as cases (Se ¼ 0) whereas all 303 true Edited Output (SAS ProcLogistic)- noncases are correctly classified as noncases Association of Predicted Probabil- (Sp ¼ 1 and 1 À Sp ¼ 0). ities and Observed Responses Additional output obtained from SAS’s Logis- Percent Concordant 71.8 Somers’ D 0.489 tic procedure is shown at the left. This output Percent Discordant 22.9 Gamma 0.517 contains information and statistical measures Percent Tied Tau-a 0.111 related to the ROC curve for the fitted model. Pairs 5.3 c 0.745 13635 c ¼ AUC The “c” statistic of 0.745 in this output gives the Somer’s D, Gamma, and Tau-a: area under the ROC curve, i.e., AUC, that we described earlier. The Somers’ D, Gamma, and other measures of discrimination Tau-a are other measures of discrimination computed for the fitted model. c, Somer’s D, Gamma, and Tau-a: Each of these measures involves different ways (ranked) correlations between to compute a correlation between ranked (i.e., observed outcomes (Yi ¼ 0 or 1) ordered) observed outcomes (Yi ¼ 0 or 1) and ranked predicted probabilities ðP^ðXiÞÞ. A high and correlation indicates that higher predicted predicted probabilities ðP^ðXiÞÞ probabilities obtained from fitting the model correspond to true cases (Yi ¼ 1) whereas lower predicted probabilities correspond to true noncases (Yi ¼ 0), hence good discrimina- tion. Percent Concordant: 100 w/np The formulae for each measure are derived Percent Discordant: 100 d/np from the information provided on the left side Percent Tied: 100 z/np of the above output. The definitions of each of Pairs: np ¼ n1 Â n0, the latter items are shown at the left. Note that w, z, and np were defined in the previous sec- tion for the formula for the AUC (i.e., c).
368 10. Assessing Discriminatory Performance of a Binary Logistic Model where w ¼ no. of case/noncase pairs for which P^ðXcaseÞ > P^ðXnoncaseÞ d ¼ no. of case/noncase pairs for which P^ðXnoncaseÞ > P^ðXcaseÞ z ¼ no. of case/noncase pairs for which P^ðXcaseÞ ¼ P^ðXnoncaseÞ Formulae for discrimination measures: Using the notation just described, the formulae for these discrimination measures are shown c = w + 0.5z = AUC at the left, with the first of these formulae (for the AUC) provided in the previous section. np Somer’s D = w – d np Gamma = w – d w+d Tau-a = w–d 0.5Σ Yi(Σ Yi – 1) ii EXAMPLE The calculation of the AUC for the fitted model is shown at the left. The value for w in this c ¼ w þ 0:5z formula is 13,635(.718), or 9,789.91 and the np value for z is (13,635)(.053), or 722.655. ¼ 13,635ð:718Þ þ 0:5ð13,635Þð:053Þ Based on the AUC result of 0.745 for these data, 13,635 there is evidence of fair (Grade C) discrimina- tion using the fitted model. ¼ 0:745 A plot of the ROC curve for these data can also AUC ¼ 0:745 ) Fair discrimination be obtained and is shown here. Notice that the ðgrade CÞ points on the plot that represent the coordi- nates of Se by 1 À Sp at different cut-pts have ROC plot not been connected by the program. Neverthe- Sensitivity less, it is possible to fit a cubic regression to the plotted points of sensitivity by 1 À specificity 1.0 (not shown, but see Computer Appendix). 0.9 0.8 0.7 0.6 0.5 =0.4 AUC 0.745 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 – Specificity
Presentation: V. Example from Study on Screening for Knee Fracture 369 EXAMPLE (continued) Recall that the previously shown output gave Wald statistics for HEAD, AGECAT, and FLEX Backward elimination: that were nonsignificant. A backward elimina- Step 1: Drop HEAD tion approach that begins by dropping the least (highest P-value 0.5617) significant of these variables (i.e., HEAD), refit- Step 2: Drop AGECAT ting the model, and dropping additional non- (highest P-value 0.2219) significant variables results in a model that Step 3: Drop FLEX contains only two predictor variables, WEIGHT (highest P-value 0.1207) and PATELLAR. (Note that we are treating all Step 4: Keep WEIGHT or predictor variables as exposure variables.) PATELLAR (highest P-value 0.0563) Reduced Model After BW The fitted logistic model that involves only Elimination WEIGHT and PATELLAR is shown at the left. Parameter DF Estimate Std Wald Pr > We also show the discrimination measures Err ChiSq ChiSq that result for this model, including the c Intercept 1 À3.1790 0.3553 80.0692 <.0001 (¼ AUC) statistic. The c statistic here is 0.731, WEIGHT 1 1.7743 0.3781 22.0214 <.0001 which is slightly smaller than the c statistic of PATELLAR 1 0.6504 0.3407 3.6437 0.0563 0.745 obtained for the full model. The reduced model has slightly less discriminatory power Association of Predicted Probabil- than the full model. (See Hanley (1983) for a ities and Observed Responses statistical test of significance between two or more AUCs.) Percent Concordant 61.4 Somers’ D 0.463 Percent Discordant 15.2 Gamma 0.604 Percent Tied 23.4 Tau-a 0.105 Pairs 14,214 c 0.731 ROC Plot for the Reduced Model The ROC plot for the reduced model is shown Sensitivity here. 1.0 Notice that there are fewer cut-pts plotted on this graph than on the ROC plot for the full 0.9 model (previous page). The reason is that the number of possible cut-pts for a given model is 0.8 always equal to or less than the number of covariate patterns (i.e., distinct combinations 0.7 of predictors) defined by the model. 0.6 The reduced model (with only two binary pre- dictors) contains four (¼22) covariate patterns =0.5 AUC 0.731 whereas the full model (with 5 binary predic- tors) contains 32 (¼25) covariate patterns. 0.4 Moreover, because the reduced model is nested 0.3 within the full model, the AUC for the reduced 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0.10 model will always be smaller than the AUC for 1 – specificity the full model, i.e., similar to the characteris- tics of R2 in linear regression. That’s the case Reduced model (2 Xs) Full model (5 Xs) here, since the AUC is 0.731 for the reduced 22 ¼ 4 covariate 25 ¼ 32 covariate model compared to 0.745 for the full model. patterns patterns 4 cut-pts 28 cut-pts AUCReduced ¼ 0.731 AUCFull ¼ 0.745 In general: Model 1 is nested within Model 2 + AUCModel 1 AUCModel 2
370 10. Assessing Discriminatory Performance of a Binary Logistic Model EXAMPLE (continued) Note, however, if two models that are not nested are compared, there is no guarantee Model 3: HEAD, AGECAT, and which model will have a larger AUC. For exam- FLEX ple, the model that contains only the three vari- AUCModel 3 ¼ 0:660 ables that were dropped, namely HEAD, AGECAT, and FLEX has an AUC of 0.660, Reduced Model: WEIGHT and which is smaller than the AUC of 0.731 PATELLAR obtained for the two variable (reduced) model involving WEIGHT and PATELLAR. AUCModel 2 ¼ 0:731 Thus, Model 2 (fewer variables) discriminates better than Model 3 (more variables)
Presentation: VI. Summary 371 VI. SUMMARY This presentation is now complete. We have described how to assess discriminatory perfor- DP ¼ discriminatory performance of a mance (DP) of a binary logistic model. binary logistic model A model provides good DP if the covariates in Good DP: model discriminates the model help to predict (i.e., discriminate) cases ðY ¼ 1Þ from which subjects will develop the outcome noncases ðY ¼ 0Þ (Y ¼ 1, or the cases) and which will not develop the outcome (Y ¼ 0, or the noncases). One approach: Classification/Diagnostic Table One way to measure DP is to consider the True (Observed) Outcome sensitivity (Se) and specificity (Sp) from a clas- sification table that combines observed and Predicted cp Y¼1 Y¼0 predicted outcomes over all subjects. The Outcome Y¼1 nTP nFP closer both the sensitivity and specificity are Y¼0 nFN nTN to 1, the better is the discrimination. n1 n0 cp ¼ cut-point for classifying cases vs. noncases Se ¼ Pr(trueþ | true C) ¼ nTP/n1 Sp ¼ Pr(trueÀ | true NC) ¼ nTN/n0 Another approach: An alternative way to measure DP involves a Plot and/or summary mea- plot and/or summary measure based on a sure based on a range of cut- range of cut-points chosen for a given model. points A widely used plot is the ROC curve, which ROC curve graphs the sensitivity by 1 minus the specific- ity for a range of cut-points. Equivalently, the 1.0 ´ ROC is a plot of the true positive rate (TPR ¼ ´ Se) by the false positive rate (FPR ¼ 1 À Sp). ´ cut-points for classification Se (=TPR) ´ ´ 1 – Sp (= FPR) 1.0 AUC ¼ area under ROC curve A popular summary measure based on the AUC ¼ 1:0 ) perfect DP ROC plot is the area under the ROC curve, ðSe ¼ 1 À SpÞ or AUC. The larger the AUC, the better is the AUC ¼ 0:5 ) no DP DP. An AUC of 1 indicates perfect DP and an AUC of 0.5 indicates no DP.
372 10. Assessing Discriminatory Performance of a Binary Logistic Model We suggest that you review the material cov- ered in this chapter by reading the detailed outline that follows. Then do the practice exer- cises and test. Up to this point, we have considered binary outcomes only. In the next two chapters, the standard logistic model is extended to handle outcomes with three or more categories.
Detailed Detailed Outline 373 Outline I. Overview (pages 348–350) A. Focus: how to assess discriminatory performance (DP) of a binary logistic model. B. Considers how well the covariates in a given model help to predict (i.e., discriminate) which subjects will develop the outcome (Y ¼ 1, or the cases) and which will not develop the outcome (Y ¼ 0, or the noncases). C. One way to measure DP: consider the sensitivity (Se) and specificity (Sp) from a classification table that combines true and predicted outcomes over all subjects. D. An alternative way to measure DP: involves a plot (i.e., ROC curve) and/or summary measure (AUC) based on a range of cut-points chosen for a given model. II. Assessing Discriminatory Performance using Sensitivity and Specificity Parameters (pages 350–354) A. Classification Table i. One way to assess DP. ii. Combines true and predicted outcomes over all subjects. iii. Cut-point (cp) can be used with P^ðXÞ to predict whether subject is case or noncase: If P^ðXÞ > cp, then predict subj X to be case; otherwise, predict subj X to be noncase. B. Sensitivity (Se) and specificity (Sp) i. Computed from classification table for fixed cut point. ii. Se ¼ proportion of truly diagnosed cases ¼ Pr(true positive | true case) ¼ nTP/n1 iii. Sp ¼ proportion of falsely diagnosed ¼ Pr(true negative | true noncase) ¼ nTN/n0 iv. The closer both Se and Sp are to 1, the better is the discrimination. v. Sp and Se values vary with cp: cp decreases from 1 to 0 ) Se increases from 0 to 1, and Sp decreases from 1 to 0. Sp may change at a different rate than the Se depending on the model considered. vi. 1 À Sp more appealing than Sp:
374 10. Assessing Discriminatory Performance of a Binary Logistic Model Se and 1 À Sp both focus on predicted cases If good discrimination, would expect Se > 1 À Sp for all cp C. Pick a case and a noncase at random: what is probability that P^ðXcaseÞ > P^ðXnoncaseÞ? i. One approach: “collectively” determine whether Se exceeds 1 À Sp over several cut- points ranging between 0 and 1. ii. Drawback: Se and Sp values are “summary statistics” over several subjects. iii. Instead: use proportion of case, noncase pairs for which P^ðXcaseÞ ! P^ðXnoncaseÞ. III. Receiver Operating Characteristic (ROC) Curves (pages 354–358) A. ROC plots sensitivity (Se) by 1 2 specifity (1 – Sp) values over all cut points. i. Equivalently, ROC plots true positive rate (TPR) for cases by the false positive rate (FPR) for noncases. B. ROC measures how well model predicts who will or will not have the outcome. C. ROC provides numerical answer to question: for randomly case/noncase pair, what is probability that P^ðXcaseÞ ! P^ðXnoncaseÞ? i. The answer: AUC ¼ area under the ROC. ii. The larger the area, the better is the discrimination. iii. Two extremes: AUC ¼ 1 ) perfect discrimination AUC ¼ 0:5 ) no discrimination D. Grading guidelines for AUC values: 0.90 À 1.0 ¼ excellent discrimination (A); rarely observed 0.80 À 0.90 ¼ good discrimination (B) 0.70 À 0.80 ¼ fair discrimination (C) 0.60 À 0.70 ¼ poor discrimination (D) 0.50 À 0.60 ¼ failed discrimination (F) E. Complete separation of points (CSP) i. Occurs if all exposed subjects are cases and almost all unexposed subjects are noncases. ii. CSP often found when AUC ! 0.90. iii. CSP ) impossible as well as unnecessary to fit a logistic model to the data.
Detailed Outline 375 IV. Computing the Area Under the ROC (pages 358–365) A. General formula: AUC ¼ no: of case=noncase pairs in which P^ðXcase Þ ! P^ ðXnoncase Þ total # caseÀcontrol pairs B. Calculation formula: cð¼ AUCÞ ¼ w þ 0:5z , np where w ¼ no. of case/noncase pairs for which P^ðXcaseÞ > P^ðXnoncaseÞ z ¼ no. of case/noncase pairs for which P^ðXcaseÞ ¼ P^ðXnoncaseÞ np ¼ total no. of case/noncase pairs C. Example of AUC calculation: i. n ¼ 300 subjects, n1 ¼ 100 true cases, n0 ¼ 200 true noncases ii. Fit logistic model P(X) and compute P^ðXiÞ for i ¼ 1, . . . , 300 iii. np ¼ n1 Â n0 ¼ 100 Â 200 ¼ 20,000 iv. Suppose w ¼ 11,480 (i.e., 57.4% of 20,000) and z ¼ 5,420 (i.e., 27.1% of 20,000) v. c ð¼ AUCÞ ¼ w þ 0:5z ¼ 11,480 þ 0:5ð5,420Þ ¼ 0:7095, np 20,000 grade C (fair) discrimination V. Examples from Study on Screening for Knee Fracture (pages 365–370) A. Scenario: n ¼ 348 ER patients; complaint: blunt knee trauma; X-rayed for knee fracture. B. Study purpose: use covariates to screen for decision to perform X-ray. C. Outcome variable: FRACTURE ¼ knee fracture status (1 ¼ yes, 0 ¼ no) Predictor variables: FLEX ¼ ability to flex knee (0 ¼ yes, 1 ¼ no) WEIGHT ¼ ability to put weight on knee (0 ¼ yes, 1 ¼ no) AGECAT ¼ patients age (0 ¼ age < 55, 1 ¼ age ! 55) HEAD ¼ injury to knee head (0 ¼ no, 1 ¼ yes) PATELLAR ¼ injury to patella (0 ¼ no, 1 ¼ yes)
376 10. Assessing Discriminatory Performance of a Binary Logistic Model D. Logistic Model: logit PðXÞ ¼ b0 þ b1FLEX þ b2WEIGHT þ b3AGECAT þ b4HEAD þ b5PATELLAR E. Results based on SAS’s LOGISTIC procedure (but can also use STATA or SPSS). F. ROC plot Sensitivity 1.0 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 1 – specificity G. AUC ¼ 0.745 ) Fair discrimination (Grade C) H. Reduced Model i. Why? Some nonsignificant regression coefficients in the full model ii. Use backward elimination to obtain following reduced model: logit PðXÞ ¼ b0 þ b2WEIGHT þ b5PATELLAR iii. AUC (Reduced model) ¼ 0.731 AUC (Full model) ¼ 0.745 iv. In general, for nested models, AUC(smaller model) AUC (larger model), v. However, if models not nested, it is possible that AUC(model with fewer variables) > AUC (model with more variables). VI. Summary (page 371)
Practice Exercises 377 Practice The following questions and computer information con- Exercises sider the Evans Country dataset on 609 white males that has been previously discussed and illustrated in earlier chapters of this text. Recall that the outcome variable is CHD status (1 ¼ case, 0 ¼ noncase), the exposure variable of interest is CAT status (1 ¼ high CAT, 0 ¼ low CAT), and the five control variables considered are AGE (continuous), CHL (continuous), ECG (0,1), SMK (0,1), and HPT (0,1). The SAS output provided below was obtained for the fol- lowing logistic model: Logit PðXÞ ¼ a þ b1CAT þ g1AGE þ g2CHL þ g3ECG þ g4SMK þ g5HPT þ d1CC þ d2CH, where CC ¼ CAT Â CHL and CH ¼ CAT Â HPT Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi- Pr > Square ChiSq Intercept 1 À4.0497 1.2550 10.4125 0.0013 cat 1 À12.6894 3.1047 16.7055 <.0001 age 1 0.0350 0.0161 4.6936 0.0303 chl 1 À0.00545 0.00418 1.7000 0.1923 ecg 1 0.3671 0.3278 1.2543 0.2627 smk 1 0.7732 0.3273 5.5821 0.0181 hpt 1 1.0466 0.3316 9.9605 0.0016 cc 1 0.0692 0.0144 23.2020 <.0001 ch 1 À2.3318 0.7427 9.8579 0.0017 Association of Predicted Probabilities and Observed Responses Percent Concordant 78.6 Somers' D 0.578 Percent Discordant 20.9 Gamma 0.580 Percent Tied 0.5 Tau-a 0.119 Pairs 38,198 c 0.789 Classification Table Prob Correct Incorrect Percentages False Level Non- Non- Sensi- Speci- False NEG Correct tivity ficity POS 0.000 Event event Event event 0.020 0.040 71 0 538 0 11.7 100.0 0.0 88.3 . 0.060 16.9 95.8 6.5 88.1 7.9 0.080 68 35 503 3 31.9 94.4 23.6 86.0 3.1 0.100 47.0 84.5 42.0 83.9 4.6 0.120 67 127 411 4 62.4 76.1 60.6 79.7 5.0 0.140 72.7 70.4 73.0 74.4 5.1 0.160 60 226 312 11 76.5 57.7 79.0 73.4 6.6 0.180 79.1 52.1 82.7 71.5 7.1 0.200 54 326 212 17 81.6 47.9 86.1 68.8 7.4 0.220 83.9 47.9 88.7 64.2 7.2 0.240 50 393 145 21 86.4 43.7 92.0 58.1 7.5 0.260 87.5 40.8 93.7 54.0 7.7 0.280 41 425 113 30 88.2 39.4 94.6 50.9 7.8 0.300 88.8 38.0 95.5 47.1 7.9 37 445 93 34 89.3 35.2 96.5 43.2 8.1 90.0 32.4 97.6 36.1 8.4 34 463 75 37 34 477 61 37 31 495 43 40 29 504 34 42 28 509 29 43 27 514 24 44 25 519 19 46 23 525 13 48
378 10. Assessing Discriminatory Performance of a Binary Logistic Model Classification Table (continued) Correct Incorrect Percentages Prob Non- Non- Sensi- Speci- False False Level NEG Event event Event event Correct tivity ficity POS 0.320 8.4 0.340 23 526 12 48 90.1 32.4 97.8 34.3 8.5 0.360 8.6 0.380 22 528 10 49 90.3 31.0 98.1 31.3 8.6 0.400 9.1 0.420 21 529 9 50 90.3 29.6 98.3 30.0 9.1 0.440 9.1 0.460 21 529 9 50 90.3 29.6 98.3 30.0 9.1 0.480 9.1 0.500 18 529 9 53 89.8 25.4 98.3 33.3 9.1 0.520 9.1 0.540 18 531 7 53 90.1 25.4 98.7 28.0 9.4 0.560 9.4 0.580 18 531 7 53 90.1 25.4 98.7 28.0 9.5 0.600 9.8 0.620 18 531 7 53 90.1 25.4 98.7 28.0 10.1 0.640 10.2 0.660 18 531 7 53 90.1 25.4 98.7 28.0 10.2 0.680 10.2 0.700 18 532 6 53 90.3 25.4 98.9 25.0 10.2 0.720 10.4 0.740 18 532 6 53 90.3 25.4 98.9 25.0 10.5 0.760 10.5 0.780 16 532 6 55 90.0 22.5 98.9 27.3 10.5 0.800 10.5 0.820 16 532 6 55 90.0 22.5 98.9 27.3 10.8 0.840 10.8 0.860 15 532 6 56 89.8 21.1 98.9 28.6 10.9 0.880 10.9 0.900 13 533 5 58 89.7 18.3 99.1 27.8 10.9 0.920 10.9 0.940 11 534 4 60 89.5 15.5 99.3 26.7 11.1 0.960 11.2 0.980 10 535 3 61 89.5 14.1 99.4 23.1 11.2 1.000 11.7 10 535 3 61 89.5 14.1 99.4 23.1 10 535 3 61 89.5 14.1 99.4 23.1 10 536 2 61 89.7 14.1 99.6 16.7 9 536 2 62 89.5 12.7 99.6 18.2 8 536 2 63 89.3 11.3 99.6 20.0 8 536 2 63 89.3 11.3 99.6 20.0 8 536 2 63 89.3 11.3 99.6 20.0 8 536 2 63 89.3 11.3 99.6 20.0 6 536 2 65 89.0 8.5 99.6 25.0 6 537 1 65 89.2 8.5 99.8 14.3 5 537 1 66 89.0 7.0 99.8 16.7 5 537 1 66 89.0 7.0 99.8 16.7 5 537 1 66 89.0 7.0 99.8 16.7 5 537 1 66 89.0 7.0 99.8 16.7 4 538 0 67 89.0 5.6 100.0 0.0 3 538 0 68 88.8 4.2 100.0 0.0 3 538 0 68 88.8 4.2 100.0 0.0 0 538 0 71 88.3 0.0 100.0 · 1. Using the above output: a. Give a formula for calculating the estimated proba- bility P^ðX*Þ of being a case (i.e., CHD ¼ 1) for a subject (X*) with the following covariate values: CAT ¼ 1, AGE ¼ 50, CHL ¼ 200, ECG ¼ 0, SMK ¼ 0, HPT ¼ 0? [Hint: P^ðX*Þ ¼ 1=f1 þ exp½Àlogit P^ðX*Þg where logit P^ðX*Þ is calculated using the estimated regression coefficients for the fitted model.] b. Compute the value of P^ðX*Þ using your answer to question 1a. c. If a discrimination cut-point of 0.200 is used to classify a subject as either a case or a noncase, how would you classify subject X* based on your answer to question 1b. d. With a cut-point of 0.000, the sensitivity of the screening test is 1.0 (or 100% – see first row). Why does the sensitivity of a test have to be 100% if the cut point is 0? (assume there is at least one true event)
Test 379 e. Notice for this data, as the cut-point gets larger the specificity also gets larger (or stays the same). For example, a cut-point of 0.200 yields a specificity of 92.0% while a cut-point of 0.300 yields a specificity of 97.6%. Is it possible (using different data) that an increase of a cut-point could actually decrease the specificity? Explain. f. In the classification table provided above, a cut- point of 0.200 yields a false positive percentage of 58.1% whereas 1 minus the specificity at this cut point is 8.0%. Since 1 minus specificity percentage is defined as 100 times the proportion of true non- cases that are falsely classified as cases, i.e., the numerator in this proportion is the number of false-positive noncases, why is not the false positive percentage (58.1%) shown in the output equal to 1 minus specificity (8.0%)? Is the computer pro- gram in error? 2. Based on the output, a. What is the area under the ROC curve? How would you grade this area in terms of the discriminatory power of the model being fitted? b. In the output provided under the heading “Associa- tion of Predicted Probabilities and Observed Responses,” the number of pairs is 38,198. How is this number computed? c. In the output provided under the same heading in question 2b, how are the Percent Concordant and the Percent Tied computed? d. Using the information given by the number of pairs, the Percent Concordant and the Percent Tied described in parts (b) and (c), compute the area under the ROC curve (AUC) and verify that it is equal to your answer to part 2a. e. The ROC curves for the interaction model described above and the no interaction model that does not contain the CC or CH (interaction) variables are shown below. The area under the ROC curve for the no-interaction model is 0.705. Why is the latter AUC less than the AUC for the interaction model?
380 10. Assessing Discriminatory Performance of a Binary Logistic Model Test The following questions and computer output consider a data from a cross-sectional study carried out at Grady Hospital in Atlanta, Georgia involving 289 adult patients seen in an emergency department whose blood cultures taken within 24 hours of admission were found to have Staph aureus infection (Rezende et al., 2002). Information was obtained on several variables, some of which were considered risk factors for methicillin resistance (MRSA). The outcome variable is MRSA status (1 ¼ yes, 0 ¼ no), and covariates of interest included the following vari- ables: PREVHOSP (1 ¼ previous hospitalization, 0 ¼ no previous hospitalization), AGE (continuous), GENDER (1 ¼ male, 0 ¼ female), and PAMU (1 ¼ antimicrobial drug use in the previous 3 months, 0 ¼ no previous anti- microbial drug use). The SAS output provided below was obtained for the fol- lowing logistic model: Logit PðXÞ ¼ a þ b1PREVHOSP þ b2AGE þ b3GENDER þ b4PAMU Analysis of maximum likelihood estimates Wald Standard Chi- Parameter DF Estimate Error Square Pr > ChiSq Intercept 1 À5.0583 0.7643 43.8059 <.0001 1.4855 0.4032 13.5745 0.0002 PREVHOSP 1 0.0353 0.00920 14.7004 0.0001 0.9329 0.3418 0.0063 AGE 1 1.7819 0.3707 7.4513 <.0001 23.1113 gender 1 pamu 1 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits PREVHOSP 4.417 2.004 9.734 AGE 1.036 1.017 1.055 gender 2.542 1.301 4.967 pamu 5.941 2.873 12.285 Association of Predicted Probabilities and Ob- served Responses Percent Concordant 83.8 Somers' D 0.681 Percent Discordant 15.8 Gamma 0.684 Percent Tied Tau-a 0.326 Pairs 0.4 c 0.840 19950
Test 381 Classification Table Correct Incorrect Percentages Non- Prob Non- Sensi- Speci- False False Level Event event NEG Event event Correct tivity ficity POS 0.000 0.020 114 0 175 0 39.4 100.0 0.0 60.6 . 0.040 40.1 100.0 1.1 60.3 0.0 0.060 114 2 173 0 45.7 10.9 58.0 5.0 0.080 51.2 99.1 21.7 55.5 9.5 0.100 113 19 156 1 57.4 96.5 33.1 52.0 9.4 0.120 62.3 94.7 41.7 48.8 8.8 0.140 110 38 137 4 63.0 93.9 42.9 48.3 8.5 0.160 64.7 93.9 46.3 47.0 9.0 0.180 108 58 117 6 67.5 93.0 50.9 44.8 8.2 0.200 68.2 93.0 52.0 44.2 8.1 0.220 107 73 102 7 68.9 93.0 53.1 43.6 7.9 0.240 69.6 93.0 56.6 42.7 10.8 0.260 107 75 100 7 69.6 89.5 57.1 42.6 11.5 0.280 70.6 88.6 60.0 41.4 12.5 0.300 106 81 94 8 70.6 86.8 60.6 41.3 13.1 0.320 70.9 86.0 61.1 41.0 13.0 0.340 106 89 86 8 73.0 86.0 65.7 38.5 13.5 0.360 73.7 84.2 66.9 37.7 13.3 0.380 106 91 84 8 74.0 84.2 68.0 37.1 13.8 0.400 73.4 83.3 68.6 37.4 15.5 0.420 106 93 82 8 73.4 80.7 69.1 37.2 16.0 0.440 74.7 79.8 71.4 35.5 15.5 0.460 102 99 76 12 75.4 79.8 72.6 34.5 15.3 0.480 75.4 79.8 73.7 34.1 16.2 0.500 101 100 75 13 75.8 78.1 76.6 32.5 17.8 0.520 76.1 74.6 78.9 31.1 18.8 0.540 99 105 70 15 76.5 71.9 80.0 30.2 19.1 0.560 76.1 71.1 80.6 30.1 19.9 0.580 98 106 69 16 76.1 69.3 82.9 28.6 21.2 0.600 76.1 65.8 84.0 27.7 21.8 0.620 98 107 68 16 76.5 64.0 86.3 25.5 22.6 0.640 75.8 61.4 87.4 25.0 23.9 0.660 96 115 60 18 75.1 57.9 89.1 23.8 25.4 0.680 74.4 53.5 91.4 21.4 26.9 0.700 96 117 58 18 73.0 48.2 93.1 20.0 28.8 0.720 71.6 42.1 94.3 19.2 30.4 0.740 95 119 56 19 69.6 36.8 95.4 19.0 32.4 0.760 70.2 29.8 97.7 11.1 32.4 0.780 92 120 55 22 69.2 28.1 97.7 12.1 33.2 0.800 67.8 25.4 97.7 13.8 34.2 0.820 91 121 54 23 65.4 21.9 98.3 15.0 36.1 0.840 64.0 14.9 98.9 14.3 37.1 0.860 91 125 50 23 64.0 10.5 99.4 37.2 0.880 62.3 99.4 8.3 38.3 0.900 91 127 48 23 61.9 9.6 99.4 14.3 38.5 0.920 60.6 5.3 100.0 16.7 39.4 0.940 89 129 46 25 60.6 4.4 100.0 39.4 0.960 60.6 0.0 100.0 . 39.4 0.980 85 134 41 29 60.6 0.0 100.0 . 39.4 1.000 60.6 0.0 100.0 . 39.4 82 138 37 32 60.6 0.0 100.0 . 39.4 0.0 . 81 140 35 33 0.0 . 79 141 34 35 75 145 30 39 73 147 28 41 70 151 24 44 66 153 22 48 61 156 19 53 55 160 15 59 48 163 12 66 42 165 10 72 34 167 8 80 32 171 4 82 29 171 4 85 25 171 4 89 17 172 3 97 12 173 2 102 11 174 1 103 6 174 1 108 5 174 1 109 0 175 0 114 0 175 0 114 0 175 0 114 0 175 0 114 0 175 0 114 0 175 0 114 Questions based on the above information begin on the next page.
382 10. Assessing Discriminatory Performance of a Binary Logistic Model 1. For a discrimination cut-point of 0.300 in the Classifi- cation Table provided above, a. fill in the table below to show the cell frequencies for the number of true positives (nTP), false posi- tives (nFP), true negatives (nTN), and false negatives (nFN): True (Observed) Outcome cp ¼ 0.30 Y ¼ 1 Y¼0 Predicted Y¼1 nTP ¼ nFP ¼ Outcome Y¼0 nFN ¼ nTN ¼ n1 ¼ 114 n0 ¼ 175 b. Using the cell frequencies in the table of part 1a, compute in percentages the sensitivity, specificity, 1 À specificity, false positive, and false negative values, and verify that these results are identical to the results shown in the Classification Table for cut-point 0.300: Sensitivity % ¼ Specificity % ¼ 1 À specificity % ¼ False positive % ¼ False negative % ¼ c. Why are the 1 À specificity and false positive per- centages not identical even though they both use the (same) number of false positive subjects in their calculation? d. How is the value of 70.9 in the column labeled “Correct” computed and how can this value be interpreted? e. How do you interpret values for sensitivity and specificity obtained for the cut-point of 0.300 in terms of how well the model discriminates cases from noncases? f. What is the drawback to (exclusively) using the results for the cut-point of 0.300 to determine how well the model discriminates cases from noncases? 2. Using the following graph, plot the points on the graph that would give the portion of the ROC curve that corresponds to the following cut-points: 0.000, 0.200, 0.400, 0.600, 0.800, and 1.000
Test 383 Sensitivity 1.0 0.2 0.4 0.6 0.8 1.0 0.8 1 – specificity 0.6 0.4 0.2 0.0 0.0 3. The ROC curve obtained for the model fitted to these data is shown below. a. Verify that the plots you produced to answer question 2 correspond to the appropriate points on the ROC curve shown here. b. Based on the output provided, what is the area under the ROC curve? How would you grade this area in terms of the discriminatory power of the model being fitted? c. In the output provided under the heading “Associa- tion of Predicted Probabilities and Observed Responses,” the number of pairs is 19,950. How is this number computed? d. Using the information given by the number of pairs, the Percent Concordant, and the Percent Tied in the output under the heading “Association of Predicted Probabilities and Observed Responses,” compute the area under the ROC curve (AUC), and verify that it is equal to your answer to part 3b. 4. Consider the following figure that superimposes the ROC curve within the rectangular area whose height is equal to the number of MRSA cases (114) and whose width is equal to the number of MRSA noncases (175).
384 10. Assessing Discriminatory Performance of a Binary Logistic Model a. What is the area within the entire rectangle and what does it have in common with the formula for the area under the ROC curve? b. Using the AUC calculation formula, what is the area under the ROC curve superimposed on the above graph? How do you interpret this area? 5. Below is additional output providing goodness of fit information and the Hosmer-Lemeshow test for the model fitted to the MRSA dataset. The column labeled as “Group” lists the deciles of risk, ordered from smal- lest to largest, e.g., decile 10 contains 23 patients who had had the highest 10% of predicted probabilities. mrsa ¼ 1 mrsa ¼ 0 Group Total Observed Expected Observed Expected 1 29 1 0.99 28 28.01 2 31 5 1.95 26 29.05 3 29 2 2.85 27 26.15 4 29 5 5.73 24 23.27 5 30 10 9.98 20 20.02 6 31 12 14.93 19 16.07 7 29 16 17.23 13 11.77 8 29 20 19.42 9 9.58 9 29 22 21.57 7 7.43 10 23 21 19.36 2 3.64 Hosmer and Lemeshow Goodness-of- Fit Test Chi-Square DF Pr > ChiSq 7.7793 8 0.4553 a. Based on the above output, does the model fit the data? Explain briefly. b. What does the distribution of the number of observed cases and observed noncases over the 10 deciles indicate about how well the model discri- minates cases from noncases? Does your answer
Test 385 coincide with your answer to question 3b in terms of the discriminatory power of the fitted model? c. Suppose the distribution of observed and expected cases and noncases was given by the following table: Partition for the Hosmer and Lemeshow Test mrsa ¼ 1 mrsa ¼ 0 Group Total Observed Expected Observed Expected 1 29 10 0.99 19 28.01 2 31 11 1.95 20 29.05 3 29 11 2.85 18 26.15 4 29 11 5.73 18 23.27 5 30 12 9.98 18 20.02 6 31 12 14.93 19 16.07 7 29 12 17.23 17 11.77 8 29 13 19.42 16 9.58 9 29 13 21.57 16 7.43 10 23 9 19.36 14 3.64 What does this information indicate about how well the model discriminates cases from noncases and how well the model fits the data? Explain briefly. d. Suppose the distribution of observed and expected cases and noncases was given by the following table: Partition for the Hosmer and Lemeshow Test mrsa ¼ 1 mrsa ¼ 0 Group Total Observed Expected Observed Expected 1 29 10 10.99 19 18.01 2 31 11 10.95 20 20.05 3 29 11 10.85 18 18.15 4 29 11 11.73 18 17.27 5 30 12 11.98 18 18.02 6 31 12 11.93 19 19.07 7 29 12 11.23 17 17.77 8 29 13 11.42 16 17.58 9 29 13 11.57 16 17.43 10 23 9 11.36 14 11.64 What does this information indicate about how well the model discriminates cases from noncases and how well the model fits the data? Explain briefly. e. Do you think it is possible that a model might provide good discrimination between cases and noncases, yet poorly fit the data? Explain briefly, perhaps with a numerical example (e.g., using hypothetical data) or generally describing a situa- tion, where this might happen.
386 10. Assessing Discriminatory Performance of a Binary Logistic Model Answers to 1. a. X* ¼ (CAT ¼ 1, AGE ¼ 50, CHL ¼ 200, ECG ¼ 0, Practice SMK ¼ 0, HPT ¼ 0) Exercises P^ðX*Þ ¼ 1=f1 þ exp½Àlogit P^ðX*Þg, where logit P^ðX*Þ ¼ À4:0497 þ ðÀ12:6894Þð1Þ þ 0:0350ð50Þ þ ðÀ0:00545Þð200Þ þ :3671ð0Þ þ 0:7732ð0Þ þ 1:0466ð0Þ þ 0:0692ð1Þð200Þ þ ðÀ2:3318Þð1Þð0Þ b. logit P^ðX*Þ ¼ À2:2391 P^ðX*Þ ¼ 1=f1 þ exp½Àlogit P^ðX*Þg ¼ 1=f1 þ exp½2:2391g ¼ 0:096 c. Cut-point ¼ 0.200 Since P^ðX*Þ ¼ 0:096 < 0:200, we would predict sub- ject X* to be a noncase. d. If the cut-point is 0 and there is at least one true case, than every case in the dataset will have P^ðX*Þ > 0, i.e., all 71 true cases will exceed the cut-point and therefore be predicted to be cases. Thus, the sensitivity percent is 100(71/71) ¼ 100. e. It is not possible that an increase in the cut-point could result in a decrease in the specificity. f. The denominator for computing 1 minus the speci- ficity is the number of true noncases (538), whereas the denominator for the false positive percentage in the SAS output is the number of persons classified as positive (74). Thus, we obtain different results as follows: Percentage specificity ¼ (100)(1 À Sp) ¼ (100)43/ 538 ¼ 8%, whereas Percentage false positive ¼ (100)43/74 ¼ 58.1%. 2. a. AUC ¼ c ¼ 0.789. Grade C, i.e., fair discrimination. b. 38,198 ¼ 71  538, where 71 is the number of true cases and 538 is the number of true noncases. Thus, 38,198 is the number of distinct case/non- case pairs in the dataset. c. Percent Concordant ¼ 100w/np, where w is the number of case/noncase pairs for which the case has a higher predicted probability than the noncase and np is the total number of case/noncase pairs (38,198). Percent Tied ¼ 100z/np, where z is the number of case/noncase pairs for which the case has the same predicted probability as the noncase.
Answers to Practice Exercises 387 d. AUC ¼ c ¼ w þ 0:5z np ¼ 38,198ð:786Þ þ 0:5ð38,198Þð0:005Þ ¼ 0:789 38,198 e. The AUC for the no interaction model is smaller than the AUC for the interaction model because the former model is nested within the latter model.
11 Analysis of Matched Data Using Logistic Regression n Contents Introduction 390 Abbreviated Outline 390 Objectives 391 426 Presentation 392 Detailed Outline 415 Practice Exercises 420 Test 424 Answers to Practice Exercises D.G. Kleinbaum and M. Klein, Logistic Regression, Statistics for Biology and Health, 389 DOI 10.1007/978-1-4419-1742-3_11, # Springer ScienceþBusiness Media, LLC 2010
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278
- 279
- 280
- 281
- 282
- 283
- 284
- 285
- 286
- 287
- 288
- 289
- 290
- 291
- 292
- 293
- 294
- 295
- 296
- 297
- 298
- 299
- 300
- 301
- 302
- 303
- 304
- 305
- 306
- 307
- 308
- 309
- 310
- 311
- 312
- 313
- 314
- 315
- 316
- 317
- 318
- 319
- 320
- 321
- 322
- 323
- 324
- 325
- 326
- 327
- 328
- 329
- 330
- 331
- 332
- 333
- 334
- 335
- 336
- 337
- 338
- 339
- 340
- 341
- 342
- 343
- 344
- 345
- 346
- 347
- 348
- 349
- 350
- 351
- 352
- 353
- 354
- 355
- 356
- 357
- 358
- 359
- 360
- 361
- 362
- 363
- 364
- 365
- 366
- 367
- 368
- 369
- 370
- 371
- 372
- 373
- 374
- 375
- 376
- 377
- 378
- 379
- 380
- 381
- 382
- 383
- 384
- 385
- 386
- 387
- 388
- 389
- 390
- 391
- 392
- 393
- 394
- 395
- 396
- 397
- 398
- 399
- 400
- 401
- 402
- 403
- 404
- 405
- 406
- 407
- 408
- 409
- 410
- 411
- 412
- 413
- 414
- 415
- 416
- 417
- 418
- 419
- 420
- 421
- 422
- 423
- 424
- 425
- 426
- 427
- 428
- 429
- 430
- 431
- 432
- 433
- 434
- 435
- 436
- 437
- 438
- 439
- 440
- 441
- 442
- 443
- 444
- 445
- 446
- 447
- 448
- 449
- 450
- 451
- 452
- 453
- 454
- 455
- 456
- 457
- 458
- 459
- 460
- 461
- 462
- 463
- 464
- 465
- 466
- 467
- 468
- 469
- 470
- 471
- 472
- 473
- 474
- 475
- 476
- 477
- 478
- 479
- 480
- 481
- 482
- 483
- 484
- 485
- 486
- 487
- 488
- 489
- 490
- 491
- 492
- 493
- 494
- 495
- 496
- 497
- 498
- 499
- 500
- 501
- 502
- 503
- 504
- 505
- 506
- 507
- 508
- 509
- 510
- 511
- 512
- 513
- 514
- 515
- 516
- 517
- 518
- 519
- 520
- 521
- 522
- 523
- 524
- 525
- 526
- 527
- 528
- 529
- 530
- 531
- 532
- 533
- 534
- 535
- 536
- 537
- 538
- 539
- 540
- 541
- 542
- 543
- 544
- 545
- 546
- 547
- 548
- 549
- 550
- 551
- 552
- 553
- 554
- 555
- 556
- 557
- 558
- 559
- 560
- 561
- 562
- 563
- 564
- 565
- 566
- 567
- 568
- 569
- 570
- 571
- 572
- 573
- 574
- 575
- 576
- 577
- 578
- 579
- 580
- 581
- 582
- 583
- 584
- 585
- 586
- 587
- 588
- 589
- 590
- 591
- 592
- 593
- 594
- 595
- 596
- 597
- 598
- 599
- 600
- 601
- 602
- 603
- 604
- 605
- 606
- 607
- 608
- 609
- 610
- 611
- 612
- 613
- 614
- 615
- 616
- 617
- 618
- 619
- 620
- 621
- 622
- 623
- 624
- 625
- 626
- 627
- 628
- 629
- 630
- 631
- 632
- 633
- 634
- 635
- 636
- 637
- 638
- 639
- 640
- 641
- 642
- 643
- 644
- 645
- 646
- 647
- 648
- 649
- 650
- 651
- 652
- 653
- 654
- 655
- 656
- 657
- 658
- 659
- 660
- 661
- 662
- 663
- 664
- 665
- 666
- 667
- 668
- 669
- 670
- 671
- 672
- 673
- 674
- 675
- 676
- 677
- 678
- 679
- 680
- 681
- 682
- 683
- 684
- 685
- 686
- 687
- 688
- 689
- 690
- 691
- 692
- 693
- 694
- 695
- 696
- 697
- 698
- 699
- 700
- 701
- 702
- 703
- 704
- 705
- 706
- 707
- 708
- 709
- 1 - 50
- 51 - 100
- 101 - 150
- 151 - 200
- 201 - 250
- 251 - 300
- 301 - 350
- 351 - 400
- 401 - 450
- 451 - 500
- 501 - 550
- 551 - 600
- 601 - 650
- 651 - 700
- 701 - 709
Pages: