Medical Statistics at a Glance
Flow charts indicating appropriate techniques in different circumstances* Flow chart for hypothesis tests I I 1 I 1 group Numerical data Categorical data I I I One-sample 2 groups > 2 groups I t-test (19) Sign test (19) ,I I 1 , Independent I 2 categories I ,I Paired One-way (investigating > 2 groups Independent ANOVA (22) 1 group proportions) Kroskal-Wallis Chi-squared Unpaired test (22) z test for a I test (25) t-test (2\" Wicoxon rank I Iproponion (23) I Chi-squared sum test (21) I ISign test (23) trend test (25) , ,2groups Chi-squared I test (25) I i Paired t-test (20) IWilcoxon signedl ranks test (20) Independent Sign test (19) paid Chi-squared McNemar's Flow chart for further analyses 1 1Additional topics Longitudinal studies Survival analysis (41) Systematic reviews and ABgaryeeesmiaennmt -ekthaopdpsa ((4326)) meta-analyses (38) Correlation coefficients Multiple (29) Pearson's (26) Logistic (30) Spearman's (26) Modelling (31) \"Relevant topic numbers shown in parenthesis
Medical Statistics at a Glance AVIVA PETRIE Senior Lecturer in Statistics Biostatistics Unit Eastman Dental Institute for Oral Health Care Sciences University College London 256 Grays Inn Road London WClX 8LD and Honorary Lecturer in Medical Statistics Medical StatisticsUnit London Schoolof Hygiene and Tropical Medicine Keppel Street London WClE 7HT CAROLINE SABIN Senior Lecturer in Medical Statisticsand Epidemiology Department of Primary Care and Population Sciences The Royal Free and University College Medical School Royal Free Campus Rowland Hill Street London NW3 2PF Blackwell Science
O 2000 by The right of the Author to be DISTRIBUTORS Blackwell Science Ltd identified as the Author of this Work Editorial Offices: has been asserted in accordance Marston Book Services Ltd Osney Mead, Oxford OX2 OEL with the Copyright, Designs and PO Box 269 25 John Street,London WClN 2BL Patents Act 1988. Abingdon, Oxon OX14 4YN 23 Ainslie Place, Edinburgh EH3 6AJ (Orders: Tel: 01235 465500 350 Main Street, Malden All rights reserved. No part of this publication may be reproduced, Fax: 01235 465555) MA 02148-5018,USA stored in a retrieval system, or 54 University Street,Carlton transmitted, in any form or by any USA means, electronic, mechanical, Blackwell Science,Inc. Victoria 3053,Australia photocopying, recording or otherwise, Commerce Place 10,rue Casimir Delavigne except as permitted by the UK 350 Main Street Copyright,Designs and Patents Act Malden,MA 02148-5018 75006 Paris,France 1988,without the prior permission (Orders: Tel: 800759 6102 of the copyright owner. 781 388 8250 Other Editorial Offices: Fax: 781 388 8255) BlackwellWissenschafts-Verlag GmbH A catalogue record for this title Kurfiirstendamm 57 is available from the British Library Canada 10707Berlin, Germany Login Brothers Book Company ISBN 0-632-05075-6 324 Saulteaux Crescent Blackwell Science KK Winnipeg,Manitoba R3J 3T2 MG Kodenmacho Building Library of Congress (Orders: Tel: 204 8372987) 7-10 Kodenmacho Nihombashi Cataloging-in-publication Data Chuo-ku,Tokyo 104,Japan Australia Petrie, Aviva. Blackwell Science Pty Ltd First published 2000 54 University Street Medical statistics at a glance / Aviva Carlton,Victoria 3053 Set by Excel Typesetters Co.,Hong Kong (Orders: Tel: 3 9347 0300 Printed and bound in Great Britain at Petrie, Caroline Sabin. Fax: 3 9347 5001) the Alden Press, Oxford and Northampton p. cm.. For further information on The Blackwell Science logo is a Blackwell Science,visit our website: trade mark of Blackwell Science Ltd, Includes index. www.blackwell-science.com registered at the United Kingdom Trade Marks Registry ISBN 0-632-05075-6 1. Medical statistics. 2. Medicine - Statistical methods. I. Sabin, Caroline. 11. Title. R853.S7 P476 2000 610'.7'27 -dc21 99-045806
Contents Preface, 6 Regression and correlation: 26 Correlation,67 Handling data 27 The theory of linear regression, 70 Types of data, 8 28 Performing a linear regression analysis,72 Data entry, 10 29 Multiple linear regression, 75 Error checking and outliers, 12 30 Polynomial and logistic regression, 78 Displaying data graphically, 14 31 Statistical modelling, 80 Describing data (1): the 'average', 16 Describing data (2): the 'spread', 18 Important considerations: Theoretical distributions (1):the Normal 32 Checking assumptions, 82 distribution, 20 33 Sample size calculations, 84 Theoretical distributions (2): other distributions,22 34 Presenting results, 87 Transformations, 24 Additional topics Samplingand estimation Diagnostic tools, 90 Sampling and sampling distributions, 26 Assessing agreement, 93 Confidence intervals, 28 Evidence-based medicine, 96 Systematic reviews and meta-analysis, 98 Study design Methods for repeated measures, 101 Study design I, 30 Time series, 104 Study design II,32 Survival analysis,106 Clinical trials, 34 Bayesian methods, 109 Cohort studies, 37 Case-control studies, 40 Appendices A Statistical tables, 112 Hypothesistesting B Altman's nomogram for sample size calculations, 119 Hypothesis testing, 42 C Typical computer output, 120 Errors in hypothesis testing, 44 D Glossary of terms, 127 Basic techniques for analysing data Index, 135 Numerical data: A single group, 46 Two related groups, 49 Two unrelated groups, 52 More than two groups, 55 Categorical data: A single proportion, 58 Two proportions, 61 More than two categories, 64
Medical Statistics at a Glance is directed at undergraduate guide to the most commonly used statistical procedures. medical students, medical researchers, postgraduates in the Epidemiology is closely allied to medical statistics. Hence biomedical disciplines and at pharmaceutical industry per- some of the main issues in epidemiology, relating to study sonnel. All of these individuals will, at some time in their design and interpretation, are discussed. Also included are professional lives, be faced with quantitative results (their topics that the reader may find useful only occasionally,but own or those of others) that will need to be critically which are, nevertheless, fundamental to many areas of evaluated and interpreted, and some, of course,will have to medical research; for example, evidence-based medicine, pass that dreaded statistics exam! A proper understanding systematic reviews and meta-analysis, time series, survival of statistical concepts and methodology is invaluable for analysis and Bayesian methods. We have explained the these needs. Much as we should like to fire the reader with principles underlying these topics so that the reader will be an enthusiasm for the subject of statistics,we are pragmatic. able to understand and interpret the results from them Our aim is to provide the student and the researcher, as when they are presented in the literature. More detailed well as the clinician encountering statistical concepts in discussions may be obtained from the references listed on the medical literature, with a book that is sound, easy to our Web site. read, comprehensive, relevant, and of useful practical application. There is extensive cross-referencing throughout the text to help the reader link the various procedures.The Glossary We believe Medical Statistics at a Glance will be particu- of terms (Appendix D) provides readily accessible expla- larly helpful as a adjunct to statistics lectures and as a refer- nations of commonly used terminology. A basic set of sta- ence guide. In addition, the reader can assess hislher tistical tables is contained in Appendix A. Neave, H.R. progress in self-directed learning by attempting the exer- (1981) Elemementary Statistical Tables Routledge, and cises on our Web site (www.medstatsaag.com), which can be Geigy Scientific Tables Vol. 2, 8th edn (1990) Ciba-Geigy accessed from the 1nternet.ThisWeb site also contains a full Ltd., amongst others, provide fuller versions if the reader set of references (some of which are linked directly to requires more precise results for hand calculations. Medline) to supplement the references quoted in the text and provide useful background information for the exam- We know that one of the greatest difficulties facing non- ples. For those readers who wish to gain a greater insight statisticians is choosing the appropriate technique.We have into particular areas of medical statistics, we can recom- therefore produced two flow-charts which can be used both mend the following books: to aid the decision as to what method to use in a given situa- tion and to locate a particular technique in the book easily. Altman, D.G. (1991) Practical Statistics for Medical They are displayed prominently on the inside cover for easy Research. Chapman and Hall, London. access. Armitage, P., Berry, G. (1994) Statistical Methods in Medical Every topic describing a statistical technique is accompa- Research, 3rd edn. Blackwell Scientific Publications, nied by an example illustrating its use. We have generally Oxford. obtained the data for these examples from collaborative studies in which we or colleagues have been involved; in Pocock, S.J. (1983) Clinical Trials: A Practical Approach. some instances, we have used real data from published Wiley, Chichester. papers. Where possible, we have utilized the same data set in more than one topic to reflect the reality of data analysis, In line with other books in the A t a Glance series, we lead which is rarely restricted to a single technique or approach. the reader through a number of self-contained, two- and Although we believe that formulae should be provided and three-page topics, each covering a different aspect of the logic of the approach explained as an aid to understand- medical statistics. We have learned from our own teaching ing, we have avoided showing the details of complex calcu- experiences, and have taken account of the difficulties that lations-most readers will have access to computers and our students have encountered when studying medical sta- are unlikely to perform any but the simplest calculations by tistics. For this reason, we have chosen to limit the theoreti- hand. cal content of the book to a level that is sufficient for understanding the procedures involved, yet which does not We consider that it is particularly important for the overshadow the practicalities of their execution. reader to be able to interpret output from a computer package. We have therefore chosen, where applicable, to Medical statistics is a wide-ranging subject covering a show results using extracts from computer output. In some large number of topics.We have provided a basic introduc- instances, when we believe individuals may have difficulty tion to the underlying concepts of medical statistics and a
with its interpretation, we have included (Appendix C) and portion of it, all of whom made invaluable comments and annotated the complete computer output from an analysis suggestions. Naturally, we take full responsibility for any of a data set. There are many statistical packages in remaining errors in the text or examples. common use; to give the reader an indication of how output can vary, we have not restricted the output to a particular It remains only to thank those who have lived and package and have, instead, used three well known ones: worked with us and our commitment to this project- SAS,SPSS and STATA. Mike, Gerald, Nina, Andrew, Karen, and Diane. They have shown tolerance and understanding, particularly in We wish to thank everyone who has helped us by provid- the months leading to its completion, and have given us the ing data for the examples. We are particularly grateful to opportunity to concentrate on this venture and bring it Richard Morris, Fiona Lampe and Shak Hajat, who read to fruition. the entire book, and Abul Basar who read a substantial pro-
1 Types of data Data and statistics have names. Examples include blood group (A, B, AB, and 0 ) and marital status (married/widowedlsingle etc.). In this The purpose of most studies is to collect data to obtain case there is no reason to suspect that being married is any information about a particular area of research. Our data better (or worse) than being single! comprise observations on one or more variables; any quan- tity that varies is termed a variable. For example, we may Ordinal data-the categories are ordered in some way. collect basic clinical and demographic information on Examples include disease staging systems (advanced, mod- patients with a particular illness. The variables of interest erate, mild, none) and degree of pain (severe, moderate, may include the sex, age and height of the patients. mild, none). Our data are usually obtained from a sample of individ- A categorical variable is binary or dichotomous when uals which represents the population of interest. Our aim is there are only two possible categories. Examples include to condense these data in a meaningful way and extract 'YeslNo', 'DeadlAlive' or 'Patient has diseaselpatient does useful information from them. Statistics encompasses the not have disease'. methods of collecting,summarizing, analysing and drawing conclusions from the data: we use statistical techniques to Numerical (quantitative) data achieve our aim. These occur when the variable takes some numerical value. Data may take many different forms. We need to know We can subdivide numerical data into two types. what form every variable takes before we can make a deci- sion regarding the most appropriate statistical methods to Discrete data-occur when the variable can only take use. Each variable and the resulting data will be one of two certain whole numerical values. These are often counts of types: categorical or numerical (Fig. 1.I). numbers of events, such as the number of visits to a GP in a year or the number of episodes of illness in an individual Categorical (qualitative) data over the last five years. These occur when each individual can only belong to one of Continuous data-occur when there is no limitation on a number of distinct categories of the variable. the values that the variable can take, e.g. weight or height, other than that which restricts us when we make the Nominal data-the categories are not ordered but simply measurement. I Variable I Distinguishing between data types (quantitative) We often use very different statistical methods depending on whether the data are categorical or numerical. Although Discrete Continuous the distinction between categorical and numerical data is usually clear, in some situations it may become blurred. Categories Categories Integervalues. Takes any value For example, when we have a variable with a large number are mutually are mutually typically in a range of of ordered categories (e.g. a pain scale with seven exclusive and exclusive and counts values categories), it may be difficult to distinguish it from a dis- unordered ordered crete numerical variable. The distinction between discrete e.g. e.g. and continuous numerical data may be even less clear, e.g. e.g. Days sick Weight in kg although in general this will have little impact on the results Sex (male1 Disease stage per year Height in cm of most analyses. Age is an example of a variable that is female) (mildlmoderatel often treated as discrete even though it is truly continuous. Blood group severe) We usually refer to 'age at last birthday' rather than 'age', (NB/AB/O) and therefore, a woman who reports being 30 may have just had her 30th birthday, or may be just about to have her 31st Fig. 1.1 Diagram showing the different types of variable. birthday. Do not be tempted to record numerical data as categori- cal at the outset (e.g. by recording only the range within which each patient's age falls into rather than hislher actual age) as important information is often lost. It is simple to convert numerical data to categorical data once they have been collected.
Derived data All these variables can be treated as continuous variables for most analyses.Where the variable is derived using more We may encounter a number of other types of data in the than one value (e.g. the numerator and denominator of a medical field.These include: percentage), it is important to record all of the values used. For example, a 10% improvement in a marker following Percentages-These may arise when considering im- treatment may have different clinical relevance depending provements in patients following treatment, e.g. a patient's on the level of the marker before treatment. lung function (forced expiratory volume in 1second,F E W ) may increase by 24% following treatment with a new drug. Censored data In this case, it is the level of improvement, rather than the absolute value, which is of interest. We may come across censored data in situations illustrated by the following examples. Ratios or quotients-Occasionally you may encounter the ratio or quotient of two variables. For example, body If we measure laboratory values using a tool that can only mass index (BMI), calculated as an individual's weight (kg) detect levels above a certain cut-off value, then any values divided by hislher height squared (m2) is often used to below this cut-off will not be detected. For example, when assess whether helshe is over- or under-weight. measuring virus levels,those below the limit of detectability will often be reported as 'undetectable' even though there Rates-Disease rates, in which the number of disease may be some virus in the sample. events is divided by the time period under consideration, are common in epidemiological studies (Topic 12). We may encounter censored data when following patients in a trial in which, for example, some patients Scores-We sometimes use an arbitrary value, i.e.a score, withdraw from the trial before the trial has ended.This type when we cannot measure a quantity. For example, a series of data is discussed in more detail in Topic 41. of responses to questions on quality of life may be summed to give some overall quality of life score on each individual.
2 Data entry When you carry out any study you will almost always the computer. For example, you may choose to assign the need to enter the data into a computer package. Computers codes of 1,2,3 and 4 to categories of 'no pain', 'mild pain', are invaluable for improving the accuracy and speed of 'moderate pain' and 'severe pain', respectively.These codes data collection and analysis, making it easy to check for can be added to the forms when collecting the data. For errors, producing graphical summaries of the data and binary data, e.g. yeslno answers, it is often convenient to generating new variables. It is worth spending some time assign the codes 1(e.g. for 'yes') and 0 (for 'no'). planning data entry-this may save considerable effort at later stages. Single-coded variables -there is only one possible answer to a question, e.g. 'is the patient dead?' It is not pos- Formats for data entry sible to answer both 'yes' and 'no' to this question. There are a number of ways in which data can be entered Multi-coded variables-more than one answer is pos- and stored on a computer. Most statistical packages allow sible for each respondent. For example,'what symptoms has you to enter data directly. However, the limitation of this this patient experienced?' In this case, an individual may approach is that often you cannot move the data to another have experienced any of a number of symptoms.There are package. A simple alternative is to store the data in either a two ways to deal with this type of data depending upon spreadsheet or database package. Unfortunately, their sta- which of the two following situations applies. tistical procedures are often limited, and it will usually be necessary to output the data into a specialist statistical There are only a few possible symptoms,and individu- package to carry out analyses. als may have experienced many of them. A number of different binary variables can be created, which A more flexible approach is to have your data available correspond to whether the patient has answered yes as an ASCII or text file. Once in an ASCII format, the data or no to the presence of each possible symptom. For can be read by most packages. ASCII format simply con- example, 'did the patient have a cough?' 'Did the sists of rows of text that you can view on a computer screen. patient have a sore throat?' Usually, each variable in the file is separated from the next There are a very large number of possible symptoms by some delimiter,often a space or a comma.This is known but each patient is expected to suffer from only a few as free format. of them. A number of different nominal variables can be created; each successive variable allows you to The simplest way of entering data in ASCII format is to name a symptom suffered by the patient. For example, type the data directly in this format using either a word pro- 'what was the first symptom the patient suffered?' cessing or editing package. Alternatively, data stored in 'What was the second symptom?' You will need to spreadsheet packages can be saved in ASCII format. Using decide in advance the maximum number of symptoms either approach, it is customary for each row of data to cor- you think a patient is likely to have suffered. respond to a different individual in the study, and each column to correspond to a different variable, although it Numerical data may be necessary to go on to subsequent rows if a large number of variables is collected on each individual. Numerical data should be entered with the same precision as they are measured, and the unit of measurement should Planning data entry be consistent for all observations on a variable. For example, weight should be recorded in kilograms or in When collecting data in a study you will often need to use a pounds, but not both interchangeably. form or questionnaire for recording data. If these are designed carefully,they can reduce the amount of work that Multiple forms per patient has to be done when entering the data. Generally, these formslquestionnaires include a series of boxes in which the Sometimes, information is collected on the same patient on data are recorded-it is usual to have a separate box for more than one occasion. It is important that there is some each possible digit of the response. unique identifier (e.g. a serial number) relating to the indi- vidual that will enable you to link all of the data from an Categorical data individual in the study. Some statistical packages have problems dealing with non- Problems with dates and times numerical data.Therefore, you may need to assign numeri- cal codes to categorical data before entering the data on to Dates and times should be entered in a consistent manner, e.g. either as daylmonthlyear or monthldaylyear, but not
interchangeably. It is important to find out what format the missing values, whereas others require you to define your statistical package can read. own code for a missing value (commonly used values are 9, 999 or -99). The value that is chosen should be one that is Coding missing values not possible for that variable. For example, when entering a categorical variable with four categories (coded 1 , 2 , 3 and You should consider what you will do with missing values 4), you may choose the value 9 to represent missing values. before you enter the data. In most cases you will need to use However, if the variable is 'age of child' then a different some symbol to represent a missing value. Statistical pack- code should be chosen. Missing data are discussed in more ages deal with missing values in different ways. Some use detail in Topic 3. special characters (e.g, a full stop or asterisk) to indicate Example D15cre. Multicoded varrab'~ Erq-or o* q!ir;?~~tlca:rr: r;i?9~.1nuoid4 variable -+omr crc;:-lar.?:i in 111. Flominal -usad ta create o t - r r ~~n !!702. ,,,firlab) Nnjn,ql O,.j var~ablca -can only -no ordering fa certain separate b:nav 7 cateaories variables value4 a ranac ,.:. .,- 8 . ,..I DAYE . .3 - ! I.... : , I . 1 -, ,,..-,,,.-,- n.1-i. r 3. - ~ r . e . r r ; . ' 'I ,I.,.. .....no..,, ;r,nn, :-,-,o.rl LX I I. :,..,+r,. ir.7,- i' ,. . rt !,,rc, ,: t . . . \" , ! : , , mxhy ':. ii .',I l > r n i.. t .rl Fig. 2.1 Portion of a spreadsheet showing data collccred on :i wmple of (4women with inhcritctl hlecdinp di.;ordcrs. As part of a study on the effect of inherited bleeding sheet. but hcforc they have bcen checked for errors. The disorders on pregnancy and childbirth. data were col- coding schemes for the categorical variables are shown at lected on a sample of 64 women registered at a single the bottom of Fig. 2.1. Each row of the spreadsheet rep- haemophilia centre in London. The women were asked resents a separate individual in thc study: each column questions relating to their bleeding disorder and their represents a diffcrcnl variablc. Whcre thc woman is still first pregnancy ( o r their current pregnancy if they were pregnant. thc ;tpc of thc woman at thc timu of hirth has pregnant for the first time on the date of interview). been calculated from the estimated date of the babv's fig. ?.I shows t h e data from a small selection of the delivery. Data relating to the live births arc shown in women after the data have been entered onto a sprcad- Topic 34. Data kindly provided by Dr R.A. Kadir. L!nivenity Dcpartmcnt of Obstetrics and Gvn;~ecologya. nd Professor C.A. Lcc. Haemophilia Centre and FIacmostasis Unit. Royal Frec Hospital. London.
3 Error checking and outliers In any study there is always the potential for errors to occur studies). In addition, patients who have died should not in a data set, either at the outset when taking measure- appear for subsequent follow-up visits! ments, or when collecting, transcribing and entering the data onto a computer. It is hard to eliminate all of these With all error checks, a value should only be corrected if errors. However, you can reduce the number of typing and there is evidence that a mistake has been made. You should transcribing errors by checking the data carefully once they not change values simply because they look unusual. have been entered. Simply scanning the data by eye will often identify values that are obviously wrong. In this topic Handling missing data we suggest a number of other approaches that you can use when checking data. There is always a chance that some data will be missing. If a very large proportion of the data is missing, then the Typing errors results are unlikely to be reliable. The reasons why data are missing should always be investigated-if missing Typing mistakes are the most frequent source of errors data tend to cluster on a particular variable and/or in a when entering data. If the amount of data is small, then particular sub-group of individuals, then it may indicate you can check the typed data set against the original that the variable is not applicable or has never been formslquestionnaires to see whether there are any typing measured for that group of individuals. In the latter case, mistakes. However, this is time-consuming if the amount of the group of individuals should be excluded from any data is large. It is possible to type the data in twice and analysis on that variable. It may be that the data are simply compare the two data sets using a computer program. Any sitting on a piece of paper in someone's drawer and are yet differences between the two data sets will reveal typing to be entered! mistakes, Although this approach does not rule out the pos- sibility that the same error has been incorrectly entered on Outliers both occasions, or that the value on the formlquestionnaire is incorrect, it does at least minimize the number of errors. What are outliers? The disadvantage of this method is that it takes twice as Outliers are observations that are distinct from the main long to enter the data, which may have major cost or time body of the data, and are incompatible with the rest of the implications. data. These values may be genuine observations from indi- viduals with very extreme levels of the variable. However, Error checking they may also result from typing errors, and so any suspi- cious values should be checked. It is important to detect Categorical data-It is relatively easy to check categori- whether there are outliers in the data set, as they may have cal data, as the responses for each variable can only take a considerable impact on the results from some types of one of a number of limited values.Therefore, values that are analyses. not allowable must be errors. For example, a woman who is 7 feet tall would probably Numerical data-Numerical data are often difficult to appear as an outlier in most data sets. However, although check but are prone to errors. For example, it is simple to this value is clearly very high, compared with the usual transpose digits or to misplace a decimal point when enter- heights of women, it may be genuine and the woman may ing numerical data. Numerical data can be range checked- simply be very tall. In this case, you should investigate this that is, upper and lower limits can be specified for each value further, possibly checking other variables such as her variable. If a value lies outside this range then it is flagged age and weight, before making any decisions about the up for further investigation. validity of the result. The value should only be changed if there really is evidence that it is incorrect. Dates -It is often difficult to check the accuracy of dates, although sometimes you may know that dates must fall Checking for outliers within certain time periods. Dates can be checked to make A simple approach is to print the data and visually check sure that they are valid. For example,30th February must be them by eye.This is suitable if the number of observations is incorrect, as must any day of the month greater than 31,and not too large and if the potential outlier is much lower or any month greater than 12. Certain logical checks can also higher than the rest of the data. Range checking should be applied. For example, a patient's date of birth should also identify possible outliers. Alternatively, the data can correspond to hislher age, and patients should usually be plotted in some way (Topic 4)-outliers can be clearly have been born before entering the study (at least in most identified on histograms and scatter plots.
Handling outliers and excluding the value. If the results are similar, then the It is important not to remove an individual from an analysis outlier does not have a great influence on the result. simply because hisher values are higher or lower than However, if the results change drastically,it is important to might be expected. However, the inclusion of outliers may use appropriate methods that are not affected by outliers to affect the results when some statistical techniques are used. analyse the data. These include the use of transformations A simple approach is to repeat the analysis both including (Topic 9) and non-parametric tests (Topic 17). Example /Digit5 trarrsp04ed? 1 % rl11~,:,?:rr--ct? Tspila m i + f . a l ~ ~ rcy o n Ei;io.~idbp '7!c3.6!47 Should be 417 child' Fig.3.1 Checking for errors in a data set. After entering the data descrihcd in Topic 2 , ~ h dcata sct and weight column^) art. likely to he errorl;, hut the notes is checked for errors. Some of the inconsistencieg high- should he checked hcforo anv decision is n~adca. s thesc lighted arc simple data entry crrors. Fc t. ~ h ccoda2 may, rcflcct .~tlicrsI.n this case, the Fest: of'41'in the'sexof bahy'column isinc .a result of age of paticnt 27 was 4 1 wcc ks. anid it was d the sex information being micsing for paticnl L,on;.l nL cA I..c.>.t that a weight :g was incrorrect. A s i t was nl of the data for patient 20 had been entered in the incorrect sihlc to find the corrcct wcisht for this hahy. the value columns. Others (c.g.unusual valucs in the gestalional age was entered as missin%.
4 Displaying data graphically One of the first things that you may wish to do when you Dot plot -each observation is represented by one dot on have entered your data onto a computer is to summarize a horizontal (or vertical) line (Fig. 4.le).This type of plot is them in some way so that you can get a 'feel' for the data. very simple to draw,but can be cumbersome with large data This can be done by producing diagrams, tables or summary sets. Often a summary measure of the data, such as the statistics (Topics 5 and 6). Diagrams are often powerful mean or median (Topic 5), is shown on the diagram. This tools for conveying information about the data, for provid- plot may also be used for discrete data. ing simple summary pictures, and for spotting outliers and trends before any formal analyses are performed. Stem-and-leaf plot -This is a mixture of a diagram and a table; it looks similar to a histogram turned on its side,and is One variable effectively the data values written in increasing order of Frequency distributions size. It is usually drawn with a vertical stem, consisting of the first few digits of the values, arranged in order. Protrud- An empirical frequency distribution of a variable relates ing from this stem are the leaves-i.e. the final digit of each each possible observation, class of observations (i.e. range of the ordered values, which are written horizontally (Fig. of values) or category, as appropriate, to its observed 4.2) in increasing numerical order. frequency of occurrence. If we replace each frequency by a relative frequency (the percentage of the total frequency), Box plot (often called a box-and-whisker plot) -This is a we can compare frequency distributions in two or more vertical or horizontal rectangle,with the ends of the rectan- groups of individuals. gle corresponding to the upper and lower quartiles of the data values (Topic 6). A line drawn through the rectangle Displaying frequencydistributions corresponds to the median value (Topic 5).Whiskers, start- Once the frequencies (or relative frequencies) have been ing at the ends of the rectangle, usually indicate minimum obtained for categorical or some discrete numerical data, and maximum values but sometimes relate to particular these can be displayed visually. percentiles, e.g. the 5th and 95th percentiles (Topic 6, Fig. 6.1). Outliers may be marked. Bar or column chart-a separate horizontal or vertical bar is drawn for each category, its length being proportional The 'shape' of the frequencydistribution to the frequency in that category.The bars are separated by The choice of the most appropriate statistical method will small gaps to indicate that the data are categorical or often depend on the shape of the distribution. The distribu- discrete (Fig. 4.la). tion of the data is usually unimodal in that it has a single 'peak'. Sometimes the distribution is bimodal (two peaks) Pie chart-a circular 'pie' is split into sections, one for or uniform (each value is equally likely and there are no each category, so that the area of each section is propor- peaks). When the distribution is unimodal, the main aim tional to the frequency in that category (Fig. 4.lb). is to see where the majority of the data values lie, relative to the maximum and minimum values. In particular, it is It is often more difficult to display continuous numerical important to assess whether the distribution is: data, as the data may need to be summarized before being drawn. Commonly used diagrams include the following symmetrical-centred around some mid-point, with one examples. side being a mirror-image of the other (Fig. 5.1); Histogram-this is similar to a bar chart,but there should skewed to the right (positively skewed)-a long tail to the be no gaps between the bars as the data are continuous (Fig. right with one or a few high values. Such data are common 4.ld). The width of each bar of the histogram relates to a in medical research (Fig. 5.2); range of values for the variable. For example, the baby's weight (Fig. 4.ld) may be categorized into 1.75-1.99kg, skewed to the left (negatively skewed)-a long tail to the 2.00-2.24 kg, . . . ,4.25-4.49 kg. The area of the bar is pro- left with one or a few low values (Fig.4.ld). portional to the frequency in that range. Therefore, if one of the groups covers a wider range than the others, its base Two variables will be wider and height shorter to compensate. Usually, between five and 20 groups are chosen; the ranges should If one variable is categorical, then separate diagrams be narrow enough to illustrate patterns in the data, but showing the distribution of the second variable can be should not be so narrow that they are the raw data. The his- drawn for each of the categories. Other plots suitable for togram should be labelled carefully, to make it clear where such data include clustered or segmented bar or column the boundaries lie. charts (Fig.4.1~). If both of the variables are continuous or ordinal, then
Epidural1 1 5 . 6 FXI deficiency nm>Once Iv Pethidine 3 1 17'6 a week m,( Once @2o7Op&hilia A a week IM Pethidinep ~ j 3 4 . 4 C Never Inhaledgas l ~ 3 9 . 1 -z - - L 'IvWD Haemophilia 0 a 0 10 20 30 40 489b 8'0 Haemophilia FXI % of women in sludv' Haemophilia B vWD deficiency 'Based on 48 women with pregnancies A (a) (C) BEeeding disorder -C I M - I - C I C \\ . z - r t -C - 7 cl, Vn-m7~I- 5, -~~mcu?,~mr~-- Age of mother (years) ~CLd,~LAALA~LA~~ hO~mhONmr-Om -NNc.,Nmmmm-3T (8) Welght of baby (kg) Fig. 4.1 A selection of graphical output which may be produced when experience bleeding gums. (d) Histogramshowing the weight of the summarizing the obstetric data in women with bleeding disorders baby at birth. (e) Dot-plot showing the mother's age at the time of (Topic 2). (a) Bar chart showing the percentage of women in the study the baby's birth,with the median age marked as a horizontal line. who required pain relief from any of the listed interventions during (f) Scatter diagram showing the relationship between the mother's labour. (b) Pie chart showing the percentage of women in the study age at delivery (on the horizontal orx-axis) and the weight of the baby with each bleeding disorder. (c) Segmentedcolumnchart showing the (on the vertical or y-axis). frequency with which women with different bleeding disorders the relationship between the two can be illustrated using a Beclomethasone Placebo scatter diagram (Fig. 4.lf). This plots one variable against dipropionate the other in a two-way diagram. One variable is usually termed the x variable and is represented on the horizontal axis.The second variable, known as they variable, is plotted on the vertical axis. Identifying outliers using graphical methods We can often use single variable data displays to identify outliers. For example, a very long tail on one side of a his- togram may indicate an outlying value. However, outliers may sometimes only become apparent when considering the relationship between two variables. For example, a weight of 55kg would not be unusual for a woman who was 1.6m tall, but would be unusually low if the woman's height was 1.9m. Fig.4.2 Stem-and-leaf plot showing the FEVl (litres) in children receiving inhaled beclomethasone dipropionate or placebo (Topic 21).
5 Describing data (1): the 'average' Summarizing data vidual, and xithe height of the ith individual, etc. We can It is very difficult to have any 'feeling' for a set of numerical write the formula for the arithmetic mean of the observa- measurements unless we can summarize the data in a meaningful way.A diagram (Topic 4) is often a useful start- tions, written x and pronounced 'xbar', as: ing point. We can also condense the information by provid- ing measures that describe the important characteristics of x = XI+x,+x, +...+ xn the data. In particular, if we have some perception of what n constitutes a representative value, and if we know how Using mathematical notation, we can shorten this to: widely scattered the observations are around it,then we can formulate an image of the data. The average is a general where C (the Greek uppercase 'sigma') means 'the sum term for a measure of location;it describes a typical mea- of', and the sub- and super-scripts on the 2 indicate that we surement. We devote this topic to averages, the most sum the values from i = 1to n.This is often further abbrevi- common being the mean and median (Table 5.1).We intro- ated to duce you to measures that describe the scatter or spread of the observations in Topic 6. The arithmetic mean The median The arithmeticmean,often simply called the mean, of a set If we arrange our data in order of magnitude, starting with of values is calculated by adding up all the values and divid- the smallest value and ending with the largest value, then ing this sum by the number of values in the set. the median is the middle value of this ordered set. The It is useful to be able to summarize this verbal description median divides the ordered values into two halves, with an by an algebraic formula. Using mathematical notation, we equal number of values both above and below it. write our set of n observations of a variable, x,as x,,x,, x,,. . . ,xn.For example, x might represent an individual's It is easy to calculate the median if the number of obser- height (cm), so that x,represents the height of the first indi- vations, n, is odd. It is the (n + 1)12th observation in the Mean = 27 0 years Mpd~an= 27 0 years ordered set. So,for example, if n = 11,then the median is the G~ovctrlcmean = 26 5 yean (11 + 1)12= 1212= 6th observation in the ordered set. If n is n+iL I I -h E .I, - 1 Median= 1.94 mmolk Geometric mean = 2.04 mrn 80 Mean = 2.39 rnr Age of mother at btrW of chtld (years) 0123156789 Triglyceride level (mmolfl) Fig.5.1 The mean, median and geometric mean age of the women in the study described inTopic 2 at the time of the baby's birth.As Fig. 5.2 The mean, median and geometric mean triglyceride level in a the distribution of age appears reasonably symmetrical,the three sample of 232 men who developed heart disease (Topic 19).Asthe measures of the 'average' all give similar values, as indicated by the distribution of triglyceride is skewed to the right, the mean gives a dotted line. higher 'average' than either the median or geometric mean.
even then, strictly,there is no median. However, we usually For example, suppose we are interested in determining calculate it as the arithmetic mean of the two middle obser- the average length of stay of hospitalized patients in a district, and we know the average discharge time for vations in the ordered set [i.e.the nl2th and the (n/2 + l)th]. patients in every hospital. To take account of the amount of information provided, one approach might be to take So, for example, if n = 20, the median is the arithmetic each weight as the number of patients in the associated hospital. mean of the 2012 = 10th and the (2012 + 1) = (10 + 1) = 11th The weighted mean and the arithmetic mean are identi- observations in the ordered set. cal if each weight is equal to one. The median is similar to the mean if the data are symmet- Table 5.1 Advantages and disadvantages of averages. rical (Fig. 5.1), less than the mean if the data are skewed to the right (Fig. 5.2), and greater than the mean if the data are Type of Advantages Disadvantages skewed to the left. average The mode Mean Uses all the data values Distorted by outliers Algebraically defined Distorted by skewed data The mode is the value that occurs most frequently in a data Median and so mathematically set;if the data are continuous,we usually group the data and manageable Ignores most of the calculate the modal group. Some data sets do not have a Mode Known sampling information mode because each value only occurs once. Sometimes, distribution (Topic 9) Not algebraically defined there is more than one mode; this is when two or more Geometric Complicated sampling values occur the same number of times, and the frequency mean Not distorted by distribution of occurrence of each of these values is greater than that outliers Ignores most of the of any other value. We rarely use the mode as a summary Weighted Not distorted by information measure. mean skewed data Not algebraically defined Unknown sampling The geometric mean Easily determined for distribution categorical data Only appropriate if the The arithmetic mean is an inappropriate summary measure log transformation of location if our data are skewed. If the data are skewed to Before back- produces a symmetrical the right, we can produce a distribution that is more sym- transformation, it has distribution metrical if we take the logarithm (to base 10 or to base e) of the same advantages as each value of the variable in this data set (Topic 9). The the mean Weights must be known or arithmetic mean of the log values is a measure of location Appropriate for right estimated for the transformed data. To obtain a measure that has the skewed data same units as the original observations, we have to back- transform (i.e. take the antilog of) the mean of the log data; Same advantages as we call this the geometric mean. Provided the distribution the mean of the log data is approximately symmetrical, the geometric Ascribes relative mean is similar to the median and less than the mean of the importance to each raw data (Fig.5.2). observation Algebraically defined The weighted mean We use a weighted mean when certain values of the vari- able of interest, x, are more important than others. We attach a weight, w , to each of the values,xi, in our sample, to reflect this importance. If the values xl, x2,x,, . . . ,x, have corresponding weights w,, w,, w,,. . . , w, the weighted arithmetic mean is:
Describing data (2): the 'spread' Summarizing data deciles. The values of x that divide the ordered set into four equally sized groups, that is the 25th, 50th, and 75th If we are able to provide two summary measures of a percentiles, are called quartiles.The 50th percentile is the continuous variable, one that gives an indication of the median (Topic5). 'average' value and the other that describes the 'spread' of the observations, then we have condensed the data in a Using percentiles meaningful way.We explained how to choose an appropri- We can obtain a measure of spread that is not influenced by ate average in Topic 5.We devote this topic to a discussion outliers by excluding the extreme values in the data set, and of the most common measures of spread (dispersion or determining the range of the remaining observations. The variability)which are compared in Table 6.1. interquartile range is the difference between the first and the third quartiles, i.e. between the 25th and 75th per- The range centiles (Fig. 6.1). It contains the central 50% of the obser- vations in the ordered set, with 25% of the observations The range is the difference between the largest and smallest lying below its lower limit, and 25% of them lying above its observations in the data set; you may find these two values upper limit.The interdecile range contains the central 80% quoted instead of their difference. Note that the range pro- of the observations, i.e. those lying between the 10th and vides a misleading measure of spread if there are outliers 90th percentiles. Often we use the range that contains the (Topic 3). central 95% of the observations, i.e. it excludes 2.5% of the observations above its upper limit and 2.5% below its lower Ranges derived from percentiles limit (Fig.6.1).We may use this interva1,provided it is calcu- lated from enough values of the variable in healthy individ- What are percentiles? uals, to diagnose disease. It is then called the reference interval,referencerange or normal range (Topic 35). Suppose we arrange our data in order of magnitude, start- The variance ing with the smallest value of the variable, x, and ending One way of measuring the spread of the data is to deter- with the largest value. The value of x that has 1% of the mine the extent to which each observation deviates from the arithmetic mean. Clearly, the larger the deviations, the observations in the ordered set lying below it (and 99% of Mean the observations lying above it) is called the first percentile. I The value of x that has 2% of the observations lying below it is called the second percentile, and so on. The values of x that divide the ordered set into 10 equally sized groups, that is the loth, 20th, 30th, . . .,90th percentiles, are called Interquartile range: ,Maximum= 4.46 kg 3.15 to 3.87 ko ---~edian = 3.64 kg 95% central ranae: II .- Squared distance = (34.65I 10 20 270130 3465 40 50 Age of mother (years) Fig.6.1 A box-and-whisker plot of the baby's weight at birth (Topic Eig.6.2 Diagram showing the spread of selected values of the 2).Tnis figure illustrates the median, the interquartile range, the range mother's age at the time of baby's birth (Topic 2) around the mean that contains the central 95% of the observations and the maximum value.The variance is calculated by adding up the squared distances and minimum values. between each point and the mean, and dividing by (n - 1).
greater the variability of the observations. However, we (intra- or within-subject variability) in the responses on cannot use the mean of these deviations as a measure of that individual.This may be because a given individual does spread because the positive differences exactly cancel not always respond in exactly the same way and/or because out the negative differences. We overcome this problem by of measurement error. However, the variation within an squaring each deviation, and finding the mean of these individual is usually less than the variation obtained when squared deviations (Fig. 6.2); we call this the variance. If we we take a single measurement on every individual in a group (inter- or between-subject variability). For example, have a sample of n observations, xl, x2,x3,.. . ,x,, whose a 17-year-old boy has a lung vital capacity that ranges between 3.60 and 3.87 litres when the measurement is mean is T, = (Zxi)/n, we calculate the variance, usually repeated 10times; the values for single measurements on 10 denoted by s2,of these observations as: boys of the same age lie between 2.98 and 4.33 litres.These concepts are important in study design (Topic 13). We can see that this is not quite the same as the arith- metic mean of the squared deviations because we have Table 6.1 Advantages and disadvantages of measures of spread. divided by n - 1instead of n. The reason for this is that we almost always rely on sample data in our investigations Measure Disadvantages (Topic 10). It can be shown theoretically that we obtain a of spread Advantages better sample estimate of the population variance if we divide by n - 1. Range Easily determined Uses only two observations Distorted by outliers The units of the variance are the square of the units of the Ranges Unaffected by Tends to increase with original observations,e.g. if the variable is weight measured based on outliers increasing sample size in kg, the units of the variance are kg2. percentiles Independent of sample size Clumsy to calculate The standard deviation Variance Appropriate for Cannot be calculated for skewed data small samples The standard deviation is the square root of the variance. In Standard Uses every Uses only two observations a sample of n observations,it is: deviation observation Not algebraically defined Algebraically defined We can think of the standard deviation as a sort of Units of measurement are average of the deviations of the observations from the Same advantages as the square of the units of mean. It is evaluated in the same units as the raw data. the variance the raw data Units of measurement Sensitive to outliers If we divide the standard deviation by the mean are the same as those Inappropriate for skewed and express this quotient as a percentage, we obtain the of the raw data data coefficient of variation. It is a measure of spread that Easily interpreted is independent of the units of measurement, but it has Sensitive to outliers theoretical disadvantages so is not favoured by statisticians. Inappropriate for skewed data Variation within- and between-subjects If we take repeated measurements of a continuous variable on an individual, then we expect to observe some variation
Theoretical distributions (1):the Normal distribution In Topic 4 we showed how to create an empirical frequency respectively, then the probability that a patient has some distribution of the observed data. This contrasts with a theoretical probability distribution,which is described by teeth is 0.67 +0.24 =0.91. a mathematical model. When our empirical distribution approximates a particular probability distribution, we can The multiplicationrule-if two events,A and B, are inde- use our theoretical knowledge of that distribution to pendent (i.e. the occurrence of one event is not contingent answer questions about the data. This often requires the on the other), then the probability that both events occur is evaluation of probabilities. equal to the product of the probability of each: Understanding probability Prob(A and B) = Prob(A) x Prob(B) e.g. if two unrelated Probability measures uncertainty; it lies at the heart of patients are waiting in the dentist's surgery,the probability statistical theory. A probability measures the chance of that both of them have no missing teeth is 0.67x 0.67 =0.45. a given event occurring. It is a positive number that lies between zero and one. If it is equal to zero, then the Probability distributions: the theory event cannot occur. If it is equal to one, then the event must occur. The probability of the complementary event (the A random variable is a quantity that can take any one of a event not occurring) is one minus the probability of set of mutually exclusive values with a given probability. A the event occurring. We discuss conditional probability,the probabilitydistribution shows the probabilities of all possi- probability of an event, given that another event has ble values of the random variable. It is a theoretical distri- occurred, in Topic 42. bution that is expressed mathematically, and has a mean and variance that are analogous to those of an empirical We can calculate a probability using various approaches. distribution. Each probability distribution is defined by Subjective-our personal degree of belief that the event certain parameters, which are summary measures (e.g. will occur (e.g.that the world will come to an end in the year mean, variance) characterizing that distribution (i.e. knowl- 2050). edge of them allows the distribution to be fully described). Frequentist-the proportion of times the event would These parameters are estimated in the sample by relevant occur if we were to repeat the experiment a large number of statistics.Depending on whether the random variable is dis- times (e.g, the number of times we would get a 'head' if we crete or continuous, the probability distribution can be tossed a fair coin 1000times). either discrete or continuous. A pn'ori-this requires knowledge of the theoretical model, called the probability distribution,which describes Discrete (e.g. Binomial, Poisson) -we can derive proba- the probabilities of all possible outcomes of the 'experi- bilities corresponding to every possible value of the random ment'. For example, genetic theory allowsus to describe the variable. Thesum of allsuchprobabilities is one. probability distribution for eye colour in a baby born to a blue-eyed woman and brown-eyed man by initially Continuous (e.g. Normal, Chi-squared, t and F)-we can specifying all possible genotypes of eye colour in the baby and their probabilities. only derive the probability of the random variable,^, taking values in certain ranges (because there are infinitely many values of x). If the horizontal axis represents the values of x, Total area under curve = 1 (or 100%) The rules of probability Shaded area represents Prob Ixoc xcx1I We can use the rules of probability to add and multiply probabilities. Shaded area The addition rule-if two events,A and B, are mutually represents exclusive (i.e. each event precludes the other), then the Prob { x > x2) probability that either one or the other occurs is equal to the sum of their probabilities. e.g, if the probabilities that an adult patient in a particular xo Xl x2 X dental practice has no missing teeth, some missing teeth or Fig. 7.1 The probability densityfunction,pdf, of x. is edentulous (i.e. has no teeth) are 0.67, 0.24 and 0.09,
Bell-shaped Variance,o2 Fig. 7.2 The probability density function of x- PI P Z x x the Normal distribution of the variable,^. (a) Symmetricalabout mean, p: variance (a) (b) (C) = 02. (b) Effect of changing mean (&>pl). (c) Effect of changing variance (o,z < 0~2). Fig. 7.3 Areas (percentages of total probability) under the curve for The Normal (Gaussian) distribution (a) Normal distribution of x, with mean p and variance 02, and (b) Standard Normal distribution of z. One of the most important distributions in statistics is the Normal distribution. Its probability density function (Fig. we can draw a curve from the equation of the distribution 7.2) is: (the probability densityfunction);it resembles an empirical relative frequency distribution (Topic 4). The total area completely described by two parameters, the mean (p) under the curve is one; this area represents the probability and the variance (02); of all possible events. The probability that x lies between two limits is equal to the area under the curve between bell-shaped (unimodal); these values (Fig. 7.1). For convenience, tables (Appendix symmetrical about its mean; A) have been produced to enable us to evaluate probabili- shifted to the right if the mean is increased and to the left ties of interest for commonly used continuous probability if the mean is decreased (assuming constant variance); distributions.These are particularly useful in the context of flattened as the variance is increased but becomes more confidence intervals (Topic 11) and hypothesis testing peaked as the variance is decreased (for a fixed mean). (Topic 17). Additional properties are that: the mean and median of a Normal distribution are equal; the probability (Fig. 7.3a) that a Normally distributed random variable,x, with mean, p, and standard deviation, o, lies between: ( p-o ) and ( p + o ) is 0.68 ( p - 1.960) and ( p+1.960) is 0.95 (p - 2.580) and ( p +2.580) is 0.99 These intervals may be used to define reference intervals (Topics 6 and 35). We show how to assessNormality in Topic 32. The Standard Normal distribution There are infinitely many Normal distributions depending on the values of p and o.The Standard Normal distribution (Fig. 7.3b) is a particular Normal distribution for which probabilities have been tabulated (Appendix Al,A4). The Standard Normal distribution has a mean of zero and a variance of one. If the random variable, x, has a Normal distribution with mean, p, and variance, 02, then the Standardized Normal Deviate (SND), z = 3o ,is a random variable that has a Standard Normal distribution.
8 Theoretical distributions (2): other distributions Some words of comfort It is particularly useful for calculating confidence inter- vals for and testing hypotheses about one or two means Do not worry if you find the theory underlying probability (Topics 19-21). distributions complex. Our experience demonstrates that you want to know only when and how to use these distri- The Chi-squaredQ 2 ) distribution (Appendix A3, butions. We have therefore outlined the essentials, and omitted the equations that define the probability distribu- Fig.8.2) tions.You will find that you only need to be familiar with the basic ideas, the terminology and, perhaps (although infre- It is a right skewed distribution taking positive values. quently in this computer age), know how to refer to the It is characterized by its degrees of freedom (Topic 11). tables. Its shape depends on the degrees of freedom; it becomes more symmetrical and approaches Normality as they More continuous probability distributions increase. It is particularly useful for analysing categorical data These distributions are based on continuous random (Topics 23-25). variables. Often it is not a measurable variable that follows such a distribution, but a statistic derived from the variable. The F-distribution(Appendix A5) The total area under the probability density function repre- It is skewed to the right. sents the probability of all possible outcomes, and is equal It is defined by a ratio. The distribution of a ratio of two to one (Topic 7). We discussed the Normal distribution in Topic 7; other common distributions are described in this estimated variances calculated from Normal data approxi- topic. mates the F-distribution. The t-distribution (Appendix A2, Fig.8.1) The two parameters which characterize it are the degrees of freedom (Topic 11) of the numerator and the denomina- Derived by W.S. Gossett, who published under the pseu- tor of the ratio. donym 'Student', it is often called Student's t-distribution. The F-distribution is particularly useful for comparing The parameter that characterizes the t-distribution is two variances (Topic 18), and more than two means using the degrees of freedom, so we can draw the probability the analysis of variance (ANOVA(T) opic 22). density function if we know the equation of the t- distribution and its degrees of freedom. We discuss degrees The Lognormal distribution of freedom in Topic 11; note that they are often closely It is the probability distribution of a random vari- affiliated to sample size. able whose log (to base 10 or e) follows the Normal Its shape is similar to that of the Standard Normal distri- distribution. bution, but it is more spread out with longer tails. Its shape approaches Normality as the degrees of freedom increase. It is highly skewed to the right (Fig. 8.3a). If, when we take logs of our raw data that are skewed to the right, we produce an empirical distribution that is Fig. 8.1 t-distributions with degrees of freedom (df) = 1,5,50,and Chi-squared value 500. Fig. 8.2 Chi-squared distributions with degrees of freedom ( d f ) = 1,2, 5, and 10.
nearly Normal (Fig. 8.3b), our data approximate the Log- Its mean (the value for the random variable that we normal distribution. expect if we look at n individuals,or repeat the trial n times) is nn. Its variance is nn(1- n). Many variables in medicine follow a Lognormal distribu- tion. We can use the properties of the Normal distribution When n is small,the distribution is skewed to the right if n (Topic 7) to make inferences about these variables after < 0.5 and to the left if n > 0.5. The distribution becomes transforming the data by taking logs. more symmetrical as the sample size increases (Fig. 8.4) and approximates the Normal distribution if both n n and If a data set has a Lognormal distribution,we use the geo- n(1- n) are greater than 5. metric mean (Topic5 ) as a summary measure of location. We can use the properties of the Binomial distribution Discrete probability distributions when making inferences about proportions. In particular we often use the Normal approximation to the Binomial The random variable that defines the probability distribu- distribution when analysing proportions. tion is discrete. The sum of the probabilities of all possible mutually exclusive events is one. The Poisson distribution The Poisson random variable is the count of the number The Binomialdistribution Suppose, in a given situation, there are only two out- of events that occur independently and randomly in time or space at some average rate, p.For example, the number of comes,'success' and 'failure'. For example, we may be inter- hospital admissions per day typically follows the Poisson ested in whether a woman conceives (a success) or does not distribution.We can use our knowledge of the Poisson dis- conceive (a failure) after in-vitro fertilization (IVF). If we tribution to calculate the probability of a certain number of look at n = 100 unrelated women undergoing IVF (each admissions on any particular day. with the same probability of conceiving), the Binomial random variable is the observed number of conceptions The parameter that describes the Poisson distribution is (successes). Often this concept is explained in terms of n the mean,i.e. the average rate, p. independent repetitions of a trial (e.g. 100 tosses of a coin) in which the outcome is either success (e.g. head) or failure. The mean equals the variance in the Poisson distribution. It is a right skewed distribution if the mean is small, but The two parameters that describe the Binomial distri- becomes more symmetrical as the mean increases, when it bution are 12, the number of individuals in the sample (or approximates a Normal distribution. repetitions of a trial) and n, the true probability of success for each individual (or in each trial). C 3 I - 2 20.- L o-o.s -0.:: -,I,:, I, ,j :' -3II [Ill Loo,n (tr~qlvcer~dleevell of tr?.glyccridc lcvcls in 132 m c n who tal Tr~glvc~r~IPdVPeI (niniol'L) . ..dr\\rclop~dheart ( ~ ~-,F,P ~IFl iP~1 n. -i.r1 0 ,\\ (h)Thci ~ p p r o ~ ~ m Ni ~orlme al l~ Fig.8.4 Binomial distribution showing the number of successes,r,when the probability of success is n=0.20 for sample sizes (a) n = 5, (b) n = 10, -and (c) n = 50. (N.B. inTopic 23,the observed seroprevalence of HHV-8 wasp =0.187 0.2,and the sample size was 271: the proportion was assumed to follow a Normal distribution).
9 Transformations Why transform? Typical transformations The observations in our investigation may not comply with The logarithmictransformation,z =logy the requirements of the intended statistical analysis (Topic When log transforming data, we can choose to take logs 32). either to base 10 (loglOy,the 'common' log) or to base e (log,y = lny,the 'natural' or Naperian log), but must be con- A variable may not be Normally distributed, a distribu- sistent for a particular variable in a data set. Note that we tional requirement for many different analyses. cannot take the log of a negative number or of zero. The back-transformation of a log is called the antilog; the The spread of the observations in each of a number of antilog of a Naperian log is the exponential, e. groups may be different (constant variance is an assump- tion about a parameter in the comparison of means using If y is skewed to the right, z = logy is often approximately the t-test and analysis of variance -Topics 21-22). Normally distributed (Fig. 9.la). Then y has a Lognormal distribution (Topic 8). Two variables may not be linearly related (linearity is an assumption in many regression analyses -Topics 27-31). If there is an exponential relationship between y and It is often helpful to transform our data to satisfy the another variable, x, so that the resulting curve bends assumptions underlying the proposed statisticaltechniques. upwards when y (on the vertical axis) is plotted against How do we transform? x (on the horizontal axis), then the relationship between We convert our raw data into transformed data by taking z = logy and x is approximately linear (Fig. 9.lb). the same mathematical transformation of each observa- Suppose we have different groups of observations, each comprising measurements of a continuous variable, y. We tion. Suppose we have n observations (yl, y2, . . . ,y,) on a may find that the groups that have the higher values of y also have larger variances. In particular, if the coefficient variable, y, and we decide that the log transformation is of variation (the standard deviation divided by the mean) of y is constant for all the groups, the log transformation, suitable. We take the log of each observation to produce z = logy, produces groups that have the same variance (Fig. 9 . 1 ~ ) . (logy,, logy2, . . . ,logy,). If we call the transformed vari- able, z, then zi = logy, for each i (i = 1,2, . . . ,n), and our In medicine, the log transformation is frequently used transformed data may be written (zl,z2, . . . ,2,). because of its logical interpretation and because many vari- ables have right-skewed distributions. We check that the transformation has achieved its 6The square root transformation,i = purpose of producing a data set that satisfies the assump- This transformation has properties that are similar to those tions of the planned statistical analysis, and proceed to of the log transformation, although the results after they analyse the transformed data (zl, z2,. . . , zn). We often back-transform any summary measures (such as the mean) to the original scale of measurement; the conclusions we draw from hypothesis tests (Topic 17) on the transformed data are applicable to the raw data. Before * ;;Y I * h*1hansformat2cion =wzl X X 2 XX X LL X Y After I /h * * l xX ; ; i * Fig. 9.1 The effects of the logarithmic Log Y transformation. (a) Normalizing. 1transformation (b) x X (b) Linearizing. (c) Variance stabilizing. w (c) 2 gLL D3 (a)
Before transformation X X After transformation X X Fig. 9.2 The effect of the square Y X X transformation. (a) Normalizing. (a) (b) (b) Linearizing. (c) Variance stabilizing. (c) have been back-transformed are more complicated to 1 interpret. In addition to its Normalizing and linearizing abilities, it is effective at stabilizing variance if the variance p10 increases with increasing values of y, i.e. if the variance x divided by the mean is constant. We apply the square root transformation if y is the count of a rare event occurring in OgitP p x time or space, i.e. it is a Poisson variable (Topic 8). Remem- ber, we cannot take the square root of a negative number. Fig. 9.3 The effectof the logit transformation on a sigmoid curve. The reciprocal transformation, z = lly If the variance of a continuous variable, y, tends to decrease as the value of y increases, then the square trans- We often apply the reciprocal transformation to survival times unless we are using special techniques for survival formation,z =y2, stabilizes the variance (Fig. 9.2~). analysis (Topic 41). The reciprocal transformation has properties that are similar to those of the log transforma- he logit (logistic)transformation,z = In- P tion. In addition to its Normalizing and linearizing abilities, it is more effective at stabilizingvariance than the log trans- 1- P formation if the variance increases very markedly with This is the transformation we apply most often to each increasing values of y, i.e. if the variance divided by the proportion, p, in a set of proportions. We cannot take the (mean)4 is constant. Note that we cannot take the recipro- logit transformation if either p = 0 or p = 1because the cor- cal of zero. responding logit values are -00 and +Q..One solution is to The square transformation, z =y2 takep as 1/(2n) instead of 0, and as (1- 1/(2n)}instead of 1. The square transformation achieves the reverse of the log It linearizes a sigmoid curve (Fig. 9.3). transformation. If y is skewed to the left, the distribution of z =y2 is often approximately Normal (Fig. 9.2a). If the relationship between two variables, x and y, is such that a line curving downwards is produced when we plot y against x, then the relationship between z = y2 and x is approximately linear (Fig. 9.2b).
10 Sampling and sampling distributions Why do we sample? tion, it is unlikely that the estimates of the population para- meter would be exactly the same in each sample. However, In statistics, a population represents the entire group of our estimates should all be close to the true value of the individuals in whom we are interested. Generally it is costly parameter in the population, and the estimates themselves and labour-intensive to study the entire population and, should be similar to each other. By quantifying the variabil- in some cases, may be impossible because the population ity of these estimates, we obtain information on the preci- may be hypothetical (e.g.patients who may receive a treat- sion of our estimate and can thereby assess the sampling ment in the future). Therefore we collect data on a sample error. In reality, we usually only take one sample from the of individuals who we believe are representative of this population. However, we still make use of our knowledge population, and use them to draw conclusions (i.e. make of the theoretical distribution of sample estimates to draw inferences) about the population. inferences about the population parameter. When we take a sample of the population, we have to Sampling distribution of the mean recognize that the information in the sample may not fully reflect what is true in the population. We have introduced Suppose we are interested in estimating the population sampling error by studying only some of the population. mean; we could take many repeated samples of size n from In this topic we show how to use theoretical probability the population, and estimate the mean in each sample. A distributions (Topics 7 and 8) to quantify this error. histogram of the estimates of these means would show their distribution (Fig. 10.1); this is the sampling distribution of Obtaining a representative sample the mean.We can show that: Ideally, we aim for a random sample.A list of all individuals If the sample size is reasonably large, the estimates of the from the population is drawn up (the sampling frame), and mean follow a Normal distribution, whatever the distribu- individuals are selected randomly from this list, i.e. every tion of the original data in the population (this comes from possible sample of a given size in the population has an a theorem known as the Central Limit Theorem). equal probability of being chosen. Sometimes,we may have difficulty in constructing this list or the costs involved may If the sample size is small, the estimates of the mean be prohibitive, and then we take a convenience sample. For follow a Normal distribution provided the data in the popu- example, when studying patients with a particular clinical lation follow a Normal distribution. condition, we may choose a single hospital, and investigate some or all of the patients with the condition in that hospi- The mean of the estimates is an unbiased estimate of the tal. Very occasionally, non-random schemes, such as quota true mean in the population, i.e. the mean of the estimates sampling or systematic sampling, may be used. Although equals the true population mean. the statistical tests described in this book assume that indi- viduals are selected for the sample randomly, the methods The variability of the distribution is measured by the are generally reasonable as long as the sample is represen- standard deviation of the estimates; this is known as the tative of the population. standard error of the mean (often denoted by SEM). If we know the population standard deviation (o)t,hen the stan- Point estimates dard error of the mean is given by: We are often interested in the value of a parameter in the SEM = o/& population (Topic 7), e.g. a mean or a proportion. Param- eters are usually denoted by letters of the Greek alphabet. When we only have one sample, as is customary, our best For example, we usually refer to the population mean as p estimate of the population mean is the sample mean, and and the population standard deviation as o.We estimate the because we rarely know the standard deviation in the popu- value of the parameter using the data collected from the lation, we estimate the standard error of the mean by: sample. This estimate is referred to as the sample statistic and is a point estimate of the parameter (i.e.it takes a single SEM = s/& value) as distinct from an interval estimate (Topic 11)which takes a range of values. where s is the standard deviation of the observations in the sample (Topic 6).The SEM provides a measure of the preci- Sampling variation sion of our estimate. If we take repeated samples of the same size from a popula- Interpreting standard errors A large standard error indicates that the estimate is imprecise.
A small standard error indicates that the estimate is a sample of size n from the population, our best estimate,^, precise. of the population proportion, n,is given by: The standard error is reduced, i.e. we obtain a more where r is the number of individuals in the sample with the precise estimate, if: characteristic. If we were to take repeated samples of size n from our population and plot the estimates of the propor- the size of the sample is increased (Fig. 10.1); tion as a histogram, the resulting sampling distribution of the data are less variable. the proportion would approximate a Normal distribution with mean value, TC. The standard deviation of this distribu- SD or SEM? tion of estimated proportions is the standard error of the proportion. When we take only a single sample, it is esti- Although these two parameters seem to be similar, they mated by: are used for different purposes. The standard deviation describes the variation in the data values and should be This provides a measure of the precision of our estimate quoted if you wish to illustrate variability in the data. In of TC; a small standard error indicates a precise estimate. contrast, the standard error describes the precision of the sample mean, and should be quoted if you are interested in the mean of a set of data values. Sampling distribution of a proportion We may be interested in the proportion of individuals in a population who possess some characteristic. Having taken Example r Samples of size 10 50 (a) Log,* (trtglyceride) 0.00 0.05 0 TO 0.15 0 20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 (b) Log,, (triglyceride) r Samples of size 20 50 Samples of size 50 50 :I,,H 20 !!;I,, A , ,g20 ,I , , ,,,, , V &, ,I g 10 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0 45 0.50 0.55 0.60 (c) Cog,, (triglyceride) (d) Log,* (triglyceride) Fig. 1fi.l (a)Theorelic;~lNnrmirl distrihulion c>flog,,,(rriglyceride) Icvelq wilh mean = 0.31 lc~g,,(,mn~ollLa)nd standard deviation =0.21 log,,, (mmc>l!L).i~ntithe ohserved distrihulion? of thc mcans of l(Kl random samples of size ( h ) 10. {c) 2O.iind ( d ) 711 ti1lic.n Imm this theorcticnl distribution.
11 Confidence intervals Once we have taken a sample from our population, we xfi.e. it is S obtain a point estimate (Topic 10) of the parameter of interest, and calculate its standard error to indicate the to.o5 -A precision of the estimate. However, to most people the standard error is not, by itself, particularly useful. It is where to,, is the percentage point (percentile) of the more helpful to incorporate this measure of precision t-distribution (Appendix A2) with (n - 1) degrees of into an interval estimate for the population parameter. freedom which gives a two-tailed probability (Topic 17) of We do this by using our knowledge of the theoretical proba- 0.05. This generally provides a slightly wider confidence bility distribution of the sample statistic to calculate a interval than that using the Normal distribution to allow for confidence interval (CI) for the parameter. Generally, the the extra uncertainty that we have introduced by estimating confidence interval extends either side of the estimate by the population standard deviation and/or because of the some multiple of the standard error; the two values (the small sample size.When the sample size is large, the differ- confidence limits) defining the interval are generally sepa- ence between the two distributions is negligible. Therefore, rated by a comma and contained in brackets. we always use the t-distribution when calculating confidence intervals even if the sample size is large. Confidence interval for the mean By convention we usually quote 95% confidence intervals. Using the Normal distribution We could calculate other confidence intervals, e.g. a 99% CI The sample mean, x, follows a Normal distribution if the for the mean. Instead of multiplying the standard error by sample size is large (Topic 10).Therefore we can make use the tabulated value of the t-distribution corresponding to a of our knowledge of the Normal distribution when consid- two-tailed probability of 0.05, we multiply it by that corre- ering the sample mean. In particular, 95% of the distribu- sponding to a two-tailed probability of 0.01. This is wider tion of sample means lies within 1.96 standard deviations than a 95% confidence interval, to reflect our increased (SD) of the population mean. When we have a single confidence that the range includes the population mean. sample, we call this SD the standard error of the mean (SEM), and calculate the 95% confidence interval for the Confidence interval for the proportion mean as: The sampling distribution of a proportion follows a (X- (1.96 x SEM),F+ (1.96x SEM)) Binomial distribution (Topic 8). However, if the sample If we were to repeat the experiment many times, the interval would contain the true population mean on 95% of size, n, is reasonably large, then the sampling distribution occasions. We usually interpret this confidence interval as the range of values within which we are 95% confident that of the proportion is approximately Normal with mean, n: the true population mean lies. Although not strictly correct (the population mean is a fixed value and therefore cannot We estimate n by the proportion in the sample, p = r/n have a probability attached to it), we will interpret the confidence interval in this way as it is conceptually easier to (where r is the number of individuals in the sample with understand. /?the characteristic of interest), and its standard error is estimated by (Topic 10). The 95% confidence interval for the proportion is esti- mated by: Using the t-distribution If the sample size is small (usually when np or n ( l - p ) We can only use the Normal distribution if we know the is less than 5) then we have to use the Binomial distribution value of the variance in the population. Furthermore if the to calculate exact confidence intervalsl. Note that if p is sample size is small the sample mean only follows a Normal expressed as a percentage, we replace (1-p) by (100 - p ) . distribution if the underlying population data are Normally distributed. Where the underlying data are not Normally Interpretation of confidence intervals distributed, and/or we do not know the population vari- ance, the sample mean follows a t-distribution (Topic 8).We When interpreting a confidence interval we are interested calculate the 95% confidence interval for the mean as: in a number of issues. 1Ciba-Geigy Ltd. (1990) GeigyScientific Tables,Vol.2,8th edn. Ciba- Geigy Ltd., Basle.
How wide is it? A wide confidence interval indicates that Degrees of freedom the estimate is imprecise; a narrow one indicates a precise estimate. The width of the confidence interval depends on You will come across the term 'degrees of freedom' in the size of the standard error, which in turn depends on the statistics. In general they can be calculated as the sample sample size and, when considering a numerical variable, the size minus the number of constraints in a particular calcu- variability of the data. Therefore, small studies on variable lation; these constraints may be the parameters that have data give wider confidence intervals than larger studies on to be estimated. As a simple illustration, consider a set of less variable data. three numbers which add up to a particular total (T). Two of the numbers are 'free' to take any value but the remain- What clinical implications can be derived from it? The ing number is fixed by the single constraint imposed by upper and lower limits provide a means of assessing T. Therefore the numbers have two degrees of freedom. whether the results are clinically important (see Example). Similarly, the degrees of freedom of the sample variance, Does it include any values of particular interest? We can s2 = C ( x-x ) ~(Topic 6), are the sample size minus one, check whether a hypothesized value for the population n-1 parameter falls within the confidence interval. If so, then our results are consistent with this hypothesized value. If because we have to calculate the sample mean (x), an esti- not, then it is unlikely (for a 95% confidence interval, the chance is at most 5%) that the parameter has this value. mate of the population mean, in order to evaluate sz. Example increased confidencc that thc population nlcan lies in the interval. Confidence interval for the mean We are interested in determining the mean age at first Confidence interval for the proportion hirth in womcn who have bleeding disorders. In a sample Of the 64 womcn included in the study. 27 (42.2%) of 49 such womcn (Topic2): rcportcd that they experienced bleeding gums at least once a week. This is a relatively high percentage. and Mean age at hirth of child..t = 77.01 years may provide a way of identifying undiagnosed women Standard devia1ion.s =5.1282 years with bleeding disorders in the general population. We cal- culate a 95% confidence interval for the proportion with Standard error. SEM = -5.J132082 =0.7326 ycars hleeding gums in the population. The variable is approximately Normally distributed r,, but, bccause the population variance is unknown. wc use the [-distribution to calculate thc confidence interval.The 0.422(1 - 0.422 ) 95% confidencc interval for the mean is: Standard error of proportion = 77.01 ? (2.01 1 x 0.7326) = (25.53.28.48)ycars +95% confidencc interval= O.422 (1.96 x 0.0617) where 3.011 is the percentap point of the r- = (0.301.0.543) distribution with (49 - 1 ) = 4S degrees of frecdom We are YiOhcertain that the true percentage of women 'P:ivinga t~ ro-tailed xobability of 0.05 ( AppendixA2 ). with bleeding disorders in the popillation who experience bleeding gums this frequently ranges from 30.I0h to We are 75% cert: in that t hle true me:an age at first hirtt.I 53.3\"/'.This is a fairly wide confidence interval, suggesting I n womerI with bleedinp disorders in the populatiorI poor precision: a largcr sample size would enable us to ranges from 25.54 to 28.48 yeilrs. This range is fairly obtain a more precise estimate. However. the upper and narrow. reflecting a precise estimate. In the general popu- lower limits of this confidence interval both indicate that lation. the mean age at first birth in 1997wits 26.8 years.As a substantial percentage of these women are likely to 26.8 falls into our confidence interval. there is little evi- cspericnce bleeding gums. We would need to ohtain an dence that women with bleeding disorders tend to give estimate of the frequency ofthis complaint in the general hirth at an older age than other women. population before drawing any conclusions about its value for identifying undiagnosed women with hlccding Note that thc 99% confidence interval (25.05. 28.97 disorders. years). is slightly wider than the 95% CI, reflecting our
12 Study design I Study design is vitally important as poorly designed studies The cause must precede the effect. may give misleading results. Large amounts of data from a The association should be plausible, i.e. the results should poor study will not compensate for problems in its design. be biologically sensible. In this topic and in Topic 13 we discuss some of the main There should be consistent results from a number of aspects of study design. In Topics 14-16 we discuss specific studies. types of study: clinical trials, cohort studies and case- The association between the cause and the effect should control studies. be strong. There should be a dose-response relationship with the The aims of any study should be clearly stated at the effect, i.e. higher levels of the effect should lead to more outset. We may wish to estimate a parameter in the popula- severe disease or more rapid disease onset. tion (such as the risk of some event), to consider associa- Removing the factor of interest should reduce the risk of tions between a particular aetiological factor and an disease. outcome of interest,or to evaluate the effect of an interven- tion (such as a new treatment). There may be a number of Cross-sectional or longitudinal studies possible designs for any such study. The ultimate choice of design will depend not only on the aims,but on the resources Cross-sectionalstudies are carried out at a single point in available and ethical considerations (seeTable 12.1). time. Examples include surveys and censuses of the popula- tion. They are particularly suitable for estimating the point Experimental or observational studies prevalence of a condition in the population. Experimental studies involve the investigator interven- Number with the disease ing in some way to affect the outcome. The clinical trial Point prevalence = at a single time point (Topic 14) is an example of an experimental study in which the investigator introduces some form of treatment. Other Total number studied examples include animal studies or laboratory studies that at the same time point are carried out under experimental conditions. Experimen- tal studies provide the most convincing evidence for any As we do not know when the events occurred prior to the hypothesis as it is generally possible to control for factors study, we can only say that there is an association between that may affect the outcome. However, these studies are not the factor of interest and disease, and not that the factor is always feasible or, if they involve humans or animals, may likely to have caused disease. Furthermore, we cannot esti- be unethical. mate the incidence of the disease,i.e. the rate of new events in a particular period. In addition, because cross-sectional Observational studies, for example cohort (Topic 15) or studies are only carried out at one point in time, we cannot case-control (Topic 16) studies, are those in which the consider trends over time. However, these studies are gen- investigator does nothing to affect the outcome, but simply erally quick and cheap to perform. observes what happens. These studies may provide poorer information than experimental studies because it is often Longitudinal studies follow a sample of individuals over impossible to control for all factors that affect the outcome. time. They are usually prospective in that individuals are However, in some situations, they may be the only types of followed forwards from some point in time (Topic 15). study that are helpful or possible. Epidemiological studies, Sometimes retrospective studies, in which individuals are which assess the relationship between factors of interest selected and factors that have occurred in their past are and disease in the population, are observational. identified (Topic 16), are also perceived as longitudinal. Longitudinal studies generally take longer to carry out than Assessing causality in observationalstudies cross-sectional studies, thus requiring more resources, and, Although the most convincing evidence for the causal role if they rely on patient memory or medical records, may be of a factor in disease usually comes from experimental subject to bias (explained at the end of this topic). st~dies~informatiofrnom observational studies may be used provided it meets a number of criteria.The most well known Repeated cross-sectional studies may be carried out at criteria for assessing causation were proposed by Hilll. different time points to assess trends over time. However, as these studies involve different groups of individuals at each 1Hil1,AB. (1965)The environment and disease:associationor time point, it can be difficult to assess whether apparent causation? Proceedings of the Royal Society of Medicine, 58,295. changes over time simply reflect differences in the groups of individuals studied.
Experimental studies are generally prospective as they Observer bias-one observer consistently under- or consider the impact of an intervention on an outcome that over-reports a particular variable; will happen in the future. However, observational studies may be either prospective or retrospective Confounding bias-where a spurious association arises due to a failure to adjust fully for factors related to both the Controls risk factor and outcome; The use of a comparison group,or control group, is essential Selection bias-patients selected for inclusion into a when designing a study and interpreting any research find- study are not representative of the population to which the ings. For example, when assessing the causal role of a par- results will be applied; ticular factor for a disease, the risk of disease should be considered both in those who are exposed and in those who Information bias -measurements are incorrectly re- are unexposed to the factor of interest (Topics 15 and 16). corded in a systematic manner; and See also 'Treatment comparisons' in Topic 14. Publication bias-a tendency to publish only those Bias papers that report positive or topical results. When there is a systematic difference between the results Other biases may, for example, be due to recall (Topic from a study and the true state of affairs,bias is said to have 16),healthy entrant effect (Topic 15),assessment (Topic 14) occurred.Types of bias include: and allocation (Topic 14). Table 12.1 Study designs. - -- - Type of study Timing Form Action in Action in Action in Typical uses past time present time future time Cross-sectional Cross- (starting point) Prevalence estimates sectional Observational Reference ranges and Collect diagnostic tests Current health status information of a group Repeated Cross- Observational cross-sectional sectional Cohort Longitudinal Observational Define cohort Prognosis and natural (Topic 15) (prospective) history (what will happen to someone with disease) Case-control F14plLongitudinal Observational Aetiology (Topic 16) Define cases Aetiology (particularly (retrospective) and controls for rare diseases) factors (i.e. outcome) Clinical trial to assess therapy (Topic 14) Experiment Longitudinal Experimental Trial to assess (prospective) preventative measure, e.g.large scale vaccine trial Laboratory experiment
13 Study design II Variation similar characteristics into a homogeneous block or stratum (e.g. the blocks may represent different age Variation in data may be caused by known factors, groups). The variation between units in a block is less than measurement 'errors', or may be unexplainable random that between units in different blocks. The individuals variation. We measure the impact of variation in the data within each block are randomly assigned to treatments; we on the estimation of a population parameter by using the compare treatments within each block rather than making standard error (Topic 10). When the measurement of a an overall comparison between the individuals in different variable is subject to considerable variation, estimates blocks. We can therefore assess the effects of treatment relating to that variable will be imprecise, with large stan- more precisely than if there was no blocking. dard errors. Clearly, it is desirable to reduce the impact of variation as far as possible, and thereby increase the preci- Parallel versus cross-over designs (Fig. 13.1) sion of our estimates. There are various ways in which we Generally, we make comparisons between individuals can do this. in different groups. For example, most clinical trials (Topic 14) are parallel trials, in which each patient receives Replication one of the two (or occasionally more) treatments that Our estimates are more precise if we take replicates (e.g. are being compared, i.e. they result in between-individual two or three measurements of a given variable for every comparisons. individual on each occasion). However, as replicate meas- urements are not independent, we must take care when Because there is usually less variation in a measurement analysing these data.A simple approach is to use the mean within an individual than between different individuals of each set of replicates in the analysis in place of the ori- (Topic 6), in some situations it may be preferable to con- ginal measurements. Alternatively, we can use methods sider using each individual as hidher own control. These that specificallydeal with replicated measurements. within-individual comparisons provide more precise com- parisons than those from between-individual designs, and Sample size fewer individuals are required for the study to achieve the The choice of an appropriate size for a study is a crucial same level of precision. In a clinical trial setting, the cross- aspect of study design. With an increased sample size, the over design1 is an example of a within-individual compari- standard error of an estimate will be reduced, leading to son; if there are two treatments, every individual gets each increased precision and study power (Topic 18). Sample treatment, one after the other in a random order to elimi- size calculations (Topic 33) should be carried out before nate any effect of calendar time. The treatment periods are starting the study. separated by a washout period, which allows any residual effects (carry-over) of the previous treatment to dissipate. Particular study designs We analyse the difference in the responses on the two Modifications of simple study designs can lead to more treatments for each individual. This design can only be precise estimates. Essentially we are comparing the effect used when the treatment temporarily alleviates symptoms of one or more 'treatments' on experimental units. The rather than provides a cure, and the response time is not experimental unit is the smallest group of 'individuals' who prolonged. can be regarded as independent for the purposes of analy- sis, for example, an individual patient, volume of blood or Factorial experiments skin patch. If experimental units are assigned randomly (i.e. by chance) to treatments (Topic 14) and there are no other When we are interested in more than one factor, separate refinements to the design, then we have a complete ran- studies that assess the effect of varying one factor at a time domized design. Although this design is straightforward may be inefficient and costly. Factorial designs allow the to analyse, it is inefficient if there is substantial variation simultaneous analysis of any number of factors of interest. between the experimental units. In this situation, we can The simplest design, a 2 x 2 factorial experiment, considers incorporate blocking and/or use a cross-over design to two factors (for example, two different treatments), each reduce the impact of this variation. at two levels (e.g. either active or inactive treatment). As Blocking 1Senn,S. (1993) Cross-overTrials in Clinical Research. Wiley, It is often possible to group experimental units that share Chichester.
an example, consider the US Physicians Health study2, then say that there is an interaction between the two designed to assess the importance of aspirin and beta factors. In this example, an interaction would suggest that carotene in preventing heart disease. A 2 x 2 factorial the combination of aspirin and beta carotene together is design was used with the two factors being the different more (or less) effective than would be expected by simply compounds and the two levels being whether or not the adding the separate effects of each drug.This design, there- physician received each compound. Table 13.1 shows the fore, provides additional information to two separate possible treatment combinations. studies and is a more efficient use of resources, requiring a smaller sample size to obtain estimates with a given degree We assess the effect of the level of beta carotene by com- of precision. paring patients in the left-hand column to those in the right- hand column. Similarly,we assess the effect of the level of Table 13.1 Possible treatment combinations. aspirin by comparing patients in the top row with those in the bottom row. In addition, we can test whether the two Beta carotene factors are interactive, i.e. when the effect of the level of beta carotene is different for the two levels of aspirin. We Aspirin No Yes 2Steering Committee of the Physician's Health Study Research No Nothing Beta carotene Group. (1989) Final report of the aspirin component of the on-going Yes Aspirin Physicians Health Study.New England Journal of Medicine, 321, Aspirin +beta 129-135. carotene (a) Parallel Jii+O-Population Compare responses I (betieen patients) I Assess + I ---A response (b) Cross-over Compare responses ,, (within - - - - patients) - - - - I Population out control response 7 Sample Fig. 13.1 (a) Parallel, :- - - Compare responses- - - - and (b) cross-over (within patients) designs.
14 Clinical trials A clinical trial1 is any form of planned experimental study patients are allocated in a random manner (i.e. based on designed, in general, to evaluate a new treatment on a clini- chance), using a process known as random allocation or cal outcome in humans. Clinical trials may either be pre- randomization. This is often performed using a computer- clinical studies, small clinical studies to investigate effect generated list of random numbers or by using a table of and safety (Phase 1/11trials), or full evaluations of the new random numbers (Appendix A12). For example,to allocate treatment (Phase I11 trials). In this topic we discuss the patients to two treatments, we might follow a sequence of main aspects of Phase I11 trials, all of which should be random numbers, and allocate the patient to treatment A if reported in any publication (see CONSORT statement, the number is even and to treatment B if it is odd. This Table 14.1,and see Figs 14.1& 14.2). process promotes similarity between the treatment groups in terms of baseline characteristics at entry to the trial (i.e. it Treatment comparisons avoids allocation bias), maximizing the efficiency of the trial. Trials in which patients are randomized to receive Clinical trials are prospective studies, in that we are inter- either the new treatment or a control treatment are known ested in measuring the impact of a treatment given now on as randomized controlled trials (often referred to as a future possible outcome. In general, clinical trials evaluate RCTs), and are regarded as optimal. a new intervention (e.g. type or dose of drug, or surgical procedure).Throughout this topic we assume, for simplicity, Further refinements of randomization, including strati- that a single new treatment is being evaluated. fied randomization (which controls for the effects of impor- tant factors), and blocked randomization (which ensures An important feature of a clinical trial is that it should roughly equal sized treatment groups) exist. Systematic be comparative (Topic 12).Without a control treatment, it allocation, whereby patients are allocated to treatment is impossible to be sure that any response is solely due to groups systematically, possibly by day of visit, or date of the effect of the treatment, and the importance of the new birth, should be avoided where possible; the clinician may treatment can be over-stated. The control may be the stan- be able to determine the proposed treatment for a particu- dard treatment (a positive control) or, if one does not exist, lar patient before helshe is entered into the trial, and this may be a negative control, which can be a placebo (a treat- may influence hidher decision as to whether to include a ment which looks and tastes like the new drug but which patient in the trial. Sometimes we use a process known as does not contain any active compound) or the absence of cluster randomization, whereby we randomly allocate treatment if ethical considerations permit. groups of individuals (e.g. all people registered at a single general practice) to treatments rather than each individual. Endpoints We should take care when planning the size of the study and analysing the data in such designs2. We must decide in advance which outcome most accurately reflects the benefit of the new therapy. This is known as the Blinding primary endpoint of the study and usually relates to treat- ment efficacy. Secondary endpoints, which often relate There may be assessment bias when patients and/or clini- to toxicity, are of interest and should also be considered at cians are aware of the treatment allocation, particularly if the outset. Generally,all these endpoints are analysed at the the response is subjective. An awareness of the treatment end of the study. However, we may wish to carry out some allocation may influence the recording of signs of improve- preplanned interim analyses (for example, to ensure that ment, or adverse events. Therefore, where possible, all par- no major toxicities have occurred requiring the trial to ticipants (clinicians,patients, assessors) in a trial should be be stopped). Care should be taken when comparing treat- blinded to the treatment allocation. A trial in which both ments at these times due to the problems of multiple the patient and clinician/assessor are unaware of the treat- hypothesis testing (Topic 18). ment allocation is a double-blind trial. Trials in which it is impossible to blind the patient may be single-blind provid- Treatment allocation ing the clinician and/or assessor is blind to the treatment allocation. Once a patient has been formally entered into a clinical trial, helshe is allocated to a treatment group. In general, 1Pocock, S.J. (1983) Clinical Tria1s:A Practical Approach. Wiley, ZKerry,S.M. & Bland,J.M. (1998)Sample size in cluster Chichester. randomisation.British Medical Journal,316,549.
Patient issues chance of correctly detecting a true treatment effect is suffi- ciently high. Therefore, before carrying out any clinical As clinical trials involve humans, patient issues are of trial, the optimal trial size should be calculated (Topic 33). importance. In particular, any clinical trial must be passed by an ethical committee who judge that the trial does not Protocol deviations are patients who enter the trial but contravene the Declaration of Helsinki. Informed patient do not fulfil the protocol criteria, e.g. patients who were consent must be obtained from all patients before they are incorrectly recruited into or who withdrew from the study, entered into a trial. and patients who switched treatments. To avoid bias, the study should be analysed on an intention-to-treatbasis, in The protocol which all patients on whom we have information are analysed in the groups to which they were originally allo- Before any clinical trial is carried out, a written description cated, irrespective of whether they followed the treatment of all aspects of the trial, known as the protocol,should be regime.Where possible, attempts should be made to collect prepared.This includes information on the aims and objec- information on patients who withdraw from the trial. On- tives of the trial, along with a definition of which patients treatment analyses, in which patients are only included in are to be recruited (inclusion and exclusion criteria),treat- the analysis if they complete a full course of treatment, are ment schedules, data collection and analysis, contingency not recommended as they often lead to biased treatment plans should problems arise, and study personnel. It is comparisons. important to recruit enough patients into a trial so that the Table 14.1 A summary o f the CONSORT (Consnlitl;iticm o l Standards for Rcportinp Trials) st;~tcnlcnt'sform:~tfor ;In uptirn;tlly reported ranJonii7ed controlled trial. Heading Descriptor - -- Title 11l~~)rritlh~evstudy as a randtmi;rcd trial Ahstract L.:.vr. ;I structured format Introduction .Y!rtrt,;~inis and sprcilic ohjcctivcs, and planned suhgrnup ;rnalysc Mcthods /)osr,rihe: Protocol Rc~ulrs P1;rnnetl interven~ions(c.g.Ire;rtmerits) lid thcir timing Particpant flow Prim:~ryand sccond;~ryoirlconlc nie;lsurc(s) R.as:.ls rrl'semple s i x c;~)culations(Tvpic 33) R;~lionnlcarid melhod.; for .italislic;~l:~n;~lyscs.i~~niIl i e t h rtrticy were u ~ m p l e l e don an tntcn~icln-to-treabt asis De\\Yc.rihc: i:nit ofrandnmizarion (c~.inJividu;rl.cIustcr) Method ~tscdto cmcr;ttt. the r:lndomi7erion schedule Mctliod of:illoc:r~ion cnncc:tllrrcnt (c.g.scaled cnvclol.rcs) and timing of assi?nmcnt Dr~.vcn'hr: Similarity ol'trr.;ltnrrnls (c.g. ;rppc;jr;tncc. t;~stcol'c;lp$ulez/tahlc~ Mcchanisnis of blinding p;rticnt~lclinici;rns/asscs.;ors Proccss o l ' u n h l i i ~ d i n ~i'l.rcquircd I ' r ~ ~ ~ , itrdiavlip~rolilc (Fig. 14.I) S~trrte. stimated cffcct of intervention on primary and sccc>nd;~ryu u ~ c u ~ mn casurcs. inclu~ling;I point cs~imatcand nicnsurc n l prccisiun (cc>nlidenccintcrv;il) Sttrtr*results i n ;thst.hutc nunlhcrs when feasihlr. (c.p. IOf20not just 50\";,) f'rr-s~-rlrsuiilin;try d:tt:t and i~pprr,priatc dcscriplivc and inIcrcnti;~sl tatistics L)r.s~,rih(,I h c l o n inllusncins response hy trentnlent yroup.;~nd any :Ittempt 10 ailiust for then1 Dl*.vr*ril)cprotticol devi;rlions (with rcsrons) Sltrrr. specific intc.rprcratioii r j l study finclings. including sources of hias and imprcci.;ion.;lnrI crmp;rr;lnlrlty \\v~ttoi t n o stutlirs Srtrtc-pcncr:~linterpretation of thc t l ; ~ t i ~n light ol' all the a\\,ail;~hlcc\\ri~lcncc Atlaptcd from: I3cgy.C.. Cho. h,l..East\\vood. S..11. (11. (1Yr)h) Inrprovinp I h r quality o f reporting o f r;inJoniii.cd controlled triaI\\.Thc CONSOKI. statement. J o r r r ~ r t r l ~ ~ f ! l r ~ ~ ~ l: ~I lttr~iI~I (r.it~1.1~~r1~~~1 ~ ~ ~I7~6i.~hr2!7i 40>13~Y. .(C'opyriylitcd IO(Jh.Amcric;in h~lcdic;ilAssr)ciation.)
I( Registeredor eligible patients (n=...) 3958 ineligible Not randomized (n=...) Reasons ( n=...) Received control intervention Received test intervention as allocated as allocated ( n =...) ( n =...) Did not receive control interventionas allocated Did not receivetest Timing of primary and interventionas allocated Interventionineffective (n=..) In=...I Lost to follow-up ( n =...) Timing of primary and Data available from midwives' Completedtrial (n=...) questionnaires I 37g 383 Data available from mothers' 1Wthdrawn ( n =...) questionnairesat discharge home Interventionineffective ( n =...) Data available / 1 /Lost to follow-up (n=...) from mothers' questionnaires at Other ( n =...) I1Completedtrial (n=...) 6 weeks post partum Fig. 14.1 The CONSORTsratcment's trial profile elf the Randomi7td Fig. 14.2 Trial profile example {adapted from trial descrihud inTopic Controlled Trial's progress. adaptcd f r o m Bey5 r fnl. (1996). ( * T h e 'R' 37 with permtfs~on). indicates randomization.) (Cop!.rightrd 19Yh.American Medical Association.)
15 Cohort studies A cohort study takes a group of individuals and usually sentative of the general population, and may be healthier. follows them forward in time, the aim being to study Cohorts can also be recruited from GP lists, ensuring that a whether exposure to a particular aetiological factor will group of individuals with different health states is included affect the incidence of a disease outcome in the future in the study. However, these patients tend to be of similar (Fig. 15.1). If so, the factor is known as a risk factor for the social backgrounds because they live in the same area. disease outcome. For example, a number of cohort studies have investigated the relationship between dietary factors When trying to assess the aetiological effect of a risk and cancer. Although most cohort studies are prospective, factor, individuals recruited to cohorts should be disease- historical cohorts can be investigated, the information free at the start of the study.This is to ensure that any expo- being obtained retrospectively. However, the quality of sure to the risk factor occurs before the outcome, thus historical studies is often dependent on medical records enabling a causal role for the factor to be postulated. and memory, and they may therefore be subject to bias. Because individuals are disease-free at the start of the study, we often see a healthy entrant effect. Mortality rates Cohort studies can either be fixed or dynamic. If individ- in the first period of the study are then often lower than uals leave a fixed cohort, they are not replaced. In dynamic would be expected in the general population. This will be cohorts, individuals may drop out of the cohort, and new apparent when mortality rates start to increase suddenly a individuals may join as they become eligible. few years into the study. Selection of cohort Follow-up of individuals The cohort should be representative of the population to When following individuals over time, there is always the which the results will be generalized. It is often advanta- problem that they may be lost to follow-up.Individuals may geous if the individuals can be recruited from a similar move without leaving a forwarding address, or they may source, such as a particular occupational group (e.g. civil decide that they wish to leave the study. The benefits of servants,medical practitioners) as information on mortality cohort studies are reduced if a large number of individuals and morbidity can be easily obtained from records held at is lost to follow-up. We should thus find ways to minimize the place of work, and individuals can be re-contacted when these drop-outs, e.g. by maintaining regular contact with the necessary. However, such a cohort may not be truly repre- individuals. 7 Exposed Develop t0 disease (a) I1 factor II 1 cn I Disease-free (b) Unexposed to Develop disease (c) factor Disease-free (d) Fig. 15.1 Diagrammatic representation of a Starting point cohort study (frequencies in parenthesis, seeTable 15.1).
Information on outcomes and exposures The relative risk (RR) measures the increased (or decreased) risk of disease associated with exposure to the It is important to obtain full and accurate information on factor of interest. A relative risk of one indicates that the disease outcomes, e.g. mortality and illness from different risk is the same in the exposed and unexposed groups. A causes.This may entail searching through disease registries, relative risk greater than one indicates that there is an mortality statistics, GP and hospital records. increased risk in the exposed group compared with the unexposed group; a relative risk less than one indicates a Exposure to the risks of interest may change over the reduction in the risk of disease in the exposed group. For study period. For example, when assessing the relationship example, a relative risk of 2 would indicate that individuals between alcohol consumption and heart disease, an individ- in the exposed group had twice the risk of disease of those ual's typical alcohol consumption is likely to change over in the unexposed group. time. Therefore it is important to re-interview individuals in the study on repeated occasions to study changes in Confidence intervals for the relative risk should be exposure over time. calculated, and we can test whether the relative risk is equal to one. These are easily performed on a computer and Analysis of cohort studies therefore we omit details. Table 15.1contains observed frequencies. Advantages of cohort studies Table 15.1. Observed frequencies (see Fig. 15.1) The time sequence of events can be assessed. Exposed to factor They can provide information on a wide range of out- comes. Yes No Total It is possible to measure the incidencelrisk of disease directly. Disease of interest It is possible to collect very detailed information on Yes a b a + b exposure to a wide range of factors. No c d c + d It is possible to study exposure to factors that are rare. Exposure can be measured at a number of time points, so Total a+c b+d n=a+b+c+d that changes in exposure over time can be studied. There is reduced recall and selection bias compared with Because patients are followed longitudinally over time, it is case-control studies (Topic 16). possible to estimate the risk of developing the disease in the population,by calculating the risk in the sample studied. Disadvantages of cohort studies Estimated risk of disease In general, cohort studies follow individuals for long periods of time, and are therefore costly to perform. - Number developing disease over study period - a +b - Where the outcome of interest is rare, a very large sample size is needed. Total number in the cohort n As follow-up increases, there is often increased loss of The risk of disease in the individuals exposed and unex- patients as they migrate or leave the study,leading to biased posed to the factor of interest in the population can be esti- results. mated in the same way. As a consequence of the long time-scale, it is often Estimated risk of disease in the exposed group, difficult to maintain consistency of measurements and out- comes over time. Furthermore, individuals may modify risk,,, = al(a + c) their behaviour after an initial interview. Estimated risk of disease in the unexposed group, It is possible that disease outcomes and their proba- bilities, or the aetiology of disease itself, may change over risk,,,, = bl(b +d) time. risk,, Then, estimated relative risk = - risk,, exp
Example Smokingstatus at hascline MI in suhscquent 10vrars Total The British Regional Heart Study is a large cohort studv - - 5XVV of 7735 rncn aged 40-59 years randomly selected from 1x19 general practices in 24 British towns. with thc aim of idcn- Ever qmoked Yes No tifying risk factors for ischacmic heart disease. At recruit- 771X ment to the study. the men were asked about a numher of Ncvcr smoked -- demographic and lifestyle factors. including information on cigarette smoking habits. Of the 771%men who pro- Tnt;tl h.7(9.5%,) 5.736(90.5?L) vided information on smoking status. 5809 (76.4%) had smoked at somc stage during their lives (includin~thosc $7 (-l.Xf!%) 1732 (05.2?4:,) who were current smokers and those who were ex- smokers). Over the subsequent 10years. 650 01thesc 771S h5O(S..ln4,) 70h8(71.hn/o) nion (8.4%) had a myocardial infarction (MI ).The rcsults. displayed in the tahle.show the number (and percentage) Thc estimated rclative risk = (5h3/5899 ) = 2.00. of smokers and non-smokers who dcvclopcd and did not develop a MI over the 10vear period. (S7/lHlO) It can be shown that the 95010 confidence interval for the true rclative risk is (1.60.7.49). We can interpret the relative risk to nxan that a middle- aged man who has (*~~ersmokisetdwice as likely to suffcr a M I over the nest 10 ycar pwiod as a man who has ne\\.[Jr smokcd.Alternativcly. the risk of suffering a MI for a man who has ever smoked is IOU% prcaler than that of a man who has never smoked. Data kindly provided hy Ms F.C. L3rnpc.M~Prl.Wnlkcr and 13r P,Whincup.Dcpartmcn~of Prininry Carc and Pnp~~latioSnciences.Royal Free and LInivcrsily Collcgc Mtdical School. Royal Frtc Campus. London.L'K.
16 Case-control studies A case-control study compares the characteristics of a there may be more than one control for each case. Where group of patients with a particular disease outcome (the possible, controls should be selected from the same source cases) to a group of individuals without a disease outcome as cases. Controls are often selected from hospitals. (the controls), to see whether any factors occurred more or However,as risk factors related to one disease outcome may less frequently in the cases than the controls (Fig. 16.1). also be related to other disease outcomes, the selection of Such retrospective studies do not provide information on hospital-based controls may over-select individuals who the prevalence or incidence of disease but may give clues as have been exposed to the risk factor of interest, and may, to which factors elevate or reduce the risk of disease. therefore, not always be appropriate. It is often acceptable to select controls from the general population, although Selection of cases they may not be as motivated to take part in such a study, and response rates may therefore be poorer in controls than It is important to define whether incident cases (patients cases. The use of neighbourhood controls may ensure that who are recruited at the time of diagnosis) or prevalent cases and controls are from similar social backgrounds. cases (patients who were already diagnosed before enter- ing the study) should be recruited. Prevalent cases may Matching have had time to reflect on their history of exposure to risk factors, especially if the disease is a well-publicized one Many case-control studies are matched in order to select such as cancer, and may have altered their behaviour after cases and controls who are as similar as possible. In general, diagnosis. It is important to identify as many cases as pos- it is useful to sex-match individuals (i.e. if the case is male, sible so that the results carry more weight and the con- the control should also be male), and, sometimes,patients clusions can be generalized to future populations. To this will be age-matched.However, it is important not to match end, it may be necessary to access hospital lists and disease on the basis of the risk factor of interest, or on any factor that registries, and to include cases who died during the time falls within the causal pathway of the disease, as this will period when cases and controls were defined,because their remove the ability of the study to assess any relationship exclusionmay lead to a biased sample of cases. between the risk factor and the disease. Unfortunately, matching does mean that the effect on disease of the vari- Selection of controls ables that have been used for matching cannot be studied. Controls should be screened at entry to the study to ensure Analysis of unmatched case-control studies that they do not have the disease of interest. Sometimes Table 16.1 shows observed frequencies. Because patients are selected on the basis of their disease status, it is not possible to estimate the absolute risk of disease. We can calculate the odds ratio, which is given by: Odds of being a case in the Exposed to factor Odds ratio = unexposed group Unexposed to factor Diseased Odds of being a case in the exposed group Exposed to factor Table 16.1 Observed frequencies (see Fig. 16.1). Unexposed to factor Disease-f ree Exposed to factor (controls) Yes No Total Starting point Disease status a b a+b Fig. 16.1 Diagrammatic representation of a case-control study. Case c d c+d Control n=a+b+c+d Total a+c b+d
where,for example, the odds of being a case in the exposed Analysis of matched case-control studies group is equal to Where possible, the analysis of matched case-control probability of being a case in the exposed group studies should allow for t h efact that cases and controls are probability of not being a case in the exposed group linked to each other as a result of the matching. Further details of methods of analysis for matched studies can be The odds of being a case in the exposed and unexposed found in Breslow and Dayl. samples are: Advantages of case-control studies a/c a x d They are generally relatively quick, cheap and easy to and therefore the estimated odds ratio = -= ----. perform. b/d b x c They are particularly suitable for rare diseases. When the incidence of disease is rare, the odds ratio is an A wide range of risk factors can be investigated. estimate of the relative risk, and is interpreted in a similar There is no loss to follow-up. way, i.e.it indicates the increased (or decreased) risk associ- ated with exposure to the factor of interest.An odds ratio of Disadvantages of case-control studies one indicates that there is the same risk in the exposed and unexposed groups; an odds ratio greater than one indicates Recall bias, when cases have a differential ability to that there is an increased risk in the exposed group com- remember certain details about their histories, is a potential pared with the unexposed group, etc. Confidence intervals problem. For example, a lung cancer patient may well and hypothesis tests can also be generated for the odds remember the occasional period when hefshe smoked, ratio. whereas a control may not remember a similar period. If the onset of disease preceded exposure to the risk factor,causation cannot be inferred. Case-control studies are not suitable when exposures to the risk factor are rare. Example A loral of 1327 \\\\'oIiicn :~yorl50-XI \\cars \\\\,it11 hip Trac- nurcs. w h o livcd in e largely MI-ban;irc;l in Stvcdcn. \\vcrc in\\:u.;tiyatcd in thih i ~ n n ~ ; ~ t c hcc;~t Isc-c.ontrol cti~ilyT. hey wtrc conip;~rcdwith 32h2 control< \\vithin the samc :IFC rangc r;~nrlnmlyfelcctcil from tlic natinnnl register. Intcr- ' i 1 1 l r 1 1 ~ 1J I I (~I J u , , ) 1 2s1I:.( I\",, 1 1327 e K t W;IS ccntrcd on determining \\vhcthcr pcirtmcnopu~~s:~l \\ \\ ' l ~ l i i i u ~hip fr;rcrurc -T .: ( J f:. I?.: .:'-b? rcpI:loc~I I C I I ~ tlicr:~py (14 !TI-) s~~lwt;~nti:~lly ( c i ~ n l r t ~ l \\ I he risk of hip I'r;lcturc. Tlic results in the t;il>lc .;honplhc iiurnhcr o f \\vornc11 ~ v h o\\vcro currcnt uwrc I I I I-IRT ;rnd thr>sc who h;ld ncvcr itsccl or fornicrly uscrl H R7' in llic c;lsc and con trtd groups. ,#\\ po.;tmcnop:~~~sw;~olman in this ;rgc r;ingcs in Thc clhscrvcd ciclrls ratio= (JOx.?O7.~)'(7.~1~3)x7 ) Sucrlcn \\vho \\\\-as;I current ilscr of HRT thu.; 11:ttI .301'tlo f =0.39. thc risk of :I hip t'ract111-col'a \\\\oninn w h o hncl n c w r uxcci I t citn tic zhnivn thal tlic 05\"gtcontidcncc intcrv;~flor tlic or formerly uzctl I-IRT. i.c, hciny a currcnt user of' HR1- cl~!i[%r:~tioi x (0.2S. 0 . 5 ( ~ ) . rcducccl the rick of hip Sracturc Iy (>I \" i , . 1Breslow,N.E. & Day,N.E. (1980) Statistical Methods in Cancer Research. VolumeI - TheAnalysis of Case-control Studies. International Agency for Cancer Research,Lyon.
17 Hypothesis testing We often gather sample data in order to assess how much out a one-tailed test in which a direction of effect is speci- evidence there is against a specific hypothesis about the fied in HI. This might apply if we are considering a disease population. We use a process known as hypothesis testing from which all untreated individuals die; a new drug cannot (or significancetesting) to quantify our belief against a par- make things worse. ticular hypothesis. Obtaining the test statistic This topic describes the format of hypothesis testing in general (Box 17.1); details of specific hypothesis tests are After collecting the data, we substitute values from our given in subsequent topics.For easy reference, each hypoth- sample into a formula, specific to the test we are using, esis test is contained in a similarly formatted box. to determine a value for the test statistic. This reflects the amount of evidence in the data against the null Box 17.1 Hypothesis testing-a general overview hypothesis -usually, the larger the value, ignoring its sign, We define five stages when carrying out a hypothesis the greater the evidence. test: Obtaining the P-value 1 Define the null and alternative hypotheses under study All test statistics follow known theoretical probability dis- tributions (Topics 7 and 8).We relate the value of the test 2 Collect relevant data from a sample of individuals statistic obtained from the sample to the known distribu- tion to obtain the P-value,the area in both (or occasionally 3 Calculate the value of the test-statistic specificto H, one) tails of the probability distribution. Most computer packages provide the two-tailed P-value automatically. The 4 Compare the value of the test statistic to values from P-value is the probability of obtaining our results, or some- a known probability distribution thing more extreme, if the null hypothesis is true. The null hypothesis relates to the population of interest, rather than 5 Interpret the P-value and results the sample. Therefore, the null hypothesis is either true or false and we cannot interpret the P-value as the probability Defining the null and alternative hypotheses that the null hypothesis is true. We always test the null hypothesis (H,), which assumes no Using the P-value effect (e.g. the difference in means equals zero) in the popu- lation. For example, if we are interested in comparing We must make a decision about how much evidence we smoking rates in men and women in the population,the null require to enable us to decide to reject the null hypothesis hypothesis would be: in favour of the alternative. The smaller the P-value, the greater the evidence against the null hypothesis. H,: smoking rates are the same in men and women in the population Conventionally, we consider that if the P-value is less than 0.05, there is sufficient evidence to reject the null We then define the alternative hypothesis (H,), which hypothesis, as there is only a small chance of the results holds if the null hypothesis is not true. The alternative occurring if the null hypothesis were true.We then rejectthe hypothesis relates more directly to the theory we wish to null hypothesis and say that the results are significant at the investigate. So,in the example,we might have: 5% level (Fig. 17.1). HI:the smoking rates are different in men and women in In contrast, if the P-value is greater than 0.05, we usually the population. conclude that there is insufficient evidence to reject the null hypothesis. We do not reject the null hypothesis, and we say We have not specified any direction for the difference in that the results are not significant at the 5% level (Fig. 17.1). smoking rates, i.e. we have not stated whether men have This does not mean that the null hypothesis is true; simply higher or lower rates than women in the population. This that we do not have enough evidence to reject it. leads to what is known as a two-tailed test, because we allow for either eventuality, and is recommended as we are The choice of 5% is arbitrary. On 5% of occasions we will rarely certain, in advance, of the direction of any difference, incorrectly reject the null hypothesis when it is true. In if one exists.In some,very rare, circumstances, we may carry situations in which the clinical implications of incorrectly rejecting the null hypothesis are severe, we may require stronger evidence before rejecting the null hypothesis (e.g.
Probability density \\ Probability These tests generally replace the data with their ranks (i.e. function the numbers 1, 2, 3 etc., describing their position in the ordered data set) and make no assumptions about the prob- Probability ability distribution that the data follow. t f Test statistic Non-parametric tests are particularly useful when the sample size is small (so that it is impossible to assess the dis- A value of the A value of the tribution of the data), and when the data are measured on a test statistic which test statistic which categorical scale. However, non-parametric tests are gener- gives Pr 0.05 gives P< 0.05 ally wasteful of information; consequently they have less power (Topic 18) of detecting a real effect than the equiva- Fig. 17.1 Probability distribution of the test statistic showing a two- lent parametric test if all the assumptions underlying the tailed probability, P= 0.05. parametric test are satisfied. Furthermore, they are pri- marily significancetests that often do not provide estimates we may choose a P-value of 0.01, or 0.001).The chosen cut- of the effects of interest; they lead to decisions rather than off (e.g. 0.05 or 0.01) is called the significance level of the an appreciation or understanding of the data. test. Which test? Quoting a result only as significant at a certain cut-off level (e.g. 0.05) can be misleading. For example, if P = 0.04 Deciding which statistical test to use depends on the design we would reject Ho;however, if P =0.06we would not reject of the study, the type of variable and the distribution that it. Are these really different? Therefore, we recommend the data being studied follow. The flow chart on the inside quoting the exact P-value, often obtained from the com- front cover will aid your decision. puter output. Hypothesis tests versus confidence intervals Non-parametric tests Confidence intervals (Topic 11) and hypothesis tests are Hypothesis tests which are based on knowledge of the closely linked. The primary aim of a hypothesis test is to probability distributions that the data follow are known as make a decision and provide an exact P-value. Confidence parametric tests. Often data do not conform to the assump- intervals quantify the effect of interest (e.g. the difference tions that underly these methods (Topic 32). In these in means), and enable us to assess the clinical implications instances we can use non-parametric tests (sometimes of the results. However, because they provide a range of referred to as distribution-free tests, or rank methods). plausible values for the true effect, they can also be used to make a decision although exact P-values are not provided. For example, if the hypothesized value for the effect (e.g. zero) lies outside the 95% confidence interval then we believe the hypothesized value is implausible and would reject HwIn this instance we know that the P-value is less than 0.05 but do not know its exact value.
18 Errors in hypothesis testing Making a decision (beta); its compliment, (1 - P), is the power of the test.The Most hypothesis tests in medical statistics compare groups power, therefore, is the probability of rejecting the null of people who are exposed to a variety of experiences. We hypothesis when it is false; i.e. it is the chance (usually may, for example, be interested in comparing the effective- expressed as a percentage) of detecting, as statistically sig- ness of two forms of treatment for reducing 5 year mortality nificant, a real treatment effect of a given size. from breast cancer. For a given outcome (e.g. death),we call the comparison of interest (e.g. the difference in 5 year mor- Ideally, we should like the power of our test to be 100%; tality rates) the effect of interest or, if relevant, the treat- we must recognize, however, that this is impossible because ment effect.We express the null hypothesis in terms of no there is always a chance, albeit slim, that we could make a effect (e.g. the 5 year mortality from breast cancer is the Type I1 error. Fortunately, however, we know which factors same in two treatment groups); the two-sided alternative affect power, and thus we can control the power of a test by hypothesis is that the effect is not zero. We perform a giving consideration to them. hypothesis test that enables us to decide whether we have enough evidence to reject the null hypothesis (Topic 17). Power and related factors We can make one of two decisions; either we reject the null hypothesis, or we do not reject it. It is essential that we know the power of a proposed test at the planning stage of our investigation. Clearly, we should Making the wrong decision only embark on a study if we believe that it has a 'good' chance of detecting a clinically relevant effect, if one exists Although we hope we will draw the correct conclusion (by 'good' we mean that the power should be at least about the null hypothesis, we have to recognize that, 70-80%). It is ethically irresponsible, and wasteful of time because we only have a sample of information, we may and resources, to undertake a clinical trial that has, say, only make the wrong decision when we rejectldo not reject the a 40% chance of detecting a real treatment effect. null hypothesis. The possible mistakes we can make are shown in Table 18.1. A number of factors have a direct bearing on power for a given test. Type I error:we reject the null hypothesis when it is true, and conclude that there is an effect when, in reality, there is The sample size:power increases with increasing sample none. The maximum chance (probability) of making a Type size. This means that a large sample has a greater ability I error is denoted by a (alpha). This is the significance level than a small sample to detect a clinically important effect if of the test (Topic 17); we reject the null hypothesis if our it exists. When the sample size is very small, the test may P-value is less than the significance level, i.e. if P < a . have an inadequate power to detect a particular effect. We explain how to choose sample size, with power considera- We must decide on the value of a before we collect our tions, in Topic 33.The methods can also be used to evaluate data; we usually assign a conventional value of 0.05 to it, the power of the test for a specified sample size. although we might choose a more restrictive value such as 0.01. Our chance of making aType I error will never exceed The variability of the observations:power increases as our chosen significance level, say a = 0.05, because we will the variability of the observations decreases (Fig. 18.1). only reject the null hypothesis if P < 0.05. If we find that P > 0.05, we will not reject the null hypothesis, and, conse- The effect of interest:the power of the test is greater for quently, do not make a Type I error. larger effects.A hypothesis test thus has a greater chance of detecting a large real effect than a small one. Type I1 error:we d o not reject the null hypothesis when it isfalse, and conclude that there is no effect when one really The significance level:the power is greater if the signifi- cance level is larger (this is equivalent to the probability of exists.The chance of making a Type I1 error is denoted by P the Type I error (a) increasing as the probability of the Type I1 error (p) decreases). So, we are more likely to detect a Table 18.1 The consequences of hypothesis testing. real effect if we decide at the planning stage that we will regard our P-value as significant if it is less than 0.05 rather Reject Ho Do not reject Ho than less than 0.01. We can see this relationship between power and the significance level in Fig. 18.2. Ho true Type I error No error H, false No error Type I1 error Note that an inspection of the confidence interval (Topic 11) for the effect of interest gives an indication of whether the power of the test was adequate. A wide confidence interval results from a small sample and/or data with sub- stantial variability, and is a suggestion of poor power.
Multiple hypothesis testing priori. It is possible to use some form of post-hoe adjust- ment to the P-value to take account of the number of tests Often, we want to carry out a number of significance tests performed (Topic 22). For example, the Bonferroni on a data set, e.g. when it comprises many variables or there approach (often regarded as rather conservative) multi- are more than two treatments. The Type I error rate plies each P-value by the number of tests carried out; any increases dramatically as the number of comparisons decisions about significanceare then based on this adjusted increases, leading to spurious conclusions. Therefore, we P-value. should only perform a small number of tests, chosen to relate to the primary aims of the study and specified a Fig. 18.1 Power curves showing the relationship Sample size (per group) between power and the sample size in each of two groups for the comparison of two means using the Significance level unpaired t-test. Each power curve relates to a two-sided ---+a=0.05 test for which the significancelevel is 0.05,and the effect of interest (e.g. the difference between the treatment Sample size (per group) means) is 2.5.The assumed equal standard deviation of the measurements in the two groups is different for each power curve (see Example,Topic 33). Fig. 18.2 Power curves showing the relationship between power and the sample size in each of two groups for the comparison of two proportions using the Chi-squared test. Curves are drawn when the effect of interest (e.g. the difference in the proportions with the characteristic of interest in the two treatment groups) is either 0.25 (i.e.0.65-0.40) or 0.10 (i.e. 0.50-0.40); the significancelevel of the two-sided test is either 0.05 or 0.01 (see Example,Topic 33).
19 Numerical data: a single group The problem 4 Comparethe value of the test statisticto values from a known probability distribution We have a sample from a single group of individuals and Refer t to Appendix A2. one numerical or ordinal variable of interest.We are inter- ested in whether the average of this variable takes a partic- 5 Interpret the P-value and results ular value. For example, we may have a sample of patients Interpret the P-value and calculate a confidence interval with a specificmedical condition.We have been monitoring for the true mean in the population (Topic 11). triglyceride levels in the blood of healthy individuals and know that they have a geometric mean of 1.74mmol/L.We The 95% confidence interval is given by: wish to know whether the average level in our patients is the same as this value. +to.05x (S/&) The one-sample t-test where to.osis the percentage point of the t-distribution with (n - 1) degrees of freedom which gives a two-tailed Assumptions probability of 0.05. In the population, the variable is Normally distributed with a given (usually unknown) variance. In addition, we have taken a reasonable sample size so that we can check the assumption of Normality (Topic 32). Rationale Interpretation of the confidence interval We are interested in whether the mean, p, of the variable in The 95% confidence interval provides a range of values the population of interest differs from some hypothesized in which we are 95% certain that the true population value, pl.We use a test statistic that is based on the differ- mean lies. If the 95% confidence interval does not in- ence between the sample mean,T, and p,. Assuming that we clude the hypothesized value for the mean, pl, we reject do not know the population variance,then this test statistic, the null hypothesis at the 5% level. If, however, the confi- often referred to as t, follows the t-distribution. If we do dence interval includes pl, then we fail to reject the null know the population variance, or the sample size is very hypothesis at that level. large, then an alternative test (often called a z-test), based on the Normal distribution, may be used. However, in these situations,results from either test are virtually identical. Additionalnotation If the assumptions.arenot satisfied Our sample is of size n and the estimated standard devia- We may be concerned that the variable does not follow tion is s. a Normal distribution in the population. Whereas the t-test is relatively robust (Topic 32) to some degree of non- 1 Define the null and alternative hypotheses under Normality, extreme skewness may be a concern. We can study either transform the data, so that the variable is Normally Ho:the mean in the population, p, equals pl distributed (Topic 9), or use a non-parametric test such as HI:the mean in the population does not equal p,. the sign test or Wilcoxon signed ranks test (Topic 20). 2 Collect relevant data from a sample of individuals 3 Calculatethe value of the test statisticspecificto H, The sign test t=- ( z - ~ l ) Rationale The sign test is a simple test based on the median of the Ykl distribution. We have some hypothesized value, il, for the median in the population. If our sample comes from this which follows the t-distribution with (n - 1)degrees of population, then approximately half of the values in our freedom. sample should be greater than il and half should be less than il(after excluding any values which equal A). The sign test continued considers the number of values in our sample that are greater (or less) than A.
The sign test is a simple test; we can use a more powerful into account the ranks of the data as well as their signswhen test, the Wilcoxon signed ranks test (Topic 20), which takes carrying out such an analysis. 1 Define the null and alternativehypotheses under study I H,: the median in the population equals A (i.e, the positive) value of the number inside the bars. The H I :the median in the population does not equal A. distribution of z is approximately Normal. The subtrac- 2 Collectrelevant data from a sample of individuals tion of 11, in the formula for z is a continuity correction, 3 Calculatethe value of the test statistic specific to H, which we have to include to allow for the fact that we are Ignore all values that are equal to A, leaving n' values relating a discrete value (u) to a continuous distribution Count the values that are greater than A. Similarly, count (the Normal distribution). the values that are less than A. (In practice this will often involve calculating the difference between each value in 4 Compare the value of the test statistic to values from a the sample and A, and noting its sign.) Consider r, the known probabilitydistribution smaller of these two counts. If n' 5 10,refer r to Appendix A6 If n' 510, the test statistic is P 1 If n' > 10,refer z to Appendix Al. IP-;l-2 5 Interpretthe P-value and results Interpret the P-value and calculate a confidence interval mIf n' > 10,calculate z = for the median-some statistical packages provide this automatically; if not, we can rank the values in order of 2 size and refer to Appendix A7 to identify the ranks of the where 11/12 is the number of values above (or below) the values that are to be used to define the limits of the confi- median that we would expect if the null hypothesis were dence interval. In general, confidence intervals for the true. The vertical bars indicate that we take the absolute median will be larger than those for the mean. Example ccncr:~lx1pul;rlion.:\\ one-qamplr t-test \\ ~ ; ipserl'ornicd t o .17icrc \\i some c\\.itlcncc ~h:tthigh hloorl ~rigl!~ceri(lcIc\\.cl< in\\pc\\tig;~tctlii.;.'I'ri~lyccricIi'Ic\\cl.; ;\\reskc\\\\~edto the riyht (Fiic.S..%I: lo. triylyccrirlc Ic\\,cls ;Ire npprosim;itcl! Nor- :trc ;~.;.;oci:itcrlwit11 hcart discnsc.As part of ;I I;rr?e cclliort 1n:tIIy di.;1riI3~11c(tFl ic.S..7h)..;o\\vc pcrforrncd oirr ;~n:~ly.;i.; \\tucly on hcnrt rli-;cafe. triglyccridc Icvcls i w r c nvnilahle in (In the I c y \\.;iiue~I.n tI1c iiicii ill tlic gcncr;~pl opul;ttion.thc 232 Ii1c.n \\~11(1dc\\.clol.rcd Iicart d i s c a ~ c(l\\,cr tIic 5 yc;irs inc:tn of'tlic lo? \\'aliic.; cclual.; 0.-14loyI,(,mrnol'L) ccluiva- :tf'lcr rccruitmcllt. CVc arc i ~ i t ~ r c s t cil~l l\\vlictl~cr Ilic ICnlto ;I <conictric iiic;ln of 1.74minolL. :IvcSr;tcctri,idyccriclc Icvcls in the popz11;rtionol' nicn fro111 \\vhicli this sitlnplc i.; cl~tkscnis rhc .;;Inc:I.; th:it in thc 1 I,:1 i I i r iI in I p o i 5 Tlicrc i.; .;Iron: c \\ ~ i ~ l c ntcoc rcjcct the null liypothc.;iz IiIliO11 o f r1lt.n \\ ~ h oilcvclop Iic;i~-t ~ l i \\ c ; ~ s cilri:~ls tIi;ll omc.tric mean tri~lyccri~lcIc\\-cl in the 0.24 I(>?( n ~ r l ~ o l ~ l ~ ) V)I7 f incn \\\\.!lo tI~.\\eIo17hc;~rtc l i ~ e i i ~~ CeI I I ; I I F 11::tlic mc;m lo:,,, (tri:l\\:cci.itfc Icicl) in thc I ~ O ~ L I ~ ; I - I.7Jmnio1 I.. ~l'tic ycomctric 1nc;ln tripl!.ccriclc I ~ . \\ L . I lion ol'mcn tvho clc~cloplic:i~-~rFi~c;iscdocs not C ~ L I : I ~ in ~ h cpc>p~~l:~tri>ol' nmcn \\vhr, ilcvclop hcart clit 0.24 10s(mn~c~lil-) i.; c\\timatcd :I:. ;intilr~y(O..:l) = 1O\"'t. ~vhichel -2 Siirnplc \\i/c. )i = 2.12: Iric;tn 01' lo? \\.:rluc\\. 2.r)4rnmc>18'LT. lic Vi\",, conlidcncc intcr\\,al for gcon~ctricmcnn triylyccrirlc Icvcl r;tncc< from I . ' .v = 0.31 log {rnmoI!L): slandnr~ldevi;~tionof' log \\.:~lucs. 2.101iiiiioI'1 (i.2. illit ilog [ ( I 3 1 + I .Oh x 0.?.5'1?.~2]).~1~ .\\ = 0.23 lo? (11111l<~l!L) forc. in tlii.; ~lopiil;~tioonr patient.;. tlic ~ c o m c t r i cI tri~lycerirlcIc\\'cl is ciyt~ific;inilyhigher than that in tlic gctict.;rl popiil;~tioti.
We can use the sign test to carrv out R similar analysis median and geometric mean trislyceride level in the male on the untransformed ~riglvceridelevels as this does not population are similar. make any distributional assumptions. We assume {hat the 1 H,,: the median trigl!lceride level in the population of 4 We refer: to Appendix Al:P = 0.012. men who develop heart diwase equals 1.74mmollL. 5 There is evidence to reject the null hypothesis that the H,: the median Iri~lyceridelevel in the population median trigtyceride level in the population of men who of men who develop heart disease docs not equal develop heart disease cquals 1.74mrnollL.The formula 1.74mmollL. in Appendix A7 indicates that the 95% confidence inter- val for the population median is given by the lOlst and 2 In thisdataset,the median value equals 1.94mmollL. 137nd ranked values: these are 1.77 and 2.16mmollL. Therefore. in this population of patients. the median 3 We investigate the differences between each value triglyceride level is significantly higher than that in the and I .74.There are 231 non-zero differences. of which general population. 134 are positive and 96 arc negative.Therefore. r = 96.As the numher of non-zem differences is greater than 10, we calculate: %- - 196-35l- = 2.50 - Dala kindly provided I>! Mc F.C.Lampe. \\IFM. IValker and Dr P. Whincup. Department o l Primary Cart and Population Sciences. Roval F r ~ c and University Collc5e Medical School. Roy11Free Campus, London. I;K.
20 Numerical data: two related groups The problem We have two samples that are related to each other and one 2 Collect relevant data from two related samples numerical or ordinal variable of interest. 3 Calculatethe value of the test statisticspecificto Ho The variable may be measured on each individual in - two circumstances. For example, in a cross-over trial (Topic t=- (2-0) d 13), each patient has two measurements on the variable, S E ( ~=) s/fi one while taking active treatment and one while taking which follows the t-distribution with (n - 1) degrees of placebo. The individuals in each sample may be different, but are linked each in way-For patients in 4 Comparethe value of the test statisticto values from a known probabilitydistribution one group may be individually matched to patients in the Refer t to Appendix A2. other group in a case-control study (Topic 16). Such data are known as paired data. It is important to take account of the dependence between the two samples 5 InterprettheP-value and results when analysing the data, otherwise the advantages of Interpret the P-value and calculate a confidence interval pairing (Topic 13) are lost. We do this by considering the for the true mean differencein the population.The 95% differencesin the values for each pair, thereby reducing our confidence is given by two samples to a single sample of differences. '''0.05 ( ~ /d&) The paired t-test where to.o5is the percentage point of the t-distribution Assumptions with (n - degrees which gives a two-tailed In the population of interest, the individual differencesare Normally distributed with a given (usually unknown) vari- probability of 0-05. ance. We have a reasonable sample size so that we can check the assumption of Normality. Rationale If the assumptions are not satisfied If the two sets of measurements were the same, then If the differences do not follow a Normal distribution, the we would expect the mean of the differences between assumption underlying the t-test is not satisfied. We can each pair of measurements to be zero in the population either transform the data (Topic 9), or use a non-parametric of interest. Therefore, our test statistic simplifies to a test such as the sign test (Topic 19) or Wilcoxon signed one-sample t-test (Topic 19) on the differences, where the ranks test to assess whether the differences are centred hypothesized value for the mean difference in the popula- around zero. tion is zero. The Wilcoxon signed ranks test Additional notation Because of the paired nature of the data, our two samples Rationale must be of the same size, n. We have n differences, with InTopic 19,we explained how to use the signtest on a single sample mean, d ,and estimated standard deviation sd. sample of numerical measurements to test the null hypoth- esis that the population median equals a particular value. 1 Define the null and alternative hypothesesunder We can also use the sign test when we have paired observa- tions, the pair representing matched individuals (e.g. in a study case-control study,Topic 16) or measurements made on the same individual in different circumstances (as in a cross- Ho:the mean difference in the population equals zero over trial of two treatments, A and B, Topic 13). For each pair, we evaluate the difference in the measurements. The Hl: the mean difference in the does not sign test can be used to assess whether the median differ- ence in the population equals zero by considering the differ- equal zero. ences in the sample and observing how manv are greater (or continued
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139