Important Announcement
PubHTML5 Scheduled Server Maintenance on (GMT) Sunday, June 26th, 2:00 am - 8:00 am.
PubHTML5 site will be inoperative during the times indicated!

Home Explore Survival analysis and Cox regression model

Survival analysis and Cox regression model

Published by chalard boonchan, 2018-07-17 05:17:52

Description: Survival analysis and Cox regression model 28-Jun-18_2 by nontiya

Search

Read the Text Version

Survival analysis and Cox regression model อาจารย์ ดร. นนท์ธิยา หอมขาํ 1

Introduction• Survival analysis methods are used to analyze the delay for a specific event to be observed:- death,- being cured after treatment, …• Survival analysis methods take into account that the event of interest may not be observed in some individuals.• Therefore, survival analysis methods are required to:- describe the delay to the event of interest,- compare the delay to the event in different groups,- estimate some factors’ effect on the delay to the event. 2

Introduction• Let’s denote T, the random variable representing the survival time.• The survival function, denoted S(t), is given by: S(t) = Proba[T > t] Representation of the survival function S(t)Median survival time 3

Overview1. Survival data2. Declaring survival data using STATA3. Non parametric survival analysis - Mortality rate - Kaplan-Meier estimates4. Semi-parametric survival analysis: Cox model 4

Introduction• To be able to perform a survival analysis, some information is required: - whether the event is observed or not, - the duration under observation (follow-up).• Data must be organized in a specific way.• Then it is possible to perform: - non parametric survival analysis (Kaplan-Meier estimates), - semi parametric survival analysis (Cox model), - parametric survival analysis. 5

1. Survival data• Individuals enter the study at different dates. For each individual, the inclusion date must be known.• A date corresponding to study end date has to be chosen. The same for all individuals.• At the study end date, the event can be: → observed, → not yet observed, and the individual is s$ll followed, → we do not know, as the individual was lost to follow-up earlier. 6

1. Survival data• Representation of the follow-up in 4 individuals• At the end of the study : 7 - the event is observed for the individual 1, - the event is not yet observed for individuals 2 and 3, - we do not know if the event is observed for the individual 4.

1. Survival data• If the event occurred before the study end date - the survival time for the individual is observed.• otherwise - the survival time for the individual is said to be censured.• Survival time is censored if the follow-up time was not long enough for the event to be observed.• The survival time is right-censored both in: 8 - individuals who reached the end of study without event,• and - lost to follow-up individuals.

1. Survival data• To summarize, information required to perform a survival analysis are: - the date of inclusion, - the date of end of the study, - the date at which the event occurred, ⇒ this information enables us to define the survival time. - binary variable indicating whether the event occurred. 9

2. Declaring survival data with StataIn the database: 1 line per individual (horizontal database).Example : Undernutrition after start received vitamin in children 9-15 year(cohort study)ID undernut startdate undernut_d LTFU_d last_date We must define :176 no 1-Apr-05 31-Dec-13 1- the survival time for each individual.177 yes 17-Sep-05 21-Jan-06 21-Jan-06 2- whether an survival time is178 no 11-Jul-05 7-Dec-06 7-Dec-06 censored or not at the end of the follow-up.179 no 10-Oct-06 31-Dec-13180 no 4-Feb-03 31-Dec-13181 no 7-Jun-07 31-Dec-13182 no 19-Jan-09 19-Nov-09 19-Nov-09183 yes 14-Jul-01 21-Oct-01 21-Oct-01184 yes 20-Oct-06 17-Apr-10 17-Apr-10185 yes 2-Apr-03 25-Oct-03 25-Oct-03186 yes 20-Dec-08 1-Oct-09 1-Oct-09187 yes 6-May-03 20-Aug-06 20-Aug-06 10

2. Declaring survival data with StataCreation of a new variable containing the survival time: gen dur_undernut = (last_date -startdate)/(365.25)Then, the variables defining the survival time must be declaredusing the instruction stset : stset dur_undernut , failure(undernut=1 ) 11

2. Declaring survival data with Stata• To validate and describe the survival data: stdesstdes failure _d: undernut == 1 analysis time _t: dur_undernut |-------------- per subject --------------|Category total mean min median max------------------------------------------------------------------------------no. of subjects 6474no. of records 6474 1 1 1 1(first) entry time 0 000(final) exit time 4.887166 .0821355 3.557837 12.93087subjects with gap 0 4.887166 .0821355 3.557837 12.93087time on gap if gap 0time at risk 31639.515failures 1215 .1876738 001------------------------------------------------------------------------------ 12

3. Non parametric survival analysis: Mortality rate• The mortality rate measures the frequency of death (or event), with respect to the duration of observation : rate = total number of death − up number of years of follow• The mortality rate is expressed in number of deaths per person-years.• This is often the first quantity estimated in a survival analysis.Strate → mortality rate (+ 95% confidence interval)strate name_var → mortality rate for each category of the covariate name_varstmh name_var → compares mortality rates to 1 category (=reference) of name_var 13

3. Non parametric survival analysis: Mortality rate• Example : Incidence of undernutrition.strate failure _d: undernut == 1analysis time _t: dur_undernutEstimated rates and lower/upper bounds of 95% confidenceintervals(6474 records included in the analysis)+-------------------------------------------------+|D Y Rate Lower Upper ||-------------------------------------------------|| 1215 3.2e+04 0.038401 0.036302 0.040622 |+-------------------------------------------------+ How to interpret this result? 14

3. Non parametric survival analysis: Mortality rate• Example : Incidence of undernutrition.strate belief+----------------------------------------------------------+| belief D Y Rate Lower Upper ||----------------------------------------------------------|| high 1068 2.9e+04 0.036714 0.034577 0.038983 || low 103 2.0e+03 0.051848 0.042743 0.062893 |+----------------------------------------------------------+stmh beliefMaximum likelihood estimate of the rate ratio comparing belief==1 vs. belief==0RR estimate, and lower and upper 95% confidence limits---------------------------------------------------------- RR chi2 P>chi2 [95% Conf. Interval]----------------------------------------------------------1.412 11.30 0.0008 1.154 1.729---------------------------------------------------------- 15

3. Non parametric survival analysis: Mortality rate• The Kaplan-Meier estimate is a graphical representation of the survival function S(t). Because it does not assume a specific distribution: ⇒ non-parametric method.• Principle of the Kaplan-Meier estimates: ⇒ being alive at time t is ⇒ being alive just before time t, and not dying at time t.• At each survival time ti, the conditional probability Qi is estimated: Qi = probability of a survival time >ti, conditional on being alive just before tiQi = Ni − Di With Ni = number of individuals at risk just before ti Ni Di = number of death at ti 16

3. Non parametric survival analysis: Mortality rate• At t0, Q0 = 1.• The estimation of the survival function is thus obtained by:∏ ∏S(ˆt) = Ni - Di Q= Nti <tti<t i i• The conditional probability is only modified when a death (or event) occurs ⇒ the Kaplan-Meier survival function is a series of horizontal steps. 17

3. Non parametric survival analysis: Mortality rate sts graph → survival function sts graph, failure → mortality function = 1-S(t)Example : Incidence of undernutrition (in year)Kaplan-Meier survival estimate Kaplan-Meier survival estimate0.00 0.25 0.50 0.75 1.00 0.00 0.25 0.50 0.75 1.00012345 012345 Time (years)Time (years) 18

3. Non parametric survival analysis: Mortality rate• The Kaplan-Meier estimate is a graphical representation of the survival function ⇒ it could also be interesting to estimate its precision.• It is possible to estimate the confidence interval for the survival function.Greenwood’s formulae : ∑Var[Sˆ (ti )] = Sˆ (ti )2 Dj jN - Dj|t j ≤ti jthe variance only changes when a death occurs.⇒ 95% confidence interval : Sˆ (ti ) ± 1.96 var Sˆ (ti ) 19

3. Non parametric survival analysis: Mortality rate sts graph, gwood → survival curve + 95% confidence interval Kaplan-Meier survival estimate0 .25 .5 .75 1 0 5 10 15 analysis time 95% CI Survivor function 20

3. Non parametric survival analysis: Mortality rate• Often, we want to know the probability of living at least until a certain time. sts list, at(#) → Estimates the probability to survive at least until the specified time #.sts list, at(0 1 5) failure _d: undernut == 1analysis time _t: dur_undernut Beg. Survivor Std.Time Total Fail Function Error [95% Conf. Int.]-------------------------------------------------------------------------------0 00 1.0000 . ..1 5651 264 0.9574 0.0026 0.9520 0.96215 2570 951 0.7523 0.0064 0.7396 0.7645-------------------------------------------------------------------------------Note: survivor function is calculated over full data and evaluated atindicated times; it is not calculated from aggregates shown at left. 21

3. Non parametric survival analysis: Mortality rate• It is also possible to represent the survival curves in different categories of a covariate. sts graph [if exp] [in range] , by(varlist) failureExample : 0.00 0.25 0.50 0.75 1.00 Kaplan-Meier survival estimate, by educaIncidence of undernutritionby maternal education level 01234 5 Time (years) 22 educa = high educa = low• How to decide if the survival curves are different? → statistical test.

3. Non parametric survival analysis: Mortality rate• Let’s assume we want to compare the survival curves in 2 groups.Assumptions : H0 : SA(t) = SB(t) HA : SA(t) ≠ SB(t)• The most common statistical test = logrank test. sts test varname , logrank 23

3. Non parametric survival analysis: Mortality rate• Principle of the logrank test:• At each time of death ti, we compare the repartition of death between the 2 groups.• At the time ti, we have: dead Not dead Group A d Ai n Ai - dAi n Ai Group B d Bi n Bi - dBi n Bi Total di ni - di ni• The number of death at time ti in group A is a random variable, denoted DAi, distributed according to a hypergeometrical law.• Under the assumption H0, the expectation and the variance of DAi are:E[DAi ] = di n Ai var[DAi ] = di (ni - di ) (n Ain Bi ) ni n 2 (n i -1) 24 i

3. Non parametric survival analysis: Mortality rate• The total number of deaths in group A, denoted DA, is also a random variable. Under the assumption H0, the expectation and variance of DA are:∑E[DA ] = iE[DAi ] ∑V[DA ] =V[D Ai ] i• Then we compare the observed number of deaths and the expected number of deaths under the assumption H0:(OA - EA )2 ~ χ12 ⇒ With this test, we can conclude whether the VA survival curves are statistically different or not. 25

3. Non parametric survival analysis: Mortality rateExample : sts test educa, logrankIncidence of undernutritionby maternal education level0.00 0.25 0.50 0.75 1.00 Kaplan-Meier survival estimate, by educa Log-rank test for equality of survivor functions | Events Events educa | observed expected ------+------------------------- high | 945 1003.12 low | 202 143.88 ------+------------------------- Total | 1147 1147.00 012345 chi2(1) = 26.85 Time (years) Pr>chi2 = 0.0000 educa = high educa = low 26

3. Non parametric survival analysis: Mortality rate• The logrank test can be generalized to the comparison of >2 groups.Assumptions: H0 : the k survival curves are identical. H1 : at least 1 survival curve is different.• Again, the logrank test is based on the comparison of observed and estimated repartition of deaths within the k categoriesNotations :ni = total number of individuals at risk at time of death tinij = number of individuals at risk at time ti, in category jdi = total number of deaths at time tidij = number of deaths at time ti, in category j 27

3. Non parametric survival analysis: Mortality rate• Let’s denote v = (v1, v2, …, vk) the vector in which all element correspond to the difference between the expected and observed number of deaths, in category j.∑Vj = dij - di n ij nii• The covariance matrix, denoted V, is constituted of k×k elements:∑Vj1 = di (ni - di ) (n i n i1δ - n ijn i1 ) With: δj1 = 1 if j =1, δj1 = 0 otherwise n 2 (n i -1) j1 ii• The chi-square statistic is: v' V -1v ~ χ2 k -1 28

3. Non parametric survival analysis: Mortality rate sts test provinces, logrankExample : Log-rank test for equality of survivor functionsIncidence of undernutritionby provinces | Events Events Kaplan-Meier survival estimate, by provinces provinces | observed expected -----------+-------------------------0.00 0.25 0.50 0.75 1.00 Chiang Mai | 83 142.57 Chiang Rai | 77 79.18 Lampang | 103 125.08 Lamphun | 43 119.01 Loei | 91 50.15 Nan | 54 101.32 Phrae | 259 223.52 Ranong | 334 202.10 Phuket | 61 42.54 012345 Tak | 50 58.65 Time (years) Yala | 60 70.87 provinces = Chiang Mai provinces = Chiang Rai -----------+------------------------- provinces = Lampang provinces = Lamphun provinces = Loei provinces = Nan Total | 1215 1215.00 provinces = Phrae provinces = Ranong provinces = Phuket provinces = Tak chi2(10) = 235.69 provinces = Yala Pr>chi2 = 0.0000 29

3. Non parametric survival analysis: Mortality rate• The logrank test is the: - the most commonly used test, - more powerful if the instantaneous hazard rate is constant.• For example, in the comparison of 2 survival curves : ( ) = constant• If the survival curves cross ⇒ the ratio of instantaneous hazard rate is not constant ⇒ the logrank test can not be used. 30





4- Semi parametric survival analysis: Cox mode• The Cox model is a proportional hazard model ⇒ The effect of a covariate on survival remains the same in time.• Let’s consider a Cox model including 1 binary covariate (sex: 0=man / 1=woman).• At any time t, the ratio between the hazard of death in men and women is given by: h(t, X = 1) = eβ h(t, X = 0) the risk of death in women is always eβ fold higher than in men. ⇒ This quantity is known as the hazard ratio (HR). 33

4- Semi parametric survival analysis: Cox mode• Nevertheless, proportional hazard does not mean constant hazard.h0(t) = hazard function in a patient with all covariates X=0. 34 this hazard function can have any form.

4- Semi parametric survival analysis: Cox mode• Moreover, proportional hazards does not mean that survival curves are parallels. Kaplan-Meier survival estimate, by educa0.00 0.25 0.50 0.75 1.00 012345 Time (years) educa = high educa = low 35

4- Semi parametric survival analysis: Cox modeCategorized covariates:• Binary covariate → may be directly introduced into a Cox model h(t, X = 1) = eβ h(t, X = 0) • Example : incidence of undernutrition Effect of the covariate maternal education on the incidence of undernutrition xi: stcox i.sex (you must have declared the survival data using stset before being able to perform a Cox model) 36

4- Semi parametric survival analysis: Cox modexi: stcox i.educa failure _d: undernut == 1analysis time _t: dur_undernutIteration 0: log likelihood = -9586.4376Iteration 1: log likelihood = -9574.6388Iteration 2: log likelihood = -9574.3052Iteration 3: log likelihood = -9574.305Refining estimates:Iteration 0: log likelihood = -9574.305Cox regression -- Breslow method for tiesNo. of subjects = 6260 Number of obs = 6260No. of failures = 1147Time at risk = 30603.92061Log likelihood = -9574.305 LR chi2(1) = 24.27 Prob > chi2 = 0.0000------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------_Ieduca_2 | 1.490285 .1155236 5.15 0.000 1.280225 1.734813------------------------------------------------------------------------------ 37

4- Semi parametric survival analysis: Cox mode• Covariate with k (k>2) categories (k-1) binary covariates must be included in the Cox model• Example for a covariate with 3 categories: creation of 2 dummy variablesCategories X1 X21 0 0 Reference category2 103 01The Cox model is: h(t, X1, X2 ) = h0 (t)eX1β1+X2β2HR are given by: h(t, X1 = 1, X2 = 0) = eβ1 h(t, X1 = 0 , X2 = 0) h(t, X1 = 0 , X2 =1) = eβ2 38 h(t, X1 = 0 , X2 = 0)

4- Semi parametric survival analysis: Cox mode• Stata can automatically create the dummy variables when categorical covariates are involved: xi: stcox i.covariate• By default, the reference category is the one with the lowest value.• It may be more relevant to chose another category as the reference. One can define the category to be used as reference using the following instruction before using the Cox model : char covariate[omit] value • Example : incidence of undernutrition Effect of the age (in categories) on the incidence of undernutrition 39

4- Semi parametric survival analysis: Cox modegen age_cat= 1 if age <= 29replace age_cat= 2 if age > 29 & age <= 39replace age_cat= 3 if age > 39 & age != .char age_cat[omit] 2xi: stcox i.age_cati.age_cat _Iage_cat_1-3 (naturally coded; _Iage_cat_2 omitted) failure _d: undernut == 1analysis time _t: dur_undernutNo. of subjects = 6430 Number of obs = 6430No. of failures = 1206Time at risk = 31406.76524Log likelihood = -10089.937 LR chi2(2) = 44.97 Prob > chi2 = 0.0000------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------_Iage_cat_1 | .8669935 .0519188 -2.38 0.017 .7709793 .9749648_Iage_cat_3 | 2.189806 .2780528 6.17 0.000 1.707354 2.808587------------------------------------------------------------------------------ 40

4- Semi parametric survival analysis: Cox modeContinuous covariates:• Can be directly introduced into a Cox model.h(t, Xi = xi ) = e(xi -xi' )β Each 1-unit increase of the covariate has theh(t, Xi' = xi' ) same effect on survival. Stata gives the HR corresponding to an increase of 1 unit of the covariate X.• If a continuous covariate is directly entered in a Cox model ⇒ assumes a linear relationship with survival. 41

4- Semi parametric survival analysis: Cox mode• Example : effect of the age on the incidence of undernutritionIncrease in the risk of death:For an individual aged 21, compared to an individual aged 20: eβ(21-20) = eβFor an individual aged 37, compared to an individual aged 36: eβ(37-36) = eβ 42

4- Semi parametric survival analysis: Cox modestcox age failure _d: undernut == 1 analysis time _t: dur_undernutNo. of subjects = 6430 Number of obs = 6430No. of failures = 1206Time at risk = 31406.76524Log likelihood = -10099.083 LR chi2(1) = 26.68 Prob > chi2 = 0.0000------------------------------------------------------------------------------_t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------age | 1.026531 .0051889 5.18 0.000 1.016411 1.036752------------------------------------------------------------------------------ 43

4- Semi parametric survival analysis: Cox mode• Compare between categories and continue variablestcox ageLog likelihood = -10099.083 Prob > chi2 = 0.0000------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------age | 1.026531 .0051889 5.18 0.000 1.016411 1.036752------------------------------------------------------------------------------xi: stcox i.age_cati.age_cat _Iage_cat_1-3 (naturally coded; _Iage_cat_2 omitted)Log likelihood = -10089.937 Prob > chi2 = 0.0000------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------_Iage_cat_1 | .8669935 .0519188 -2.38 0.017 .7709793 .9749648_Iage_cat_3 | 2.189806 .2780528 6.17 0.000 1.707354 2.808587------------------------------------------------------------------------------ 44

4- Semi parametric survival analysis: Cox mode• The Cox model makes the assumption of proportional hazards ⇒ at any time t, the ratio of hazards is constant between the differentcategories of a covariate.• This is a very strong assumption 45 ⇒ gives its meaning to the HR, ⇒ that must hold.

4- Semi parametric survival analysis: Cox modeMethod 0 – If the survival curves for different categories of a covariate cross, wecan be sure that the proportional hazard assumption does not hold. ⇒ the covariate can not be entered in a Cox model. 46

4- Semi parametric survival analysis: Cox mode• Method 1 – Test based on Schoenfeld’s residuals.• Schoenfeld’s residuals: - are estimated at each time when an event occurred, - correspond to the difference between the profile of the individual with the event, and the profiles of the individuals who were at risk. R i,j = Xi,j − X j (ti ,βˆ)value for covariate j Weighted Mean for covariate j in for individual i individuals at risk at ti• Weights are proportional to estimated risks : 47

4- Semi parametric survival analysis: Cox mode• If the proportional hazards assumption holds ⇒ residuals are linear with time.• To check the proportional hazards assumption: - linear regression on Schoenfeld’s residuals, ⇒ test if the slope in the linear regression = 0. 48

4- Semi parametric survival analysis: Cox mode• Before being able to test Schoenfeld’s residuals, a Cox model has to be applied, and th residuals have to be stored.stcox covariate, schoenfeld(new_var1*) scaledsch(new_var2*)stphtest, detail → test on Schoenfeld’s residuals 49

4- Semi parametric survival analysis: Cox mode• Example: Incidence of undernutritionDoes the proportional hazards assumption holds for the covariate education?xi: stcox i.educa, scaledsch(residual1*) schoenfeld(residual2*)------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------_Ieduca_2 | 1.490285 .1155236 5.15 0.000 1.280225 1.734813------------------------------------------------------------------------------stphtestTest of proportional-hazards assumptionTime: Time---------------------------------------------------------------- | chi2 df Prob>chi2------------+---------------------------------------------------global test | 0.20 1 0.6581----------------------------------------------------------------drop residual* 50


Like this book? You can publish your book online for free in a few minutes!
Create your own flipbook