Appraisal Critically Appraised Papers Targeted physiotherapy treatment for low back pain based on clinical risk can improve clinical and economic outcomes when compared with current best practice Synopsis Summary of: Hill JC et al (2011) Comparison of stratified and psychosocial obstacles to recovery. In the control group primary care management for low back pain with current a 30-min physiotherapy assessment and initial treatment best practice (STarT Back): a randomised controlled trial. including advice and exercises was provided, with the Lancet 378: 1560–1571. Published Online September 29, option of onward referral to further physiotherapy, based 2011 DOI:10.1016/S0140-6736(11)60937-9 [Prepared by on the physiotherapist’s clinical judgement. Outcome Margreth Grotle and Kåre Birger Hagen, CAP Editors.] measures: The 12 months score of Roland and Morris Disability Questionnaire (RMDQ). Secondary measures Question: Does a stratified primary care approach for were referral for further physiotherapy, back pain intensity, patients with low back pain result in clinical and economic pain catastrophising, fear-avoidance beliefs, anxiety, benefits when compared with current best practice? Design: depression, health-related quality of life, reduction of risk- A randomised, controlled trial with stratification for three subgroup, global change of pain, number of physiotherapy risk groups and a targeted treatment according to the risk treatment sessions, adverse events, health-care resource profile. Group allocation was carried out by computer- use and costs over 12 months, number of days off work generated block randomisation in a 2:1 ratio. Setting: Ten because of back pain, and satisfaction with care. Results: general practices in England. Participants: Men and women Of 851 patients assigned to the intervention (n = 568) and at least 18 years old with low back pain of any duration, with control groups (n = 283) a total of 649 completed the 12 or without associated radiculopathy. Exclusion criteria were months follow-up. Adjusted mean changes in RMDQ potentially serious disorders, serious illness or comorbidity, scores were significantly higher in the intervention group spinal surgery in the past 6 months, pregnancy, and receiving than in the control group at 4 months (4.7 [SD 5.9] vs 3.0 back treatments (except primary care). Interventions: In [5.9], between-group difference 1.8 [95% CI 1.6 to 2.6]) and the intervention group decisions about referral to risk group at 12 months (4.3 [6.4] vs 3.3 [6.2], 1.1 [0.6 to 1.9]). At 12 were made by use of the STarT Back Screening Tool. The 30- months, stratified care was associated with a mean increase min assessment and initial treatment focused on promotion in generic health benefit (0.039 additional QALYs) and cost of appropriate levels of activity, including return to work, a savings (£240.01 vs £274.40) compared with the control pamphlet about local exercise venues and self-help groups, group. There were significant differences in favour of the the Back Book, and a 15-min educational video Get Back intervention group in many of the secondary outcomes. Active. Low-risk patients were only given this clinic session. Conclusion: A stratified management approach including Medium-risk patients were referred for standardised a prognostic screening and treatment targeting, showed physiotherapy to address symptoms and function. High- improved clinical and economic benefits when compared risk patients were referred for psychologically informed with current best practice. physiotherapy to address physical symptoms and function, Commentary drop-out at 12 months follow-up) and a co-intervention consisting of a 15 minute educational video and the Back This trial represents a new and promising approach for the Book given all participants in the intervention group may physiotherapy management of low back pain in primary have influenced the results of the prognostic screening and care. By using a previously validated and simple-to-use targeted treatment. The study is however much needed prognostic screening tool developed in a primary care and shows that physiotherapy management of low back physician setting, Hill and colleagues found that a stratified pain can be improved. The promising approach by Hill management approach, in which prognostic screening and and colleagues and other recent literature indicating that treatment targeting were combined, resulted in improved low back patients are heterogeneous and profit by targeted primary care efficiency of physiotherapy. The potential treatment should be implemented by physiotherapists and for targeting treatment has been emphasised as a research further developed to find the best treatment strategy for this priority (Borkin and Cherkin 1996, Bouter et al 1998). The large and costly patient group. study is well-conducted, powered to detect differences between subgroups, and satisfies the recommendations Kjersti Storheim for studying subgroups of responders to physiotherapy Communication-and Research Unit for Musculoskeletal interventions (Hancock et al 2009). The results are consistent and in favour of the intervention group across Disorders (FORMI) and most outcome variables, included cost-effectiveness Orthopaedic Department, Oslo University Hospital and analysis. It should however be noted that the difference between groups in the main outcome variable (Roland University of Oslo, Norway Morris Disability Questionnaire) reached the pre-specified level of 2.5 only at one time point (2.5 (95% CI 0.9 to 4.2) in References the high-risk group at 4 months follow-up) and ranged from 0.1 (95% CI –1.1 to 1.4) to 2.0 (95% CI 0.8 to 3.2) for all Borkan JM, Cherkin DC (1996) Spine 15: 2880–2884. other comparisons. This effect is similar to other primary care trials. Further, drop-out was substantial (almost 25% Bouter LM et al (1998) Spine 15: 2014–2020. Hancock M et al (2009) Phys Ther 89: 698–704. Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012 57
Dalton et al: Reliability of the Assessment of Physiotherapy Practice The Assessment of Physiotherapy Practice (APP) is a reliable measure of professional competence of physiotherapy students: a reliability study Megan Dalton1,3, Megan Davidson2 and Jennifer L Keating1 1Department of Physiotherapy, Monash University, 2School of Allied Health, La Trobe University, 3Griffith University Australia Question: What is the inter-rater reliability of the Assessment of Physiotherapy Practice (APP) instrument, and what is the error associated with individual scores? Design: Cross-sectional inter-rater reliability study. Thirty pairs of clinical educators each assessed one student after observing student practice over a 5-week clinical placement. Participants: Sixty clinical educators from five Australian universities formed 30 independent pairs of assessors. Outcome measures: Each pair completed two independent assessments of one student, providing 60 completed APP assessments and an associated Global Rating Scale score for analysis. Analysis: Correlational coefficients and measurement error expressed in APP scale units were computed to provide a comprehensive analysis of the likely utility of APP scores and to enable score and change score interpretation. Results: Percentage of agreement between assessors for each item ranged from 56% (Item 19, evidence-based practice) to 83% (Item 20, risk management) and across all items averaged 70% (SD 7). The ICC(2,1) was 0.92 (95% CI 0.84 to 0.96) for the total APP score and 0.72 (95% CI 0.50 to 0.86) for the Global Rating Scale. The standard error of measurement for the total score (scale width 0–80) was 3.2 APP points and the MDC90 was 7.86 representing 9% of the scale width. Bland-Altman analyses identified no systematic differences between raters. Conclusion: Clinical educators demonstrated a high level of reliability when using the APP instrument to assess physiotherapy students’ level of professional competence in workplace- based practice. <%BMUPO. %BWJETPO. ,FBUJOH+- 5IF\"TTFTTNFOUPG1IZTJPUIFSBQZ1SBDUJDF \"11 JTBSFMJBCMF NFBTVSF PG QSPGFTTJPOBM DPNQFUFODF PG QIZTJPUIFSBQZ TUVEFOUT B SFMJBCJMJUZ TUVEZ Journal of Physiotherapy o> ,FZXPSET Educational measurement, Professional competence, Clinical competence, Physical therapy (Specialty), inter-rater reliability, intraclass correlation coefficient, Physiotherapy Introduction with inadequate consistency in different circumstances, when the underlying construct (in this case, professional The Assessment of Physiotherapy Practice (APP) is a competence) is unchanged, would be of limited value no 20-item instrument covering professional behaviour, matter how sound other arguments are for its validity. communication, assessment, analysis and planning, In the context of assessment of workplace performance, intervention, evidence-based practice, and risk management. reliability is the extent to which assessment yields relatively Each item is assessed on a 5-level scale from 0 (Infrequently/ consistent results across occasions, contexts and assessors rarely demonstrates performance indicators) to 4 (Baartman et al 2007). Reliability is dependent on the (Demonstrates most performance indicators to an excellent characteristics of the test, the conditions of administration, standard). A rating of 2 (Demonstrates most performance the group of examinees and the interaction between these indicators to an adequate standard) indicates that the factors (Streiner and Norman 2003, Wolfe and Smith 2007). minimum standard for an entry-level physiotherapist has While repeated, blinded testing of the same student under been met. The total APP score ranges from 0 to 80. Rasch the same conditions in the authentic practice environment analysis of APP scores indicated that the data had adequate by the same assessor is not feasible in performance- fit to the chosen measurement model (Rasch Partial Credit based assessment, the consistency with which different Model), the Person Separation Index demonstrated the assessors rate the performance of different students (inter- scale was internally consistent discriminating between rater reliability) is achievable. Since inter-rater reliability four groups of students with different levels of professional competence, the items were targeting the intended construct 8IBUJTBMSFBEZLOPXOPOUIJTUPQJD The (professional competence) and the instrument demonstrated Assessment of Physiotherapy Practice (APP) is unidimensionality (Dalton et al 2011). The APP has been a valid measure of the clinical competence of widely adopted by entry-level physiotherapy programs in physiotherapy students. It covers professional Australia and New Zealand. behaviour, communication, assessment, analysis, planning, intervention, evidence-based practice and risk Given the high stakes of summative assessments of clinical management. performance, assessment procedures should not only be feasible and practical within the clinical environment, 8IBUUIJTTUVEZBEET Clinical educators demonstrate but also demonstrate sufficient reliability and validity for a high level of reliability using the APP to assess the purpose (Baartman et al 2007, Epstein and Hundert students in workplace-based practice. 2002, Roberts et al 2006). An instrument that yields scores Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license. 49
Research 5BCMF. Participant and placement characteristics. Characteristic University 1 University 2 University 3 University 4 University 5 Program 4-year bachelor 4-year bachelor 4-year bachelor 4-year bachelor 5-year double degree degree degree degree degree Year of study 3 3 )%* 3 5 Students, n 1:3 3:3 2:4 3:2 3:6 male:female Student age (yr), 22 (3) 22 (3) 22 (3) 23 (3) 23 (3) mean (SD) Clinical educators, n 3:5 4:8 5:7 4:6 6:12 male:female Clinical educator 39 (9) 37 (8) 33 (9) 36 (9) 35 (9) age (yr), mean (SD) Facility type Hospital Hospital Hospital Hospital Hospital 9b_d_YWbWh[W%i Orthopaedics Orthopaedics (inpatients), Cardiorespiratory, Neurological Cardiorespiratory, (inpatients), Musculoskeletal Musculoskeletal (outpatients) Paediatrics rehabilitation, Gerontology (outpatients), Community health rehabilitation Paediatrics contains all the sources of error contributing to intra-rater Sixty clinical educators formed 30 independent pairs of reliability, plus differences that arise in decisions made by assessors. Since not all physiotherapy education programs different observers, demonstration of adequate inter-rater typically utilised shared supervision (ie, two supervisors reliability is sufficient evidence of adequate intra-rater sharing supervision of a student), five programs where reliability (which is typically more reliable) (Streiner and this routinely occurred were identified from the twelve Norman 2003, Wilson 2005). physiotherapy entry-level programs in Australia and clinical educators were invited to participate in the trial. Assuming that there is a true value for professional competence, two sources of error in ratings are of Replication of authentic practice meant that the assessors interest. One is the random variation in scores when the provided educational supervision to the students during same underlying professional competence is assessed by the clinical placement and then each student (n = 30) was independent assessors; the other is the systematic variation assessed independently by their unique pair of educators in scores. The latter may result, for example, from assessors using the APP at the end of a five-week clinical placement with different expectations of entry level competence for block. The blocks were scheduled across one university individual items on the APP, or from different circumstances semester. Educators completed the APP and also gave within which the student is assessed that enable or restrict students a rating of overall performance, on a Global a view of student competence. Systematic variation is of Rating Scale of not adequate, adequate, good, or excellent. interest because it may be possible to trace the source of Students, working with supervision, provided physiotherapy errors of this nature and address them with methods such as services during this placement on a full-time basis (32–40 standardised training of assessors, or adjustment of grades hours/week). Approval for the study was obtained from the for areas of practice where higher level skills are typically human ethics committees of each of the five participating expected (eg, critical care wards). Random errors are, by universities. their nature, unpredictable. They need to be estimated and allowed for in score interpretation (Rankin and Stokes Participants 1998). Students enrolled in entry-level physiotherapy programs The research question was therefore: from five universities in Australia were assessed by What is the inter-rater reliability of the APP instrument, educators using the APP on completion of a five-week full- and what is the error around individual scores? time clinical placement block. Recruitment procedures optimised representation of physiotherapy clinical Method educators by location (metropolitan, regional/rural, and remote), clinical area of practice, years of experience This reliability study was conducted in the authentic practice as a clinical educator, and organisation (private, public, environment to investigate the error in APP measurements hospital based, community based, and non-government). in the typical application of the instrument (Baartman et al The placements occurred during the last 18 months of the 2006). students’ physiotherapy program and represented diverse areas of physiotherapy practice including musculoskeletal, Design cardiorespiratory, neurological, paediatric, and gerontological physiotherapy. The inter-rater reliability trial was a cross-sectional study designed to replicate authentic assessment procedures. 50 Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license.
Dalton et al: Reliability of the Assessment of Physiotherapy Practice Item 20: Identifies adverse events/near misses Item 7: Conducts an appropriate patient/client interview Item1: Demonstrates an understanding of patient rights and consent Item 14: Performs interventions appropriately Item 5: Communicates effectively and appropriately Item 11: Identifies and prioritises patient’s/client’s problems Item 3: Demonstrates ethical, legal and culturally sensitive practice Item 16: Monitors the effect of intervention Item 15: Is an effective educator Item 13: Selects appropriate intervention in collaboration with patient Item 12: Sets realistic short and long term goals with patient Item 10: Appropriately interprets assessment findings Item 9: Performs appropriate physical assessment procedures Item 6: Demonstrates clear and accurate documentation Item 2: Demonstrates commitment to learning Item 17: Progresses intervention appropriately Item 18: Undertakes discharge planning Item 8: Selects and measures relevant health indicators and outcomes Item 4: Demonstrates teamwork Item 19: Applies evidence-based practice in patient care 0 25 50 75 Percentage agreement close agreement exact agreement 'JHVSF. Percentage agreement between raters for 20 items on APP. Percent close agreement is within 1 point on the 5-point scale. Inter-rater reliability trial procedure blind to scores awarded by the partner educator. Assessment data were excluded from analysis if either the student or Information on the reliability trial was provided in writing their clinical educator did not consent to participate in the to the educators and students and their written consent to research and if any pair of assessors did not complete the participation was obtained. All clinical educators received APP instrument as per the instructions that both assessors training in the use of the APP through workshop attendance must complete the APP independently within 12 hours of and/or access to the APP resource manual. During the trial each other. Participants were advised that all data would be a member of the research group was available to answer permanently de-identified prior to data analysis. questions by phone or email. Students were educated in the assessment process and use of the APP instrument Data management and analysis using a standardised presentation prior to placements commencing, and information about the APP was included On completion of each placement the completed APP forms in each university’s student clinical education manual. To were returned by mail; data were entered into a spreadsheet, be eligible to participate, each pair of educators had to be matched to the paired report, and de-identified prior to able to make sufficient observation of student performance analysis. Planned data analysis included: descriptive to confidently complete the APP at the end of the five-week statistics; calculation of Pearson’s r and the Intraclass placement. In addition, each participant had to be able to Correlation Coefficient (ICC 2,1) (two-way random-effects independently complete an APP assessment and remain model) (and their confidence intervals), the standard error of Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license. 51
Research 80 70 60 Rater 2 50 40 30 70 80 30 40 50 60 Rater 1 'JHVSF. Scatterplot of APP scores for Rater 1 and Rater 2. measurement (SEM) and the minimum detectable change There was complete agreement between 24 pairs of raters at 90% confidence (MDC90), a Bland and Altman analysis (80%) for the overall global rating of student performance. for total and individual item scores, and a plot of the mean The remaining six pairs of raters all scored within one point of scores for the two raters against the difference between of each other on the 4-point Global Rating Scale. the rater scores (Bland and Altman 1986) to examine consistency in error across the spectrum of obtained scores. Pearson’s product-moment correlation In addition, percentage agreement for decisions across coefficient raters in total scores, item scores, and Global Rating Scale scores was calculated. A scatterplot was visually assessed for violation of assumptions of linearity and homoscedasticity. Figure 2 No previous data were available with which to conduct shows the positive, strong (Cohen 1988), linear, significant power analysis regarding the numbers required to achieve relationship between Rater 1 and Rater 2 total APP scores significance for the obtained inter-rater score correlation. A [r = 0.92 (95% CI 0.87 to 0.95), p < 0.0005]. The coefficient minimum of 30 pairs of educators was set as the desirable of determination (r2 = 0.85) indicates that 85% (95% CI 75% recruitment target as this sample size typically produces to 90%) of the variance in a rater’s scores was explained by data that conform to a normal distribution (Gravetter variance in the other rater’s scores. and Wallnau 2005). The research team considered that if adequate evidence of reliability was not identified with *OUSBDMBTT$PSSFMBUJPO$PFGmDJFOU *$$ this sample size, it would be unlikely that APP scores had properties required for confident interpretation of scores for The ICC(2,1) (two-way random effects model) for total APP an individual student. scores for the two raters was 0.92 (95% CI 0.84 to 0.96). The ICC(2,1) for the global rating scale scores was 0.72 (95% Results CI 0.50 to 0.86). Table 2 presents the ICC(2,1) results for the total score, each of the 20 APP items, and the Global Thirty-three pairs of clinical educators (66 independent Rating Scale. educators) and 33 independent third and fourth year physiotherapy students consented to participate in the 4UBOEBSE&SSPSPG.FBTVSFNFOU 4&. reliability trial. Three pairs were subsequently excluded as the educators completed the APP instrument a week The SEM for the total score was 3.2 APP points (scale width apart, allowing for errors due to real changes in student 0–80) indicating that a student’s true score will typically performance over that time. Table 1 presents a summary of fall between an obtained score plus or minus 3.2 (at 68% participant characteristics. confidence). The 95% confidence band around a single score was 6.5 APP points (given t(0.05, df = 29) = 2.045). Percentage agreement between raters This implies that in 95% of cases a student’s true APP total score will fall between the obtained score plus or minus 6.5 Ratings by two assessors for 14 of the 20 APP items were points. identical among 70% or more of the 30 pairs. Figure 1 shows the percent exact agreement and the percent close .JOJNBM%FUFDUBCMF$IBOHF .%$ agreement, ie, within 1 point on the 5-point scale, for each of the 20 items. Minimal detectable change scores were calculated for the total and individual item score data at the 90% confidence interval. The MDC90 for the APP total scores was 7.86 52 Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license.
Dalton et al: Reliability of the Assessment of Physiotherapy Practice 5BCMF. Intraclass correlation coefficient (ICC), standard error of the measurement (SEM) and minimum detectable change (MDC90) for the total APP score, global rating scale, and individual APP items. ICC(2,1)a 95% CI 2SEM MDC90 60.5 70.86 Total APP score 0.92 0.84 to 0.96 0.84 0.98 Global Rating Scale 0.72 0.50 to 0.86 0.31 0.69 0.35 0.70 Professional behaviour 0.81 0.64 to 0.90 0.35 0.77 ?j[c'0:[cedijhWj[iWdkdZ[hijWdZ_d]e\\fWj_[dj%Yb_[djh_]^jiWdZYedi[dj 0.70 0.46 to 0.85 0.45 0.64 0.77 0.57 to 0.88 Item 2: Demonstrates commitment to learning 0.65 0.37 to 0.81 0.30 0.85 Item 3: Demonstrates ethical, legal and culturally sensitive practice 0.31 0.80 Item 4: Demonstrates teamwork 0.30 0.80 Communication 0.82 0.66 to 0.91 0.43 0.61 ?j[c+09ecckd_YWj[i[\\\\[Yj_l[boWdZWffhefh_Wj[bo¸l[hXWb%ded#l[hXWb 0.79 0.56 to 0.89 0.38 0.71 Item 6: Demonstrates clear and accurate documentation 0.37 0.65 0.36 0.74 Assessment 0.80 0.62 to 0.90 0.35 0.75 ?j[c-09edZkYjiWdWffhefh_Wj[fWj_[dj%Yb_[dj_dj[hl_[m 0.60 0.29 to 0.77 0.35 0.73 0.71 0.48 to 0.85 Item 8: Selects and measures relevant health indicators and outcomes 0.29 0.85 Item 9: Performs appropriate physical assessment procedures 0.35 0.81 0.38 0.60 Analysis and planning 0.63 0.35 to 0.80 0.36 0.77 Item 10: Appropriately interprets assessment findings 0.75 0.53 to 0.87 0.44 0.71 0.76 0.55 to 0.87 ?j[c''0?Z[dj_Å[iWdZfh_eh_j_i[ifWj_[dj½i%Yb_[dj½ifheXb[ci 0.73 0.50 to 0.86 0.44 0.68 ?j[c'(0I[jih[Wb_ij_Yi^ehjWdZbed]j[hc]eWbim_j^j^[fWj_[dj%Yb_[dj ?j[c')0I[b[YjiWffhefh_Wj[_dj[hl[dj_ed_dYebbWXehWj_edm_j^fWj_[dj%Yb_[dj 0.34 0.75 Intervention 0.82 0.66 to 0.91 Item 14: Performs interventions appropriately 0.82 0.65 to 0.90 Item 15: Is an effective educator 0.60 0.32 to 0.79 Item 16: Monitors the effect of intervention 0.76 0.57 to 0.88 Item 17: Progresses intervention appropriately 0.71 0.49 to 0.85 Item 18: Undertakes discharge planning Evidence-based practice 0.70 0.43 to 0.83 Item 19: Applies evidence based practice in patient care Risk management 0.74 0.52 to 0.86 ?j[c(&0?Z[dj_Å[iWZl[hi[[l[dji%d[Whc_ii[iWdZc_d_c_i[ih_iaWiieY_Wj[Z with assessment and interventions a = all ICC p < 0.0005 (given t(0.1, df = 29) = 1.699). This implies that a change scores than among low scores, or vice versa. Errors appear in score of around 8 APP total score units is required to be similar regardless of the magnitude of averaged scores, confident that for 90% of students demonstrating changes indicating that it is valid to apply a single error estimate in of this magnitude, real change in professional competence the interpretation of scores across the width of the scale. has occurred. As the APP scale width is 0–80, the MDC90 represents 9% of the scale. For each item the MDC90 ranges Discussion from 0.60 to 0.85. Therefore on the 5-point rating scale used to score each item, a change in rating of around 1 point In this inter-rater reliability study of APP scores, the (the minimal observable change) indicates that real change percentage agreement for individual items was high with in performance on that item has occurred beyond random 70% absolute agreement on 14 of the 20 items. Similarly variability. there was complete agreement between raters for the overall global rating of student performance on 80% of occasions. Bland-Altman analyses Where there was a lack of agreement, all raters were within one point of agreement on both the 5-point item rating scale A Bland and Altman plot was constructed to display errors and the Global Rating Scale. in estimates of total APP scores (Figure 3). In this plot, differences between raters’ marks were plotted against Individual item ICCs ranged from 0.60 for Item 8 (selecting the mean of the two raters’ marks, and the 95% limits of relevant health indicators and outcomes) and Item 16 agreement were defined. The Bland-Altman plot shows that (monitoring the effect of intervention), to 0.82 for Item 5 the disagreement between raters was not greater among high Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license. 53
Research +1.96 SD 10 7.7 5 Rater 1 – Rater 2 0 Mean –1.4 –5 –10 –1.96 SD –10.4 –15 40 50 60 70 80 90 30 Average of Rater 1 and Rater 2 'JHVSF. Plot of the differences between raters’ marks against the means of raters’ marks for the total score out of 80 (n = 60 assessments). The mean difference between raters bisects the y-axis and the upper and lower lines represent the 95% limits of agreement. (verbal communication), Item 14 (performing interventions), The comprehensive nature of the training of raters in use and Item 15 (being an effective educator). The ICC(2,1) for of the APP instrument may have enabled informal norming total APP scores for the two raters was 0.92 (95% CI 0.84 to occur (a desirable outcome), positively influencing the to 0.96), while the SEM of 3.2 and MDC90 of 7.86 allows level of agreement between raters. While the possibility of scores for individual students to be interpreted relative to inadvertent communication between raters may be seen as error in the measurement. a limitation of the inter-rater reliability study, independent replication of the assessment process as it occurs in practice It should be noted that while 85% of the variance in the was given priority and the possible limitations relating to second rater’s scores are explained by variance in the first this method were considered acceptable. rater’s scores, the remaining 15% of variance remains unexplained error. It has been proposed that raters are the Four studies have investigated inter-rater reliability of primary source of measurement error (Alexander 1996, physiotherapy clinical performance assessment instruments. Landy and Farr 1980). Other studies suggest that rater Intraclass correlations (2,1) of 0.87 for the total Clinical behaviour may contribute less to error variance than other Performance Instrument (CPI) score were found for joint factors such as student knowledge, tasks sampled, and case evaluators of physiotherapy students and 0.77 for joint specificity (Govaerts et al 2002, Keen et al 2003, Shavelson assessments of physiotherapy assistants (Task Force for the et al 1993). Development of Student Clinical Performance Instruments 2002). Coote et al (2007) reported an ICC of 0.84 for the A limitation of the current study is that while the paired Common Assessment Form (CAF), and Meldrum et al assessors were instructed not to discuss the grading of student (2008) reported an ICC of 0.84 for a predecessor to the CAF. performance during the five-week clinical placements, Loomis (1985) reported ICCs of 0.62 and 0.59 for third and adherence to these instructions was not assessed. Similarly, fourth year total scores respectively on the Evaluation of discussion between educators on strategies to facilitate Clinical Competence form. learning in a student may have inadvertently communicated the level of ability being demonstrated by a student from A range of expressions of test reliability have been provided one educator to the other. This may have reduced the in this study. Although the ICC and SEM are related, they independence of the rating given by the paired raters, and do not convey the same information. The ICC provides inflated the correlation coefficient. Mitigating this was that, information on the level of agreement, whereas the SEM in all 30 pairs of raters, the education of students was shared provides information on the magnitude of error expressed with little, if any, overlap of work time between raters. in the scale units of measurement. The SEM for the APP While this trial design limited opportunities for discussion (3.2) represents 4% of the 0–80 scale width. The reliability between raters, educators who regularly work together or of the APP compares favourably with reliability estimates job share a position may be more likely to agree even if there reported by others who have developed instruments for is little, if any, overlap in their work time. Further research assessing competency to practise physiotherapy. Coote et al investigating the influence a regular working relationship (2007) and Meldrum et al (2008) reported data that enabled may confer on assessment outcomes is required. calculation of the SEM and it appears that for the Common Assessment Form and its predecessor this was also 3% to 54 Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license.
Dalton et al: Reliability of the Assessment of Physiotherapy Practice 4% on a 0–80 scale. The evidence suggests that clinicians Ethics: Approval for the study was provided by the Human are reasonably consistent in their judgements of student Ethics Committees of Monash University and from the ability to practise and that this consistency is evident across Human Ethics Committees of each of the participating different scales, countries, and practice conditions. universities. All participants gave written informed consent before data collection began. The 95% confidence band around a single score for this data was 6.5 APP points. The high retest correlations shown in Competing interests: Nil. this study provide evidence that educators using the APP are consistent in rating the relative ability of students. Support: Funding from the Australian Learning and This is important for conferral of academic awards and for Teaching Council (ALTC) enabled employment of a monitoring improvement in performance relative to peers. research assistant and travel to conduct focus groups and With a scale width of 0–80, an error margin of 6.5 (95% CI) training workshops. is acceptable. This error enables a high level of accuracy in ranking student performance as evidenced by the test/ Acknowledgements: The authors acknowledge the retest correlation of 0.92. Additionally in other data that assistance of Curtin, James Cook, La Trobe, Griffith, we have collected (Dalton 2011), students commencing Monash, and Sydney Universities and thank the clinical workplace-based education typically obtain mean scores of educators and students who participated. approximately 45 APP points; by the end of their clinical training average scores are in the order of 60 APP points. Correspondence: Dr Megan Dalton, Department of Hence an error margin of 6.5 allows a clear view of average Physiotherapy, School of Primary Health Care, Monash student progress across the workplace practice period. University, Australia. Email: [email protected] Across the practice period 77% of students change by more than the MDC90 of 8 points. Of the 23% of students with References scores that remain unchanged across 6 placement blocks, approximately 70% were relatively low performing students Alexander HA (1996) Physiotherapy student clinical education: across all blocks while the others were consistently average the influence of subjective judgements on observational (23%) to high (7%) performing students. assessment. Assessment & Evaluation in Higher Education 21: 357. However, it has implications for students whose score is within the borderline pass/fail range. If the pass mark is 40 Baartman LKJ, Bastiaens TJ, Kirschner PA, van der Vleuten out of the total 80 marks on the 20 items, then 40 minus 6.5 CPM (2006) The wheel of competency assessment: (33.5) might be considered an outright fail, while 40 plus presenting quality criteria for competency assessment 6.5 (46.5) might be considered an outright pass. The values programs. Studies in Educational Evaluation 32: 153–170. in between would require a process for deciding on further assessment for confidence that the student has an adequate Baartman LKJ, Bastiaens TJ, Kirschner PA, van der Vleuten level of professional competence. There are many possible CPM (2007) Evaluating assessment quality in competence- sources of error in assessment scores and these are likely based education: a qualitative comparison of two frameworks. to be related to circumstances, educator, student, and the Educational Research Review 2: 114–129. interaction of these factors. If other indicators of student ability indicated competency, a mark as low as 34 may Bland JM, Altman DG (1986) Statistical methods for assessing be acceptable. Alternatively, if other assessments indicate agreement between two methods of clinical measurement. a student consistently performs in the borderline range, Lancet 1: 307–310. further practice and assessment (or tailored remediation) may be triggered even by grades as high as 47. Cohen JW (1988) Statistical power for the behavioral sciences (2nd ed). Hillsdale: Lawrence Erlbaum. Norman et al (2003) reported that for health-related quality of life outcome measures, the change in measures of health Coote S, Alpine L, Cassidy C, Loughnane M, McMahon S, outcomes that people typically consider to be important Meldrum D, et al (2007) The development and evaluation (minimal important difference) is approximately half a of a Common Assessment Form for physiotherapy practice standard deviation of raw scores for a representative cohort. education in Ireland. Physiotherapy Ireland 28: 6–10. If the APP scores behaved as quality of life scores do, then an estimate of the possible minimally important difference Dalton M, Davidson M, Keating JL (2011) The Assessment would be 6–8 points, a proposal that warrants investigation. of Physiotherapy Practice (APP) is a valid measure of professional competence of physiotherapy students: a There will always be some lack of agreement between cross-sectional study with Rasch analysis. Journal of raters and defining the limits of tolerable disagreement is Physiotherapy 57: 239–246. challenging. Some variability would be expected due to the unpredictable challenges of a complex health services Epstein RM, Hundert EM (2002) Defining and assessing environment combined with variable opportunities for professional competence. Journal of American Medical educators to observe student ability across the spectrum Association 287: 226–235. of clinical skills. Despite these challenges, in this inter- rater reliability trial the physiotherapy clinical educators Govaerts MJ, van der Vleuten CP, Schuwirth LW (2002) demonstrated a high level of consistency in the assessment Optimising the reproducibility of a performance-based and marking of physiotherapy students’ performance assessment test in midwifery education. Advances in Health on clinical placements when using the Assessment of Science Education 7: 133–145. Physiotherapy Practice. Q Gravetter F, Wallnau L (2005) Essentials of statistics for the behavioural sciences. Pacific Grove: Wadsworth. Keen AJ, Klein S, Alexander DA (2003) Assessing the communication skills of doctors in training: reliability and sources of error. Advances in Health Sciences Education 8: 5–16. Landy FJ, Farr JL (1980) Performance rating. Psychological Bulletin 87: 72–107. Loomis J (1985) Evaluating clinical competence of physical therapy students. Part 2: assessing the reliability, validity and usability of a new instrument. Physiotherapy Canada 37: 91–98. Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license. 55
Research Meldrum D, Lydon A, Loughnane M, Geary F, Shanley L, Sayers Streiner DL, Norman GR (2003) Health Measurement Scales. K, et al (2008) Assessment of undergraduate physiotherapist A practical guide to their development and use (3rd ed). New clinical performance: investigation of educator inter-rater York: Oxford University Press. reliability. Physiotherapy 94: 212–219. Task Force for the Development of Student Clinical Performance Norman GR, Sloan JA, Wyrwich KW (2003) Interpretation Instruments (2002) The development and testing of APTA of changes in health-related quality of life: the remarkable clinical performance instruments. Physical Therapy 82: 329– universality of half a standard deviation. Medical Care 41: 353. 582–592. Wilson M (2005) Constructing Measures: An item response Rankin G, Stokes M (1998) Reliability of assessment tools modeling approach. Mahwah, New Jersey: Lawrence in rehabilitation: an illustration of appropriate statistical Erlbaum. analyses. Clinical Rehabilitation 12: 187–199. Wolfe EW, Smith EV Jr (2007) Instrument development tools Roberts C, Newble D, Jolly B, Reed M, Hampton K (2006) and activities for measure validation using Rasch models: Assuring the quality of high-stakes undergraduate Part I – instrument development tools. Journal of Applied assessments of clinical competence. Medical Teacher 28: Measurement 8: 97–123. 535–543. Website Shavelson RJ, Gao X, Baxter G (1993) Sampling variability in performance assessments. CSE Technical report number Dalton MB (2011) Development of the Assessment of 361. Santa Barbara: National Center for Research on Physiotherapy Practice – A standardised and validated Evaluation, Standards and Student Testing, University of approach to assessment of professional competence California. in physiotherapy. Doctor of Philosophy Thesis, Monash Kd_l[hi_jo\"C[bXekhd[$^jjf0%%Whhem$cedWi^$[Zk$Wk%^Zb%'/+/$' %*-/'*& Statement regarding registration of clinical trials from the Editorial Board of Journal of Physiotherapy All clinical trials submitted to Journal of Physiotherapy for publication must have been registered in a publicly-accessible trials register. We will accept any register that satisfies the International Committee of Medical Journal Editors requirements. Authors must provide the name and address of the register and the trial registration number on submission. Trials that have been registered prospectively will be given higher priority. From 2013 the journal will only accept trials that have been registered prospectively unless data collection began before 2006, in which case retrospective registration is acceptable. 56 Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012. Open access under CC BY-NC-ND license.
Appraisal Clinical Practice Guidelines Palliation in aged care A palliative approach to aged care in the community Latest update: 2011. Next update: Within 5 years. Patient Description: These guidelines present evidence for how to group: Adults aged over 65 years who have a progressive, deliver a palliative approach to care of the older adult in life-limiting illness or frailty who reside in their own, the community setting. It outlines different models of care, friends’, or relatives’ homes or retirement villages. the effectiveness of postacute transitional care programs Intended audience: Health care professionals providing or crisis care programs, and outlines tools to improve care for older people in the community. Additional palliative care such as technology and staff education. It versions: Companion documents include a booklet for provides information about and outlines evidence regarding older people and their families, a booklet for care workers, family carers, advanced health care planning and directives, and a document outlining the processes underpinning these psychosocial care, and spiritual support. Evidence for best practice recommendations. Expert working group: A the assessment and management of physical symptoms is guideline development group of seven Australian experts in provided, including issues such as pain, fatigue, respiratory cancer, palliative care, or aged care authored the guidelines. symptoms, and falls. More detailed information is provided A further 20 individuals wrote specific sections of the for older people with special needs such as those living with guidelines and a reference group of 19 individuals from a mental illness, those experiencing advanced Parkinson’s varied professional, government, and societal backgrounds disease, motor neurone disease, or dementia. Information also provided input. Funded by: Australian Government regarding how to provide a palliative approach to care for Department of Health and Ageing. Consultation with: Aboriginal and Torres Strait Islander people and those from National public consultation occurred in addition to focus diverse cultural and language groups is also provided. The groups and interviews with key stakeholders. Approved guidelines are supported by 75 references. by: The National Health and Medical Research Council of Australia. Location: The guidelines and companion Sandra Brauer documents are available at: www.palliativecare.org.au. The University of Queensland, Australia Type 2 diabetes 1SFWFOUJPOPGUZQFEJBCFUFT Latest update: 2009. Next update: Within 5 years. Patient Description: These guidelines present evidence about the group: Adults at risk of developing type 2 diabetes. Intended prevention of Type 2 diabetes at both an individual and audience: Clinicians, health promotion and public health population level, addressing the questions: Can Type 2 practitioners, planners and policy makers. Additional diabetes be prevented? How can it be prevented in high risk versions: Nil. Expert working group: Nine health individuals? How can high risk individuals be identified? professionals and a consumer representative comprised This 213 page document provides underpinning evidence the working group. The guidelines were developed by a regarding the effectiveness of lifestyle modification consortium comprising Diabetes Australia, Australian (including increasing physical activity, improving diet, Diabetes Society; the Australian Diabetes Educators’ weight loss), pharmacotherapy, and bariatric surgery to Association; the Royal Australian College of General prevent Type 2 diabetes. Evidence for modifiable and non- Practitioners; and The Diabetes Unit, Menzies Centre for modifiable risk factors for Type 2 diabetes is presented. Health Policy, and The University of Sydney. Funded by: Risk assessment tools are evaluated and recommendations Australian Government Department of Health and Ageing. made. Population strategies effective in reducing risk Consultation with: Expert advisory groups, stakeholder factors are detailed, and the cost effectiveness and socio- groups and consumers occurred via a targeted approach economic implications of preventing Type 2 diabetes are and a formal public consultation process. Approved by: discussed. A summary of recommendations and practice The National Health and Medical Research Council of points is provided on pp 6–7. Australia. Location: The guidelines are available at: www.diabetesaustralia.com.au/For-Health-Professionals/ Sandra Brauer Diabetes-National-Guidelines/ The University of Queensland, Australia Journal of Physiotherapy 2012 Vol. 58 – © Australian Physiotherapy Association 2012 63
Search