Abstract
Objective. Magnetic resonance imaging (MRI) is increasingly used to measure articular inflammation and damage in patients with psoriatic arthritis (PsA). We evaluated the reliability of a new OMERACT PsA MRI scoring system, PsAMRIS, in PsA fingers.
Methods. In 2 separate studies, MRI scans were obtained from patients with clinical evidence of synovitis or dactylitis of the fingers. For the first cross-sectional study, images were obtained at one timepoint. For the second longitudinal study, images were obtained at 2 timepoints, 6 weeks apart. Scans were scored using PsAMRIS in an international multireader setting, for synovitis, tenosynovitis, periarticular inflammation, bone edema, bone erosions, and bone proliferation.
Results. Global status scores from both datasets revealed moderate to high reliability for scoring most features, although reliability was poor for periarticular inflammation in the cross-sectional study. Change scores that reflected inflammatory activity also exhibited moderate to good reliability in the longitudinal exercise, despite there being very little absolute change in MRI synovitis or tenosynovitis observed in this dataset. At the distal interphalangeal joints, reliability for change scores was acceptable only for synovitis and tenosynovitis.
Conclusion. Further development and testing of the PsAMRIS is planned to improve its performance as a clinical and research tool to identify and measure pathology in peripheral joint PsA.
Psoriatic arthritis (PsA) is characterized by a diverse array of musculoskeletal pathology involving the joints and periarticular structures of the peripheral and axial skeleton1. Disease activity and damage at these sites can be imaged using a variety of modalities including conventional radiography (CR), ultrasonography (US), and magnetic resonance imaging (MRI). MRI has a number of advantages over CR and US in that it can produce complex, high resolution, 3-dimensional images, depicting synovitis, tenosynovitis, and extracapsular inflammation, as well as bone inflammation (as bone edema) and damage (as erosion, ankylosis, and ultimately joint subluxation and deformity)2. These changes refer to PsA as it affects the peripheral joints, but MRI can also reveal axial pathology such as sacroiliitis and spondyloarthritis3. The development of effective therapies including biological disease modifying antirheumatic drugs (bDMARD) for PsA has increased the requirement for reliable measurement of the response to therapy4. MRI is particularly suitable for this role, as the images produced are digitized (and can be stored and later retrieved for comparison) and are not particularly operator-dependent as long as sequences and acquisitions are standardized, and the resolution is high enough for detection of inflammatory change that could be influenced by therapy.
The development of an instrument for scoring the peripheral arthritis of PsA was begun in 2004 under the auspices of the Outcome Measures in Rheumatology Clinical Trials (OMERACT) MRI in inflammatory arthritis group. The psoriatic arthritis MRI scoring system (PsAMRIS) was developed using the rheumatoid arthritis MRI scoring system (RAMRIS)5 as a template and formulated specifically for imaging the fingers, as this is a region where typical psoriatic pathology such as dactylitis is observed6. Initial testing revealed high interobserver reliability for scoring bone erosion and edema, but moderate to low reliability for scoring soft tissue inflammation (OMERACT PsA MRI exercise 1)7. In June 2007 in Barcelona a working party set out to refine MRI definitions of key pathologies and to revise the scoring system. The aim of our study was to evaluate the interobserver reliability of the revised PsAMRIS.
MATERIALS AND METHODS
Two multicenter studies (exercises) were performed using MRI scans from a total of 20 PsA patients and 20 healthy controls acquired in Copenhagen (C. Wiell). The first exercise involved reading 12 MRI scans taken at one timepoint (OMERACT PsA MRI cross-sectional exercise 2). Scans were from 10 patients with PsA and 2 healthy controls. The second exercise involved reading scans from 10 PsA patients, taken at 2 timepoints, 6 weeks apart (OMERACT PsA MRI longitudinal exercise 3). All patients met diagnostic criteria for PsA. They were rheumatoid factor-negative, had swelling of at least one finger joint (2nd–5th), and had at least 3 out of 76 tender and 3 out of 78 swollen joints. Their demographics were as follows for the cross-sectional exercise: median age 56 years, M:F = 7:3, disease duration 4.5 years; and for the longitudinal exercise: median age 53 years, M:F = 1:1, disease duration 8.5 years. For the longitudinal exercise, patients were treated with a tumor necrosis factor (TNF) inhibitor (adalimumab 25 mg every other week subcutaneously). This was begun at timepoint A, and scans were repeated after 6 weeks of therapy (timepoint B).
MRI scans: acquisitions
MRI scans of the 2nd–5th fingers were performed, imaging the metacarpophalangeal (MCP), proximal interphalangeal (PIP), and distal interphalangeal (DIP) joints using a 0.6 T Philips Panorama MRI unit (Philips Medical systems, Helsinki, Finland). The acquired images included a coronal T1-weighted 3-dimensional fast field echo [repetition time (TR) 20 ms, echo time (TE) 8 ms, flip angle 25°, field of view (FOV) 120 mm, matrix 240 x 240, slice thickness (ST) 0.8 mm, number of acquisitions (Acq) 1, acquisition time (TA) 4.31 min], allowing axial and sagittal reconstructions, and axial fat-saturated T1w sequences (TR 31 ms, TE 11 ms, flip angle 25°, FOV 150 mm, matrix 256 x 256, ST 4 mm, Acq 1, TA 4.57 min), before and after intravenous administration of the contrast agent Omniscan (0.1 mmol/kg; Amersham Health AS, Oslo, Norway). Additionally, sagittal [TR 4000 ms, TE 17 ms, inversion time (TI) 80 ms, flip angle 90°, FOV 160 mm, matrix 256 x 256, ST 3 mm, Acq 1, TA 6.56 min] and axial (TR 3000 ms, TE 17 ms, TI 80 ms, flip angle 90°, FOV 160 mm, matrix 256 x 256, ST 3 mm, Acq 1, TA 7.01 min) short-tau inversion recovery (STIR) sequences were performed before contrast administration. In exercise 3 axial STIR images were also available.
Scoring using the PsAMRIS system
Scans were anonymized, copied onto DVD, and circulated to 8 readers, AD, CW, FG, KGH, PBi, PBø, PC (both exercises), and FM (exercise 2) or MØ (exercise 3). Images were scored separately in different centers throughout Europe and Australasia. Scans were read using the commercial software package Merge eFilm Workstation™ (eFilm Lite, version 2.1.0; Merge Healthcare, Milwaukee, WI, USA). For the longitudinal exercise, images were read paired, but blind to chronological order. Definitions according to the revised PsAMRIS system are described in detail8. Briefly, synovitis was scored 0–3 at MCP, PIP, and DIP joints of the fingers. Bone erosions (0–10) and bone edema (0–3) were scored at proximal and distal regions of each joint (M1/M2 for bone proximal and distal to MCP joints, P1/P2 for bone proximal and distal to PIP joints, D1/D2 for bone proximal and distal to MCP joints). Periarticular inflammation was graded as 0 or 1 at the dorsal and volar aspects of each MCP, PIP, and DIP joint, and bone proliferation was scored 0 or 1 at each joint. Finally, flexor tenosynovitis was scored 0–3 at each joint.
Statistics
Three different reliability statistics were calculated to evaluate the reliability of PsAMRIS: the intraclass correlation coefficient (ICC), the smallest detectable difference (SDD), and the smallest detectable difference percentage (%SDD). Specifically, we used the random-effects average-measures ICC. The SDD was calculated by multiplying the square root of the mean residual error of repeated-measures analysis of variance by √2 and by 1.96 (MRE * √2 * 1.96), then dividing the results by the square root of the number of readers (√k)9. A %SDDmax and %SDDmean were calculated by dividing the SDD for the change score by the maximum and mean value and the result was expressed as a percentage10. The %SDD is a relative statistic. It facilitates a comparison of SDD (an absolute statistic) across the different parameters, in a manner similar to the ICC. The %SDDmean is a more robust statistic as it is less influenced by outliers. In the cross-sectional study the ICC, SDD, and %SDD were calculated for status scores. In the longitudinal study, these statistics were calculated for both status (timepoint A, timepoint B) and change (i.e., difference between timepoint A and timepoint B scores), for each parameter. The statistical programs used were Stata v10 and SPPS v15.
RESULTS
Part 1: OMERACT PsA MRI Exercise 2: Testing PsAMRIS in a cross-sectional setting
Table 1 shows average-measures interreader ICC for all components of the score. Reliability was high for all components (ICC 0.84–0.91) apart from periarticular inflammation, where it was low (ICC = 0.25). The SDD are presented as a percentage of the mean of the maximal score range, where this maximum was taken from the actual readers’ scores rather than being the potential maximum for each feature.
Part 2: OMERACT PsA MRI Exercise 3: Testing PsAMRIS in a longitudinal setting
A total of 10 paired sets of MRI scans from patients with PsA were scored by 8 readers at the 2nd–5th MCP, 2nd–5th PIP, and 2nd–5th DIP joints. Status scores were obtained at timepoint A (Table 2) and again at timepoint B after 6 weeks of anti-TNF therapy (see Materials and Methods). There was no significant difference between readers in scoring any of the parameters at either timepoint. There was a fall in the number of tender and swollen finger joints on the MRI-scanned side (MCP, PIP, DIP of fingers 2–5, maximum possible score of 12) from 1.9 and 4.9, respectively, at timepoint A to 0.2 and 1.0, respectively, at timepoint B. Interestingly, despite the changes in clinical markers of inflammation, MRI scores for all components of PsAMRIS did not change significantly, and there was no difference between readers in terms of assessing MRI change.
In this exercise, ICC for global status scores were moderate (> 0.7 at timepoint A) for all parameters (Table 3). ICC for global change scores were also moderate (> 0.6) for synovitis, tenosynovitis, periarticular inflammation, and bone erosion; and low/not measurable for bone edema and bone proliferation. For synovitis and tenosynovitis, ICC were comparable at MCP, PIP, and DIP joint levels (0.58–0.81). However, for periarticular inflammation, bone edema, bone erosions, and bone proliferation, ICC were markedly higher at the MCP joints (0.72, 0.64, 0.65, and 0.89, respectively) versus PIP joints (−0.77 to 0.52) and DIP joints, where scores were often unobtainable (Table 4).
DISCUSSION
This article summarizes results from 2 multireader exercises (OMERACT PsA MRI exercises 2 and 3) undertaken by the OMERACT MRI inflammatory arthritis working group, testing the performance characteristics of the recently developed PsAMRIS system in patients with peripheral PsA. The first version of this system was tested in exercise 17 and was modified following a review of definitions and elimination of features where reliability was low8.
Cross-sectional testing of the new version of PsAMRIS was performed in exercise 2 by 8 readers, and subsequently the score was tested longitudinally in exercise 3 over 6 weeks in a cohort of patients receiving anti-TNF therapy. Analysis of the cross-sectional study showed that reliability in terms of ICC ranged from moderate to high for all parameters measured, except periarticular inflammation. These results were encouraging given that readers had variable levels of experience, and were scoring scans separately in different institutions, using a variety of platforms without specialized radiology imaging software. The %SDDmean scores were also relatively good at 17%–32% for all readers, and compare very favorably with reliability studies of swollen joint count, tender joint count, and patient self-report pain and function, where %SDDmax of 65%, 56%, 53%, and 22% have been reported in patients with rheumatoid arthritis (RA)10.
These data would seem to predict that PsAMRIS would also perform well in measuring change in a longitudinal setting, and the results of the longitudinal exercise did show satisfactory performance in terms of reliability and sensitivity to change, as moderate ICC were observed for synovitis, tenosynovitis, and periarticular inflammation (0.77, 0.75, and 0.70, respectively). Further, the %SDDmean and %SDDmax were also satisfactory for all features except bone edema and bone proliferation. Indeed, the %SDDmax almost approached values seen in reliability studies of radiographic progression in RA using the van der Heijde modified Sharp score, a well established measure of radiographic damage (%SDDmax 15%–21%)10.
However, cross-study comparisons of reliability need to take into account the many factors that may influence results, including the expertise of readers as well as the spectrum of disease activity and joint damage, and so can be used only as an approximate guide. Further, it should be noted that there was very little absolute change in MRI synovitis scores between timepoints A and B, with 3 readers noting a slight improvement, 3 readers noting no change at all, and 2 readers noting slight worsening. The picture for tenosynovitis was very similar. Thus, further testing of PsAMRIS is required in a setting where MRI synovitis and tenosynovitis do change substantially, before it can be concluded that there is adequate reliability for detecting change in these disease activity measures.
The agreement between readers for change in bone erosion was only moderate, and agreement for bone proliferation was poor/untestable. These features are unlikely to change over a 6-week period; therefore further testing of PsAMRIS is required using a different dataset where there is a longer interval between scans, to determine whether a change in the damage score can be identified using this instrument. Similarly, reliability for bone edema change could not be quantified using the available data. This feature is also important to follow in PsA trials of bDMARD, as it was reported by one group to fall in all patients treated with infliximab/methotrexate11, while another group found a much more variable bone edema response to adalimumab that was not concordant with clinical improvement12.
An important finding was that some aspects of the score performed differently depending on which joints were being assessed. Measuring bone erosion, bone edema, bone proliferation, and periarticular inflammation was difficult at the PIP and especially the DIP joints, where axial images were often uninterpretable because of the small size of the region being imaged. If confirmed in future studies, consideration may need to be given to modifying PsAMRIS so that only synovitis and tenosynovitis are included for these joints, just as the Sharp-van der Heijde scoring system for scoring rheumatoid bone damage at the wrist omitted measuring joint space narrowing at some of the intercarpal joints because of poor visualization on plain radiographs13. It was encouraging that synovitis and tenosynovitis performed relatively well at all the joint regions tested, as inclusion of DIP synovitis is important given that this is a disease-specific feature14.
Scoring reliability was low for periarticular inflammation using this version of PsAMRIS in the cross-sectional exercise 2 (reported here), and the previously reported exercise 17, but improved markedly in the longitudinal exercise 3. Readers in exercise 1 reported difficulty recognizing this feature, and the problem was addressed by redefining its characteristics during the group’s Barcelona workshop in June 20078, and further discussions/reader training at a group meeting in November 2007, between exercises 2 and 3. However, it is important to note that some variability may have been introduced into the data because readers used different personal computers/workstations and software packages for scoring the images.
Further, readers sometimes used different MRI sequences to assess the individual pathologies. For example, all readers except 2, in exercise 3, used the 2-D fat-suppressed T1w sequence for assessing synovitis, whereas the remaining 2 readers used the 3-D non-fat-suppressed T1w sequence. Standardization of sequences for assessment of the individual parameters and more reader training would be expected to improve scoring reliability further for this, and other, components of PsAMRIS. Further, both exercises had many more raters (n = 8) than most published interobserver reliability studies and, although we accounted for this statistically by using the average ICC and dividing the SDD by the square root of the number of raters as recommended by Bruynesteyn, et al15, our next exercise will involve a smaller number of more highly-trained readers. Standardization will also be facilitated by conducting a further training exercise prior to scoring, and the use of identical workstations with high-resolution monitors.
In summary, PsAMRIS is emerging as a valid instrument for the measurement of PsA inflammation and damage. Thus far it has been tested only in patients with relatively mild disease and clinical involvement of the fingers. Testing in patients with more advanced disease is planned, as is testing in a longitudinal setting with a greater interval between scans. Inclusion of other regions such as the feet or large-joint entheses may also be required to fully determine the burden of this diverse disease.
Acknowledgments
Radiographer Jakob Møller, Department of Radiology, Copenhagen University Hospital at Herlev, Copenhagen, Denmark, is acknowledged for acquiring the MR images used in this article.