Pearson correlation, Bland-Altman method and linear regression were used. Results: A positive correlation was found when analyzed the two methods of assessment of the disease activity in rheumatoid arthritis. A single mismatch was found (Pearson correlation and linear regression) at the level of the second metacarpal on the ventral and right side when compared the Doppler ultrasound versus the Grey scale score, but we presumed that this is due to the particularity of the individual outcome. The Bland-Altman method showed that the Grey scale of quantification overrated the scores versus the Color Doppler Ultrasound scale. Conclusions: The semi-quantitative Color Doppler method which assesses the intra-joint activity in rheumatoid arthritis patients was validated in our research to be further used. Even though none of the statistic methods used for validating the Doppler score in our study showed significant statistic differences between them, it is for the best to apply all of them for validating any intended method to be used in any study.
Rheumatoid arthritis (RA) is a complex disease with multiple faces, one of them being its activity. At the moment, there are multiple-choice tools to assess the activity of the disease, none of them gaining the complete status of "gold standard". In the evaluation of the disease activity biological and non-biological methods are used and interposed. The methods are more or less subjective, thus the validation of measurements in RA is highly relevant. The assessment of the validity is divided into three categories: the
content validity, the criterion validity and the construct validity. All categories are mutually related. The definitions of the validity's categories are not totally stringent. The content validity known as the face validity follows the requirement of the measurement to expose all the important aspects of the disease assessment. In RA, one of the basic and important aspect is the synovial inflammation, thus the blood flow is increased. Corroborating those data with the presence of angiogenesis it can be easy speculated
on the usefulness of Color or Power Doppler ultrasound assessment - the ultrasound Doppler reflects the activity of the disease. The Grey-scale (the B mode) ultrasound emphasizes on synovial hypertrophy and erosions, so the content validity is present when combining the two methods. The criterion validity refers to the correlation of a measurement with an estimated test or a "gold standard" test for the assessment of a certain condition. An important part of the criterion validity is testing the accuracy of the method. The accuracy is defined as the assay precision, on what extend it is affected by random errors or systematic
errors. The criterion validity represents the objective, practical term of validity. In rheumatology, the construct validity is confounded with the criterion validity. It estimates the capacity of the assay to measure what theoretically is supposed to measure. The inflammatory and the destructive changes in RA can be visualised by ultrasound. Two types of US modes are used: the grey scale and the Doppler US. The grey scale is capable of depicting the morphological changes. The Doppler US displays blood flow in the tissues. The increased blood flow is a part of the inflammatory process, thus the amount of Doppler activity can be an indirect measure of inflammation [1,2,3]. The grey scale is able to show us a synovial hypertrophy and erosions. There are a number of definitions for US pathology, including the OMERACT one. The scale used for B imaging was a semi-quantitative one. The semi-quantitative scale was grade from 0 to 3 as followed: 0 - normal or no synovial thickening/effusion, 1 - mild thickening/effusion, 2 - moderate thickening/effusion, 3 - intense thickening/effusion [4-9]. On the monitor, in the grey scale the tissues are marked in different grey tones, the different nuances of grey representing the reflective ability of the tissue. The bones are good reflectors and their surfaces are shown in white, and the synovial fluid is shown in black, as a lack of reflection on the monitor. An echo is generated when a US beam is crossing two tissues with different acoustic impedance. The acoustic impedance (Z) is the results of the density multiplied by the speed of the sound (= Z) and the reflective ability depends on it. The US Doppler is able to differentiate between a thickened synovium with inflammation and thickened synovium due to previous inflammatory attacks (no Doppler activity) [10,11]. In the semi-quantitative US Doppler score the assessment is done on a subjective basis by the percentage of colour pixels in ROI (region of interest). The 4 grade semiquantitative scoring system is the following: 0 - no Doppler signals/no blood flow, 1 - single Doppler signals/mild blood flow, 2 - various, confluent Doppler signals/moderate blood flow, 3 - confluent Doppler signals with more than half of the visible synovium showing Doppler signals/ intense blood flow. A Phillips HD7 (High Definition) ultrasound machine was used. The ultrasound exam was performed with a linear array transducer with variable frequency from 3 to 12 MHz (L12-3/38mm, HD7, Phillips, Bothell, WA, USA). The longitudinal scan was used according with the standards imposed in the musculoskeletal ultrasound guidelines for rheumatology. The patient was positioned in a proper way in order to make the examination comfortable for both the subject and the operator, regardless of the examined anatomical region to ensure as little pressure as possible from the transducer (e.g. the joints were assess in an extended position with the comfortable resting of the scanned extremity). The angle of the ultrasound beam must be perpendicular to the examined structure; due to the fact the artefacts are easily reproducible in the musculoskeletal ultrasound. The gain was set to a colour per unit (CPU) of 82%, which was selected by manual elevation of the Doppler US gain level until the colour box was almost uniformly filled with the first indication of colour and with the only the minimum of the next highest signal just beginning to appear. Our prospective study included a number of 66 patients diagnosed with rheumatoid arthritis. The inclusion criteria were: subjects diagnosed with RA according to the new 2010 EULAR/ACR criteria for RA, clinically medium-high activity (DAS28 >3.2) RA patients with nonresponse or low synovial activity measured by Power Doppler ultrasound (OMERACT criteria), subjects without cardiovascular and metabolic diseases known/recorded, subjects who consent to the study, subjects aged between 18-75 years. The patients were assessed with the B-mode and consequently Color Doppler ultrasound score system. The joints examined were the second and the third metacarpal (MCP) and proximal interphalangeal joints (PIP) on the dorsal and ventral side. Two thousand twelve images were stored and evaluated on both scales. The statistical analysis of the date was performed with GraphPad Prism 5.0 software (San Diego, California). In order to assess the validity of the ultrasound methods used in the study as non-biological markers for the activity of the rheumatoid arthritis the Pearson correlation, the Bland-Altman method and the linear regression were used. A p <0.05 was considered to be statistically significant. Grey scale images for MCP and PIP for ventral and dorsal approach are depicted in figures 1-4. Corresponding Color Doppler images are presented in figures 5, 6.
The ultrasound measurements by Grey scale and Color Doppler scale were analyzed applying the Pearson's correlation, in order to acknowledge the differences (if it exists) between the two non biological methods chosen to measure the activity of the disease (rheumatoid arthritis) and to validate the use of only one method over the other (in our case the Color Doppler method). A positive correlation was found on the second metacarpal joint on the dorsal side, between the semi-quantitative methods chosen for assessment (the Grey scale vs. the Color Doppler scale) (p = 0.0252, 95% CI: 0.03581 to 0.4852) (Figure 7). This correlation was found in all the joints (p<0.05) on both sides examined with one exception - it didn't apply to the ventral side (p: 0.1144, 95% CI: -0.04484 to 0.4054) on the second metacarpal joint on the right limb (Figure 8). The only mismatch found was at the ventral level of the second MCP. We could only stipulate that that result is due to the activity of the disease and couldn't be taken as bias. A lot of the agreement studies published have shown that using the t test or the Pearson's correlation is flawed when measuring the agreement or detecting the bias. So the Bland-Altman method proposed graphical techniques in order to analyze method comparison and to validate studies. In medical research it is often required or needed to compare two methods of measurement (usually a new one versus the established method - the so-called "gold standard") to determine whether the methods can be used interchangeably or the new method can replace the "gold standard". The methods compared are known that each provides some errors in their measurements [12-17]. Bland and Altman first described the Bland - Altman method, in the eighties as a method of data plotting in analyzing the agreement between two different assays. The original Bland-Altman method was developed for two sets of measurements done on one occasion (independent data), and so this approach is not suitable for repeated measures data. In the latest years, the medical publication received increased numbers of studies using the Bland-Altman method (the method was cited in more than 11,500 articles till 2007) or similar ones (from 8% in 1986, 14% in 1995 ended at 31-36% in 2007) [12]. It is used a graphical method to plot the difference scores of two measurements against the mean for each subject and if the new method agrees in accepted terms with the established one, the first may be replaced. The Bland-Altman plot (popularised in medical statistics) is widely known as Tukey Mean-Difference Plot. The a-axis is the mean of the two measurements (best guess of the "right" result) and the y-axis is the difference between the two measurement differences. The plot is chosen according to the values of the difference. It may be possible to plot the difference, the ratio or the percent difference. If the difference gets larger as the average gets larger, it can make more sense to plot the ratio or the percent difference. The Bland-Altman plots are generally interpreted informally, without further analyses [10]. The chart highlights the certain types of anomalies in the assays - that one of the methods over-estimates high values and under-estimates low values. If the plots are scattered all over the place, above and below zero that there is no consistent bias of one approach versus the other - it can't exclude completely the hidden bias. Still, the Bland-Altman method remains a good first step for exploring the data [18,19, 20]. The key word in Bland-Altman method and other methods comparison is agreement. In the Bland-Altman method it is common to compute the limits of agreement specified as bias ï½± 1.96 SD (average differenceï½± 1.96 standard deviation of the difference) and the estimation of confident intervals for the bias. Those results are often omitted in research papers. The 95% limits of agreement are to visually judge the data on how the two methods agree. The smaller the range is, the better the agreement is. The size of the average discrepancy between methods (the bias) must be interpreted clinically. Whether the observed discrepancy is large enough or not is a clinical enquire not a statistical one. The next step is to question the existence of a trend (e.g. Does the differences between the two methods tend to get smaller or larger as the average increases?). The final step is to verify if the variability is consistent across the graph. A Bland-Altman method was applied at the start of the study in order to decide the best method to assess by ultrasound the activity of the disease in rheumatoid arthritis patients. The methods compared were the Grey-scale ultrasound versus the Doppler ultrasound in small joints on the dorsal and ventral part. The measurements at the level of the second metacarpal joint using the Grey scale method versus the Doppler scale method on the dorsal side after being submitted to the Bland - Altman method showed an over-estimated data by using the Grey scale versus Doppler scale, even if the bias are not high statistically (Bias: 0.3306, SD of bias: 0.5327, 95% limits of agreement: -0.7135 - 1.375). The plot used was the percent difference and ratio (Bias: -131.8, SD: 103.8, 95% limits of agreement: -335.3 - 71.65) (Figure 9, 10). The measurements at the level of the second metacarpal joint using the Grey scale method versus the Doppler scale method on the ventral side after being submitted to the Bland - Altman method showed an over-estimated data by using the Grey scale versus Doppler scale, even if the bias are not high statistically (Bias: 0.2077, SD of bias: 0.4621, 95% limits of agreement: -0.6981 - 1.13). The plot used was the percent difference and ratio (Bias: -148.8, SD: 101.1, 95% limits of agreement: -346.9 - 49.37) (Figure 11, 12). It was observed not only that data were similar to the dorsal measurements at this level, but the fact that the larger the average became the smallest the difference was. The general trend observed was the smallest the differences were the largest the averages were. The second general observation was the trend of the Grey-scale score to over-estimate values. All the trends could be applied generally. All the biases obtained (nearby zero) supply the idea that there is not a big difference between the two semiquantitative methods studied and applied. In 2000, Hopkins debated the flaws (the hidden bias) of the Bland-Altman's graphical method and proposed the use of simple linear regression applied between the two measurements. He emphasized on the nature on the random error in measurements provided by the instruments. The Bland and Altman group recommended the use of a log transformation of the data points [21]. Validating a method is important, so after performing the Bland-Altman method in order to avoid as much as possible the biases, the linear regression was applied to the measurements. The measurements applied at the second MCP on the dorsal side on the right showed a positive correlation (p <0.005). The results exerted from evaluating the same joint on the ventral side didn't show a positive correlation (p >0.005) (Figure 13, 14). The final results after using the logistic regression were similar with the two ones applied before (the Pearson correlation and the Bland-Altman method). The semi-quantitative Color Doppler method for assessing the intra-joint activity in rheumatoid arthritis patients was validated in our research and can be further used on. Even though all the three statistic methods used for the validation of the Doppler score in our study didn't show significant statistic differences between them, it is for the best to apply all of them in order to validate any intended method to be used in any other study.