Rasch Validation of the LVQOL Scale

Aim: This study proceeds to rigorously examine and validate the Low Vision Quality-of-Life Questionnaire (LVQOL) on a Greek population of ophthalmic patients employing Rasch measurement techniques. Methods: It is a prospective observational study of 150 cataract patients and 150 patients with other ophthalmic diseases, all followed longitudinally for a period of two months pending surgical or other corrective therapy, after which they were administered the LVQOL for a second time. Results: The original 25-item LVQOL demonstrated high reliability and validity, excellent measurement precision and ordered response category thresholds. A small number of items carry an acceptable level of measurement error while three items had some differential functioning for gender, Age and underlying disorder that did not exceed the established thresholds. Conclusions: This validation study is the first to employ Rasch measurement to examine the validity of the LVQOL and it supports its use with no changes to the original structure. The LVQOL can be employed in a large range of ophthalmic diseases and reliably assess improvements in quality-of-life following phacoemulsification surgery or any other intervention.


INTRODUCTION
The Low Vision Quality-of-Life Questionnaire (LVQOL) has been constructed by Wolffsohn et al (1) as an index of functional impairment designed to a vision-specific quality-of-life assessment tool that can reliable be utilized to measure the outcome of low-vision rehabilitation, help with evaluation of current rehabilitation strategy, eventually leading to the improvement of the offered services while securing and enhancing funding within managed care plans. It has been translated in several languages including Dutch (2), Spanish (3), Chinese (4), Thai (5), Korean (6) and Tamil (7), demonstrating excellent reliability and validity when examined with the classical test theory. A systematic review of vision-related quality of life questionnaires concluded that the LVQOL was one of the two best questionnaires for use in low vision patients (8).
The more modern approaches to assessing the psychometric properties of a quality-of-life instrument involve item response theory and particularly the Rasch model. Item response theory relates to the measurement of a latent construct from a number of items in a questionnaire, included parameters describing characteristics of the individual items and was built around the premise that respondents and items could be placed on the same quantitative latent continuum (9).
The item response theory "Rasch" model is a probabilistic mathematical method that has been employed to assess the psychometric properties of an instrument and its' measurement quality against an established framework of precision criteria (10), while transforming ordinal test responses into interval-level scores thus reducing measurement noise, increasing precision and statistical power to test the hypotheses with a smaller sample size. Rasch models have become a method of choice for examining the validity of an assessment instrument (11).
The only version of the LVQOL that has been tested with an item response theory approach is the Dutch one (12) and was deemed satisfactory although two items (item 1 and 24) were removed because of differential item functioning, meaning that the interpretation of the original questionnaire differed by subgroups of the research population and that it was influenced by some confounding factors.
The principal aim of this study is to assess a Greek version of the LVQOL using Rasch analysis and test its applicability in patients suffering from ophthalmic disease including cataract and other causes.

Selection and Description of Participants
This is a prospective observational study of 300 patients who were treated for vision problems in the outpatient services of the 2nd Department of Ophthalmology, Aristotle University of Thessaloniki. The patients were longitudinally followed for two months, during which they received appropriate treatment depending on their underlying disease. A full list of the underlying disorders for the full sample is presented in

TECHNICAL INFORMATION
All patients were initially handed out a brief demographics questionnaire that included information on their gender, age, marital status, living arrangements and comorbid health issues that necessitated continuous medical care. The patients were required to fill in the 25-item Low Vision Quality-of-Life Questionnaire (LVQOL), a vision-specific quality-of-life assessment tool designed to be used in a clinical setting in order to evaluate low-vision rehabilitation strategy and management (1). The LVQOL's 25 items are graded in a Likert-type scale ranging from five (having no difficulty with the item because of their vision) to one (having a great difficulty with the item because of their vision) or as 0 (item could no longer be performed because of their vision). Other options are the "not relevant", if the item in question was not relevant to a patient in his/her daily life; these answers are given an average score so that those individuals who scored more items irrelevant did not have a lower summed questionnaire score and therefore an apparently worse quality of life. The total summed score ranges between zero (a low quality of life) and 125 (a high quality of life). The LVQOL was measured twice, the second instance being two months after the initial one. In addition to the questionnaires, the patients' best corrected visual acuity was also measured pre-and post-surgery with the Early Treatment Diabetic Retinopathy Study (ETDRS) charts.

STATISTICS
The validation process started with translating the items from English to Greek by two medical doctors who are fluent in both languages and the initial draft was revised by a panel of experts for clarity. The draft was then translated back into English and compared with the original English version to identify any discrepancies between the two versions, which were then again revised by the panel. The final draft was then tested in twenty patients for comprehension before being handed out to the patient groups. All subsequent Rasch measurements were carried out with the aid of the statistical package Winsteps (13). Five assumptions and properties of the model were examined to assess the validity of the Greek version of the LVQOL with Rasch modeling (14,15) including:

Measurement Precision
Measurement precision refers to how the questionnaire performs as an instrument of measurement. It is estimated with the person and item separation statistics. Separation is the signal-to-noise ratio in the data. Person separation indicates how efficiently a set of items is able to separate those persons measured while item separation indicates how well a sample of people is able to separate those items used in the questionnaire. A low Person Separation Index (PSI) implies that the instrument may not be sensitive enough to distinguish between high and low performers and more items may be needed while a low Item Separation Index (ISI) implies that the person sample is not large enough to confirm the construct validity of the instrument (15). A PSI of 1.5 represents an acceptable level of separation, an index of 2.00 represents a good level of separation, and index of 3.00 represents an excellent level of separation (16). A PSI > 2.0 and a person reliability (PR) score >0.8 are generally considered to be the minimum requirements for satisfactory discrimination of at least three strata of participants level of the trait (i.e., vision functioning) (14,15).

Unidimensionality
Unidimensionality is prerequisite for construct validity since it refers to whether a questionnaire measures only a single underlying trait (i.e., visual functioning) and it is assessed in Rasch measurement by examining the item fit statistics and with a principal component analysis (PCA) of the residuals. Item fit relates to how well the responses meet the test requirements and ultimately how well the items fit the construct. The item fit statistics are expressed in mean square statistics and there are two types of fit statistics, infit and outfit (15). According to established criteria (17), mean fit values ranging between 0.5-1.5 are productive for measurement, values over 1.5 are unproductive for construction of measurement, but not degrading, values under 0.5 are less productive for measurement, but not degrading and values over 2 denote an item that distorts or degrades the measurement system. To test for local independence the method of choice is the conduct of a PCA of the residuals, a process in which we scan for patterns in the part of the data that does not accord with the Rasch measures. If this is the case then there is a possibility that a secondary dimension is present that may distort measurement and the unidimensionality assumption is violated. When 60% of the variance in the PCA of the residuals is explained by the raw data then this is an indication of unidimensionality since there is little noise to form a pattern (14). Residuals in PCA are grouped in contrasts and if the first contrast has an eigenvalue of >2.0 then this is considered as evidence that a second contrast is being measured by the questionnaire (14).

Category Threshold Order
The response categories for the items in a questionnaire should ideally be used in an orderly fashion; this requires that the category definitions are clear and distinct to one another and the number does not exceed the range that the respondents can distinguish or is smaller than the nuances of the category that we are trying to ascertain (18). If there is disordering, then some answers are significantly more likely than others or even unlikely.

Targeting
Targeting refers to how far the average or modal measure is from the center of the item calibrations, denoting how persons of higher or lower ability (i.e. visual functioning) will be able to relate to the items that are offered and respond meaningfully (19). Perfect targeting would have a difference in means equal to zero logits and poor targeting over two logits while a value between 0.5 and 1 logit indicates very good targeting (20). A person-item map visualizes targeting by placing on two sides of the same continuous line the participant scores on the Rasch-calibrated questionnaire and the relative difficulty of each of the questionnaire items, showing whether the items adequately cover the range of person ability and whether there is any overlap in questions.

Differential Item Functioning
Differential item functioning (DIF) indicates whether subgroups are responding in a different pattern than the rest of the sample despite having equal levels of the assessed trait (15). In order to ascertain clinically important differential item functioning two conditions had to be satisfied at the same time, a Welsh's test statistically significant p-value (P < .05) and a contrast value of >0.64 logits. If both conditions were satisfied it would indicate that the interpretation of the questionnaire differs by group and that it is influenced by confounding factor(s).

Comparative statistics
Gender differences on age and the LVQOL score were assessed with Mann-Whitney tests. The difference in LVQOL scores pre and post operation was assessed with a paired samples t-test. All comparative statistics were calculated using the SPSS statistical package, version 25.

Measurement Precision and Unidimensionality
In our sample the LVQOL questionnaire had a PSI = 4.43 and a PR = 0.95 which are excellent values. There were however four items exhibiting a mean square statistic (MNSQ) higher than 1.5 but none higher than 2. Those four items were item 16, item 10, item 9 and item 11. The PCA had had 62.1% of raw variance explained by the measures and the unexplained variance by the first contrast of the residuals was 1.82 eigenvalue units. The reliability of the LVQOL is assessed with two measurements, Cronbach alpha's coefficient equals 0.959, while the more accurate Rasch measurement methodology offers a model reliability upper estimate of 0.99 and a 'real' reliability lower estimate of .95. In every case, reliability of the LVQOL is excellent. Figure 1 presents the probability of a specific response selection after one considers the item being answered plotted against person item measure in logits, meaning the overall attitude measure of the respondent. Each LVQOL response category is most probable for some combination of person measures and item measures, with an increased probability for the first and last response categories, depending on the person item measure.

TARGETING
The LVQOL had excellent targeting, with a difference between the person and item means on the person-item map equal to 0.06.

PERSON-ITEM MAP
The person-item map in Figure 2 displays the participant scores on the Rasch-calibrated questionnaire and the relative difficulty of each of the questionnaire items. On the left side of each Wright Map the mean (M) and two standard deviation points (S = one SD and T = two SD) are shown for each patient's vision functioning. Participants with the highest level of vision-related quality of life are located at the top of the figure while those with the lowest are found at the bottom. On the right side of the map, the mean difficulty of the items (M) and two standard deviation points (S = one SD and T = two SD) for the items are shown, where 'mean difficulty' refers to the mean possibility of answering positively the item, an item being 'more difficult' when less participants answer it positively. Items are grouped between the range of +1 to −1 SD from mean ability denoting that there is less discriminating ability for persons of high or low visual functioning, although there is spacing between the items indicating little redundancy. Item 16 appears to break the pattern being situated at more than 2 SD below mean patient ability. The mean ability of the patients (M on the left side of the scale) is identical to the mean difficulty of the items (M on the right side of the scale) denoting an excellent level of item understanding.

DIFFERENTIAL ITEM FUNCTIONING
Differential item functioning for gender, age and underlying disorder was examined for the LVQOL. Gender was included because there are differences between the genders with regards to the usual activities that they perform and value the most, hence it is possible that they would place a differential emphasis on the items of the questionnaire that are more closely related to their everyday needs. Age has a direct impact on visual functioning but also the activities that the patients are expected to perform since the higher the age, the higher the chance of comorbid disease that limits general functionality. We divided the sample into two subsamples for this DIF analysis, those patients up to and including 70 years of age, since they comprised one third of the total sample and those aged over 70. Table 3 presents the summary of the examination of the LVQOL items for differential item functioning by gender, age and disorder (cataract or other). Results indicated a number of instances where items met the statistical significance for differential functioning (Welch's test p-value P < 0.05), especially when examining differential item functioning by disorder but in every case the contrast effect size was lower than .64 denoting that the difference in functioning between the subgroups was negligible. These items were item 11 and 13 for gender, items 5, 16 and 24 for age and items 4, 6, 7, 10, 16, and 23 for the underlying disorder.     Table 4 presents the results from the application of the LVQOL per disease and per gender, pre and post treatment. Results indicate that the cataract group had statistically significantly lower quality of life than the combined group of other diseases before treatment (Mann-Whitney Z = 2.096, P = 0.036), while the increase in quality-of-life post treatment led to it being significantly higher post treatment Mann-Whitney Z = 3.55, P < 0.001. There was no difference in quality of life between the genders or between those older than 70 and younger than 70 years of age (P > 0. 05). When comparing the cataract group to the other diseases group, there was a significant difference in the increase of visual functioning post treatment for the cataract group (Mann-Whitney Z = 5.5, P < 0.001) that coincided with the relative increase in quality of life (Mann-Whitney Z = 6.479, P < 0.001). Apparently, the larger gain from treatment for cataract patients leads to a direct increase in their vision related quality of life that surpasses that of patients with other ophthalmic diseases.
We examined the difference in LVQOL scores pre-and post-surgery in the cataract patients' group, assuming that a corrective surgery would carry a positive effect onto the visual functioning of the patient so as to test content validity. A paired-sample t-test returned a statistically significant difference between quality-of-life pre-and post-cataract surgery assessed with the LVQOL, t (149) = 13.238, P < 0.001. In order to ascertain concurrent validity, we examined the correlation between the scores on the LVQOL and the visual acuity pre-and post-surgery. Results indicated that the LVQOL score after surgery correlated with the improvement between visual acuity pre-and post-surgery, Spearman's rho r(s) = 0.681, P <0.001.

DISCUSSION
The examination of the LVQOL questionnaire with Rasch measurement demonstrated high reliability and validity with a small number of items that carry an acceptable level of measurement error. Those were items 16, 10, 9 and item 11. Item 16 in particular appeared to be separate from the grouping of items in the Wright map; this item queries the subjects as to how well their eye condition been explained to them. This is unrelated to the visual impairment per se, and it is detached from the other group of items in the questionnaire itself. However, it does provide useful input for the researcher and the clinician as to a possible source of vision-related anxiety and skipping it would impoverish the trove of information that the LVQOL provides. Since the item's MNSQ did not exceed the two-unit threshold the decision was made to retain it in the Greek version of the LVQOL. This item had statistically significant differential item functioning for age and underlying disorder, denoting that older patients with more complex diseases may require extra assistance in understanding their predicament. Obviously, this information would not be available now for the clinician if the item was omitted from the questionnaire. Items 9, 10 and 11, also had an MSNQ higher than 1.5 but lower than 2; these questions relate to depth of vision (item 9) and moving outside on the street unaided without being hindered by small obstacles (items 10 and 11). These items could be alternatively consolidated into a single item in future research, however changing the structure of a questionnaire that has been widely employed worldwide in this specific form has the minus of reducing comparability between different studies.
A limitation of this study is the non-stratified sample that cannot be considered representative for the general Greek patient population; however, there is no reason to assume that the study population may differ to a significant extent from the average population examined in the outpatient services of an ophthalmic department. Also, the differences between diagnoses have been considered with cataract as the main diagnosis and the other diagnoses considered as a single group, due to their lower number. Future studies can replicate the findings in other diagnoses with larger sample sizes for each diagnosis.
These limitations however are offset by the validation process of this study. The examination of the differential item functioning by gender, age and disorder was essential practice in order to demonstrate that the LVQOL is reliable and valid across genders, age range and underlying eye disorder. Our sample had two large subpopulations; the first one was indicative of the demographics of cataract patients that are referred for phacoemulsification since these patients were consecutive and not selected with bias. The second subpopulation was comprised from patients with twelve distinct eye disorders that can lead to low vision, their relative frequency is indicative of how common they are compared to one another among patients who are referred to outpatient services. There was no difference in the validity of the LVQOL among these patient groups; this finding has not been tested in other cultural validations so far in the literature and is unique to this study.
Cataract phacoemulsification surgery offers immediate positive results to the patient and it was expected that the improvements in quality-of-life, as was the case here. The magnitude of improvement in visual acuity determined the improvement in quality-of-life as well. Successful cataract surgery leads to beneficial results in the patient's quality of life and the documentation of this fact with a valid questionnaire, as the LVQOL, can be very important for the funding of ophthalmological departments in the new era of managed care. The introduction of modern surgical equipment and related procedures can be thus justified in practical terms of overall patient improvement and satisfaction. The importance of cataract surgery in particular is demonstrated in a recent study where the current COVID-19 pandemic did not affect patients' decision to visit a hospital for cataract surgery (21). Any decisions to limit the provision of these important patient services should therefore be weighted against the considerable benefits that they infer to the patients' lives.

CONCLUSION
In conclusion, our validation study has resulted in the acceptance of the Greek version of the LVQOL as a valid research tool. The employment of the Rasch measurement model has resulted in identification of a number of items that are not ideally suited for the questionnaire yet do not degrade the measurement system and have been retained for their clinical value and compatibility with other versions of the questionnaire that have been employed worldwide. The LVQOL can be employed in a large range of ophthalmic diseases and reliably assess improvements in quality-of-life following phacoemulsification surgery.