In item response theory (IRT), the scaling constant D = 1.7 is used to scale a discrimination coefficient a estimated with the logistic model to the normal metric. Empirical verification is provided that Savalei’s?[1...In item response theory (IRT), the scaling constant D = 1.7 is used to scale a discrimination coefficient a estimated with the logistic model to the normal metric. Empirical verification is provided that Savalei’s?[1] proposed a scaling constant of D = 1.749 based on Kullback-Leibler divergence appears to give the best empirical approximation. However, the understanding of this issue as one of the accuracy of the approximation is incorrect for two reasons. First, scaling does not affect the fit of the logistic model to the data. Second, the best scaling constant to the normal metric varies with item difficulty, and the constant D = 1.749 is best thought of as the average of scaling transformations across items. The reason why the traditional scaling with D = 1.7 is used is simply because it preserves historical interpretation of the metric of item discrimination parameters.展开更多
Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content...Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content. The objective of this study was to compare the combined measurement properties of two fatigue scales, the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) and SF-36 vitality (VT) scale using item response theory (IRT). Methods: The FACIT-Fatigue and SF-36v2 were administered at baseline and weeks 2, 4, 7, 12, and 16 to rheumatoid arthritis (RA) patients (n = 237) enrolled in a 52-week multicenter, randomized, double-blind, placebo-controlled, parallel-group, dose finding study to evaluate the efficacy and safety of subcutaneous secukinumab administered to pa- tients with active RA. Confirmatory factor analysis (CFA) was used to investigate unidimensionality among FACIT- Fatigue and VT items. A generalized partial credit IRT model was used to cross-calibrate the FACIT-Fatigue and VT items and weighted maximum-likelihood estimation was used to score a composite fatigue index. Analysis of variance was used to compare the composite fatigue index with the original scales in responding to ACR improvement and treatment effects. Results: CFA found less than adequate fit to a unidimensional model. However, specifications of alternative multidimensional models were insufficient in explaining the common variance among items. An IRT model was successfully fitted and the composite fatigue index score was found to be more responsive than the original scales to ACR improvement and treatment effects. Effect sizes and significance tests for changes in scores on the composite index were generally larger than those observed with the original scales. Conclusion: IRT methods offer a promising approach to combining items from different scales measuring the same concept that could improve the detection of treatment effects in clinical studies of RA.展开更多
This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of ...This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of variety simulation conditions. The consistency between the mean of individual-level ability estimates and cluster-level ability estimates was evaluated by the correlations between them. As a result, it was found that they were highly correlated if the patterns of item discriminations were the same for both individual and cluster levels. The magnitudes of item discriminations themselves did not affect much on correlations, as far as the patterns were the same at the two levels. However, it was found that the correlation became lower when the patterns of item discriminations were different between the individual and cluster levels. Also, it was revealed that the mean of the estimated individual-level abilities would not be necessarily a good representation of the cluster-level ability, if the patterns were different at the two levels.展开更多
Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods t...Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods to make prediction behaviors are based upon observed data, the design model must overcome computation challenges, but also consideration towards calibration and proficiency estimation. The IRT model deems to be offered those latter options. We review that model and apply it to an observational survey data. We then compare the findings with the more popular weighted logistic regression. Method: Apply IRT model to the observed data from 136 sites within the Commonwealth of Virginia over five years collected in a two stage systematic stratified proportional to size sampling plan. Results: A relationship within data is found and is confirmed using the weighted logistic regression model selection. Practical Application: The IRT method may allow simplicity and better fit in the prediction within complex methodology: the model provides tools for survey analysis.展开更多
文摘In item response theory (IRT), the scaling constant D = 1.7 is used to scale a discrimination coefficient a estimated with the logistic model to the normal metric. Empirical verification is provided that Savalei’s?[1] proposed a scaling constant of D = 1.749 based on Kullback-Leibler divergence appears to give the best empirical approximation. However, the understanding of this issue as one of the accuracy of the approximation is incorrect for two reasons. First, scaling does not affect the fit of the logistic model to the data. Second, the best scaling constant to the normal metric varies with item difficulty, and the constant D = 1.749 is best thought of as the average of scaling transformations across items. The reason why the traditional scaling with D = 1.7 is used is simply because it preserves historical interpretation of the metric of item discrimination parameters.
文摘Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content. The objective of this study was to compare the combined measurement properties of two fatigue scales, the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) and SF-36 vitality (VT) scale using item response theory (IRT). Methods: The FACIT-Fatigue and SF-36v2 were administered at baseline and weeks 2, 4, 7, 12, and 16 to rheumatoid arthritis (RA) patients (n = 237) enrolled in a 52-week multicenter, randomized, double-blind, placebo-controlled, parallel-group, dose finding study to evaluate the efficacy and safety of subcutaneous secukinumab administered to pa- tients with active RA. Confirmatory factor analysis (CFA) was used to investigate unidimensionality among FACIT- Fatigue and VT items. A generalized partial credit IRT model was used to cross-calibrate the FACIT-Fatigue and VT items and weighted maximum-likelihood estimation was used to score a composite fatigue index. Analysis of variance was used to compare the composite fatigue index with the original scales in responding to ACR improvement and treatment effects. Results: CFA found less than adequate fit to a unidimensional model. However, specifications of alternative multidimensional models were insufficient in explaining the common variance among items. An IRT model was successfully fitted and the composite fatigue index score was found to be more responsive than the original scales to ACR improvement and treatment effects. Effect sizes and significance tests for changes in scores on the composite index were generally larger than those observed with the original scales. Conclusion: IRT methods offer a promising approach to combining items from different scales measuring the same concept that could improve the detection of treatment effects in clinical studies of RA.
文摘This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of variety simulation conditions. The consistency between the mean of individual-level ability estimates and cluster-level ability estimates was evaluated by the correlations between them. As a result, it was found that they were highly correlated if the patterns of item discriminations were the same for both individual and cluster levels. The magnitudes of item discriminations themselves did not affect much on correlations, as far as the patterns were the same at the two levels. However, it was found that the correlation became lower when the patterns of item discriminations were different between the individual and cluster levels. Also, it was revealed that the mean of the estimated individual-level abilities would not be necessarily a good representation of the cluster-level ability, if the patterns were different at the two levels.
文摘Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods to make prediction behaviors are based upon observed data, the design model must overcome computation challenges, but also consideration towards calibration and proficiency estimation. The IRT model deems to be offered those latter options. We review that model and apply it to an observational survey data. We then compare the findings with the more popular weighted logistic regression. Method: Apply IRT model to the observed data from 136 sites within the Commonwealth of Virginia over five years collected in a two stage systematic stratified proportional to size sampling plan. Results: A relationship within data is found and is confirmed using the weighted logistic regression model selection. Practical Application: The IRT method may allow simplicity and better fit in the prediction within complex methodology: the model provides tools for survey analysis.