Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content...Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content. The objective of this study was to compare the combined measurement properties of two fatigue scales, the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) and SF-36 vitality (VT) scale using item response theory (IRT). Methods: The FACIT-Fatigue and SF-36v2 were administered at baseline and weeks 2, 4, 7, 12, and 16 to rheumatoid arthritis (RA) patients (n = 237) enrolled in a 52-week multicenter, randomized, double-blind, placebo-controlled, parallel-group, dose finding study to evaluate the efficacy and safety of subcutaneous secukinumab administered to pa- tients with active RA. Confirmatory factor analysis (CFA) was used to investigate unidimensionality among FACIT- Fatigue and VT items. A generalized partial credit IRT model was used to cross-calibrate the FACIT-Fatigue and VT items and weighted maximum-likelihood estimation was used to score a composite fatigue index. Analysis of variance was used to compare the composite fatigue index with the original scales in responding to ACR improvement and treatment effects. Results: CFA found less than adequate fit to a unidimensional model. However, specifications of alternative multidimensional models were insufficient in explaining the common variance among items. An IRT model was successfully fitted and the composite fatigue index score was found to be more responsive than the original scales to ACR improvement and treatment effects. Effect sizes and significance tests for changes in scores on the composite index were generally larger than those observed with the original scales. Conclusion: IRT methods offer a promising approach to combining items from different scales measuring the same concept that could improve the detection of treatment effects in clinical studies of RA.展开更多
This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of ...This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of variety simulation conditions. The consistency between the mean of individual-level ability estimates and cluster-level ability estimates was evaluated by the correlations between them. As a result, it was found that they were highly correlated if the patterns of item discriminations were the same for both individual and cluster levels. The magnitudes of item discriminations themselves did not affect much on correlations, as far as the patterns were the same at the two levels. However, it was found that the correlation became lower when the patterns of item discriminations were different between the individual and cluster levels. Also, it was revealed that the mean of the estimated individual-level abilities would not be necessarily a good representation of the cluster-level ability, if the patterns were different at the two levels.展开更多
Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods t...Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods to make prediction behaviors are based upon observed data, the design model must overcome computation challenges, but also consideration towards calibration and proficiency estimation. The IRT model deems to be offered those latter options. We review that model and apply it to an observational survey data. We then compare the findings with the more popular weighted logistic regression. Method: Apply IRT model to the observed data from 136 sites within the Commonwealth of Virginia over five years collected in a two stage systematic stratified proportional to size sampling plan. Results: A relationship within data is found and is confirmed using the weighted logistic regression model selection. Practical Application: The IRT method may allow simplicity and better fit in the prediction within complex methodology: the model provides tools for survey analysis.展开更多
儿童早期数学能力评估对数学能力的发展研究具有重要意义,研究修订了《早期数学能力评估工具》(Research-Based Early Math Assessment,REMA),并对其信度和效度进行检验.研究以上海市两所幼儿园313名儿童为研究对象,采用项目反应理论中...儿童早期数学能力评估对数学能力的发展研究具有重要意义,研究修订了《早期数学能力评估工具》(Research-Based Early Math Assessment,REMA),并对其信度和效度进行检验.研究以上海市两所幼儿园313名儿童为研究对象,采用项目反应理论中的Rasch模型检验REMA的信效度.结果表明,REMA的信度较好,基本为单一的能力维度结构,怀特图说明量表整体适合中高水平的被试,各个项目的内外适合度指标在0.5~1.5之间,符合Rasch模型,早期数学能力与数学学习品质呈中高水平相关(相关系数在0.34~0.61之间).研究表明,REMA量表具有良好的信效度,适合作为评估3~6岁学前儿童数学能力的有效工具.展开更多
This paper studies the technics of reducing item exposure by utilizing automatic item generation methods. Known test item calibration method uses item parameter estimation with the statistical data, collected during e...This paper studies the technics of reducing item exposure by utilizing automatic item generation methods. Known test item calibration method uses item parameter estimation with the statistical data, collected during examinees prior testing. Disadvantage of the mentioned item calibration method is the item exposure; when test items become familiar to the examinees. To reduce the item exposure, automatic item generation method is used, where item models are being constructed based on already calibrated test items without losing already estimated item parameters. A technic of item model extraction method from the already calibrated and therefore exposed test items described, which can be used by the test item development specialists to integrate automatic item generation principles with the existing testing applications.展开更多
In this article,we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm.In order to assign higher weights to the classifiers which can correctly classify hard-to-clas...In this article,we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm.In order to assign higher weights to the classifiers which can correctly classify hard-to-classify instances,we introduce the item response theory(IRT)framework to evaluate the samples′difficulty and classifiers′ability simultaneously.We assigned the weights to classifiers based on their abilities.Three models are created with different assumptions suitable for different cases.When making an inference,we keep a balance between the accuracy and complexity.In our experiment,all the base models are constructed by single trees via bootstrap.To explain the models,we illustrate how the IRT ensemble model constructs the classifying boundary.We also compare their performance with other widely used methods and show that our model performs well on 19 datasets.展开更多
目的探索老年人死亡前各认知维度障碍先后顺序、变化速度及影响因素,为老年人认知障碍的早期干预提供依据。方法基于1998-2018年中国老年健康影响因素跟踪调查中17538名老年人简易智力状态检查量表(mini mental state examination,MMSE...目的探索老年人死亡前各认知维度障碍先后顺序、变化速度及影响因素,为老年人认知障碍的早期干预提供依据。方法基于1998-2018年中国老年健康影响因素跟踪调查中17538名老年人简易智力状态检查量表(mini mental state examination,MMSE)数据,采用包含协变量的纵向项目反应理论模型估计MMSE中各项目的区分度、难度参数、5个维度认知障碍得分的变化速度及各协变量的回归系数。难度参数最小的项目所在维度即为老年人最先出现障碍的认知维度。结果MMSE中区分度参数最高的是注意力与计算力(0.938~1.537);注意力与计算力、回忆力、语言力、记忆力和定向力维度中最小的难度参数分别为-0.918、0.896、1.482、1.722和2.241,老年人认知障碍变化速度分别为0.028、0.011、0.007、0.004和0.001;女性、年龄大、受教育程度低的老年人认知障碍变化速度较快,农村老年人注意力与计算力、回忆力及记忆力的变化速度快于城市。结论我国老年人认知功能中注意力与计算力和回忆力障碍出现较早,且变化速度较快,尤其是女性、年龄大、受教育程度低及农村老年人。展开更多
目的应用经典测量理论(classical test theory,CTT)和项目反应理论(item response theory,IRT)对慢性肺源性心脏病生命质量测定量表[QLICD-CPHD(V2.0)]进行条目分析与评价。方法采用QLICD-CPHD(V2.0)量表对184名慢性肺源性心脏病患者进...目的应用经典测量理论(classical test theory,CTT)和项目反应理论(item response theory,IRT)对慢性肺源性心脏病生命质量测定量表[QLICD-CPHD(V2.0)]进行条目分析与评价。方法采用QLICD-CPHD(V2.0)量表对184名慢性肺源性心脏病患者进行调查,运用经典测量理论中的相关系数法、变异度法、因子分析法、克朗巴赫系数法对QLICD-CPHD(V2.0)量表的条目进行分析评定,同时运用项目反应理论中的Samejima等级反应模型计算量表每个条目的难度、信息量和区分度系数。结果CTT结果提示存在7个条目不满足至少3种的统计学要求,其中共性模块6个,特异模块1个。IRT结果显示条目区分度范围为1.18~1.44较为适宜。难度系数随难度等级(B1→B4)增加而单调递增,存在部分条目难度系数b超过标准值范围。各条目平均信息量分布在0.185~0.576。结论经CTT与IRT分析,QLICD-CPHD(V2.0)量表的大部分条目质量较高,具有较好的区分度,但仍有少量条目需进一步分析和修订。展开更多
文摘Background: Many clinical trials include multiple patient-reported outcomes (PROs) to measure fatigue as secondary or exploratory endpoints of treatment effectiveness. Often, these instruments have overlapping content. The objective of this study was to compare the combined measurement properties of two fatigue scales, the Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-Fatigue) and SF-36 vitality (VT) scale using item response theory (IRT). Methods: The FACIT-Fatigue and SF-36v2 were administered at baseline and weeks 2, 4, 7, 12, and 16 to rheumatoid arthritis (RA) patients (n = 237) enrolled in a 52-week multicenter, randomized, double-blind, placebo-controlled, parallel-group, dose finding study to evaluate the efficacy and safety of subcutaneous secukinumab administered to pa- tients with active RA. Confirmatory factor analysis (CFA) was used to investigate unidimensionality among FACIT- Fatigue and VT items. A generalized partial credit IRT model was used to cross-calibrate the FACIT-Fatigue and VT items and weighted maximum-likelihood estimation was used to score a composite fatigue index. Analysis of variance was used to compare the composite fatigue index with the original scales in responding to ACR improvement and treatment effects. Results: CFA found less than adequate fit to a unidimensional model. However, specifications of alternative multidimensional models were insufficient in explaining the common variance among items. An IRT model was successfully fitted and the composite fatigue index score was found to be more responsive than the original scales to ACR improvement and treatment effects. Effect sizes and significance tests for changes in scores on the composite index were generally larger than those observed with the original scales. Conclusion: IRT methods offer a promising approach to combining items from different scales measuring the same concept that could improve the detection of treatment effects in clinical studies of RA.
文摘This study attempted to interpret differential item discriminations between individual and cluster levels by focusing on patterns and magnitudes of item discriminations under 2PL multilevel IRT model through a set of variety simulation conditions. The consistency between the mean of individual-level ability estimates and cluster-level ability estimates was evaluated by the correlations between them. As a result, it was found that they were highly correlated if the patterns of item discriminations were the same for both individual and cluster levels. The magnitudes of item discriminations themselves did not affect much on correlations, as far as the patterns were the same at the two levels. However, it was found that the correlation became lower when the patterns of item discriminations were different between the individual and cluster levels. Also, it was revealed that the mean of the estimated individual-level abilities would not be necessarily a good representation of the cluster-level ability, if the patterns were different at the two levels.
文摘Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods to make prediction behaviors are based upon observed data, the design model must overcome computation challenges, but also consideration towards calibration and proficiency estimation. The IRT model deems to be offered those latter options. We review that model and apply it to an observational survey data. We then compare the findings with the more popular weighted logistic regression. Method: Apply IRT model to the observed data from 136 sites within the Commonwealth of Virginia over five years collected in a two stage systematic stratified proportional to size sampling plan. Results: A relationship within data is found and is confirmed using the weighted logistic regression model selection. Practical Application: The IRT method may allow simplicity and better fit in the prediction within complex methodology: the model provides tools for survey analysis.
文摘儿童早期数学能力评估对数学能力的发展研究具有重要意义,研究修订了《早期数学能力评估工具》(Research-Based Early Math Assessment,REMA),并对其信度和效度进行检验.研究以上海市两所幼儿园313名儿童为研究对象,采用项目反应理论中的Rasch模型检验REMA的信效度.结果表明,REMA的信度较好,基本为单一的能力维度结构,怀特图说明量表整体适合中高水平的被试,各个项目的内外适合度指标在0.5~1.5之间,符合Rasch模型,早期数学能力与数学学习品质呈中高水平相关(相关系数在0.34~0.61之间).研究表明,REMA量表具有良好的信效度,适合作为评估3~6岁学前儿童数学能力的有效工具.
文摘This paper studies the technics of reducing item exposure by utilizing automatic item generation methods. Known test item calibration method uses item parameter estimation with the statistical data, collected during examinees prior testing. Disadvantage of the mentioned item calibration method is the item exposure; when test items become familiar to the examinees. To reduce the item exposure, automatic item generation method is used, where item models are being constructed based on already calibrated test items without losing already estimated item parameters. A technic of item model extraction method from the already calibrated and therefore exposed test items described, which can be used by the test item development specialists to integrate automatic item generation principles with the existing testing applications.
文摘In this article,we propose a novel probabilistic framework to improve the accuracy of a weighted majority voting algorithm.In order to assign higher weights to the classifiers which can correctly classify hard-to-classify instances,we introduce the item response theory(IRT)framework to evaluate the samples′difficulty and classifiers′ability simultaneously.We assigned the weights to classifiers based on their abilities.Three models are created with different assumptions suitable for different cases.When making an inference,we keep a balance between the accuracy and complexity.In our experiment,all the base models are constructed by single trees via bootstrap.To explain the models,we illustrate how the IRT ensemble model constructs the classifying boundary.We also compare their performance with other widely used methods and show that our model performs well on 19 datasets.
文摘目的探索老年人死亡前各认知维度障碍先后顺序、变化速度及影响因素,为老年人认知障碍的早期干预提供依据。方法基于1998-2018年中国老年健康影响因素跟踪调查中17538名老年人简易智力状态检查量表(mini mental state examination,MMSE)数据,采用包含协变量的纵向项目反应理论模型估计MMSE中各项目的区分度、难度参数、5个维度认知障碍得分的变化速度及各协变量的回归系数。难度参数最小的项目所在维度即为老年人最先出现障碍的认知维度。结果MMSE中区分度参数最高的是注意力与计算力(0.938~1.537);注意力与计算力、回忆力、语言力、记忆力和定向力维度中最小的难度参数分别为-0.918、0.896、1.482、1.722和2.241,老年人认知障碍变化速度分别为0.028、0.011、0.007、0.004和0.001;女性、年龄大、受教育程度低的老年人认知障碍变化速度较快,农村老年人注意力与计算力、回忆力及记忆力的变化速度快于城市。结论我国老年人认知功能中注意力与计算力和回忆力障碍出现较早,且变化速度较快,尤其是女性、年龄大、受教育程度低及农村老年人。
文摘目的应用经典测量理论(classical test theory,CTT)和项目反应理论(item response theory,IRT)对慢性肺源性心脏病生命质量测定量表[QLICD-CPHD(V2.0)]进行条目分析与评价。方法采用QLICD-CPHD(V2.0)量表对184名慢性肺源性心脏病患者进行调查,运用经典测量理论中的相关系数法、变异度法、因子分析法、克朗巴赫系数法对QLICD-CPHD(V2.0)量表的条目进行分析评定,同时运用项目反应理论中的Samejima等级反应模型计算量表每个条目的难度、信息量和区分度系数。结果CTT结果提示存在7个条目不满足至少3种的统计学要求,其中共性模块6个,特异模块1个。IRT结果显示条目区分度范围为1.18~1.44较为适宜。难度系数随难度等级(B1→B4)增加而单调递增,存在部分条目难度系数b超过标准值范围。各条目平均信息量分布在0.185~0.576。结论经CTT与IRT分析,QLICD-CPHD(V2.0)量表的大部分条目质量较高,具有较好的区分度,但仍有少量条目需进一步分析和修订。