Objective: This study investigated the inter- and intra-rater reliability of the Australian Spasticity Assessment Scale (ASAS) in adults with unilateral hypertonia following acquired brain injury. The ASAS has been sh...Objective: This study investigated the inter- and intra-rater reliability of the Australian Spasticity Assessment Scale (ASAS) in adults with unilateral hypertonia following acquired brain injury. The ASAS has been shown to be superior to other clinical tools for the assessment of spasticity in children with cerebral palsy but reliability has not been previously examined in adults. Method: Four muscle groups were rated on one occasion by four assessors using the ASAS in sixteen adults with unilateral hypertonia following acquired brain injury. Twelve participants returned one week later for reassessment by the same assessors. Results: Overall inter-rater reliability of the ASAS using a quadratic weighted Kappa was moderate (Kqw 0.58) with ranges from moderate to good (Kqw 0.42 - 0.70). Agreement between raters was greatest for soleus muscle and least for wrist flexors. Overall intra-rater reliability of each of the four raters was moderate to good (Kqw 0.48 - 0.79). Agreement within raters was greatest for soleus muscle and least for biceps muscle. Conclusions: The ASAS may represent an appropriate alternative to the clinical scales currently used to assess spasticity;however inter and intra-rater reliability data from this investigation are lower than those which have previously been reported by experienced users of the ASAS in children with cerebral palsy. Further investigation with a larger sample size is warranted before any firm conclusions may be drawn about the reliability and validity of this tool to assess spasticity in adults with acquired brain injury.展开更多
In medical image segmentation,it is often necessary to collect opinions from multiple experts to make the final decision.This clinical routine helps to mitigate individual bias.However,when data is annotated by multip...In medical image segmentation,it is often necessary to collect opinions from multiple experts to make the final decision.This clinical routine helps to mitigate individual bias.However,when data is annotated by multiple experts,standard deep learning models are often not applicable.In this paper,we propose a novel neural network framework called Multi-rater Prism(MrPrism)to learn medical image segmentation from multiple labels.Inspired by iterative half-quadratic optimization,MrPrism combines the task of assigning multi-rater confidences and calibrated segmentation in a recurrent manner.During this process,MrPrism learns inter-observer variability while taking into account the image's semantic properties and finally converges to a self-calibrated segmentation result reflecting inter-observer agreement.Specifically,we propose Converging Prism(ConP)and Diverging Prism(DivP)to iteratively process the two tasks.ConP learns calibrated segmentation based on multi-rater confidence maps estimated by DivP,and DivP generates multi-rater confidence maps based on segmentation masks estimated by ConP.Experimental results show that the two tasks can mutually improve each other through this recurrent process.The final converged segmentation result of MrPrism outperforms state-of-the-art(SOTA)methods for a wide range of medical image segmentation tasks.The code is available at https://github.-com/WuJunde/MrPrism.展开更多
This explorative study investigates 1) whether and how quantitative measures of writing can be applied in finding out about scoring raters' specific tendency in their scoring of EFL writing; 2) how the knowledge of...This explorative study investigates 1) whether and how quantitative measures of writing can be applied in finding out about scoring raters' specific tendency in their scoring of EFL writing; 2) how the knowledge of raters' tendency and scoring results would help verify the best way of combining raters' scores; and 3) how the prediction of the writing scores of EFL writing obtained by quantitative writing performance measures would match the real scores given by raters. Based on a tentative CAF framework of writing measures, raters' performance or tendency in their scoring was observed and certain patterns of similarities as well as differences were found among the raters. The resuks of multiple linear regressions indicate that all raters give prior attention to the aspect of accuracy in their scoring. Differences among raters are also obvious. When it comes to the combination of different raters' scores, the study also finds that weighted average is the best of the three ways of combining scores for this group of raters because it has yielded the best predicting scores than the "pure average". It is even slightly better than the results obtained by facet analysis in terms of some important indices such as R square and Durbin-Watson value. The matching of the predicted scores with the real scores is well over 50 percent. The results of the study are further discussed in relation to the application of wpm and the possible improvement of wpm framework. The methodological, theoretical and practical implications of the study have also been touched upon in the relevant part of the article.展开更多
This study consists of two questionnaire surveys conducted in two stages to investigate factors that high-stakes exam essay raters believe to affect their rating behavior. Raters were all university Chinese teachers o...This study consists of two questionnaire surveys conducted in two stages to investigate factors that high-stakes exam essay raters believe to affect their rating behavior. Raters were all university Chinese teachers of English majors. Seventy-three participants in stage one and 75 in stage two responded to the same questionnaire. Both exploratory factor analysis and confirmatory factor analysis were used in data analysis. Results showed that there were generally six broad factors interfering with the rating process: rating scale, rater training, rating supervision, rater characteristics, eye-catching text features and rating condition. The interaction of those factors reflected the tension between the constraints executed by the test institution and raters' own knowledge and understanding of essay rating. This study may shed light on measures taken to improve essay rating quality.展开更多
A survey was carried out in this study to find out the factors that the raters perceived as affecting the rating of TEM-4 Oral Test, a large-scale tape-mediated oral English testing system in China. The findings show ...A survey was carried out in this study to find out the factors that the raters perceived as affecting the rating of TEM-4 Oral Test, a large-scale tape-mediated oral English testing system in China. The findings show that what the raters perceived as affecting the rating included training, raters' interaction with the rating criteria, raters' physical and emotional conditions, raters' attitudes towards the rating work, raters' oral English proficiency level, and the recording quality. Raters' educational and research background were perceived not to affect the rating.展开更多
文摘Objective: This study investigated the inter- and intra-rater reliability of the Australian Spasticity Assessment Scale (ASAS) in adults with unilateral hypertonia following acquired brain injury. The ASAS has been shown to be superior to other clinical tools for the assessment of spasticity in children with cerebral palsy but reliability has not been previously examined in adults. Method: Four muscle groups were rated on one occasion by four assessors using the ASAS in sixteen adults with unilateral hypertonia following acquired brain injury. Twelve participants returned one week later for reassessment by the same assessors. Results: Overall inter-rater reliability of the ASAS using a quadratic weighted Kappa was moderate (Kqw 0.58) with ranges from moderate to good (Kqw 0.42 - 0.70). Agreement between raters was greatest for soleus muscle and least for wrist flexors. Overall intra-rater reliability of each of the four raters was moderate to good (Kqw 0.48 - 0.79). Agreement within raters was greatest for soleus muscle and least for biceps muscle. Conclusions: The ASAS may represent an appropriate alternative to the clinical scales currently used to assess spasticity;however inter and intra-rater reliability data from this investigation are lower than those which have previously been reported by experienced users of the ASAS in children with cerebral palsy. Further investigation with a larger sample size is warranted before any firm conclusions may be drawn about the reliability and validity of this tool to assess spasticity in adults with acquired brain injury.
基金supported by the Excellent Young Science and Technology Talent Cultivation Special Project of China Academy of Chinese Medical Sciences(CI2023D006)the National Natural Science Foundation of China(82121003 and 82022076)+2 种基金Beijing Natural Science Foundation(2190023)Shenzhen Fundamental Research Program(JCYJ20220818103207015)Guangdong Provincial Key Laboratory of Human Digital Twin(2022B1212010004)。
文摘In medical image segmentation,it is often necessary to collect opinions from multiple experts to make the final decision.This clinical routine helps to mitigate individual bias.However,when data is annotated by multiple experts,standard deep learning models are often not applicable.In this paper,we propose a novel neural network framework called Multi-rater Prism(MrPrism)to learn medical image segmentation from multiple labels.Inspired by iterative half-quadratic optimization,MrPrism combines the task of assigning multi-rater confidences and calibrated segmentation in a recurrent manner.During this process,MrPrism learns inter-observer variability while taking into account the image's semantic properties and finally converges to a self-calibrated segmentation result reflecting inter-observer agreement.Specifically,we propose Converging Prism(ConP)and Diverging Prism(DivP)to iteratively process the two tasks.ConP learns calibrated segmentation based on multi-rater confidence maps estimated by DivP,and DivP generates multi-rater confidence maps based on segmentation masks estimated by ConP.Experimental results show that the two tasks can mutually improve each other through this recurrent process.The final converged segmentation result of MrPrism outperforms state-of-the-art(SOTA)methods for a wide range of medical image segmentation tasks.The code is available at https://github.-com/WuJunde/MrPrism.
基金funded by China National Planning Office of Philosophy and Social Science(No.08XYY007)
文摘This explorative study investigates 1) whether and how quantitative measures of writing can be applied in finding out about scoring raters' specific tendency in their scoring of EFL writing; 2) how the knowledge of raters' tendency and scoring results would help verify the best way of combining raters' scores; and 3) how the prediction of the writing scores of EFL writing obtained by quantitative writing performance measures would match the real scores given by raters. Based on a tentative CAF framework of writing measures, raters' performance or tendency in their scoring was observed and certain patterns of similarities as well as differences were found among the raters. The resuks of multiple linear regressions indicate that all raters give prior attention to the aspect of accuracy in their scoring. Differences among raters are also obvious. When it comes to the combination of different raters' scores, the study also finds that weighted average is the best of the three ways of combining scores for this group of raters because it has yielded the best predicting scores than the "pure average". It is even slightly better than the results obtained by facet analysis in terms of some important indices such as R square and Durbin-Watson value. The matching of the predicted scores with the real scores is well over 50 percent. The results of the study are further discussed in relation to the application of wpm and the possible improvement of wpm framework. The methodological, theoretical and practical implications of the study have also been touched upon in the relevant part of the article.
基金supported by the Youth Foundation of Ministry of Education of China for Humanity and Social Science Research(15YJC740004)the Fundamental Research Funds for the Central Universities in China(16LZUJBWZY032+1 种基金LZUJBWZY069)Fund of School of Foreign Languages of LZU(16LZUWYXSTD002)
文摘This study consists of two questionnaire surveys conducted in two stages to investigate factors that high-stakes exam essay raters believe to affect their rating behavior. Raters were all university Chinese teachers of English majors. Seventy-three participants in stage one and 75 in stage two responded to the same questionnaire. Both exploratory factor analysis and confirmatory factor analysis were used in data analysis. Results showed that there were generally six broad factors interfering with the rating process: rating scale, rater training, rating supervision, rater characteristics, eye-catching text features and rating condition. The interaction of those factors reflected the tension between the constraints executed by the test institution and raters' own knowledge and understanding of essay rating. This study may shed light on measures taken to improve essay rating quality.
基金This study is one part of the TEM-4 Oral Test project funded by Institute of Oral English Studies of Nanjing University The author here shows her gratitude to the institute and TEM-4 Oral Test Center for both financial and data support
文摘A survey was carried out in this study to find out the factors that the raters perceived as affecting the rating of TEM-4 Oral Test, a large-scale tape-mediated oral English testing system in China. The findings show that what the raters perceived as affecting the rating included training, raters' interaction with the rating criteria, raters' physical and emotional conditions, raters' attitudes towards the rating work, raters' oral English proficiency level, and the recording quality. Raters' educational and research background were perceived not to affect the rating.