We are developing a Moodle plug-in,which is an AES(automated essay scoring)support system for the basic education of university students.Our system evaluates essays based on rubric,which has five evaluation viewpoints...We are developing a Moodle plug-in,which is an AES(automated essay scoring)support system for the basic education of university students.Our system evaluates essays based on rubric,which has five evaluation viewpoints“Contents,Structure,Evidence,Style,and Skill”.Vocabulary level is one of the scoring items of Skill.It is calculated using Japanese Language Learners’Dictionaries constructed by Sunakawa et al.Since this does not fully cover the words used in the student-level essays,we found that there is a problem with the accuracy of the vocabulary level scoring.In this paper,we propose to construct comprehensive Japanese vocabulary difficulty level dictionaries using Japanese Wikipedia as the corpus.We apply Latent Dirichlet Allocation(LDA)to the Wikipedia corpus and find the word appearance probability as one of the indexes of word difficulty.We use the TF-IDF value instead of the LDA value of the words,which rarely appears.As a result,we constructed highly comprehensive Japanese vocabulary difficulty level dictionaries.We confirmed that the vocabulary level can be scored for all words in the test dataset by using the constructed dictionaries.展开更多
基金This work was supported by JSPS KAKENHI(Nos.18K11589,17K00432).
文摘We are developing a Moodle plug-in,which is an AES(automated essay scoring)support system for the basic education of university students.Our system evaluates essays based on rubric,which has five evaluation viewpoints“Contents,Structure,Evidence,Style,and Skill”.Vocabulary level is one of the scoring items of Skill.It is calculated using Japanese Language Learners’Dictionaries constructed by Sunakawa et al.Since this does not fully cover the words used in the student-level essays,we found that there is a problem with the accuracy of the vocabulary level scoring.In this paper,we propose to construct comprehensive Japanese vocabulary difficulty level dictionaries using Japanese Wikipedia as the corpus.We apply Latent Dirichlet Allocation(LDA)to the Wikipedia corpus and find the word appearance probability as one of the indexes of word difficulty.We use the TF-IDF value instead of the LDA value of the words,which rarely appears.As a result,we constructed highly comprehensive Japanese vocabulary difficulty level dictionaries.We confirmed that the vocabulary level can be scored for all words in the test dataset by using the constructed dictionaries.