摘要
针对无纸化考试系统入库试题重复检测问题;提出基于卡方检验与词义分析的试题重复检测算法,首先自动提取试题的特征信息词项,利用卡方检验改进公式进行特征词分析并删除冗余词;其次,结合中文WordNet词典对特征词进行词义分析,并利用Tf-Idf方法计算入库试题的特征词向量与不同题型特征词的余弦相似度;最后,根据所得相似度值判别该试题是否与题库试题重复。实验结果表明,在重复度阀值选取0.8时,算法耗时少、准确性高。
According to the question repeatability problem of paperless examination. The algorithm of Feature semantic similarity is proposed based on Chi square test. First, automatic extraction of words features information from the question, delete the redundant words by test, Second, analysis feature words semantic under the Chinese WordNet Dictionary, and calculate the cosine similarity of feature vectors by using the TF-IDF method, Finally, according to the result to determine whether the question is put into question database. The experimental result shows that the algorithm is good robustness, high accuracy, high efficiency under the threshold selection 0.8.
出处
《电子设计工程》
2016年第13期26-29,共4页
Electronic Design Engineering
基金
陕西省高等教育教学改革研究重点项目(13BZ69)
陕西省教育厅专项科学研究项目(16JK2078)
关键词
卡方检验
特征词
语义
余弦相似度
试题重复度
chi-square test
feature terms
semantic
cosine similarity
question redundancy