摘要
随着在线教育平台的兴起,为了解决大量试题带来的存储开支问题,试题查重技术应运而生。提出将改进的Simhash算法应用到试题查重中,首先根据结巴分词技术将试题文本进行切分,然后根据TF-IDF技术并结合词语的词性及词长算出关键词权重,以期达到对Simhash签名值的精确计算,最后通过带有索引功能的海明距离检测出相似试题。实验结果验证了此方案的可行性。
With the rise of online education platform,in order to solve the problem of storage costs caused by a large number of test questions,the research of examination checking technology is becoming more and more important.So,we propose the improved Simhash algorithm is applied to examination checking,first of all,we will test the text segmentation according to stutter segmentation,and then based on TF-IDF technology and to achieve the Simhash signature accurately calculated with the words part of speech and word length to calculate the weight of words,finally,with the index function of Hamming the distance detected similar questions,through experiments,we can verify the feasibility of this scheme.
出处
《软件导刊》
2018年第2期151-153,157,共4页
Software Guide