摘要
输入表示为固定长度的特征向量是机器学习算法要求之一。针对编程中的编译错误信息特征,论文提出了基于word2vec模型对编译错误信息进行特征提取。利用滑动窗口取词的方式,建立one-hot字典,结合word2vec中的Skip-gram模型,构建Huffman树,从可变长度的文本中学习固定长度的特征表示。最后使用SVM分类算法进行实验结果的验证。结果表明,该特征提取方法在编译错误信息中有显著的效果。
One of the requirements of machine learning algorithms is to represent the input as a fixed length eigenvector.Aiming at the features of compile error information in programming,this paper proposes to extract the features of compile error information based on word2vec model.The one-hot dictionary is established by using the sliding window to pick words,and the Skip-gram model in word2vec is combined to build the Huffman tree to learn the feature representation of fixed length from the text of variable length.Finally,SVM classification algorithm is used to verify the experimental results.The results show that the feature extraction method is effective in compiling error messages.
作者
何烨辛
谷林
孙晨
HE Yexin;GU Lin;SUN Chen(School of Computer Science,Xi'an Polytechnic University,Xi'an 710048;College of Management,Xi'an University of Science and Technology,Xi'an 710054;New Rural Cooperative Medical Service Operation Center,Yanliang District,Xi'an 710089)
出处
《计算机与数字工程》
2022年第6期1317-1322,共6页
Computer & Digital Engineering