摘要
数据降维是文本表示中不可或缺的一个环节,有效的数据降维方法不仅能够减少计算量,同时有助于文本处理精度的提高。不同于传统的利用统计信息进行降维的方法,本文提出了一种基于词汇的语义相似度的文本表示的降维方法,该方法结合自然语言处理的知识,在降维环节考虑了特征词的语义信息和词性信息。实验结果表明:该方法能够有效地降低文本表示的维数,并在降维后的空间获得较高的文本处理精度,基于语义相似度的降维方法是一种适合文本处理的降维方法。
Data dimension reduction plays an important role in the field of text expression.An effective dimension reduction method can not only reduce the amount of calculation,but help to improve the accuracy of text classification.The paper presents a new method of dimension reduction which is based on word semantic similarity.Being different from the traditional method which usually uses the statistical information of word,natural language processing knowledge is used in our method which considers semantic information and POS information of feature terms.The experimental result shows that the method is effective in dimensionality reduction of text expression and achieves a higher accuracy.The method based on semantic similarity is a suitable method.
出处
《河南科技大学学报(自然科学版)》
CAS
2008年第5期36-39,共4页
Journal of Henan University of Science And Technology:Natural Science
基金
河南省教育厅基金项目(200510464031)
关键词
语义相似度
知网
特征选取
Semantic similarity
Hownet
Feature selection