期刊文献+

基于SVM的维吾尔文文本分类研究 被引量:11

Research of Uyghur Language Text Categorization Based on SVM
下载PDF
导出
摘要 文本自动分类技术在提高文本信息利用的有效性和准确性上具有重要的现实意义和广阔的应用前景。随着Internet上维吾尔文信息的迅速发展,维吾尔文文本分类成为处理和组织这些大量文本数据的关键技术。研究维吾尔文文本分类相关技术和方法,针对维吾尔文文本在向量空间模型表示下的高维性,本文采用词干提取和χ2统计量相结合的方法对表示空间进行降维。采用SVM算法构造了维吾尔文文本分类器。针对维吾尔文文本分类语料进行的实验结果表明,SVM分类器的MacroF1值达到了84.6%,明显好于kNN方法。 The automatic text categorization technique has important practical significance and broad application prospect in improving the validity and accuracy of the use of text information.With the rapid increase of Uyghur language text information on the Internet,Uyghur language text categorization has become a key technique of processing and organizing these text data.As to the high dimensionality of Uyghur language text under vector space model representation,the stemming technique is used along with χ2 to reduce the dimensionality.Uyghur language text categorizer is constructed based on SVM.The experimental results based on Uyghur language text corpus show that the MacroF1 value of SVM categorizer can reach 84.6% and outperform the kNN approach.
出处 《计算机工程与科学》 CSCD 北大核心 2012年第12期150-154,共5页 Computer Engineering & Science
基金 国家自然科学基金资助项目(61063026 61163028)
关键词 文本分类 SVM KNN 维吾尔语 text categorization SVM kNN uyghur language
  • 相关文献

参考文献14

二级参考文献91

  • 1孙建涛,郭崇慧,陆玉昌,石纯一.多项式核支持向量机文本分类器泛化性能分析[J].计算机研究与发展,2004,41(8):1321-1326. 被引量:16
  • 2古丽拉.阿东别克,米吉提.阿布力米提.维吾尔语词切分方法初探[J].中文信息学报,2004,18(6):61-65. 被引量:39
  • 3力提甫.托乎提.电脑处理维吾尔语语音和谐律的可能性[J].中央民族大学学报(哲学社会科学版),2004,31(5):108-113. 被引量:14
  • 4阿依克孜.卡德尔,开沙尔.卡德尔,吐尔根.依布拉音.面向自然语言信息处理的维吾尔语名词形态分析研究[J].中文信息学报,2006,20(3):43-48. 被引量:23
  • 5L. S. Larkey, L. Ballesteros and M. E. Connell. Improving Stemming for Arabic Information Retrieval: Light Stemming and Co-occurrence Analysis[C]//Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, Tampere, Finland,2002, 275-282.
  • 6Tai, S. Y., Ong, C. S., and Abdullah, N. A. On designing an automated Malaysian stemmer for the Malay language(poster) [C]//Proeeedings of the fifth international workshop on information retrieval with Asian languages, Hong Kong, 2000: 207-208.
  • 7Greengrass, M., Robertson, A. M., Robyn, S., and Willett, P. Processing morphological variants in searches of Latin text [J]. Information research news, 1996, 6(4): 2-5.
  • 8Berlian, V., Vega, S. N., and Bressan, S. Indexing the Indonesian web: Language identification and miscellaneous issues[C]//Presented at Tenth International World Wide Web Conference, Hong Kong, 2001.
  • 9Carlberger, J., Dalianis, H., Hassel, M., and Knutsson, O. Improving precision in information retrieval for Swedish using stemming[C]//Proceedings of NO- DALIDA'01-13th Nordic conference on computational linguistics, Uppsala,Sweden, 2001.
  • 10Monz, C. and de Rijke, M. Shallow morphological analysis in rnonolingual information retrieval for German and Italian[C]//Cross-qanguage information retrieval and evaluation: Proceedings of the CLEF 2001 workshoo, C. Peters, Ed.: Soringer Verlag. 2001.

共引文献45

同被引文献100

  • 1哈力木拉提,阿孜古丽.多字体印刷维吾尔文字符识别系统的研究与开发[J].计算机学报,2004,27(11):1480-1484. 被引量:36
  • 2李荣陆,王建会,陈晓云,陶晓鹏,胡运发.使用最大熵模型进行中文文本分类[J].计算机研究与发展,2005,42(1):94-101. 被引量:96
  • 3苏金树,张博锋,徐昕.基于机器学习的文本分类技术研究进展[J].软件学报,2006,17(9):1848-1859. 被引量:389
  • 4刘华.基于关键短语的文本分类研究[J].中文信息学报,2007,21(4):34-41. 被引量:14
  • 5FORMAN G. An extensive empirical study of feature selection metricsfor text classification); J]. Journal of Machine Learning Research,2003,3(1) :1289-1305.
  • 6ROGATI M,YANG Yi-ming. High-performing feature selection fortext classification [ C] //Proc of the 11th ACM International Confe-rence on Information and Knowledge Management. New York : ACMPress, 2002 :659-661.
  • 7YANG Yi-ming, PEDERSEN J 0. A comparative study on feature se-lection in text categorization [ C] //Proc of the 14th International Con-ference on Machine Learning. [ S. 1.] : Morgan Kaufmann Publish-ers ,1997 :412-420.
  • 8JOACHIMS T, Text categorization with support vector machines :learning with many relevant features [ C] //Proc of European Confe-rence on Machine Learning. [ S. 1.] : Springer-Verlag, 1998 : 137-142.
  • 9CHANG C C,LIN C J. LIBSVM : a library for support vector ma-chines [ EB/OL]. [ 2011-09- 10] http://www. csie. ntu. edu. tw/ ~cjlin/libsvm/.
  • 10吾守尔,吾宗尧,苏丹.计算机维、哈、柯文字信息处理标准化方案[J].计算机研究与发展,1986(12):33-38.

引证文献11

二级引证文献57

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部