期刊文献+

基于标题类别语义识别的文本分类算法研究 被引量:6

Applying Title Category Semantic Recognition for Text Categorization
下载PDF
导出
摘要 本文提出了一种基于标题类别语义识别的文本分类算法。算法利用基于类别信息的特征选择策略构造分类的特征空间,通过识别文本标题中的特征词的类别语义来预测文本的候选类别,最后在候选类别空间中用分类器执行分类操作。实验表明该算法在有效降低分类候选数目的基础上可显著提高文本分类的精度,通过对类别空间表示效率指标的验证,进一步表明该算法有效地提高了文本表示空间的性能。 This paper presents a new algorithm using title category semantic recognition for text categorization. The algorithm generates feature space based on its category, picks up category semantic words of the title to produce candidate category and finally classifies it under these candidate categories. The experimental results firmly prove that the new algorithm performs better with fewer candidates and higher precision. Further research introduces category space representation efficiency to verify the validity of the new algorithm and proves that it can achieve great improvement in text representation.
出处 《电子与信息学报》 EI CSCD 北大核心 2007年第12期2885-2890,共6页 Journal of Electronics & Information Technology
基金 国家自然科学基金(60435020 60504021)资助课题
关键词 标题类别语义识别 候选类别 类别空间表示效率 Title Category Semantic Recognition(TCSR) Candidate category Category space representation efficiency
  • 相关文献

参考文献12

  • 1Yiming Yang and Jan O P. A comparative study on feature selection in text categorization. In Proceedings of the 14th International Conference on Machine Learning (ICML97), San Francisco, USA, 1997- 412-420.
  • 2Rong Jin, Joyce Y C, and Luo Si. Learn to weight terms in information retrieval using category information. In Proceedings of the 22nd International Conference on Machine Learning, Bonn, Germany, 2005: 353-360.
  • 3Young joong Ko,Park Jinwoo, and Seo Jungyun. Automatic text categorization using the importance of sentences. In Proceedings of the 19th International Conference on Computational Linguistics, Talpei, Taiwan, 2002: 474-480.
  • 4Li Wei, Yuan Chunfa, Wong Kam-Fai, and Li Wenjie. Text similarity calculating based on critical sentence vector model. In Proceedings of the 20th International Conference on Computer Processing of Oriental Languages (ICCPOL2003), Shenyang, China, 2003: 424-430.
  • 5Zhan Xuegang, Yao Tianshun. The classification method for Chinese document title based on Chinese semantic analysis. In Proc of the Int'l Conf Chinese Information Processing, Beijing, Tsinghua University Press, 1998, 321-324.
  • 6林鸿飞.基于示例的文本标题分类机制[J].计算机研究与发展,2001,38(9):1132-1136. 被引量:17
  • 7张加民.标题预示性的元功能视角[J].外语教学,2004,25(6):36-39. 被引量:7
  • 8麻志毅,姚天顺.基于情境的文本主题求解[J].计算机研究与发展,1998,35(4):344-348. 被引量:7
  • 9刘云.论篇名语言的标记性[J].云梦学刊,2003,24(4):104-107. 被引量:7
  • 10John C P. Probabilistic outputs for support vector machines and comparisons to regularized likelihood, methods. Advances in Large Margin Classifiers, 1999: 61-73.

二级参考文献31

  • 1张凤.标记理论的再评价[J].解放军外国语学院学报,1999,22(6):44-46. 被引量:34
  • 2王立非.关于标记理论[J].外国语,1991,14(4):32-36. 被引量:59
  • 3沈家煊.类型学中的标记模式[J].外语教学与研究,1997,29(1):4-13. 被引量:178
  • 4陈磊.基于HNC语义分析的中文标题分类方法.计算语言学文集[M].北京:清华大学出版社,1999.371-375.
  • 5战学钢 姚天顺.基于汉语分析的中文标题分类方法.中文信息处理国际会议论文集[M].北京:清华大学出版社,1998.321-324.
  • 6-.中国分类主题词表,分类号-主题词对应表,第一卷[M].北京:华艺出版社,1994..
  • 7-.中国分类主题词表,主题词-分类号对应表,第二卷[M].北京:华艺出版社,1994..
  • 8麻志毅,International Conference on Computer Processing of Oriental Languages,1997年,67页
  • 9迟成英,中文信息学报,1997年,56卷,1期,9页
  • 10麻志毅,First International Conference on High-New Technology and Traditional Industry,1996年,240页

共引文献34

同被引文献104

引证文献6

二级引证文献30

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部