期刊文献+

基于伪相关反馈的短文本扩展与分类 被引量:6

Short text expansion and classification based on pseudo-relevance feedback
下载PDF
导出
摘要 针对短文本分类问题,提出基于伪相关反馈(PFR)的短文本扩展与分类方法.在保持语义不变的情况下,利用互联网中的相似语料对短文本的内容进行了扩展.对现有的仅使用局部特征的扩展语料特征抽取方法进行改进,引入全局特征抽取,将全局特征与局部特征相结合得到了更好的特征向量,有效地解决了分类过程中由短文本长度有限导致的特征矩阵高度稀疏的问题.通过在开放数据集上的测试和与其他文献的结果比对,验证了该方法在短文本分类的问题上可以取得较好的效果. A novel classification method based on pseudo-relevance feedback (PFR) was proposed in order to solve the sparseness problems in short text classification. The short texts were expanded using the web pages which are similar to them in semantic level. The feature vector generation algorithm was modified to extract both the local features and the global features. The method can alleviate the sparseness problem of the final feature matrix, which is common in short text classification because of the limited length of the texts. The experimental results on an open dataset show that the method can significantly improve the short text classification effect compared with state-of-the-art methods.
出处 《浙江大学学报(工学版)》 EI CAS CSCD 北大核心 2014年第10期1835-1842,共8页 Journal of Zhejiang University:Engineering Science
基金 博士点基金资助项目(20110101110065) 国家"十二五"科技支撑计划资助项目(2012BAD35B01-3 2013BAF02B10)
关键词 伪相关反馈 短文本分类 特征提取 pseudo-relevance feedback short text classification feature extraction
  • 相关文献

参考文献13

  • 1SRIRAM B, FUHRY D, DEMIR E, et al. Short text classification in twitter to improve information filtering [C]// Proceedings of the 33rd international ACM SIGIR Conference on Research and Development in Information Retrieval. Geneva: ACM, 2010: 841- 842.
  • 2SUN A. Short text classification using very few words [C]// Proceedings of the 35th International ACM SIGIR Conference on Research and Development in Information Retrieval. Portland: ACM, 2012: 1145- 1146.
  • 3YUAN Q, CONG G, THALMANN N M. Enhancing Naive Bayes with various smoothing methods for short text classification [C]// Proceedings of the 21st Interna- tional Conference on World Wide Web. Seoul: ACM, 2012:645 - 646.
  • 4李卫疆,赵铁军,王宪刚.基于上下文的查询扩展[J].计算机研究与发展,2010,47(2):300-304. 被引量:32
  • 5BANERJEE S, RAMANATHAN K, GUPTA A. Clus- tering short texts using Wikipedia [C]// Proceedings of the 30th International ACM SIGIR Conference on Research and Development in Information Retrieval. Amsterdam:ACM, 2007: 787- 788.
  • 6HU X, SUN N, ZHANG C, et al. Exploiting internal and external semantics for the clustering of short texts using world knowledge [C] // Proceedings of the 18th ACM Conference on Information and Knowledge Manage- ment. HongKong: ACM, 2009: 919-928.
  • 7PHAN X H, NGUYEN L M, HORIGUCHI S. Learn- ing to classify short and sparse text : web with hidden topics from large-scale data collections[C]/// Proceed- ings of the 17th International Conference on World Wide Web. Beijing: ACM, 2008:91-100.
  • 8CHEN M, JIN X, SHEN D. Short text classification improved by learning multi-granularity topics [C] // Proceedings of the 22nd International Joint Conference on Ar- tificial Intelligence. Barcelona: AAAI, 2011:1776 - 1781.
  • 9SAHAMI M, HEILMAN T D. A web-based kernel function for measuring the similarity of short text snip- pets [C]//Proceedings of the 15th International Confer- ence on World Wide Web. Edinburgh: ACM, 2006:377 -386.
  • 10YIH W T, CHRISTOPHER M. Improving similarity measures for short segments of text [C]//Proceedings of the 22nd Conference on Artificial Intelligence. Van- couver: AAAI, 2007: 1489- 1494.

二级参考文献17

  • 1张敏,宋睿华,马少平.基于语义关系查询扩展的文档重构方法[J].计算机学报,2004,27(10):1395-1401. 被引量:55
  • 2丁国栋,白硕,王斌.文本检索的统计语言建模方法综述[J].计算机研究与发展,2006,43(5):769-776. 被引量:19
  • 3丁国栋,白硕,王斌.一种基于局部共现的查询扩展方法[J].中文信息学报,2006,20(3):84-91. 被引量:43
  • 4Ponte J, Croft W. A language modeling approach to information retrieval [C] //Proc of the 21st ACM Conf on Research and Development in Information Retrieval (SIGIR'98). New York: ACM, 1998:222-229.
  • 5Richardson R, Smeaton A. Using Wordnet in a knowledgebased approach to information retrieval, ca-0395 [R]. Dublin: Trinity College Dublin, 1995.
  • 6Lin D-K, Zhao S-J. Identifying synonyms among distributionally similar words [C]//Proc of Int Joint Conf of Artificial Intelligence (IJCAI2003). Acapuleo: Elsevier, 2003:Ⅰ492-Ⅰ493.
  • 7Xu J, Croft W. Query expansion using local and global document analysis [C] //Proc of the 19th Annual Int ACM SIGIR Conf on Research and Development in Information Retrieval. New York: ACM, 1996:4-11.
  • 8Li Dekang. Dependency-based evaluation of MINIPAR [C] // Proc of the Workshop on the Evaluation of Parsing Systems. Granada: ELAR, 1998:298-312.
  • 9Peat H, Willett P. The limitations of term co-occurrence data for query expansion in document retrieval systems [J]. Journal of the American Society for Information Science, 1991, 42(5) : 378-383.
  • 10Voorhees E. Query expansion using lexical semantic relations[C]//Proe of ACM Conf on Research and Development in Information Retrieval 1994. New York: ACM, 1994:61-69.

共引文献31

同被引文献51

引证文献6

二级引证文献40

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部