期刊文献+

手机短信文本信息流的自动文摘生成 被引量:4

Automatic Abstracting Generating Based on Mobile Short Message Text Information Flow
原文传递
导出
摘要 针对手机短信文本信息流的特点,设计一种自动文摘生成模型。该模型利用词共现定义语义相似度,根据TF-IDF定义特征词权值以及文摘候选句权值。算法通过清除孤立点、根据权值筛选文摘句以及文摘句排序,生成冗余度较小且可读性较好的短信文本信息流文摘。相关数据实验证明,文摘句的生成质量和算法效率都比较高。 Due to the characteristics of mobile short message text information flow in the practical application, an auto- matic digest generation model is designed. The model uses word co- occurrence to define the semantic similarity. Using the TF - IDF, weights of feature words and abstracts candidate sentence weights are defined in the model. By removing iso- lated points, the algorithm generates smaller redundancy and more readable short text messages flow digest according to the weight screening abstract and abstract sort. Experiments of the relevant data show that the model has better quality and higher efficiency in abstract generation.
出处 《现代图书情报技术》 CSSCI 北大核心 2013年第2期43-49,共7页 New Technology of Library and Information Service
基金 河北省科技支撑计划项目"手机垃圾短信语义识别与分类"(项目编号:10213581) 淮安市社会支撑基金项目"基于数据挖掘的淮安市人力资源及就业状况研究"(项目编号:HASZ2012046)的研究成果之一
关键词 手机短信文本 信息流 文摘 权值 Mobile short message text Information flow Abstracts Weights
  • 相关文献

参考文献12

  • 112321 网络不良与垃圾信息举报受理中心.2011 年下半年手机短信息状况调查报告[R/OL].[2012-08-17]. http://12321.cn/pdf/ sms1102.pdf.
  • 2Carbonell J, Goldstein J. The Use of MMR, Diversity-based Reranking for Reordering Documents and Producing Summaries[C]. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA: ACM,1998:335-336.
  • 3Lapata M.Automatic Evaluation of Information Ordering:Kendall's Tau[J].Computational Linguistics,2006,32(4):471-484.
  • 4Hu M, Sun A, Lim E P. Comments-oriented Document Summarization: Understanding Documents with Readers' Feedback[C].In: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval. New York, NY, USA:ACM,2008:291-298.
  • 5Zajic D, Dorr B J, Lin J. Single-document and Multi-document Summarization Techniques for Email Threads Using Sentence Compression[J].Information Processing and Management,2008, 44(4):1600-1610.
  • 6彭泽映,俞晓明,许洪波,刘春阳.大规模短文本的不完全聚类[J].中文信息学报,2011,25(1):54-59. 被引量:35
  • 7Newman M E J. Power Laws, Pareto Distributions and Zipf's Law[J]. Contemporary Physics, 2005,46(5):323-351.
  • 8黄承慧,印鉴,侯昉.一种结合词项语义信息和TF-IDF方法的文本相似度量方法[J].计算机学报,2011,34(5):856-864. 被引量:221
  • 9郝秀兰,胡运发,申情.中文论坛内容监测的方法研究[J].中文信息学报,2012,26(3):129-136. 被引量:3
  • 10刘美玲,郑德权,赵铁军,于洋.动态多文档文摘模型[J].软件学报,2012,23(2):289-298. 被引量:9

二级参考文献82

共引文献271

同被引文献115

引证文献4

二级引证文献17

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部