摘要
针对手机短信文本信息流的特点,设计一种自动文摘生成模型。该模型利用词共现定义语义相似度,根据TF-IDF定义特征词权值以及文摘候选句权值。算法通过清除孤立点、根据权值筛选文摘句以及文摘句排序,生成冗余度较小且可读性较好的短信文本信息流文摘。相关数据实验证明,文摘句的生成质量和算法效率都比较高。
Due to the characteristics of mobile short message text information flow in the practical application, an auto- matic digest generation model is designed. The model uses word co- occurrence to define the semantic similarity. Using the TF - IDF, weights of feature words and abstracts candidate sentence weights are defined in the model. By removing iso- lated points, the algorithm generates smaller redundancy and more readable short text messages flow digest according to the weight screening abstract and abstract sort. Experiments of the relevant data show that the model has better quality and higher efficiency in abstract generation.
出处
《现代图书情报技术》
CSSCI
北大核心
2013年第2期43-49,共7页
New Technology of Library and Information Service
基金
河北省科技支撑计划项目"手机垃圾短信语义识别与分类"(项目编号:10213581)
淮安市社会支撑基金项目"基于数据挖掘的淮安市人力资源及就业状况研究"(项目编号:HASZ2012046)的研究成果之一
关键词
手机短信文本
信息流
文摘
权值
Mobile short message text Information flow Abstracts Weights