摘要
新闻话题的特征表示是建立话题模型以及进行话题聚类(融合)的基础,传统的特征构建一般采用关键字构成的向量表示模型,未对特征的选取、分类以及质量等方面进行完整的研究,因此本文拟针对互联网新闻文档进行特征提取、特征构建以及话题聚类质量分析等方面的系统研究,阐明话题特征的选择与构建对文本话题研究的影响,为后续的话题检测与追踪等应用提供更科学的特征理论模型。实验结果表明经过话题特征优选后的聚类效果有助于提高话题模型的准确性,避免噪声特征带来的话题歧义。
Feature representation for news topic is the foundation for modeling and clustering news topic. Traditional methods for features construction commonly use keywords to build a vector representation model taking no considera-tion of feature selection, feature classification as well as performance evaluation. Aiming at this issue, to provide more reasonable feature model for Topic Detection and Tracking (TDT), this paper intends to carry out a systematic study on feature selection, feature construction and performance evaluation for Internet news, and discusses how the feature se-lection and construction affect the study on texts topic. Experimental results show that the feature model selected and reconstructed by this paper improves the precision of topic clustering and effectively avoids the ambiguity by noise features.
出处
《软件》
2015年第7期17-20,共4页
Software
基金
国家自然科学基金(61202044)
四川省教育厅科研基金(12ZB326)
绵阳市网络融合工程实验室开放课题(12ZXWK04)
西南科技大学博士研究基金(12zx7116)
关键词
话题特征
话题模型
话题聚类
特征选择
Topic feature
Topic model
Topic cluster
Feature selection