摘要
文本挖掘是发现文本中所包含的内容和意义的过程。向量空间模型是文本挖掘中成熟的文本表示模型,而特征项的选择对其性能有着重要的影响。但以前的研究都把目光聚焦于文本中出现的特征项,忽略了文档之间的相关性。这种局限使这些特征项不能提供丰富的语义信息。始于2005年的Web2.0大潮席卷了整个互联网,在此背景下应运而生的社会化标注成了相关文档的语义桥梁,此文本挖掘带来了新的生机。据此本文利用IRF(Iterative Reinforcement Framwork)模型为文档产生了丰富的特征项,大大提高了文档的检索率。
Text mining is a process of discovering the interesing information contained in the document.Vector space model is a mature model of text representation in text mining,the performance of which is affect by the way of choosing feature items.However,previous studies have focused on the items appeared in this document,ignoring the relvance between the documents.In the environment of Web2.0,which swept the entire Internet in 2005,the social annotation become the semantic bridge of related documents,which bring new life for web minging.So this paper choose IRF to produce the enhanced representative of document.
出处
《网络安全技术与应用》
2010年第9期47-49,共3页
Network Security Technology & Application
关键词
文档代表
WEB2.0
特征项
标注
the representative of document
web 2.0
feature items
annotation