摘要
针对目前数据库知识发现模型系统中传统文本信息抽取算法无法满足用户业务需求的问题,提出了一种基于用户需求描述的文本信息特征抽取模型。通过用户的业务需求模型进行特征化描述,将数据库中存储的原始本文信息进行预处理加工,计算的词频、权重,初步选取文本特征,根据用户需求描述计算特征相似度,过滤不相关的"噪声"信息,进而保留能够精确描述文本信息的特征。
Because the traditional algorithm of text feature extracting in the system of knowledge discover in database model (KDD) cannot meet the business requirement of user, this paper proposes a model of text feature extracting based on user requirement description. Through the characterization for the model of user business requirement, by making use of the pretreatment of text informa- tion stored in database, the frequency of words and weight value is calculated to initially select the text feature. Some "noise" words in these feature words are filtered according to the similarity under the user requirement description and the feature of text words that meeting the user requirement can be kept.
出处
《山西电子技术》
2013年第1期65-67,共3页
Shanxi Electronic Technology
关键词
用户需求模型
文本信息
文本特征
相似度
model of user requirement
text information
text feature
similarity