摘要
对异构数据库相似语义属性聚类过程及其关键技术进行研究,在词频-逆文件频率的基础上,提出数值类型属性信息的槽频率-逆文件频率处理方法,分别应用于文本信息和数值信息的相似语义属性聚类过程。研究结果表明:使用词频-逆文件频率和槽频率-逆文件频率方法相结合是异构数据库相似语义属性聚类实现的一种有效方法。
The key technology of the similar semantic attribute clustering process in the heterogeneous database was researched.On the basis of the term frequency-inverse document frequency,the processing method of bin frequency-inverse document bin frequency was proposed,which was applied in similar semantic attribute clustering prosess of the text information and numerical information.The results show that the method using term frequency–inverse document frequency and bin frequency-inverse document bin frequency is effective to the process of the similar semantic attribute clustering in the heterogeneous database.
出处
《铁道科学与工程学报》
CAS
CSCD
北大核心
2012年第2期119-124,共6页
Journal of Railway Science and Engineering
基金
甘肃省自然科学基金资助项目(1014RJZA042)
关键词
异构数据库
相似语义
属性聚类
统一矢量化
词频—逆文件频率
槽频率—逆文件槽频率
自组织映射网络
heterogeneous database
similar semantic
attribute clustering
unified vector(UV)
term frequency inverse document frequency(TF-IDF)
bin frequency-inverse document bin frequency(BF-IDBF)
self-organizing mapping network(SOM)