摘要
为了实现异构数据库的数据共享,关键的问题就是要找出数据库间的相同属性。目前主要采用的方法是通过比较所有的属性来实现属性的相似性匹配,但是当同一属性用不同数据类型表示时,由于描述属性的元数据信息和取值信息的极大差异性,这些方法就不能找出相同的属性。并且将不同数据类型描述的属性放在一起匹配,还会造成属性数据之间的干扰,影响匹配结果的准确性。为此,本文提出一种基于 BP 神经网络的二步检查法属性匹配算法。该算法中属性首先根据数据类型进行分类,然后用分类后的属性集分别多次训练神经网络,并对每次的匹配结果求交集作为最终的属性匹配结果,进行两阶段检查,即二步检查法。该算法能有效地消除不一致信息的干扰,降低神经网络的规模,并且可以实现不同数据类型的属性集之间属性匹配过程的并行计算。实验结果显示本文提出的方法能明显地提高系统的运行效率、属性匹配的查准率和查全率。
In order to realize data sharing, identifying corresponding attributes is an important issue in heterogeneous databases. The main approaches at present use the characteristics describing attributes to evaluate the similarity of attributes by comparing all attributes. But these approaches can not present correct results due to the obvious difference of metadata and value information describing attributes when the same attribute is expressed using different data types, and result in incorrect attributes matching for the interference among attributes with different data types. So two phase checking algorithm based on BP neural network is presented to realize attributes matching, in which attributes are required to be categorized according to data types, and the BP neural networks are trained several times respectively using the categorized attributes, and the final attributes matching results are the intersection of every time matching results. This algorithm can resolve the interference among attributes with different data types, and decrease the size of BP neural network, and realize the parallel computation of attributes matching. The experimental results show our approach can improve the system performance, the precision ratio and recall ratio of attributes matching obviously.
出处
《计算机科学》
CSCD
北大核心
2006年第1期249-251,259,共4页
Computer Science
基金
国家自然科学基金项目(70371030)
重庆市教委基金项目(040212)。