摘要
XML作为电子政务应用中的数据交换标准已经被广泛研究。随着大数据时代的到来,对电子政务中XML数据的管理也显得越来越重要。在XML数据的管理中,XML文档的相似性是XML数据集成、XML数据分类的关键。为了研究XML文档的相似性,针对XML文档进行了树形变换,并提取树节点的相应特征,然后分别利用这些特征对节点进行相应的相似性计算,再将得到的相似性利用ELM(超限学习机)算法进行拟合得到最终的节点相似性。在节点相似性的基础上提出了XML文档树的相似性比较算法,从而计算得到XML文档的相似性。实验部分在给出具体的评估指标的基础上,在两个不同的数据集上给出使用文中方法所得到的精确度、召回率、F-measure值以及相应时间的对比情况,通过实验验证了所提方法的性能优势。
XML has been widely studied as the standard of data exchange in e-government applications. With the arrival of the era of big data,the management of XML data in e-government is also becoming more and more important. In the management of XML data,the similarity of XML documents is the key of XML data integration and XML data classification. In order to study the XML document simi- laxity, the XML document are transformed into tree, extracting the corresponding characteristics of the nodes of the tree, and then using these characteristics to calculate the similarity of nodes, and then the final node similarity can be obtained by the ELM( Extreme Learning Machine) algorithm. Based on the similarity of nodes,the algorithm of similarity comparison of the XML document tree is given,which can obtain the similarity of XML documents. Based on the given specific evaluation indexes, the accuracy, recall, F -measure values and the corresponding time are obtained through experiments in two different data sets using the method proposed. The performance advanta- ges of the proposed method are verified by experiments.
出处
《计算机技术与发展》
2017年第1期186-189,194,共5页
Computer Technology and Development
基金
教育部人文社会科学研究青年基金项目(15YJC870028)
辽宁省自然科学基金(2015020009)
辽宁省哲学社会科学规划基金项目(L15BTQ002)
辽宁省社科联2015年度辽宁经济社会发展立项课题(2015lslktglx-01)
关键词
XML文档
相似性
特征提取
拟合
数据集成
XML documents
similarity
feature extracting
synthesizing
data integration