期刊文献+

采用数据血缘的数据热度预测方法 被引量:1

Data citation popularity prediction method by data lineage
下载PDF
导出
摘要 数据之间存在相互引用关系,在进行数据开发时,通常存在一些具有高热度的数据,此类数据被其他数据大量引用,它们的缺陷往往会给整个大数据平台产出的数据结果带来极大影响。因此,对高热度数据进行预测并予以相应保护至关重要。面向基于数据热度的数据分级治理需求,提出一种采用数据血缘的数据热度预测方法。首先通过构建数据系统中的数据血缘捕获数据节点之间的引用关系;然后,提取数据血缘的时间和结构特征,并采用图卷积网络(GCN)进行数据血缘图特征的学习;最后,提出一种数据血缘传播趋势分层读出的方法读出图特征,对数据热度进行预测。在浙江中烟营销系统数据集ZJZY-SL和高能物理现象学相关论文引文数据集(HEP-PH)上的实验结果表明,相较于DeepCCP等方法,所提方法的识别准确率分别提升7.64、2.88个百分点,平均F1分别提升4.7、4.34个百分点。所提方法能充分挖掘数据在被引用早期的数据血缘特征,并预测数据节点未来的热度。 There are mutual reference relationships between data.In the process of data development,there are usually some data with high citation popularity.Such data are heavily referenced by other data,and their defects often bring great impact to the data results produced by the whole big data platform.Therefore,it is crucial to predict and protect high citation popularity data.Facing the demand for hierarchical data governance based on data citation popularity,a data citation popularity prediction method by data lineage was proposed.Firstly,the reference relationship between data nodes was captured by constructing the data lineage in the data system.Then,the temporal and structural features of the data lineage were extracted,and Graph Convolutional Network(GCN)was used to learn the features of the data lineage graph.Finally,a method was proposed to hierarchically read out the graph features of data lineage propagation trend to predict the data citation popularity.Experimental results on Zhejiang tobacco marketing system dataset called ZJZY-SL and High Energy Physics PHenomenology-related paper citation dataset(HEP-PH)show that compared with DeepCCP(an end-to-end Deep learning neural network for paper Citation Counts Prediction),the proposed method has the recognition accuracy increased by 7.64 and 2.88 percentage points respectively,and the average F1 score increased by 4.7 and 4.34 percentage points respectively.The proposed method can fully explore the data lineage features at the early stage of being referenced,and predict the future citation popularity of data nodes.
作者 金泳 高扬华 潘晓华 沈诗婧 朱心洲 JIN Yong;GAO Yanghua;PAN Xiaohua;SHEN Shijing;ZHU Xinzhou(Information Center,China Tobacco Zhejiang Industrial Company Limited,Hangzhou Zhejiang 310007,China;Binjiang Institute of Zhejiang University,Hangzhou Zhejiang 310053,China;School of Software Technology,Zhejiang University,Hangzhou Zhejiang 310013,China)
出处 《计算机应用》 CSCD 北大核心 2023年第S01期119-125,共7页 journal of Computer Applications
基金 浙江大学-浙江中烟联合实验室科技项目ZJZY2021E006(ZD-ZJZY20211001) 中国烟草总公司重点研发项目(110202102030) 浙江中烟工业有限责任公司科技项目(ZJZY2021E006)
关键词 数据血缘 图卷积网络 数据热度 传播趋势 数据治理 data lineage Graph Convolutional Network(GCN) data citation popularity propagation trend data governance
  • 相关文献

参考文献8

二级参考文献105

  • 1胡铂,吴秀丽,孙树栋.基于设计模式的报表生成组件的设计与实现[J].计算机工程与应用,2004,40(16):113-115. 被引量:8
  • 2金澈清,钱卫宁,周傲英.流数据分析与管理综述[J].软件学报,2004,15(8):1172-1181. 被引量:161
  • 3潘福成,张士杰.基于XML的智能报表生成工具的研究[J].小型微型计算机系统,2005,26(1):134-138. 被引量:15
  • 4吴雷,袁兆山,李超.B/S结构下复杂报表实现技术的研究[J].计算机应用研究,2006,23(5):83-85. 被引量:21
  • 5谭跃进,吴俊,邓宏钟.复杂网络中节点重要度评估的节点收缩方法[J].系统工程理论与实践,2006,26(11):79-83. 被引量:257
  • 6Wang Y Richard, Madnick Stuart E. A polygen model for heterogeneous database systems: The source tagging perspective//Proceedings of the 16th International Conference on Very Large Data Bases. Brisbane, Queensland, Australia, 1990:519-538.
  • 7Lanter D P. Design of a lineage-based meta-data base for GIS. Cartography and Geographic Information Systems, 1991, 18:255-261.
  • 8Woodruff A, Stonebraker M. Supporting fine-grained data lineage in a database visualization environment//Proceedings of the 13rd IEEE International Conference on Data Engineering. Birmingham, England, 1997:91-102.
  • 9Cui Y, Widom J, Wiener J L. Tracing the lineage of view data in a warehousing environment. The ACM Transactions on Database Systems, 2000, 25(2): 179-227.
  • 10Buneman P, Khanna S, Tan WC. Why and where, A characterization of data provenanee//Proceedings of the 17th International Conference on Data Engineering. London, UK 2001:316-330.

共引文献100

同被引文献13

引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部