期刊文献+

基于近邻传播聚类和TANE算法的高校数据中函数依赖的发现 被引量:3

Discovery of functional dependencies in university data based on affinity propagation clustering and TANE algorithms
下载PDF
导出
摘要 针对高校实际数据质量检测过程中数据集存在缺失值以及发现的函数依赖个数较少且不准确的问题,提出了一种结合近邻传播(AP)聚类算法和TANE算法的高校函数依赖发现方法(APTANE)。首先,对数据集中的中文字段进行列剖析,将中文字段值用对应的数值来表示;其次,使用AP聚类算法对数据集中的缺失值进行填补;最后,使用TANE算法从处理好的数据集中自动发现出满足非平凡、最小要求的函数依赖。实验结果表明,在使用AP聚类算法对真实的高校数据集进行修复之后,相比于直接使用函数依赖自动发现算法,发现的函数依赖个数增加到了80个,经过缺失值填补后所发现的函数依赖在表示字段间关联关系时也更加准确,减少了领域专家的工作量,提升了高校数据所拥有数据的质量。 In view of the missing values of datasets and the number of found functional dependencies is small and inaccurate in actual data quality detection process of universities, a university functional dependency discovery method combining Affinity Propagation(AP) clustering and TANE algorithm(APTANE) was proposed. Firstly, the Chinese field in the dataset was parsed row by row, and the Chinese field values were represented by the corresponding numerical values. Then, the AP clustering algorithm was used to fill the missing values in the dataset. Finally, the TANE algorithm was used to automatically find out the functional dependencies satisfying non-trivial and minimum requirements from the processed dataset. The experimental results show that after using AP clustering algorithm to repair real university dataset, compared with the direct use of functional dependency automatic discovery algorithm, the number of functional dependencies found increases to 80. The functional dependencies found after the filling of missing values represent the relationship between fields more accurately, reducing the workload of domain experts and improving the quality of data held by universities.
作者 黄永鑫 唐雪飞 HUANG Yongxin;TANG Xuefei(School of Information and Software Engineering,University of Electronic Science and Technology of China,Chengdu Sichuan 610054,China)
出处 《计算机应用》 CSCD 北大核心 2020年第1期90-95,共6页 journal of Computer Applications
基金 国家重点研发计划项目(2017YFB1401303) 四川省科技计划项目(2017GZ0192)~~
关键词 高校信息化 数据质量 近邻传播聚类算法 函数依赖 TANE university informationization data quality Affinity Propagation (AP) clustering algorithm functional dependency TANE
  • 相关文献

参考文献7

二级参考文献54

  • 1Frey B J and Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976.
  • 2Givoni I E and Frey B J. A binary variable model for affinity propagation. Neural Computation, 2009, 21(6): 1589-1600.
  • 3Jia Sen, Qian Yun-tao, and Ji Zhen, Band hyperspectral imagery using affinity. Proceedings of the 2008 Digital Image Techniques and Applications, Canberra, ACT selection for Propagation. Computing: 1-3.12.2008:137-141.
  • 4Gang Li, Lei brain MR International (ISCAS 2009) Guo, and Liu Tian-ming, et at. Grouping of images via affinity propagation. IEEE Symposium on Circuits and Systems, 2009 Taipei, Taiwan, 5.24. 2009: 2425-2428.
  • 5Dueck D, Frey B J, and Jojic N, et al. Constructing treatment portfolios using affinity propagation[C]. Proceedings of 12th Annual International Conference, RECOMB 2008. Singapore. 3.30-4.2, 2008: 360-371.
  • 6Leone M, Sumedha, and Weigt M. Clustering by soft-constraint affinity propagation: applications to gene- expression data. Bioinformatics, 2007, 23(20): 2708-2715.
  • 7Alexander Hinneburg and Daniel A Keim. A general approach to clustering in large databases with noise. Knowledge and Information Systems, 2003, 5(4): 387-415.
  • 8Little M A, McSharry P E, Hunter E J, and Lorraine O. Suitability of dysphonia measurements for telemonitoring of Parkinson's disease. IEEE Transactions on Biomedical Engineering, 2009, 56(4): 1015-1022.
  • 9Frey B J, Dueck D. Clustering by passing messages between data points. Science, 2007, 315(5814): 972-976
  • 10Kelly K. Affinity program slashes computing times [Online], available: http://www.news.utoronto.ca/bin6/070215-2952. asp. October 25, 2007

共引文献382

同被引文献46

引证文献3

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部