期刊文献+

一种基于MapReduce的知识聚类与统计机制 被引量:1

Knowledge Clustering and Statistics Based on Map Reduce
下载PDF
导出
摘要 网络文献知识库中的海量资源及其分类的粗粒度,导致学习者容易在文献检索和阅读过程出现认知迷航和知识过载问题。该文提出一种基于Map Reduce的知识聚类与统计机制:首先,提出基于Map Reduce的共现矩阵构建算法MR-Co Matrix;其次,将共现矩阵与相似度系数结合构建相似度矩阵;然后,通过Z Scores对相似度矩阵进行标准化;最后,使用离差平方和法(Ward,s method)对相似度矩阵进行聚类,生成树状的知识聚类谱系图;基于聚类结果,提出基于Map Reduce的知识文献统计算法MR-Statistics,对每个分类的知识属性进行统计。实验结果表明:将MR-Co Matrix和MR-Statistics方法应用于网络文献知识库进行知识聚类和统计,达到较理想的聚类精度和计算效率,实现了细粒度知识聚类和多维统计,同时减少了时间开销。 The large scale and the coarse classification granularity of resources in literature knowledge bases lead to disorientation and overloading when learners retrieve and read papers. This paper proposes a mechanism of knowledge clustering and knowledge statistics based on Map Reduce. Firstly, this paper presents a Co-occurrence Matrix building algorithm based on Map Reduce(MR-Co Matrix). Secondly, it makes combination of the co-occurrence matrix and similarity coefficient to build the similarity matrix. Thirdly, the similarity matrix is standardized with Z scores. Finally, knowledge clusters are constructed with the Ward,s method. After knowledge clustering, this paper introduces a knowledge Statistics algorithm based on Map Reduce(MR-Statistics) to dig the hidden information in each cluster. The experimental results show that the literature knowledge base with MRCo Matrix and MR-Statistics can realize the accurate and fine clustering, multi-dimension statistics, computational efficiency, and less cost of time.
出处 《电子与信息学报》 EI CSCD 北大核心 2016年第1期202-208,共7页 Journal of Electronics & Information Technology
基金 国家自然科学基金(61202004 61472192) 教育部科技发展中心网络时代的科技论文快速共享专项研究(2013116) 江苏省高校自然科学研究计划(14KJB520014)~~
关键词 数据挖掘 聚类 知识 共现矩阵 统计 MAP REDUCE Data mining Cluster Knowledge Co-occurrence matrix Statistics Map Reduce
  • 相关文献

参考文献25

  • 1SERET A, VERBRAKEN T, and BAESENS B. A new knowledge-based constrained clustering approach: theory and application in direct marking[J]. Applied Soft Computing, 2014, 24(3): 316-327.
  • 2朱林,雷景生,毕忠勤,杨杰.一种基于数据流的软子空间聚类算法[J].软件学报,2013,24(11):2610-2627. 被引量:31
  • 3ZHU Lin, CHUNG Fulai, and WANG Shitong. Generalized fuzzy C-means clustering algorithm with improved fuzzy partitions[J]. IEEE Transactions on Systems, Man, and Cybernetics, 2009, 39(3): 578-591.
  • 4张敏,于剑.基于划分的模糊聚类算法[J].软件学报,2004,15(6):858-868. 被引量:176
  • 5徐森,周天,于化龙,李先锋.一种基于矩阵低秩近似的聚类集成算法[J].电子学报,2013,41(6):1219-1224. 被引量:6
  • 6徐森,卢志茂,顾国昌.使用谱聚类算法解决文本聚类集成问题[J].通信学报,2010,31(6):58-66. 被引量:15
  • 7ZHU Wenxing, CHEN Jianli, and LI Weiguo. An augmented Lagrangian method for VLSI global placement[J]. The Journal of Supercomputing, 2014, 69(2): 714-738.
  • 8ZHOU F, TORRE F D L, and HODGINS J K. Hierarchical aligned cluster analysis for temporal clustering of human motion[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2013, 35(3): 582-596.
  • 9MASHSHI S, NIU G, MAKOTO Y, et al. Information- maximization clustering based on squared-loss mutual information[J]. Neural Computation, 2014. 26(1): 84-131.
  • 10YU Feili, CAO Liangliang, FERIS R S, et al. Designing Category-level attributes for discriminative visual recognition [C]. Preceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Portland, 2013: 771-776.

二级参考文献200

共引文献487

同被引文献3

引证文献1

二级引证文献4

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部