基于超级参数调整的网络表示学习算法性能公平比较框架被引量：3

A Framework for Fair Comparison of Network Representation Learning Algorithm Performance Based on Hyperparameter Tuning

下载PDF

导出

摘要网络结构数据在现今生活中广泛存在,但由于数据结构稀疏、规模较大等特性,难以直接利用现有的机器学习算法对数据进行分析.网络表示学习算法的出现,通过将高维数据映射到低维向量空间,解决了上述问题.但是网络表示学习算法中存在大量超级参数,参数的选择与数据分析任务密切相关且对算法性能有明显影响,如何针对数据分析任务,通用地对多种网络表示学习算法进行超级参数调整,以获取不同算法的最优性能,实现算法间性能的公平比较,从而选择出最优者对数据进行分析,是一个亟待解决的问题.此外,对算法进行超级参数调整通常需要花费较长时间,且由于网络结构数据规模通常较大,还会有内存占用过高问题的存在,因此如何能够在有资源限制(时间、内存占用)的条件下进行超级参数调整,是面临的另一个问题.基于上述两个问题,本文提出了基于超级参数调整的网络表示学习算法性能公平比较框架JITNREv,能够在有资源限制的条件下通用对多种网络表示学习算法进行超级参数调整,通过获取不同算法针对相同数据分析任务的性能最优值,实现算法之间的性能公平比较.该框架具有4个松耦合且可扩展的组件,组件间仅通过数据流进行交互,并在闭环结构中完成样本的测试优化,满足了框架的通用性.JITNREv基于拉丁超立方采样对超级参数进行采样;根据“当前最优值附近,有更大概率出现更优值”的假设对采样范围进行剪枝;针对超大规模数据集,提出了图粗化方式在保留数据结构的基础上压缩数据规模,满足了资源限制条件下对超级参数进行调整的要求.框架还融合了网络表示学习算法常用的评测数据集、评测指标和数据分析应用,实现了框架的易用性.实验证明JITNREv框架能够在资源限制条件下稳定提高算法性能,例如,针对GCN算法的节点分类任务相比默认参数设置,JITNREv框架能够将性能提升31%. Network data are ubiquitous in real-world applications to represent complex relationships of objects,e.g.,social networks,reference networks,and web networks,etc.However,due to the large-scale and high-dimensional sparse representation of network datasets,it is hard to directly apply off-the-shelve machine learning methods for analysis.Network representation learning(NRL)can generate succinct node representations for large-scale networks,and serve as a bridge between machine learning methods and network data.It has attracted great research interests from both academia and industry.Despite the wide adoption of NRL algorithms,the setting of their hyperparameters remains an impacting factor to the success of their applications,as hyperparameters can influence the algorithms’performance results to a great extent.How to generate a task-aware set of hyperparameters for different NRL algorithms in order to obtain their best performance,achieve their performance fair comparison,and select the most suitable NRL algorithm to analyze the network data are fundamental questions to be answered before the application of NRL algorithms.In addition,hyperparameters tuning is a time-consuming task,and the massive scale of network datasets has further complicated the problem by incurring a high memory footprint.So,how to tune NRL algorithms’hyperparameters within given resource constraints such as the time constraint or the memory limit is also a problem.Regarding the above two problems,in this work,we propose an easy-to-use framework named JITNREv,to compare NRL algorithms fairly within resource constraints based on hyperparameters tuning.The framework has four loosely coupled components and adopts a sample-test-optimize process in a closed loop.The four main components are named hyperparameter sampler,NRL algorithm manipulator,performance evaluator,and hyperparameter sampling space optimizer.All components interact with each other through data flow.We use the divide-and-diverge sampling method based on Latin Hypercube Sampling to sample a set of hyperparameters,and trim the sample space around the previous best configuration according to the assumption that“around the point with the best performance in the sample set we will be more likely to find other points with similar or better performance”.Massive scale of network data also brings great challenges to hyperparameter tuning,since the computational cost of NRL algorithms increases in proportion to the network scale.So we use the graph coarsening model to reduce data size and preserve graph structural information.Therefore,JITNREv can easily meet the resource constraints set by users.Besides,the framework also integrates representative algorithms,general evaluation datasets,commonly used evaluation metrics,and data analysis applications for easy use of the framework.Extensive experiments demonstrate that JITNREv can stably improve the performance of general NRL algorithms only by hyperparameter tuning,thus enabling the fair comparisons of NRL algorithms at their best performances.As an example,for the node classification task of the GCN algorithm,JITNREv can increase the accuracy by up to 31%compared with the default hyperparameter settings.

作者郭梦影孙振宇朱妤晴包云岗 GUO Meng-Ying;SUN Zhen-Yu;ZHU Yu-Qing;BAO Yun-Gang(Center for Advanced Computer Systems,Institute of Computing Technology,Chinese Academy of Sciences,Beijing 100190;University of Chinese Academy of Sciences,Beijing 100049;Beijing National Research Center of Information Science and Technology(Tsinghua University),Beijing 100084;National Engineering Laboratory of Big Data System Software,Beijing 100084)

机构地区中国科学院计算技术研究所先进计算机系统研究中心中国科学院大学北京信息科学与技术国家研究中心(清华大学) 大数据系统软件国家工程实验室

出处《计算机学报》 EI CAS CSCD 北大核心 2022年第5期897-917,共21页 Chinese Journal of Computers

基金国家重点研发计划(2016YFB1000201) 国家自然科学基金项目(61420106013)资助.

关键词网络表示学习网络嵌入图卷积网络自动化机器学习超级参数调整 network representation learning network embedding graph convolutional network automated machine learning hyperparameter tuning

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

同被引文献42

1李金铃,周颢,赵保华.基于隐马尔可夫模型的无线局域网媒体接入控制层入侵检测方法[J].西安交通大学学报,2009,43(12):26-30. 被引量：2
2尹安东,曹诚,徐俊波,龚来智.基于全局响应面法的电动汽车车架多目标优化[J].汽车科技,2014(5):8-11. 被引量：5
3李莉,张赛,何强,胡学斌.响应面法在试验设计与优化中的应用[J].实验室研究与探索,2015,34(8):41-45. 被引量：716
4刘磊,马爱军,董睿,刘洪英,石蒙,冯雪梅,赵亚雄.基于两级拓扑优化的振动台扩展台面设计与试验验证[J].环境技术,2016,34(3):5-8. 被引量：6
5张陶,于炯,廖彬,国冰磊,卞琛,王跃飞,刘炎.基于GraphX的传球网络构建及分析研究[J].计算机研究与发展,2016,53(12):2729-2752. 被引量：9
6仝宁可,杜环宇,李鸿光,刘营,孟光.600 kN超大推力电磁振动试验台动圈结构的模态分析与优化[J].噪声与振动控制,2019,39(3):24-28. 被引量：3
7申洪明,宋璇坤,李轶凡,邹格,刘颖,江璟,杜娜,李慧.基于确定系数的失灵保护优化技术方案[J].电力系统保护与控制,2019,47(22):98-104. 被引量：4
8杨鹤标,胡惊涛,刘芳.基于神经网络语言模型的动态层序Softmax训练算法[J].江苏大学学报（自然科学版）,2020,41(1):67-72. 被引量：4
9贾克斌,杜奕伯.基于邻域信息约束与自适应窗口的立体匹配算法[J].北京工业大学学报,2020,46(5):466-475. 被引量：4
10周牧,李垚鲆,谢良波,蒲巧林,田增山.基于多核最大均值差异迁移学习的WLAN室内入侵检测方法[J].电子与信息学报,2020,42(5):1149-1157. 被引量：5

引证文献3

1顾凡.无线局域网络入侵行为的预判算法设计与仿真[J].贵阳学院学报（自然科学版）,2023,18(3):50-55. 被引量：1
2葛鹏飞,朱江峰,杨鹏,张雷雷,郑建洲.基于响应面法的电动振动台垂直扩展台面优化[J].工程与试验,2024,64(1):38-41. 被引量：1
3廖彬,张陶,于炯,李敏.NLGAE:一种基于改进网络结构及损失函数的图自编码器节点分类模型[J].计算机科学,2024,51(10):234-246.

二级引证文献2

1王芳.基于GA-SVM算法的无线局域网络入侵信号检测方法[J].电脑与电信,2024(1):47-49.
2谢雨珂.基于有限元仿真和试验验证的锥形扩展台面结构优化设计[J].环境技术,2024,42(7):226-234.

1何儒汉,唐娇,史爱武,陈佳,李相朋,胡新荣.基于实体消岐和多粒度注意力的知识库问答[J].计算机工程与设计,2022,43(2):560-566. 被引量：2
2王瑶瑶,孙彬,石映晖,王国庆.风冷冰箱实际工况及该工况下压缩机效率提升的研究与应用[J].家电科技,2021(S01):217-219.
3陆晓芳,陆雪萍,王连军,顾士甲,江莞.CNTs/AgSbTe_(2)复合材料的制备及其热电性能[J].中国科技论文,2022,17(4):372-378. 被引量：1
4车路遥.论欧盟反倾销“成本调整”方法及其WTO合规性[J].经贸法律评论,2021(1):73-89. 被引量：3
5李惊生,詹敏峰,李俊.多点渐变馈电双极化微带辐射单元的研究[J].移动通信,2022,46(4):80-84.
6朱亚丽.职前教师和在职教师KCS的比较研究——基于问题提出的视角[J].河南教育学院学报（自然科学版）,2022,31(1):35-42.
7祝世平,谢文韬,赵丛杨,李庆海.基于特征排列和空间激活的显著物体检测方法[J].电子与信息学报,2022,44(3):1093-1101. 被引量：1
8邵一阳,刘铠诚,董树锋.并网模式下虚拟同步发电机的虚拟惯量控制策略[J].现代电力,2022,39(2):160-168. 被引量：6
9赵世达,王树才,郝广钊,张一驰,杨华建.基于单阶段目标检测算法的羊肉多分体实时分类检测[J].农业机械学报,2022,53(3):400-411. 被引量：3
10陈宋洪.极性到间性:实践转向中译学研究范式的嬗变[J].莆田学院学报,2022,29(1):69-73.

计算机学报

2022年第5期

浏览历史

内容加载中请稍等...

基于超级参数调整的网络表示学习算法性能公平比较框架被引量：3

同被引文献42

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于超级参数调整的网络表示学习算法性能公平比较框架 被引量：3

同被引文献42

引证文献3

二级引证文献2

相关作者

相关机构

相关主题

浏览历史

基于超级参数调整的网络表示学习算法性能公平比较框架被引量：3