基于KST索引的最大连通Steiner分量查询算法被引量：1

The KST Index Based Querying Algorithm for Steiner Maximum-Connected Components

下载PDF

导出

摘要查找图的连通分量在生物信息学领域有着重要应用价值,其中的关键问题之一是查询最大连通Steiner分量(SMCC).针对已有最大连通Steiner分量查询方法中存在的查询效率低的问题,本文首先提出利用k-edge连通分量与(k+1)-edge连通分量之间的包含关系建立顶点集合的分层索引KST.和现有的专用索引相比,KST索引规模得到了缩减;然后本文提出了基于KST索引的SMCC查询算法以及具有顶点数量限制的SMCC L查询算法.和已有方法中索引的是图中顶点不同,KST索引中维护的是顶点集合的包含关系.其优点在于将已有方法在遍历过程中的一次一顶点的查询方式转换为更高效的一次一集合的查询方式,显著减少了需要访问的索引点数量,极大提升了查询处理的效率;最后,基于15个真实数据集进行实验测试,从不同角度验证了本文所提方法的高效性. Given a graph G,and a set of query nodes q,the Steiner Maximum-Connected Component(SMCC)is a connected subgraph of G with the maximum connectivity and the maximum number nodes which contains q.And SMCC L is the SMCC of G with the constraint of the number of nodes L.Finding SMCC or SMCC L is one of the fundamental operations in graph data processing,and is one of the hot issues,which has attracted much attention in the research field and can be applied in many applications,including bio-informatics,etc.Existing methods for SMCC and SMCC L query processing are mainly classified into the following two categories:(1)Finding SMCC or SMCC L based on existing k-edge connected component algorithms which decrease the value of k from|V|(the number of nodes in graph G)to 1 in turn,calculate all k-edge connected components in G,then the first k-edge connected component containing query q is SMCC of q,and the first k-edge connected component containing q whose number of nodes is greater than or equal to L is SMCC L of q.The shortcomings of such methods are that the search process needs to traverse graph G many times and the computation cost is high when the size of graph is large.(2)Finding SMCC or SMCC L based on special index which constructs the MST(Maximum Spanning Tree)index of G offline.When processing the query,it firstly calculates the connectivity of q,traverses the MST with any node in q as the start node,and only accesses the nodes corresponding to the edges satisfying specific conditions until all nodes in q are covered,then the visited nodes are the nodes in the SMCC of q.The shortcomings of such methods are that querying SMCC needs expensive traversal operations,each step of traversal can obtain at most one useful node(namely one-node-a-step).When the number of nodes in SMCC is large,the number of nodes that need to be accessed in the index tree will increase.Moreover,when query requests are executed frequently,the system load will increase rapidly.Considering that existing approaches on querying SMCC and SMCC L suffer from inefficiency,we first propose a KST index,which maintains the relationship between k-edge connected subgraph and(k+1)-edge connected subgraph.Compared with existing methods,the index size is largely reduced.Based on the KST index,we propose a new SMCC finding algorithm(namely SMCC-KST),and propose another algorithm to find SMCC with size constraint(namely SMCC L-KST).Compared with existing index which maintains the relationship of nodes in the original graph,in our KST index,each index node represents a nodes set.The benefit is that in this way,we can improve existing approaches from one-node-a-step when traversing on the index to one-set-a-step,such that significantly reduces the number of visited index nodes,while at the same time improves the query efficiency.We conduct extensive experimental study on 15 real datasets.The experimental results show that our approaches perform much better than existing ones when querying SMCC and SMCC L.

作者陈子阳陈伟贾勇周军锋 CHEN Zi-Yang;CHEN Wei;JIA Yong;ZHOU Jun-Feng(School of Information and Management,Shanghai Lixin University of Accounting and Finance,Shanghai 201620;School of Information Science and Engineering,Yanshan University,Qinhuangdao,Hebei 066004;Department of Information Engineering,Hebei University of Environmental Engineering,Qinhuangdao,Hebei 066102;School of Computer Science and Technology,Donghua University,Shanghai 201620)

机构地区上海立信会计金融学院信息管理学院燕山大学信息科学与工程学院河北环境工程学院信息工程系东华大学计算机科学与技术学院

出处《计算机学报》 EI CSCD 北大核心 2020年第7期1215-1229,共15页 Chinese Journal of Computers

基金国家自然科学基金(61472339,61572421,61873337)资助.

关键词无向图 k-edge连通分量最大连通Steiner分量索引最大生成树 undirected graph k-edge connected components Steiner maximum-connected component index maximum spanning tree

分类号 TP18 [自动化与计算机技术—控制理论与控制工程]

引文网络
相关文献

参考文献6

1路东方,许俊富,项超娟,谢江.生物大数据中的聚类方法分析[J].上海大学学报（自然科学版）,2016,22(1):45-57. 被引量：5
2张宇,刘燕兵,熊刚,贾焰,刘萍,郭莉.图数据表示与压缩技术综述[J].软件学报,2014,25(9):1937-1952. 被引量：13
3丁悦,张阳,李战怀,王勇.图数据挖掘技术的研究与进展[J].计算机应用,2012,32(1):182-190. 被引量：14
4刘昊,廖波,彭利红.基于蛋白质相互作用网络的聚类算法研究[J].计算机工程与应用,2008,44(30):142-144. 被引量：3
5周红芳,周扬,张晓鹏,谈姝辰.基于连通分量的分类变量聚类算法[J].控制与决策,2015,30(1):39-45. 被引量：4
6周宏,郑浩然,李毅,李恒.基于强连通分量的^13C MFA计算模型稳定性判断[J].北京生物医学工程,2009,28(1):34-38. 被引量：1

二级参考文献190

1杨世坚,贺国光.基于模糊C均值聚类和神经网络的短时交通流预测方法[J].系统工程,2004,22(8):83-86. 被引量：19
2卢宏超,石秋艳,石宝晨,张治华,赵屹,唐素勤,熊磊,王强,陈润生.基于蛋白质网络功能模块的蛋白质功能预测[J].生物化学与生物物理进展,2006,33(5):446-451. 被引量：6
3蒋雄飞,杨洁,王炜.Alzheimer’s疾病相关蛋白质相互作用网络构建及其相互作用预测[J].南京大学学报（自然科学版）,2006,42(5):479-489. 被引量：3
4金阳,左万利.一种基于动态近邻选择模型的聚类算法[J].计算机学报,2007,30(5):756-762. 被引量：18
5King A D,Przulj N,Jurisica I.Protein complex prediction via costbased clustering[J].Bioinformatics, 2004,20( 17 ) : 3013-3020.
6Spirin V,Mirny L A.Protein complexes and functional modules in molecular networks[J].PNAS, 2003,100(21 ) : 12123-12128.
7Zotenko E,Guimaraes K S,Przytycka J R.A graph theoretical method for analyzing static and dynamic protein associations[J].Algorithms for Molecular Biology, 2006,1 (7).
8van Dongen S.Graph clustering by flow simulation [D].University of Urtecht, 2000 : d6-56.
9Tetko I V,Facius A,Ruepp A.Mewes HW:super paramagnetic clustering of protein sequences[J].BMC Bioinformatics,2005,6(82).
10Dunn R,Dudbridge F, Sanderson C M.The use of edge-betweenness clustering to investigate biological function in protein interaction networks[J].BMC Bioinformatics,2005,6( 1 ) : 39.

共引文献32

1张安勤,彭柏程,张挺.以案例驱动的《图数据挖掘》课程教学改革和实践[J].中国电力教育,2021(S01):260-262. 被引量：1
2吴爽,雷秀娟,郭玲.PPI网络的改进谱聚类算法[J].计算机应用研究,2012,29(7):2442-2446.
3黄鑫.基于确定图的频繁子图挖掘技术概述[J].计算机光盘软件与应用,2012,15(17):63-64.
4张超慧,关喆,匡宝平,黄和.互联网+教育下医学实用型人才培养指标体系的构建[J].现代预防医学,2018,45(24):4526-4530. 被引量：6
5贺玮.软件工程数据挖掘技术的研究进展[J].科技风,2014(17):141-141. 被引量：10
6李桃陶,周斌,王忠振.基于社交网络的图数据挖掘应用研究[J].计算机技术与发展,2014,24(10):6-11. 被引量：7
7尹婷婷,刘俊焱,周溜溜,业宁,尹佟明.基于动态抽样的图分类算法[J].南京师大学报（自然科学版）,2015,38(1):113-118. 被引量：2
8张素智,张琳,曲旭凯.图数据挖掘技术的现状与挑战[J].现代计算机（中旬刊）,2015(9):52-57. 被引量：1
9孙金岭,庞娟.基于残差修正的灰色神经网络在数据挖掘中的应用[J].吉林大学学报（理学版）,2015,53(6):1263-1268. 被引量：2
10路东方,许俊富,项超娟,谢江.生物大数据中的聚类方法分析[J].上海大学学报（自然科学版）,2016,22(1):45-57. 被引量：5

同被引文献1

1石川,王睿嘉,王啸.异质信息网络分析与应用综述[J].软件学报,2022,33(2):598-621. 被引量：25

引证文献1

1李源,范晓林,孙晶,赵会群,杨森,王国仁.异质信息网络中最大路径连通Steiner分量查询算法[J].软件学报,2023,34(2):655-675.

1Alexander A.Razborov,李乔(译),陆柱家(校).什么是旗代数?[J].数学译林,2019,0(3):275-278.
2刘晓慧,江峰,杜军威,余东瑾.面向用户偏好分析的无向图层次聚类并行优化算法[J].计算机与数字工程,2020,48(5):1137-1142.
3本刊英文版Vol.36(2020),No.1论文摘要[J].数学学报（中文版）,2020,63(2).
4王春雪,李家华.亲子关系对青春期少年心理健康的影响及其对策[J].新一代（理论版）,2020(17):38-40.
5陈婷婷.基于微信服务的高校图书馆读者服务策略[J].办公室业务,2020(11):173-174.
6黄颖,张豹,陈伟荣,戴鹏.基于分层索引的高维数据对象检索[J].指挥信息系统与技术,2019,10(6):81-85.
7高祎晴,潘晓,吴雷.一种基于语义轨迹的相似性连接查询算法[J].计算机应用与软件,2020,37(7):14-21. 被引量：2
8深度思维④[J].围棋天地,2020(12):90-90.
9胡佳贝,周蓬勃,耿国华,陈小雪,杨稳,王飘.基于生成树代价和和几何约束的文物碎片自动重组方法[J].自动化学报,2020,46(5):946-956. 被引量：1
10钟建伟,张钦惠,王智方,董伟广,周璨.基于AT89C52单片机电梯控制仿真系统的设计[J].电工材料,2020(3):50-54. 被引量：5

计算机学报

2020年第7期

浏览历史

内容加载中请稍等...

基于KST索引的最大连通Steiner分量查询算法被引量：1

参考文献6

二级参考文献190

共引文献32

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于KST索引的最大连通Steiner分量查询算法 被引量：1

参考文献6

二级参考文献190

共引文献32

同被引文献1

引证文献1

相关作者

相关机构

相关主题

浏览历史

基于KST索引的最大连通Steiner分量查询算法被引量：1