Network information mining is the study of the network topology,which may answer a large number of applicationbased questions towards the structural evolution and the function of a real system.The question can be rela...Network information mining is the study of the network topology,which may answer a large number of applicationbased questions towards the structural evolution and the function of a real system.The question can be related to how the real system evolves or how individuals interact with each other in social networks.Although the evolution of the real system may seem to be found regularly,capturing patterns on the whole process of evolution is not trivial.Link prediction is one of the most important technologies in network information mining,which can help us understand the evolution mechanism of real-life network.Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures.Currently,widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures.However,these algorithms on highly sparse or longpath networks have poor performance.Here,we propose a new index that is associated with the principles of structural equivalence and shortest path length(SESPL)to estimate the likelihood of link existence in long-path networks.Through a test of 548 real networks,we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks.Meanwhile,we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques.The results show that the performance of SESPL can achieve a gain of 44.09%over GraphWave and 7.93%over Node2vec.Finally,according to the matrix of maximal information coefficient(MIC)between all the similarity-based predictors,SESPL is a new independent feature in the space of traditional similarity features.展开更多
The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign an...The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.展开更多
The spatial spread of COVID-19 during early 2020 in China was primarily driven by outbound travelers leaving the epicenter,Wuhan,Hubei province.Existing studies focus on the influence of aggregated out-bound populatio...The spatial spread of COVID-19 during early 2020 in China was primarily driven by outbound travelers leaving the epicenter,Wuhan,Hubei province.Existing studies focus on the influence of aggregated out-bound population flows originating from Wuhan;however,the impacts of different modes of transportation and the network structure of transportation systems on the early spread of COVID-19 in China are not well understood.Here,we assess the roles of the road,railway,and air transportation networks in driving the spatial spread of COVID-19 in China.We find that the short-range spread within Hubei province was dominated by ground traffic,notably,the railway transportation.In contrast,long-range spread to cities in other provinces was mediated by multiple factors,including a higher risk of case importation associated with air transportation and a larger outbreak size in hub cities located at the center of transportation networks.We further show that,although the dissemination of SARS-CoV-2 across countries and continents is determined by the worldwide air transportation network,the early geographic dispersal of COVID-19 within China is better predicted by the railway traffic.Given the recent emergence of multiple more transmissible variants of SARS-CoV-2,our findings can support a better assessment of the spread risk of those variants and improve future pandemic preparedness and responses.展开更多
基金supported by the National Natural Science Foundation of China(Grant Nos.61773091 and 62173065)the Industry-University-Research Innovation Fund for Chinese Universities(Grant No.2021ALA03016)+2 种基金the Fund for University Innovation Research Group of Chongqing(Grant No.CXQT21005)the National Social Science Foundation of China(Grant No.20CTQ029)the Fundamental Research Funds for the Central Universities(Grant No.SWU119062).
文摘Network information mining is the study of the network topology,which may answer a large number of applicationbased questions towards the structural evolution and the function of a real system.The question can be related to how the real system evolves or how individuals interact with each other in social networks.Although the evolution of the real system may seem to be found regularly,capturing patterns on the whole process of evolution is not trivial.Link prediction is one of the most important technologies in network information mining,which can help us understand the evolution mechanism of real-life network.Link prediction aims to uncover missing links or quantify the likelihood of the emergence of nonexistent links from known network structures.Currently,widely existing methods of link prediction almost focus on short-path networks that usually have a myriad of close triangular structures.However,these algorithms on highly sparse or longpath networks have poor performance.Here,we propose a new index that is associated with the principles of structural equivalence and shortest path length(SESPL)to estimate the likelihood of link existence in long-path networks.Through a test of 548 real networks,we find that SESPL is more effective and efficient than other similarity-based predictors in long-path networks.Meanwhile,we also exploit the performance of SESPL predictor and of embedding-based approaches via machine learning techniques.The results show that the performance of SESPL can achieve a gain of 44.09%over GraphWave and 7.93%over Node2vec.Finally,according to the matrix of maximal information coefficient(MIC)between all the similarity-based predictors,SESPL is a new independent feature in the space of traditional similarity features.
基金Project supported by the National Natural Science Foundation of China(Grant Nos.61773091 and 61603073)the LiaoNing Revitalization Talents Program(Grant No.XLYC1807106)the Natural Science Foundation of Liaoning Province,China(Grant No.2020-MZLH-22).
文摘The establishment of effective null models can provide reference networks to accurately describe statistical properties of real-life signed networks.At present,two classical null models of signed networks(i.e.,sign and full-edge randomized models)shuffle both positive and negative topologies at the same time,so it is difficult to distinguish the effect on network topology of positive edges,negative edges,and the correlation between them.In this study,we construct three re-fined edge-randomized null models by only randomizing link relationships without changing positive and negative degree distributions.The results of nontrivial statistical indicators of signed networks,such as average degree connectivity and clustering coefficient,show that the position of positive edges has a stronger effect on positive-edge topology,while the signs of negative edges have a greater influence on negative-edge topology.For some specific statistics(e.g.,embeddedness),the results indicate that the proposed null models can more accurately describe real-life networks compared with the two existing ones,which can be selected to facilitate a better understanding of complex structures,functions,and dynamical behaviors on signed networks.
基金supported by the National Natural Science Foundation of China[61773091 and 62173065 to X.-K.X.,11975025 to L.W.,11875005 to Y.W.,72025405 and 82041020 to X.L.,71974029 to X.W.]the Grand Challenges ICODA pilot initiative,delivered by Health Data Research UK and funded by the Bill&Melinda Gates Foundation and the Minderoo Foundation[to X.F.L.]+1 种基金US CDC Grant 20U01CK000592[to S.P.]US CDC and CSTE Grant NU38OT00297[to S.P.].
文摘The spatial spread of COVID-19 during early 2020 in China was primarily driven by outbound travelers leaving the epicenter,Wuhan,Hubei province.Existing studies focus on the influence of aggregated out-bound population flows originating from Wuhan;however,the impacts of different modes of transportation and the network structure of transportation systems on the early spread of COVID-19 in China are not well understood.Here,we assess the roles of the road,railway,and air transportation networks in driving the spatial spread of COVID-19 in China.We find that the short-range spread within Hubei province was dominated by ground traffic,notably,the railway transportation.In contrast,long-range spread to cities in other provinces was mediated by multiple factors,including a higher risk of case importation associated with air transportation and a larger outbreak size in hub cities located at the center of transportation networks.We further show that,although the dissemination of SARS-CoV-2 across countries and continents is determined by the worldwide air transportation network,the early geographic dispersal of COVID-19 within China is better predicted by the railway traffic.Given the recent emergence of multiple more transmissible variants of SARS-CoV-2,our findings can support a better assessment of the spread risk of those variants and improve future pandemic preparedness and responses.