Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for rese...Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.展开更多
Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments ...Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.展开更多
Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes...Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.展开更多
Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effectiv...Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>展开更多
In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore,...In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.展开更多
In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.Th...In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information,spatial location,and correlation distribution using Bayes’rule.This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them,thus significantly reducing the computational cost.Furthermore,this enables reconstruction of the original data more accurately than existing methods.We demonstrate the effectiveness of our technique using six datasets,with the largest having one billion grid points.The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.展开更多
Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the ...Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the cost of data storage and improve the reliability and efficiency of Big Data management.Its weaknesses lie in inadequate and non-standardized management.Archiving in archival science focuses on the management aspects and neglects the necessary technical considerations,resulting in high storage and retention costs and poor ability to manage Big Data.Therefore,the integration of large-scale data archiving and archival theory can balance the existing research limitations of the two fields and propose two research topics for related research-archival management of Big Data and large-scale management of archived Big Data.展开更多
Household electricity demand has substantial impacts on local grid operation,energy storage and the energy per-formance of buildings.Hourly demand data at district or urban level helps stakeholders understand the dema...Household electricity demand has substantial impacts on local grid operation,energy storage and the energy per-formance of buildings.Hourly demand data at district or urban level helps stakeholders understand the demand patterns from a granular time scale and provides robust evidence in energy management.However,such type of data is often expensive and time-consuming to collect,process and integrate.Decisions built upon smart meter data have to deal with challenges of privacy and security in the whole process.Incomplete data due to confiden-tiality concerns or system failure can further increase the difficulty of modeling and optimization.In addition,methods using historical data to make predictions can largely vary depending on data quality,local building envi-ronment,and dynamic factors.Considering these challenges,this paper proposes a statistical method to generate hourly electricity demand data for large-scale single-family buildings by decomposing time series data and recom-bining them into synthetics.The proposed method used public data to capture seasonality and the distribution of residuals that fulfill statistical characteristics.A reference building was used to provide empirical parameter settings and validations for the studied buildings.An illustrative case in a city of Sweden using only annual total demand was presented for deploying the proposed method.The results showed that the proposed method can mimic reality well and represent a high level of similarity to the real data.The average monthly error for the best month reached 15.9%and the best one was below 10%among 11 tested months.Less than 0.6%improper synthetic values were found in the studied region.展开更多
Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal alg...Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal algorithms cannot be performed. In order to make GPS/MET data assimilation able to satisfy the demand of numerical weather prediction, finding an algorithm with a great convergence rate of iteration will be the most important thing. A new method is presented that dynamically combines the limited memory BFGS (L-BFGS) method with the Hessian-free Newton(HFN) method, and it has a good rate of convergence in iteration. The numerical tests indicate that the computational efficiency of the method is better than the L-BFGS and HFN methods.展开更多
We analyze the galaxy pairs in a set of volume limited samples from the Sloan Digital Sky Survey to study the effects of minor interactions on the star formation rate(SFR)and color of galaxies.We carefully design cont...We analyze the galaxy pairs in a set of volume limited samples from the Sloan Digital Sky Survey to study the effects of minor interactions on the star formation rate(SFR)and color of galaxies.We carefully design control samples of isolated galaxies by matching the stellar mass and redshift of the minor pairs.The SFR distributions and color distributions in the minor pairs differ from their controls at>99%significance level.We also simultaneously match the control galaxies in stellar mass,redshift and local density to assess the role of the environment.The null hypothesis can be rejected at>99%confidence level even after matching the environment.Our analysis shows a quenching in the minor pairs where the degree of quenching decreases with the increasing pair separation and plateaus beyond 50 kpc.We also prepare a sample of minor pairs with Hαline information.We calculate the SFR of these galaxies using the Hαline and repeat our analysis.We observe a quenching in the Hαsample too.We find that the majority of the minor pairs are quiescent systems that could be quenched due to minor interactions.Combining data from the Galaxy Zoo and Galaxy Zoo 2,we find that only∼1%galaxies have a dominant bulge,4%–7%galaxies host a bar and 5%–10%of galaxies show active galactic nucleus(AGN)activity in minor pairs.This indicates that the presence of bulge,bar or AGN activity plays an insignificant role in quenching the galaxies in minor pairs.The more massive companion satisfies the criteria for mass quenching in most of the minor pairs.We propose that the stripping and starvation likely caused the quenching in the less massive companion at a later stage of evolution.展开更多
Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challengin...Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem.This study focuses on the construction of an effective solution designed for spatiotemporal data to predict the traffic state of large-scale traffic systems.In this study,we first summarize the three challenges faced by large-scale traffic state prediction,i.e.,scale,granularity,and sparsity.Based on the domain knowledge of traffic engineering,the propagation of traffic states along the road network is theoretically analyzed,which are elaborated in aspects of the temporal and spatial propagation of traffic state,traffic state experience replay,and multi-source data fusion.A deep learning architecture,termed as Deep Traffic State Prediction(DeepTSP),is therefore proposed to address the current challenges in traffic state prediction.Experiments demonstrate that the proposed DeepTSP model can effectively predict large-scale traffic states.展开更多
We study the color and star formation rates of paired galaxies in filaments and sheets using the EAGLE simulations.We find that the major pairs with pair separation<50 kpc are bluer and more star-forming in filamen...We study the color and star formation rates of paired galaxies in filaments and sheets using the EAGLE simulations.We find that the major pairs with pair separation<50 kpc are bluer and more star-forming in filamentary environments compared to those hosted in sheet-like environments.This trend reverses beyond a pair separation of~50 kpc.The interacting pairs with larger separations(>50 kpc)in filaments are on average redder and low-star-forming compared to those embedded in sheets.The galaxies in filaments and sheets may have different stellar mass and cold gas mass distributions.Using a KS test,we find that for paired galaxies with pair separation<50 kpc,there are no significant differences in these properties in sheets and filaments.The filaments transport gas toward the cluster of galaxies.Some earlier studies find preferential alignment of galaxy pairs with the filament axis.Such alignment of galaxy pairs may lead to different gas accretion efficiency in galaxies residing in filaments and sheets.We propose that the enhancement of star formation rate at smaller pair separation in filaments is caused by the alignment of galaxy pairs.A recent study with SDSS data reports the same findings.The confirmation of these results by the EAGLE simulations suggests that the hydrodynamical simulations are powerful theoretical tools for studying galaxy formation and evolution in the cosmic web.展开更多
Computational psychiatry is an emerging field that not only explores the biological basis of mental illness but also considers the diagnoses and identifies the underlying mechanisms.One of the key strengths of computa...Computational psychiatry is an emerging field that not only explores the biological basis of mental illness but also considers the diagnoses and identifies the underlying mechanisms.One of the key strengths of computational psychiatry is that it may identify patterns in large datasets that are not easily identifiable.This may help researchers develop more effective treatments and interventions for mental health problems.This paper is a narrative review that reviews the literature and produces an artificial intelligence ecosystem for computational psychiatry.The artificial intelligence ecosystem for computational psychiatry includes data acquisition,preparation,modeling,application,and evaluation.This approach allows researchers to integrate data from a variety of sources,such as brain imaging,genetics,and behavioral experiments,to obtain a more complete understanding of mental health conditions.Through the process of data preprocessing,training,and testing,the data that are required for model building can be prepared.By using machine learning,neural networks,artificial intelligence,and other methods,researchers have been able to develop diagnostic tools that can accurately identify mental health conditions based on a patient’s symptoms and other factors.Despite the continuous development and breakthrough of computational psychiatry,it has not yet influenced routine clinical practice and still faces many challenges,such as data availability and quality,biological risks,equity,and data protection.As we move progress in this field,it is vital to ensure that computational psychiatry remains accessible and inclusive so that all researchers may contribute to this significant and exciting field.展开更多
Massive neutrinos are expected to affect the large-scale structure formation,including the major component of solid substances,dark matter halos.How halos are influenced by neutrinos is vital and interesting,and angul...Massive neutrinos are expected to affect the large-scale structure formation,including the major component of solid substances,dark matter halos.How halos are influenced by neutrinos is vital and interesting,and angular momentum(AM)as a significant feature provides a statistical perspective for this issue.Exploring halos from TianNu N-body cosmological simulation with the co-evolving neutrino particles,we obtain some concrete conclusions.First,by comparing the same halos with and without neutrinos,in contrast to the neutrino-free case,over 89.71%of halos have smaller halo moduli,over 71.06%have smaller particle-mass-reduced(PMR)AM moduli,and over 95.44%change their orientations of less than 0°.65.Moreover,the relative variation of PMR modulus is more visible for low-mass halos.Second,to explore the PMR moduli of halos in dense or sparse areas,we divide the whole box into big cubes,and search for halos within a small spherical cell in a single cube.From the two-level divisions,we discover that in denser cubes,the variation of PMR moduli with massive neutrinos decreases more significantly.This distinction suggests that neutrinos exert heavier influence on halos'moduli in compact regions.With massive neutrinos,most halos(86.60%)have lower masses than without neutrinos.展开更多
This paper offers preliminary work on system dynamics and Data mining tools. It tries to understand the dynamics of carrying out large-scale events, such as Hajj. The study looks at a large, recurring problem as a var...This paper offers preliminary work on system dynamics and Data mining tools. It tries to understand the dynamics of carrying out large-scale events, such as Hajj. The study looks at a large, recurring problem as a variable to consider, such as how the flow of people changes over time as well as how location interacts with placement. The predicted data is analyzed using Vensim PLE 32 modeling software, GIS Arc Map 10.2.1, and AnyLogic 7.3.1 software regarding the potential placement of temporal service points, taking into consideration the three dynamic constraints and behavioral aspects: a large population, limitation in time, and space. This research proposes appropriate data analyses to ensure the optimal positioning of the service points with limited time and space for large-scale events. The conceptual framework would be the output of this study. Knowledge may be added to the insights based on the technique.展开更多
The investigation of the interplay between genes,proteins,metabolites and diseases plays a central role in molecular and cellular biology.Whole genome sequencing has made it possible to examine the behavior of all the...The investigation of the interplay between genes,proteins,metabolites and diseases plays a central role in molecular and cellular biology.Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale,which form the backbone of systems biology.In particular,Bayesian network(BN)is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data.However,scalability is a crucial issue when we try to apply BNs to infer such interactions.In this paper,we not only introduce the Bayesian network formalism and its applications in systems biology,but also review recent technical developments for scaling up or speeding up the structural learning of BNs,which is important for the discovery of causal knowledge from large-scale biological datasets.Specifically,we highlight the basic idea,relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.展开更多
The tidal current duration (TCD) and velocity (TCV) and suspended sediment concentration (SSC) were measured in the dry season in December, 2011 and in the flood season in June, 2012 at the upper part of the Nor...The tidal current duration (TCD) and velocity (TCV) and suspended sediment concentration (SSC) were measured in the dry season in December, 2011 and in the flood season in June, 2012 at the upper part of the North Channel of Changjiang Estuary. They were assimilated with the measured data in 2003, 2004, 2006 and 2007, using the tidal range's proportion conversion. Variations in TCD and TCV, preferential flow and SSC have been calculated. Influences of typical engineering projects such as Qingcaosha fresh water reservoir, Yangtze River Bridge, and land reclamation on the ebb and flood TCD, TCV and SSC in the North Channel for the last 10 years are discussed. The results show that: (1) currently, in the upper part of North Channel, the ebb tide dominates; after the construction of the typical projects, ebb TCD and TCV tends to be larger and the vertical average ebb and flood SSC decrease during the flood season while SSC increases during the dry season; (2) changes in the vertical average TCV are mainly contributed by seasonal runoff variation during the flood season, which is larger in the flood season than that in the dry season; the controlling parameters of increasing ebb TCD and TCV are those large-scale engineering projects in the North Channel; variation in SSC may result mainly from the reduction of basin annual sediment loads, large-scale nearshore projects and so on.展开更多
The power spectrum of the two-degree Field Galaxy Redshift Survey (2dFGRS) sample is estimated with the discrete wavelet transform (DWT) method. The DWT power spectra within 0.035 〈 k 〈 2.2 h Mpc^-1 are measured...The power spectrum of the two-degree Field Galaxy Redshift Survey (2dFGRS) sample is estimated with the discrete wavelet transform (DWT) method. The DWT power spectra within 0.035 〈 k 〈 2.2 h Mpc^-1 are measured for three volume-limited samples defined in consecutive absolute magnitude bins - 19 - - 18, - 20 - - 19 and - 21 - - 20. We show that the DWT power spectrum can effectively distinguish ACDM models of σ8 = 0.84 and σ8 = 0.74. We adopt maximum likelihood method to perform three-parameter fitting of the bias parameter b, pairwise velocity dispersion σpv and redshift distortion parameterβ = Ωm^0.6/b to the measured DWT power spectrum. The fitting results state that in a σ8 = 0.84 universe the best-fit values of Ωm given by the three samples are mutually consistent within the range 0.28 - 0.36, and the best fitted values of Opv are 398-27^+35, 475-29^37 and 550 ± 20 km s^-1 for the three samples, respectively. In the model of σ8 = 0.74, our three samples give very different values of Ωm. We repeated the fitting using the empirical formula of redshift distortion. The result of the model of low σ8 is still poor, especially, one of the best-fit values of σpv is as large as 10^3 km s^-1. We also repeated our fitting by incorporating a scale-dependent galaxy bias. This gave a slightly lower value of Ωm. Differences between the models of σ8 = 0.84 and σ8 = 0.74 still exist in the fitting results. The power spectrum of 2dFGRS seems to disfavor models with low amplitude of density fluctuations if the bias parameter is assumed to be scale independent. For the fitting value of Ωm to be consistent with that given by WMAP3, strong scale dependence of the bias parameters is needed.展开更多
The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation(EO)data is still unsolved.Supporting both types of interoperability is on...The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation(EO)data is still unsolved.Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets.The proposed system wraps world models,(semantic interoperability)into OGC Web Processing Services(syntactic interoperability)for semantic online analyses.World models describe spatio-temporal entities and their relationships in a formal way.The proposed system serves as enabler for(1)technical interoperability using a standardised interface to be used by all types of clients and(2)allowing experts from different domains to develop complex analyses together as collaborative effort.Users are connecting the world models online to the data,which are maintained in a centralised storage as 3D spatio-temporal data cubes.It allows also non-experts to extract valuable information from EO data because data management,low-level interactions or specific software issues can be ignored.We discuss the concept of the proposed system,provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO,gridded,multitemporal data sets(CORINE land cover).展开更多
In forest dynamics models, the intensive computation and load involved in the simulation of seed dispersal can become unbearably huge for large-scale forest analysis. To solve this problem, we propose a multi-resoluti...In forest dynamics models, the intensive computation and load involved in the simulation of seed dispersal can become unbearably huge for large-scale forest analysis. To solve this problem, we propose a multi-resolution algorithm to compute seed dispersal on GPU. By exploiting the computation parallelism of seed dispersal, the computation of the whole forest plot is divided into multiple small plot cells, which are computed independently by parallel threads on GPU. To further improve the calculation efficiency with limited threads scale for GPU computation, we propose a hierarchical method to cluster the plot cells into a multi-resolution form according to the biological curves of tree seed dispersal. Experimental results show that our algorithm not only greatly reduces computational time but also obtains comparably correct results as compared to the naive GPU algorithm, which makes it especially suitable for large-scale forest modeling.展开更多
文摘Background A task assigned to space exploration satellites involves detecting the physical environment within a certain space.However,space detection data are complex and abstract.These data are not conducive for researchers'visual perceptions of the evolution and interaction of events in the space environment.Methods A time-series dynamic data sampling method for large-scale space was proposed for sample detection data in space and time,and the corresponding relationships between data location features and other attribute features were established.A tone-mapping method based on statistical histogram equalization was proposed and applied to the final attribute feature data.The visualization process is optimized for rendering by merging materials,reducing the number of patches,and performing other operations.Results The results of sampling,feature extraction,and uniform visualization of the detection data of complex types,long duration spans,and uneven spatial distributions were obtained.The real-time visualization of large-scale spatial structures using augmented reality devices,particularly low-performance devices,was also investigated.Conclusions The proposed visualization system can reconstruct the three-dimensional structure of a large-scale space,express the structure and changes in the spatial environment using augmented reality,and assist in intuitively discovering spatial environmental events and evolutionary rules.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110IUCAA,Pune for providing support through an associateship program+1 种基金IISER Tirupati for support through a postdoctoral fellowshipFunding for the SDSS and SDSS-Ⅱhas been provided by the Alfred P.Sloan Foundation,the U.S.Department of Energy,the National Aeronautics and Space Administration,the Japanese Monbukagakusho,the Max Planck Society,and the Higher Education Funding Council for England。
文摘Major interactions are known to trigger star formation in galaxies and alter their color.We study the major interactions in filaments and sheets using SDSS data to understand the influence of large-scale environments on galaxy interactions.We identify the galaxies in filaments and sheets using the local dimension and also find the major pairs residing in these environments.The star formation rate(SFR) and color of the interacting galaxies as a function of pair separation are separately analyzed in filaments and sheets.The analysis is repeated for three volume limited samples covering different magnitude ranges.The major pairs residing in the filaments show a significantly higher SFR and bluer color than those residing in the sheets up to the projected pair separation of~50 kpc.We observe a complete reversal of this behavior for both the SFR and color of the galaxy pairs having a projected separation larger than 50 kpc.Some earlier studies report that the galaxy pairs align with the filament axis.Such alignment inside filaments indicates anisotropic accretion that may cause these differences.We do not observe these trends in the brighter galaxy samples.The pairs in filaments and sheets from the brighter galaxy samples trace relatively denser regions in these environments.The absence of these trends in the brighter samples may be explained by the dominant effect of the local density over the effects of the large-scale environment.
基金Supported by Project of National Natural Science Foundation(No.41874134)
文摘Processing large-scale 3-D gravity data is an important topic in geophysics field. Many existing inversion methods lack the competence of processing massive data and practical application capacity. This study proposes the application of GPU parallel processing technology to the focusing inversion method, aiming at improving the inversion accuracy while speeding up calculation and reducing the memory consumption, thus obtaining the fast and reliable inversion results for large complex model. In this paper, equivalent storage of geometric trellis is used to calculate the sensitivity matrix, and the inversion is based on GPU parallel computing technology. The parallel computing program that is optimized by reducing data transfer, access restrictions and instruction restrictions as well as latency hiding greatly reduces the memory usage, speeds up the calculation, and makes the fast inversion of large models possible. By comparing and analyzing the computing speed of traditional single thread CPU method and CUDA-based GPU parallel technology, the excellent acceleration performance of GPU parallel computing is verified, which provides ideas for practical application of some theoretical inversion methods restricted by computing speed and computer memory. The model test verifies that the focusing inversion method can overcome the problem of severe skin effect and ambiguity of geological body boundary. Moreover, the increase of the model cells and inversion data can more clearly depict the boundary position of the abnormal body and delineate its specific shape.
文摘Social media data created a paradigm shift in assessing situational awareness during a natural disaster or emergencies such as wildfire, hurricane, tropical storm etc. Twitter as an emerging data source is an effective and innovative digital platform to observe trend from social media users’ perspective who are direct or indirect witnesses of the calamitous event. This paper aims to collect and analyze twitter data related to the recent wildfire in California to perform a trend analysis by classifying firsthand and credible information from Twitter users. This work investigates tweets on the recent wildfire in California and classifies them based on witnesses into two types: 1) direct witnesses and 2) indirect witnesses. The collected and analyzed information can be useful for law enforcement agencies and humanitarian organizations for communication and verification of the situational awareness during wildfire hazards. Trend analysis is an aggregated approach that includes sentimental analysis and topic modeling performed through domain-expert manual annotation and machine learning. Trend analysis ultimately builds a fine-grained analysis to assess evacuation routes and provide valuable information to the firsthand emergency responders<span style="font-family:Verdana;">.</span>
基金This research has been partially supported by the national natural science foundation of China (51175169) and the national science and technology support program (2012BAF02B01).
文摘In the face of a growing number of large-scale data sets, affinity propagation clustering algorithm to calculate the process required to build the similarity matrix, will bring huge storage and computation. Therefore, this paper proposes an improved affinity propagation clustering algorithm. First, add the subtraction clustering, using the density value of the data points to obtain the point of initial clusters. Then, calculate the similarity distance between the initial cluster points, and reference the idea of semi-supervised clustering, adding pairs restriction information, structure sparse similarity matrix. Finally, the cluster representative points conduct AP clustering until a suitable cluster division.Experimental results show that the algorithm allows the calculation is greatly reduced, the similarity matrix storage capacity is also reduced, and better than the original algorithm on the clustering effect and processing speed.
基金supported by the Chinese Postdoctoral Science Foundation(2021M700016).
文摘In this paper,we propose a correlationaware probabilistic data summarization technique to efficiently analyze and visualize large-scale multi-block volume data generated by massively parallel scientific simulations.The core of our technique is correlation modeling of distribution representations of adjacent data blocks using copula functions and accurate data value estimation by combining numerical information,spatial location,and correlation distribution using Bayes’rule.This effectively preserves statistical properties without merging data blocks in different parallel computing nodes and repartitioning them,thus significantly reducing the computational cost.Furthermore,this enables reconstruction of the original data more accurately than existing methods.We demonstrate the effectiveness of our technique using six datasets,with the largest having one billion grid points.The experimental results show that our approach reduces the data storage cost by approximately one order of magnitude compared to state-of-the-art methods while providing a higher reconstruction accuracy at a lower computational cost.
基金supported by the National Natural Science Foundation of China(grant number 72074214).
文摘Both computer science and archival science are concerned with archiving large-scale data,but they have different focuses.Large-scale data archiving in computer science focuses on technical aspects that can reduce the cost of data storage and improve the reliability and efficiency of Big Data management.Its weaknesses lie in inadequate and non-standardized management.Archiving in archival science focuses on the management aspects and neglects the necessary technical considerations,resulting in high storage and retention costs and poor ability to manage Big Data.Therefore,the integration of large-scale data archiving and archival theory can balance the existing research limitations of the two fields and propose two research topics for related research-archival management of Big Data and large-scale management of archived Big Data.
基金The authors are thankful for the financial support from the UBMEM project from the Swedish Energy Agency(Grant No.46068).
文摘Household electricity demand has substantial impacts on local grid operation,energy storage and the energy per-formance of buildings.Hourly demand data at district or urban level helps stakeholders understand the demand patterns from a granular time scale and provides robust evidence in energy management.However,such type of data is often expensive and time-consuming to collect,process and integrate.Decisions built upon smart meter data have to deal with challenges of privacy and security in the whole process.Incomplete data due to confiden-tiality concerns or system failure can further increase the difficulty of modeling and optimization.In addition,methods using historical data to make predictions can largely vary depending on data quality,local building envi-ronment,and dynamic factors.Considering these challenges,this paper proposes a statistical method to generate hourly electricity demand data for large-scale single-family buildings by decomposing time series data and recom-bining them into synthetics.The proposed method used public data to capture seasonality and the distribution of residuals that fulfill statistical characteristics.A reference building was used to provide empirical parameter settings and validations for the studied buildings.An illustrative case in a city of Sweden using only annual total demand was presented for deploying the proposed method.The results showed that the proposed method can mimic reality well and represent a high level of similarity to the real data.The average monthly error for the best month reached 15.9%and the best one was below 10%among 11 tested months.Less than 0.6%improper synthetic values were found in the studied region.
基金the National Excellent Youth Fund(Grant No.49825109)the CAS Key Innovation Direction Project(Grant No.KZCX2-208),and LASG Project.
文摘Global Positioning System (GPS) meteorology data variational assimilation can be reduced to the problem of a large-scale unconstrained optimization. Because the dimension of this problem is too large, most optimal algorithms cannot be performed. In order to make GPS/MET data assimilation able to satisfy the demand of numerical weather prediction, finding an algorithm with a great convergence rate of iteration will be the most important thing. A new method is presented that dynamically combines the limited memory BFGS (L-BFGS) method with the Hessian-free Newton(HFN) method, and it has a good rate of convergence in iteration. The numerical tests indicate that the computational efficiency of the method is better than the L-BFGS and HFN methods.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110IUCAA,Pune for providing support through an associateship program+8 种基金IISER Tirupati for support through a postdoctoral fellowshipFunding for the SDSS and SDSS-II has been provided by the Alfred P.Sloan Foundationthe Participating Institutionsthe National Science Foundationthe U.S.Department of Energythe National Aeronautics and Space Administrationthe Japanese Monbukagakushothe Max Planck Societythe Higher Education Funding Council for England.
文摘We analyze the galaxy pairs in a set of volume limited samples from the Sloan Digital Sky Survey to study the effects of minor interactions on the star formation rate(SFR)and color of galaxies.We carefully design control samples of isolated galaxies by matching the stellar mass and redshift of the minor pairs.The SFR distributions and color distributions in the minor pairs differ from their controls at>99%significance level.We also simultaneously match the control galaxies in stellar mass,redshift and local density to assess the role of the environment.The null hypothesis can be rejected at>99%confidence level even after matching the environment.Our analysis shows a quenching in the minor pairs where the degree of quenching decreases with the increasing pair separation and plateaus beyond 50 kpc.We also prepare a sample of minor pairs with Hαline information.We calculate the SFR of these galaxies using the Hαline and repeat our analysis.We observe a quenching in the Hαsample too.We find that the majority of the minor pairs are quiescent systems that could be quenched due to minor interactions.Combining data from the Galaxy Zoo and Galaxy Zoo 2,we find that only∼1%galaxies have a dominant bulge,4%–7%galaxies host a bar and 5%–10%of galaxies show active galactic nucleus(AGN)activity in minor pairs.This indicates that the presence of bulge,bar or AGN activity plays an insignificant role in quenching the galaxies in minor pairs.The more massive companion satisfies the criteria for mass quenching in most of the minor pairs.We propose that the stripping and starvation likely caused the quenching in the less massive companion at a later stage of evolution.
基金supported by the Distinguished Young Scholar Project(No.71922007)of the National Natural Science Foundation of China,and supported in part by the Jiangsu Provincial Key Laboratory of Networked Collective Intelligence under Grant BM2017002part of a project that has received funding from the European Union's Horizon 2020 research and innovation programme under the Marie Skłodowska-Curie grant agreement No.101025896.
文摘Real-time traffic state(e.g.,speed)prediction is an essential component for traffic control and management in an urban road network.How to build an effective large-scale traffic state prediction system is a challenging but highly valuable problem.This study focuses on the construction of an effective solution designed for spatiotemporal data to predict the traffic state of large-scale traffic systems.In this study,we first summarize the three challenges faced by large-scale traffic state prediction,i.e.,scale,granularity,and sparsity.Based on the domain knowledge of traffic engineering,the propagation of traffic states along the road network is theoretically analyzed,which are elaborated in aspects of the temporal and spatial propagation of traffic state,traffic state experience replay,and multi-source data fusion.A deep learning architecture,termed as Deep Traffic State Prediction(DeepTSP),is therefore proposed to address the current challenges in traffic state prediction.Experiments demonstrate that the proposed DeepTSP model can effectively predict large-scale traffic states.
基金financial support from the SERB,DST,Government of India through the project CRG/2019/001110support from IUCAA,Pune through the associateship programDST,Government of India for support through a National Post Doctoral Fellowship(N-PDF)。
文摘We study the color and star formation rates of paired galaxies in filaments and sheets using the EAGLE simulations.We find that the major pairs with pair separation<50 kpc are bluer and more star-forming in filamentary environments compared to those hosted in sheet-like environments.This trend reverses beyond a pair separation of~50 kpc.The interacting pairs with larger separations(>50 kpc)in filaments are on average redder and low-star-forming compared to those embedded in sheets.The galaxies in filaments and sheets may have different stellar mass and cold gas mass distributions.Using a KS test,we find that for paired galaxies with pair separation<50 kpc,there are no significant differences in these properties in sheets and filaments.The filaments transport gas toward the cluster of galaxies.Some earlier studies find preferential alignment of galaxy pairs with the filament axis.Such alignment of galaxy pairs may lead to different gas accretion efficiency in galaxies residing in filaments and sheets.We propose that the enhancement of star formation rate at smaller pair separation in filaments is caused by the alignment of galaxy pairs.A recent study with SDSS data reports the same findings.The confirmation of these results by the EAGLE simulations suggests that the hydrodynamical simulations are powerful theoretical tools for studying galaxy formation and evolution in the cosmic web.
文摘Computational psychiatry is an emerging field that not only explores the biological basis of mental illness but also considers the diagnoses and identifies the underlying mechanisms.One of the key strengths of computational psychiatry is that it may identify patterns in large datasets that are not easily identifiable.This may help researchers develop more effective treatments and interventions for mental health problems.This paper is a narrative review that reviews the literature and produces an artificial intelligence ecosystem for computational psychiatry.The artificial intelligence ecosystem for computational psychiatry includes data acquisition,preparation,modeling,application,and evaluation.This approach allows researchers to integrate data from a variety of sources,such as brain imaging,genetics,and behavioral experiments,to obtain a more complete understanding of mental health conditions.Through the process of data preprocessing,training,and testing,the data that are required for model building can be prepared.By using machine learning,neural networks,artificial intelligence,and other methods,researchers have been able to develop diagnostic tools that can accurately identify mental health conditions based on a patient’s symptoms and other factors.Despite the continuous development and breakthrough of computational psychiatry,it has not yet influenced routine clinical practice and still faces many challenges,such as data availability and quality,biological risks,equity,and data protection.As we move progress in this field,it is vital to ensure that computational psychiatry remains accessible and inclusive so that all researchers may contribute to this significant and exciting field.
基金supported by the National Natural Science Foundation of China(grant Nos.11929301 and 61802428)。
文摘Massive neutrinos are expected to affect the large-scale structure formation,including the major component of solid substances,dark matter halos.How halos are influenced by neutrinos is vital and interesting,and angular momentum(AM)as a significant feature provides a statistical perspective for this issue.Exploring halos from TianNu N-body cosmological simulation with the co-evolving neutrino particles,we obtain some concrete conclusions.First,by comparing the same halos with and without neutrinos,in contrast to the neutrino-free case,over 89.71%of halos have smaller halo moduli,over 71.06%have smaller particle-mass-reduced(PMR)AM moduli,and over 95.44%change their orientations of less than 0°.65.Moreover,the relative variation of PMR modulus is more visible for low-mass halos.Second,to explore the PMR moduli of halos in dense or sparse areas,we divide the whole box into big cubes,and search for halos within a small spherical cell in a single cube.From the two-level divisions,we discover that in denser cubes,the variation of PMR moduli with massive neutrinos decreases more significantly.This distinction suggests that neutrinos exert heavier influence on halos'moduli in compact regions.With massive neutrinos,most halos(86.60%)have lower masses than without neutrinos.
文摘This paper offers preliminary work on system dynamics and Data mining tools. It tries to understand the dynamics of carrying out large-scale events, such as Hajj. The study looks at a large, recurring problem as a variable to consider, such as how the flow of people changes over time as well as how location interacts with placement. The predicted data is analyzed using Vensim PLE 32 modeling software, GIS Arc Map 10.2.1, and AnyLogic 7.3.1 software regarding the potential placement of temporal service points, taking into consideration the three dynamic constraints and behavioral aspects: a large population, limitation in time, and space. This research proposes appropriate data analyses to ensure the optimal positioning of the service points with limited time and space for large-scale events. The conceptual framework would be the output of this study. Knowledge may be added to the insights based on the technique.
基金We thank the support from the China National Science Foundation(Grant No.30890033,30588001 and 30620120433)Chinese Ministry of Science and Technology(No.2006CB910700)to J.D.J.H.
文摘The investigation of the interplay between genes,proteins,metabolites and diseases plays a central role in molecular and cellular biology.Whole genome sequencing has made it possible to examine the behavior of all the genes in a genome by high-throughput experimental techniques and to pinpoint molecular interactions on a genome-wide scale,which form the backbone of systems biology.In particular,Bayesian network(BN)is a powerful tool for the ab-initial identification of causal and non-causal relationships between biological factors directly from experimental data.However,scalability is a crucial issue when we try to apply BNs to infer such interactions.In this paper,we not only introduce the Bayesian network formalism and its applications in systems biology,but also review recent technical developments for scaling up or speeding up the structural learning of BNs,which is important for the discovery of causal knowledge from large-scale biological datasets.Specifically,we highlight the basic idea,relative pros and cons of each technique and discuss possible ways to combine different algorithms towards making BN learning more accurate and much faster.
文摘The tidal current duration (TCD) and velocity (TCV) and suspended sediment concentration (SSC) were measured in the dry season in December, 2011 and in the flood season in June, 2012 at the upper part of the North Channel of Changjiang Estuary. They were assimilated with the measured data in 2003, 2004, 2006 and 2007, using the tidal range's proportion conversion. Variations in TCD and TCV, preferential flow and SSC have been calculated. Influences of typical engineering projects such as Qingcaosha fresh water reservoir, Yangtze River Bridge, and land reclamation on the ebb and flood TCD, TCV and SSC in the North Channel for the last 10 years are discussed. The results show that: (1) currently, in the upper part of North Channel, the ebb tide dominates; after the construction of the typical projects, ebb TCD and TCV tends to be larger and the vertical average ebb and flood SSC decrease during the flood season while SSC increases during the dry season; (2) changes in the vertical average TCV are mainly contributed by seasonal runoff variation during the flood season, which is larger in the flood season than that in the dry season; the controlling parameters of increasing ebb TCD and TCV are those large-scale engineering projects in the North Channel; variation in SSC may result mainly from the reduction of basin annual sediment loads, large-scale nearshore projects and so on.
基金the National Natural Science Foundation of China
文摘The power spectrum of the two-degree Field Galaxy Redshift Survey (2dFGRS) sample is estimated with the discrete wavelet transform (DWT) method. The DWT power spectra within 0.035 〈 k 〈 2.2 h Mpc^-1 are measured for three volume-limited samples defined in consecutive absolute magnitude bins - 19 - - 18, - 20 - - 19 and - 21 - - 20. We show that the DWT power spectrum can effectively distinguish ACDM models of σ8 = 0.84 and σ8 = 0.74. We adopt maximum likelihood method to perform three-parameter fitting of the bias parameter b, pairwise velocity dispersion σpv and redshift distortion parameterβ = Ωm^0.6/b to the measured DWT power spectrum. The fitting results state that in a σ8 = 0.84 universe the best-fit values of Ωm given by the three samples are mutually consistent within the range 0.28 - 0.36, and the best fitted values of Opv are 398-27^+35, 475-29^37 and 550 ± 20 km s^-1 for the three samples, respectively. In the model of σ8 = 0.74, our three samples give very different values of Ωm. We repeated the fitting using the empirical formula of redshift distortion. The result of the model of low σ8 is still poor, especially, one of the best-fit values of σpv is as large as 10^3 km s^-1. We also repeated our fitting by incorporating a scale-dependent galaxy bias. This gave a slightly lower value of Ωm. Differences between the models of σ8 = 0.84 and σ8 = 0.74 still exist in the fitting results. The power spectrum of 2dFGRS seems to disfavor models with low amplitude of density fluctuations if the bias parameter is assumed to be scale independent. For the fitting value of Ωm to be consistent with that given by WMAP3, strong scale dependence of the bias parameters is needed.
基金This work was supported by the Austrian Science Fund(FWF)through the Doctoral College GIScience(DK W1237-N23)the Austrian Research Promotion Agency(FFG)within the exploratory projects SemEO(contract no:855467)AutoSentinel 2/3(contract no:848009).
文摘The challenge of enabling syntactic and semantic interoperability for comprehensive and reproducible online processing of big Earth observation(EO)data is still unsolved.Supporting both types of interoperability is one of the requirements to efficiently extract valuable information from the large amount of available multi-temporal gridded data sets.The proposed system wraps world models,(semantic interoperability)into OGC Web Processing Services(syntactic interoperability)for semantic online analyses.World models describe spatio-temporal entities and their relationships in a formal way.The proposed system serves as enabler for(1)technical interoperability using a standardised interface to be used by all types of clients and(2)allowing experts from different domains to develop complex analyses together as collaborative effort.Users are connecting the world models online to the data,which are maintained in a centralised storage as 3D spatio-temporal data cubes.It allows also non-experts to extract valuable information from EO data because data management,low-level interactions or specific software issues can be ignored.We discuss the concept of the proposed system,provide a technical implementation example and describe three use cases for extracting changes from EO images and demonstrate the usability also for non-EO,gridded,multitemporal data sets(CORINE land cover).
基金supported by the National Natural Science Foundation of China (Nos. 61173097 and 61003265)the Natural Science Foundation of Zhejiang Province, China (No. Z1090459)+1 种基金the Science and Technology Planning Project of Zhejiang Province, China (No.2010C33046)Tsinghua-Tencent Joint Laboratory for Internet Innovation Technology
文摘In forest dynamics models, the intensive computation and load involved in the simulation of seed dispersal can become unbearably huge for large-scale forest analysis. To solve this problem, we propose a multi-resolution algorithm to compute seed dispersal on GPU. By exploiting the computation parallelism of seed dispersal, the computation of the whole forest plot is divided into multiple small plot cells, which are computed independently by parallel threads on GPU. To further improve the calculation efficiency with limited threads scale for GPU computation, we propose a hierarchical method to cluster the plot cells into a multi-resolution form according to the biological curves of tree seed dispersal. Experimental results show that our algorithm not only greatly reduces computational time but also obtains comparably correct results as compared to the naive GPU algorithm, which makes it especially suitable for large-scale forest modeling.