Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel...HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and...Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.展开更多
Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s in...Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.展开更多
We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observ...We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).展开更多
1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich inf...1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels...As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.展开更多
Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owne...Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.展开更多
As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electron...As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.展开更多
Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems...Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.展开更多
The hypocentral depths of more than 200 Chinese earthquakes, of magnitudes from M 8.6 to M 3.0, are calculated from macroseismic data carried on earthquake catalogs, by using the formula for macroseismic hypocen...The hypocentral depths of more than 200 Chinese earthquakes, of magnitudes from M 8.6 to M 3.0, are calculated from macroseismic data carried on earthquake catalogs, by using the formula for macroseismic hypocentral depths and the formula for general solution of macroseismic hypocentral depths. The results are plotted on maps to show their geographical distribution. It can be seen that most Chinese earthquakes are shollow ones. Of the 200 earthquakes calculated, 162(81.0%) hypocenters are shallower than 9 km, of which 111 (55.5%) hypocenters are shallower than 5 km. Such shallow earthquakes are mostly distributed in the provinces near to the North South Earthquake Belt, while the rest are scattered in the other provinces(except Zhejiang province). Earthquakes of medium(between 10 and 20 km) depth are relatively few (32 in number, 15.0%); they are distributed along the North South Earthquake Belt, and the western part of Xinjiang Uygur Autonomous Region and in provinces Shaanxi, Shanxi and Shandong (along the Tanlu Fracture Zone, crossing the sea to northeast China). Deep earthquakes are rare, being scattered in south Yunnan and the east end of Inner Mongolia Uygur Autonomous Region.展开更多
Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its cla...Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.展开更多
Federated learning has emerged as a distributed learning paradigm by training at each client and aggregat-ing at a parameter server.System heterogeneity hinders stragglers from responding to the server in time with hu...Federated learning has emerged as a distributed learning paradigm by training at each client and aggregat-ing at a parameter server.System heterogeneity hinders stragglers from responding to the server in time with huge com-munication costs.Although client grouping in federated learning can solve the straggler problem,the stochastic selection strategy in client grouping neglects the impact of data distribution within each group.Besides,current client grouping ap-proaches make clients suffer unfair participation,leading to biased performances for different clients.In order to guaran-tee the fairness of client participation and mitigate biased local performances,we propose a federated dynamic client selec-tion method based on data representativity(FedSDR).FedSDR clusters clients into groups correlated with their own lo-cal computational efficiency.To estimate the significance of client datasets,we design a novel data representativity evalua-tion scheme based on local data distribution.Furthermore,the two most representative clients in each group are selected to optimize the global model.Finally,the DYNAMIC-SELECT algorithm updates local computational efficiency and data representativity states to regroup clients after periodic average aggregation.Evaluations on real datasets show that FedS-DR improves client participation by 27.4%,37.9%,and 23.3%compared with FedAvg,TiFL,and FedSS,respectively,tak-ing fairness into account in federated learning.In addition,FedSDR surpasses FedAvg,FedGS,and FedMS by 21.32%,20.4%,and 6.90%,respectively,in local test accuracy variance,balancing the performance bias of the global model across clients.展开更多
Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different a...Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different aims.E-Infrastructures are collaborative platforms that address this diversity of aims and data representations.In this paper,we present a prototype for a European Geothermal Information Platform that uses INSPIRE recommendations and an e-Infrastructure(D4Science)to collect,aggregate and share data sets from different European data contributors,thus enabling stakeholders to retrieve and process a large amount of data.Our system merges segmented and national realities into one common framework.We demonstrate our approach by describing a platform that collects data from Italian,French,Hungarian,Swiss and Icelandic geothermal data providers.展开更多
There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and...There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and Virtual Observatories is available.In addition,national databanks and commercially available large exploration data-sets also exist.These distributed data resources impose challenges for the future to move toward their objective integration and visualization to discover new knowledge.Such advancements can facilitate meaningful interpretations and decision-making for the benefit of society at global and local scales.This article presents the Digital Earth initiative at a national level to address multiple domains,such as effective management of natural resources,interactive planning of exploration activities and monitoring,mapping and mitigation of natural hazards.It discusses a distributed geospatial data infrastructure and its importance in geoscientific data integration for efficient and interactive data retrieval,analysis and visualization.Some examples are presented to demonstrate the advantages of integrated visualization in geoscientific analysis.展开更多
The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges...The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.展开更多
Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this w...Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this work,we first investigated the data generation trends of the K computer by analyzing some operational log data files.We verified a tendency of generating large amounts of distributed files as simulation outputs,and in most cases,the number of files has been proportional to the number of utilized computational nodes,that is,each computational node producing one or more files.Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations,a flexible data input/output(I/O)management mechanism becomes highly useful for the post-hoc visualization and analysis.In this work,we focused on the xDMlib data management library,and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results.In the proposed approach,a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process.We evaluated the proposed approach by using a 32-node visualization cluster,and the K computer.Besides the inevitable performance penalty associated with longer data loading time,when using smaller number of processes,there is a benefit for avoiding any data replication via copy,conversion,or extraction.In addition,users will be able to freely select any number of nodes,without caring about the number of distributed files,for the post-hoc visualization and analysis purposes.展开更多
Comparing the city-size distribution at the urban agglomeration(UA) scale is important for understanding the processes of urban development. However, comparative studies of city-size distribution among China's thre...Comparing the city-size distribution at the urban agglomeration(UA) scale is important for understanding the processes of urban development. However, comparative studies of city-size distribution among China's three largest UAs, the Beijing-Tianjin-Hebei agglomeration(BTHA), the Yangtze River Delta agglomeration(YRDA), and the Pearl River Delta agglomeration(PRDA), remain inadequate due to the limitation of data availability. Therefore, using urban data derived from time-series nighttime light data, the common characteristics and distinctive features of city-size distribution among the three UAs from 1992 to 2015 were compared by the Pareto regression and the rank clock method. We identified two common features. First, the city-size distribution became more even. The Pareto exponents increased by 0.17, 0.12, and 0.01 in the YRDA, BTHA, and PRDA, respectively. Second, the average ranks of small cities ascended, being 0.55, 0.08 and 0.04 in the three UAs, respectively. However, the average ranks of large and medium cities in the three UAs experienced different trajectories, which are closely related to the similarities and differences in the driving forces for the development of UAs. Place-based measures are encouraged to promote a coordinated development among cities of differing sizes in the three UAs.展开更多
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
基金The project supported by the Meg-science Engineering Project of the Chinese Academy of Sciences
文摘HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.
文摘Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.
基金supported by the Special Fund of Fundamental Scientific Research Business Expense for Higher School of Central Government(Projects for creation teams ZY20110101)NSFC 41090294talent selection and training plan project of Hebei university
文摘We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).
基金supported by the Special Scientific Research Fund of Public Welfare Profession of Ministry of Land and Resources of the People’s Republic of China (No. 201011057)
文摘1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.
基金supported by the National Natural Science Foundation of China(Nos.U19A2059 and 61802050)Ministry of Science and Technology of Sichuan Province Program(Nos.2021YFG0018 and 20ZDYF0343)。
文摘Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.
基金This work was supported in part by the National Natural Science Foundation of China(CCF1919154,ECCS-1923409).
文摘As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.
文摘Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.
文摘The hypocentral depths of more than 200 Chinese earthquakes, of magnitudes from M 8.6 to M 3.0, are calculated from macroseismic data carried on earthquake catalogs, by using the formula for macroseismic hypocentral depths and the formula for general solution of macroseismic hypocentral depths. The results are plotted on maps to show their geographical distribution. It can be seen that most Chinese earthquakes are shollow ones. Of the 200 earthquakes calculated, 162(81.0%) hypocenters are shallower than 9 km, of which 111 (55.5%) hypocenters are shallower than 5 km. Such shallow earthquakes are mostly distributed in the provinces near to the North South Earthquake Belt, while the rest are scattered in the other provinces(except Zhejiang province). Earthquakes of medium(between 10 and 20 km) depth are relatively few (32 in number, 15.0%); they are distributed along the North South Earthquake Belt, and the western part of Xinjiang Uygur Autonomous Region and in provinces Shaanxi, Shanxi and Shandong (along the Tanlu Fracture Zone, crossing the sea to northeast China). Deep earthquakes are rare, being scattered in south Yunnan and the east end of Inner Mongolia Uygur Autonomous Region.
基金supported by the[National Key Research and Development Program]under Grant[number 2019YFE0126700][Shandong Provincial Natural Science Foundation]under Grant[number ZR2020QD018].
文摘Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.
基金This work is supported by the National Key Research and Development Program of China under Grant No.2022YFC3005401the Key Research and Development Program of Yunnan Province of China under Grant No.202203AA080009+1 种基金the Transformation Program of Scientific and Technological Achievements of Jiangsu Province of China under Grant No.BA2021002the Key Research and Development Program of Jiangsu Province of Chin under Grant No.BE2020729.
文摘Federated learning has emerged as a distributed learning paradigm by training at each client and aggregat-ing at a parameter server.System heterogeneity hinders stragglers from responding to the server in time with huge com-munication costs.Although client grouping in federated learning can solve the straggler problem,the stochastic selection strategy in client grouping neglects the impact of data distribution within each group.Besides,current client grouping ap-proaches make clients suffer unfair participation,leading to biased performances for different clients.In order to guaran-tee the fairness of client participation and mitigate biased local performances,we propose a federated dynamic client selec-tion method based on data representativity(FedSDR).FedSDR clusters clients into groups correlated with their own lo-cal computational efficiency.To estimate the significance of client datasets,we design a novel data representativity evalua-tion scheme based on local data distribution.Furthermore,the two most representative clients in each group are selected to optimize the global model.Finally,the DYNAMIC-SELECT algorithm updates local computational efficiency and data representativity states to regroup clients after periodic average aggregation.Evaluations on real datasets show that FedS-DR improves client participation by 27.4%,37.9%,and 23.3%compared with FedAvg,TiFL,and FedSS,respectively,tak-ing fairness into account in federated learning.In addition,FedSDR surpasses FedAvg,FedGS,and FedMS by 21.32%,20.4%,and 6.90%,respectively,in local test accuracy variance,balancing the performance bias of the global model across clients.
文摘Geothermal data are published using different IT services,formats and content representations,and can refer to both regional and global scale information.Geothermal stakeholders search for information with different aims.E-Infrastructures are collaborative platforms that address this diversity of aims and data representations.In this paper,we present a prototype for a European Geothermal Information Platform that uses INSPIRE recommendations and an e-Infrastructure(D4Science)to collect,aggregate and share data sets from different European data contributors,thus enabling stakeholders to retrieve and process a large amount of data.Our system merges segmented and national realities into one common framework.We demonstrate our approach by describing a platform that collects data from Italian,French,Hungarian,Swiss and Icelandic geothermal data providers.
文摘There has been a worldwide revolution in geoscientific data availability and access.An effectively infinite and instantaneous free access to geoscientific data from the World Wide System of Geoscience Data Centers and Virtual Observatories is available.In addition,national databanks and commercially available large exploration data-sets also exist.These distributed data resources impose challenges for the future to move toward their objective integration and visualization to discover new knowledge.Such advancements can facilitate meaningful interpretations and decision-making for the benefit of society at global and local scales.This article presents the Digital Earth initiative at a national level to address multiple domains,such as effective management of natural resources,interactive planning of exploration activities and monitoring,mapping and mitigation of natural hazards.It discusses a distributed geospatial data infrastructure and its importance in geoscientific data integration for efficient and interactive data retrieval,analysis and visualization.Some examples are presented to demonstrate the advantages of integrated visualization in geoscientific analysis.
基金VODAN-Africathe Philips Foundation+2 种基金the Dutch Development Bank FMOCORDAIDthe GO FAIR Foundation for supporting this research
文摘The field of health data management poses unique challenges in relation to data ownership, the privacy of data subjects, and the reusability of data. The FAIR Guidelines have been developed to address these challenges. The Virus Outbreak Data Network(VODAN) architecture builds on these principles, using the European Union’s General Data Protection Regulation(GDPR) framework to ensure compliance with local data regulations, while using information knowledge management concepts to further improve data provenance and interoperability. In this article we provide an overview of the terminology used in the field of FAIR data management, with a specific focus on FAIR compliant health information management, as implemented in the VODAN architecture.
基金the“Joint Usage/Research Center for Interdisciplinary Large-scale Information Infrastructures”in Japan(Project ID:jh170043,jh170051).
文摘Leading-edge supercomputers,such as the K computer,have generated a vast amount of simulation results,and most of these datasets were stored on the file system for the post-hoc analysis such as visualization.In this work,we first investigated the data generation trends of the K computer by analyzing some operational log data files.We verified a tendency of generating large amounts of distributed files as simulation outputs,and in most cases,the number of files has been proportional to the number of utilized computational nodes,that is,each computational node producing one or more files.Considering that the computational cost of visualization tasks is usually much smaller than that required for large-scale numerical simulations,a flexible data input/output(I/O)management mechanism becomes highly useful for the post-hoc visualization and analysis.In this work,we focused on the xDMlib data management library,and its flexible data I/O mechanism in order to enable flexible data loading of big computational climate simulation results.In the proposed approach,a pre-processing is executed on the target distributed files for generating a light-weight metadata necessary for the elaboration of the data assignment mapping used in the subsequent data loading process.We evaluated the proposed approach by using a 32-node visualization cluster,and the K computer.Besides the inevitable performance penalty associated with longer data loading time,when using smaller number of processes,there is a benefit for avoiding any data replication via copy,conversion,or extraction.In addition,users will be able to freely select any number of nodes,without caring about the number of distributed files,for the post-hoc visualization and analysis purposes.
基金National Natural Science Foundation of China,No.41621061,No.41501092 Talents Training Program from the Beijing Municipal Commission of Education No.201500002012G058
文摘Comparing the city-size distribution at the urban agglomeration(UA) scale is important for understanding the processes of urban development. However, comparative studies of city-size distribution among China's three largest UAs, the Beijing-Tianjin-Hebei agglomeration(BTHA), the Yangtze River Delta agglomeration(YRDA), and the Pearl River Delta agglomeration(PRDA), remain inadequate due to the limitation of data availability. Therefore, using urban data derived from time-series nighttime light data, the common characteristics and distinctive features of city-size distribution among the three UAs from 1992 to 2015 were compared by the Pareto regression and the rank clock method. We identified two common features. First, the city-size distribution became more even. The Pareto exponents increased by 0.17, 0.12, and 0.01 in the YRDA, BTHA, and PRDA, respectively. Second, the average ranks of small cities ascended, being 0.55, 0.08 and 0.04 in the three UAs, respectively. However, the average ranks of large and medium cities in the three UAs experienced different trajectories, which are closely related to the similarities and differences in the driving forces for the development of UAs. Place-based measures are encouraged to promote a coordinated development among cities of differing sizes in the three UAs.