Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data...Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.展开更多
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel...HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.展开更多
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat...Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and...Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.展开更多
It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel alg...It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.展开更多
Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s in...Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.展开更多
Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges...Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges from low to medium to high.The satellites possess the capability to observe across multiple spectral bands,under all weather conditions,and at all times.The data of China Earth observation satellites has been widely used in fields such as natural resource detection,environmental monitoring and protection,disaster prevention and reduction,urban planning and mapping,agricultural and forestry surveys,land survey and geological prospecting,and ocean forecasting,achieving huge social benefits.This article introduces the recent progress of Earth observation satellites in China since 2022,especially the satellite operation,data archiving,data distribution and data coverage.展开更多
To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while r...To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.展开更多
In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the r...In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.展开更多
We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observ...We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).展开更多
1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich inf...1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clus...The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.展开更多
Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed s...Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.展开更多
As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels...As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.展开更多
Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owne...Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.展开更多
As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electron...As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.展开更多
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
文摘Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.
基金The project supported by the Meg-science Engineering Project of the Chinese Academy of Sciences
文摘HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.
基金Supported by National Natural Science Foundation of China (No. 50475117)Tianjin Natural Science Foundation (No.06YFJMJC03700).
文摘Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.
文摘It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.
文摘Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.
文摘Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges from low to medium to high.The satellites possess the capability to observe across multiple spectral bands,under all weather conditions,and at all times.The data of China Earth observation satellites has been widely used in fields such as natural resource detection,environmental monitoring and protection,disaster prevention and reduction,urban planning and mapping,agricultural and forestry surveys,land survey and geological prospecting,and ocean forecasting,achieving huge social benefits.This article introduces the recent progress of Earth observation satellites in China since 2022,especially the satellite operation,data archiving,data distribution and data coverage.
基金The National Key Basic Research Program of China(973 Program)
文摘To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.
基金Supported by the National Natural Science Foundation of China(61622301,61533002)Beijing Natural Science Foundation(4172005)Major National Science and Technology Project(2017ZX07104)
文摘In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.
基金supported by the Special Fund of Fundamental Scientific Research Business Expense for Higher School of Central Government(Projects for creation teams ZY20110101)NSFC 41090294talent selection and training plan project of Hebei university
文摘We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).
基金supported by the Special Scientific Research Fund of Public Welfare Profession of Ministry of Land and Resources of the People’s Republic of China (No. 201011057)
文摘1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.
基金Project supported by the National Natural Science Foundation of China (Nos. 60772098 and 60672068)the New Century Excel-lent Talents in University of China (No. NCET-06-0393)
文摘Privacy is a critical requirement in distributed data mining. Cryptography-based secure multiparty computation is a main approach for privacy preserving. However, it shows poor performance in large scale distributed systems. Meanwhile, data perturbation techniques are comparatively efficient but are mainly used in centralized privacy-preserving data mining (PPDM). In this paper, we propose a light-weight anonymous data perturbation method for efficient privacy preserving in distributed data mining. We first define the privacy constraints for data perturbation based PPDM in a semi-honest distributed environment. Two protocols are proposed to address these constraints and protect data statistics and the randomization process against collusion attacks: the adaptive privacy-preserving summary protocol and the anonymous exchange protocol. Finally, a distributed data perturbation framework based on these protocols is proposed to realize distributed PPDM. Experiment results show that our approach achieves a high security level and is very efficient in a large scale distributed environment.
文摘As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.
基金supported by the National Natural Science Foundation of China(Nos.U19A2059 and 61802050)Ministry of Science and Technology of Sichuan Province Program(Nos.2021YFG0018 and 20ZDYF0343)。
文摘Graph data publication has been considered as an important step for data analysis and mining.Graph data,which provide knowledge on interactions among entities,can be locally generated and held by distributed data owners.These data are usually sensitive and private,because they may be related to owners’personal activities and can be hijacked by adversaries to conduct inference attacks.Current solutions either consider private graph data as centralized contents or disregard the overlapping of graphs in distributed manners.Therefore,this work proposes a novel framework for distributed graph publication.In this framework,differential privacy is applied to justify the safety of the published contents.It includes four phases,i.e.,graph combination,plan construction sharing,data perturbation,and graph reconstruction.The published graph selection is guided by one data coordinator,and each graph is perturbed carefully with the Laplace mechanism.The problem of graph selection is formulated and proven to be NP-complete.Then,a heuristic algorithm is proposed for selection.The correctness of the combined graph and the differential privacy on all edges are analyzed.This study also discusses a scenario without a data coordinator and proposes some insights into graph publication.
基金This work was supported in part by the National Natural Science Foundation of China(CCF1919154,ECCS-1923409).
文摘As more and more data is produced,finding a secure and efficient data access structure has become a major research issue.The centralized systems used by medical institutions for the management and transfer of Electronic Medical Records(EMRs)can be vulnerable to security and privacy threats,often lack interoperability,and give patients limited or no access to their own EMRs.In this paper,we first propose a privilege-based data access structure and incorporates it into an attribute-based encryption mechanism to handle the management and sharing of big data sets.Our proposed privilege-based data access structure makes managing healthcare records using mobile healthcare devices efficient and feasible for large numbers of users.We then propose a novel distributed multilevel EMR(d-EMR)management scheme,which uses blockchain to address security concerns and enables selective sharing of medical records among staff members that belong to different levels of a hierarchical institution.We deploy smart contracts on Ethereum blockchain and utilize a distributed storage system to alleviate the dependence on the record-generating institutions to manage and share patient records.To preserve privacy of patient records,our smart contract is designed to allow patients to verify attributes prior to granting access rights.We provide extensive security,privacy,and evaluation analyses to show that our proposed scheme is both efficient and practical.