Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and sha...Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges...Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges from low to medium to high.The satellites possess the capability to observe across multiple spectral bands,under all weather conditions,and at all times.The data of China Earth observation satellites has been widely used in fields such as natural resource detection,environmental monitoring and protection,disaster prevention and reduction,urban planning and mapping,agricultural and forestry surveys,land survey and geological prospecting,and ocean forecasting,achieving huge social benefits.This article introduces the recent progress of Earth observation satellites in China since 2022,especially the satellite operation,data archiving,data distribution and data coverage.展开更多
To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while r...To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.展开更多
In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the r...In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.展开更多
It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel alg...It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.展开更多
Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process...Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.展开更多
Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data...Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.展开更多
We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observ...We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).展开更多
HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research fiel...HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.展开更多
Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify dat...Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.展开更多
Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s in...Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.展开更多
Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and...Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.展开更多
1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich inf...1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on展开更多
An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during t...An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.展开更多
The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clus...The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.展开更多
Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficie...Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficient movingwindow local outlier probability algorithm is proposed, lies key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Finally, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.展开更多
Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems...Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.展开更多
As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels...As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.展开更多
Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its cla...Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.展开更多
基金supported by STI 2030-Major Projects 2021ZD0200400National Natural Science Foundation of China(62276233 and 62072405)Key Research Project of Zhejiang Province(2023C01048).
文摘Multimodal sentiment analysis utilizes multimodal data such as text,facial expressions and voice to detect people’s attitudes.With the advent of distributed data collection and annotation,we can easily obtain and share such multimodal data.However,due to professional discrepancies among annotators and lax quality control,noisy labels might be introduced.Recent research suggests that deep neural networks(DNNs)will overfit noisy labels,leading to the poor performance of the DNNs.To address this challenging problem,we present a Multimodal Robust Meta Learning framework(MRML)for multimodal sentiment analysis to resist noisy labels and correlate distinct modalities simultaneously.Specifically,we propose a two-layer fusion net to deeply fuse different modalities and improve the quality of the multimodal data features for label correction and network training.Besides,a multiple meta-learner(label corrector)strategy is proposed to enhance the label correction approach and prevent models from overfitting to noisy labels.We conducted experiments on three popular multimodal datasets to verify the superiority of ourmethod by comparing it with four baselines.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
文摘Currently,China has 32 Earth observation satellites in orbit.The satellites can provide various data such as optical,multispectral,infrared,and radar.The spatial resolution of China Earth observation satellites ranges from low to medium to high.The satellites possess the capability to observe across multiple spectral bands,under all weather conditions,and at all times.The data of China Earth observation satellites has been widely used in fields such as natural resource detection,environmental monitoring and protection,disaster prevention and reduction,urban planning and mapping,agricultural and forestry surveys,land survey and geological prospecting,and ocean forecasting,achieving huge social benefits.This article introduces the recent progress of Earth observation satellites in China since 2022,especially the satellite operation,data archiving,data distribution and data coverage.
基金The National Key Basic Research Program of China(973 Program)
文摘To improve data distribution efficiency a load-balancing data distribution LBDD method is proposed in publish/subscribe mode.In the LBDD method subscribers are involved in distribution tasks and data transfers while receiving data themselves.A dissemination tree is constructed among the subscribers based on MD5 where the publisher acts as the root. The proposed method provides bucket construction target selection and path updates furthermore the property of one-way dissemination is proven.That the average out-going degree of a node is 2 is guaranteed with the proposed LBDD.The experiments on data distribution delay data distribution rate and load distribution are conducted. Experimental results show that the LBDD method aids in shaping the task load between the publisher and subscribers and outperforms the point-to-point approach.
基金Supported by the National Natural Science Foundation of China(61622301,61533002)Beijing Natural Science Foundation(4172005)Major National Science and Technology Project(2017ZX07104)
文摘In wastewater treatment process(WWTP), the accurate and real-time monitoring values of key variables are crucial for the operational strategies. However, most of the existing methods have difficulty in obtaining the real-time values of some key variables in the process. In order to handle this issue, a data-driven intelligent monitoring system, using the soft sensor technique and data distribution service, is developed to monitor the concentrations of effluent total phosphorous(TP) and ammonia nitrogen(NH_4-N). In this intelligent monitoring system, a fuzzy neural network(FNN) is applied for designing the soft sensor model, and a principal component analysis(PCA) method is used to select the input variables of the soft sensor model. Moreover, data transfer software is exploited to insert the soft sensor technique to the supervisory control and data acquisition(SCADA) system. Finally, this proposed intelligent monitoring system is tested in several real plants to demonstrate the reliability and effectiveness of the monitoring performance.
文摘It is difficult to parallelize a subsistent sequential algorithm. Through analyzing the sequential algorithm of a Global Atmospheric Data Objective Analysis System, this article puts forward a distributed parallel algorithm that statically distributes data on a massively parallel processing (MPP) computer. The algorithm realizes distributed parailelization by extracting the analysis boxes and model grid point Iatitude rows with leaped steps, and by distributing the data to different processors. The parallel algorithm achieves good load balancing, high parallel efficiency, and low parallel cost. Performance experiments on a MPP computer arc also presented.
基金Project(2017YFC1405600)supported by the National Key R&D Program of ChinaProject(18JK05032)supported by the Scientific Research Project of Education Department of Shaanxi Province,China。
文摘Due to the limited scenes that synthetic aperture radar(SAR)satellites can detect,the full-track utilization rate is not high.Because of the computing and storage limitation of one satellite,it is difficult to process large amounts of data of spaceborne synthetic aperture radars.It is proposed to use a new method of networked satellite data processing for improving the efficiency of data processing.A multi-satellite distributed SAR real-time processing method based on Chirp Scaling(CS)imaging algorithm is studied in this paper,and a distributed data processing system is built with field programmable gate array(FPGA)chips as the kernel.Different from the traditional CS algorithm processing,the system divides data processing into three stages.The computing tasks are reasonably allocated to different data processing units(i.e.,satellites)in each stage.The method effectively saves computing and storage resources of satellites,improves the utilization rate of a single satellite,and shortens the data processing time.Gaofen-3(GF-3)satellite SAR raw data is processed by the system,with the performance of the method verified.
文摘Distributed Data Mining is expected to discover preciously unknown, implicit and valuable information from massive data set inherently distributed over a network. In recent years several approaches to distributed data mining have been developed, but only a few of them make use of intelligent agents. This paper provides the reason for applying Multi-Agent Technology in Distributed Data Mining and presents a Distributed Data Mining System based on Multi-Agent Technology that deals with heterogeneity in such environment. Based on the advantages of both the CS model and agent-based model, the system is being able to address the specific concern of increasing scalability and enhancing performance.
基金supported by the Special Fund of Fundamental Scientific Research Business Expense for Higher School of Central Government(Projects for creation teams ZY20110101)NSFC 41090294talent selection and training plan project of Hebei university
文摘We analyze co-seismic displacement field of the 26 December 2004, giant Sumatra–Andaman earthquake derived from Global Position System observations,geological vertical measurement of coral head, and pivot line observed through remote sensing. Using the co-seismic displacement field and AK135 spherical layered Earth model, we invert co-seismic slip distribution along the seismic fault. We also search the best fault geometry model to fit the observed data. Assuming that the dip angle linearly increases in downward direction, the postfit residual variation of the inversed geometry model with dip angles linearly changing along fault strike are plotted. The geometry model with local minimum misfits is the one with dip angle linearly increasing along strike from 4.3oin top southernmost patch to 4.5oin top northernmost path and dip angle linearly increased. By using the fault shape and geodetic co-seismic data, we estimate the slip distribution on the curved fault. Our result shows that the earthquake ruptured *200-km width down to a depth of about 60 km.0.5–12.5 m of thrust slip is resolved with the largest slip centered around the central section of the rupture zone78N–108N in latitude. The estimated seismic moment is8.2 9 1022 N m, which is larger than estimation from the centroid moment magnitude(4.0 9 1022 N m), and smaller than estimation from normal-mode oscillation data modeling(1.0 9 1023 N m).
基金The project supported by the Meg-science Engineering Project of the Chinese Academy of Sciences
文摘HT-7 is the first superconducting tokamak device for fusion research in China. Many experiments have been done in the machine since 1994, and lots of satisfactory results have been achieved in the fusion research field on HT-7 tokamak [1]. With the development of fusion research, remote control of experiment becomes more and more important to improve experimental efficiency and expand research results. This paper will describe a RCS (Remote Control System), the combined model of Browser/Server and Client/Server, based on Internet of HT-7 distributed data acquisition system (HT7DAS). By means of RCS, authorized users all over the world can control and configure HT7DAS remotely. The RCS is designed to improve the flexibility, opening, reliability and efficiency of HT7DAS. In the paper, the whole process of design along with implementation of the system and some key items are discussed in detail. The System has been successfully operated during HT-7 experiment in 2002 campaign period.
基金Supported by National Natural Science Foundation of China (No. 50475117)Tianjin Natural Science Foundation (No.06YFJMJC03700).
文摘Integrating heterogeneous data sources is a precondition to share data for enterprises. Highly-efficient data updating can both save system expenses, and offer real-time data. It is one of the hot issues to modify data rapidly in the pre-processing area of the data warehouse. An extract transform loading design is proposed based on a new data algorithm called Diff-Match,which is developed by utilizing mode matching and data-filtering technology. It can accelerate data renewal, filter the heterogeneous data, and seek out different sets of data. Its efficiency has been proved by its successful application in an enterprise of electric apparatus groups.
文摘Product data management (PDM) has been accepted as an important tool for the manufacturing industries. In recent years, more and mor e researches have been conducted in the development of PDM. Their research area s include system design, integration of object-oriented technology, data distri bution, collaborative and distributed manufacturing working environment, secur ity, and web-based integration. However, there are limitations on their rese arches. In particular, they cannot cater for PDM in distributed manufacturing e nvironment. This is especially true in South China, where many Hong Kong (HK) ma nufacturers have moved their production plants to different locations in Pearl R iver Delta for cost reduction. However, they retain their main offices in HK. Development of PDM system is inherently complex. Product related data cover prod uct name, product part number (product identification), drawings, material speci fications, dimension requirement, quality specification, test result, log size, production schedules, product data version and date of release, special tooling (e.g. jig and fixture), mould design, project engineering in charge, cost spread sheets, while process data includes engineering release, engineering change info rmation management, and other workflow related to the process information. Accor ding to Cornelissen et al., the contemporary PDM system should contains manageme nt functions in structure, retrieval, release, change, and workflow. In system design, development and implementation, a formal specification is nece ssary. However, there is no formal representation model for PDM system. Theref ore a graphical representation model is constructed to express the various scena rios of interactions between users and the PDM system. Statechart is then used to model the operations of PDM system, Fig.1. Statechart model bridges the curr ent gap between requirements, scenarios, and the initial design specifications o f PDM system. After properly analyzing the PDM system, a new distributed PDM (DPDM) system is proposed. Both graphical representation and statechart models are constructed f or the new DPDM system, Fig.2. New product data of DPDM and new system function s are then investigated to support product information flow in the new distribut ed environment. It is found that statecharts allow formal representations to capture the informa tion and control flows of both PDM and DPDM. In particular, statechart offers a dditional expressive power, when compared to conventional state transition diagr am, in terms of hierarchy, concurrency, history, and timing for DPDM behavioral modeling.
文摘Recently, researches on distributed data mining by making use of grid are in trend. This paper introduces a data mining algorithm by means of distributed decision-tree,which has taken the advantage of conveniences and services supplied by the computing platform-grid,and can perform a data mining of distributed classification on grid.
基金supported by the Special Scientific Research Fund of Public Welfare Profession of Ministry of Land and Resources of the People’s Republic of China (No. 201011057)
文摘1 Introduction Geochemical mapping at national and continental scales continues to present challenges worldwide due to variations in geologic and geotectonic units.Use of the proper sampling media can provide rich information on
基金supported by the National Key Research and Development Program[2020YFB1006302].
文摘An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task.However,these models sample a large number of negative entities and negative relations during the model training,which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance.In order to address the above issues,we propose a two-phase paradigm for the span-based joint entity and relation extraction,which involves classifying the entities and relations in the first phase,and predicting the types of these entities and relations in the second phase.The two-phase paradigm enables our model to significantly reduce the data distribution gap,including the gap between negative entities and other entities,aswell as the gap between negative relations and other relations.In addition,we make the first attempt at combining entity type and entity distance as global features,which has proven effective,especially for the relation extraction.Experimental results on several datasets demonstrate that the span-based joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-ofthe-art span-based models for the joint extraction task,establishing a new standard benchmark.Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.
文摘The advent of Big Data has led to the rapid growth in the usage of parallel clustering algorithms that work over distributed computing frameworks such as MPI,MapReduce,and Spark.An important step for any parallel clustering algorithm is the distribution of data amongst the cluster nodes.This step governs the methodology and performance of the entire algorithm.Researchers typically use random,or a spatial/geometric distribution strategy like kd-tree based partitioning and grid-based partitioning,as per the requirements of the algorithm.However,these strategies are generic and are not tailor-made for any specific parallel clustering algorithm.In this paper,we give a very comprehensive literature survey of MPI-based parallel clustering algorithms with special reference to the specific data distribution strategies they employ.We also propose three new data distribution strategies namely Parameterized Dimensional Split for parallel density-based clustering algorithms like DBSCAN and OPTICS,Cell-Based Dimensional Split for dGridSLINK,which is a grid-based hierarchical clustering algorithm that exhibits efficiency for disjoint spatial distribution,and Projection-Based Split,which is a generic distribution strategy.All of these preserve spatial locality,achieve disjoint partitioning,and ensure good data load balancing.The experimental analysis shows the benefits of using the proposed data distribution strategies for algorithms they are designed for,based on which we give appropriate recommendations for their usage.
基金Supported by the National Natural Science Foundation of China(61374140)Shanghai Postdoctoral Sustentation Fund(12R21412600)+1 种基金the Fundamental Research Funds for the Central Universities(WH1214039)Shanghai Pujiang Program(12PJ1402200)
文摘Complex industrial processes often have multiple operating modes and present time-varying behavior. The data in one mode may follow specific Gaussian or non-Gaussian distributions. In this paper, a numerically efficient movingwindow local outlier probability algorithm is proposed, lies key feature is the capability to handle complex data distributions and incursive operating condition changes including slow dynamic variations and instant mode shifts. First, a two-step adaption approach is introduced and some designed updating rules are applied to keep the monitoring model up-to-date. Then, a semi-supervised monitoring strategy is developed with an updating switch rule to deal with mode changes. Based on local probability models, the algorithm has a superior ability in detecting faulty conditions and fast adapting to slow variations and new operating modes. Finally, the utility of the proposed method is demonstrated with a numerical example and a non-isothermal continuous stirred tank reactor.
文摘Various application domains require the integration of distributed real-time or near-real-time systems with non-real-time systems.Smart cities,smart homes,ambient intelligent systems,or network-centric defense systems are among these application domains.Data Distribution Service(DDS)is a communication mechanism based on Data-Centric Publish-Subscribe(DCPS)model.It is used for distributed systems with real-time operational constraints.Java Message Service(JMS)is a messaging standard for enterprise systems using Service Oriented Architecture(SOA)for non-real-time operations.JMS allows Java programs to exchange messages in a loosely coupled fashion.JMS also supports sending and receiving messages using a messaging queue and a publish-subscribe interface.In this article,we propose an architecture enabling the automated integration of distributed real-time and non-real-time systems.We test our proposed architecture using a distributed Command,Control,Communications,Computers,and Intelligence(C4I)system.The system has DDS-based real-time Combat Management System components deployed to naval warships,and SOA-based non-real-time Command and Control components used at headquarters.The proposed solution enables the exchange of data between these two systems efficiently.We compare the proposed solution with a similar study.Our solution is superior in terms of automation support,ease of implementation,scalability,and performance.
文摘As a fundamental operation in ad hoc networks,broadcast could achieve efficient message propagations.Particularl y in the cognitive radio ad hoc network where unlicensed users have different sets of available channels,broadcasts are carried out on multiple channels.Accordingly,channel selection and collision avoidance are challenging issues to balance the efficiency against the reliability of broadcasting.In this paper,an anticollision selective broadcast protocol,called acSB,is proposed.A channel selection algorithm based on limited neighbor information is considered to maximize success rates of transmissions once the sender and receiver have the same channel.Moreover,an anticollision scheme is adopted to avoid simultaneous rebroadcasts.Consequently,the proposed broadcast acSB outperforms other approaches in terms of smaller transmission delay,higher message reach rate and fewer broadcast collisions evaluated by simulations under different scenarios.
基金supported by the[National Key Research and Development Program]under Grant[number 2019YFE0126700][Shandong Provincial Natural Science Foundation]under Grant[number ZR2020QD018].
文摘Spectral clustering is a well-regarded subspace clustering algorithm that exhibits outstanding performance in hyperspectral image classification through eigenvalue decomposition of the Laplacian matrix.However,its classification accuracy is severely limited by the selected eigenvectors,and the commonly used eigenvectors not only fail to guarantee the inclusion of detailed discriminative information,but also have high computational complexity.To address these challenges,we proposed an intuitive eigenvector selection method based on the coincidence degree of data distribution(CDES).First,the clustering result of improved k-means,which can well reflect the spatial distribution of various types was used as the reference map.Then,the adjusted Rand index and adjusted mutual information were calculated to assess the data distribution consistency between each eigenvector and the reference map.Finally,the eigenvectors with high coincidence degrees were selected for clustering.A case study on hyperspectral mineral mapping demonstrated that the mapping accuracies of CDES are approximately 56.3%,15.5%,and 10.5%higher than those of the commonly used top,high entropy,and high relevance eigenvectors,and CDES can save more than 99%of the eigenvector selection time.Especially,due to the unsupervised nature of k-means,CDES provides a novel solution for autonomous feature selection of hyperspectral images.