An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advan...An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.展开更多
The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things...The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.展开更多
Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage techn...Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.展开更多
Numerical mechanical models used for design of structures and processes are very complex and high-dimensionally parametrised.The understanding of the model characteristics is of interest for engineering tasks and subs...Numerical mechanical models used for design of structures and processes are very complex and high-dimensionally parametrised.The understanding of the model characteristics is of interest for engineering tasks and subsequently for an efficient design.Multiple analysis methods are known and available to gain insight into existing models.In this contribution,selected methods from various fields are applied to a real world mechanical engineering example of a currently developed clinching process.The selection of introduced methods comprises techniques of machine learning and data mining,in which the utilization is aiming at a decreased numerical effort.The methods of choice are basically discussed and references are given as well as challenges in the context of meta-modelling and sensitivities are shown.An incremental knowledge gain is provided by a step-bystep application of the numerical methods,whereas resulting consequences for further applications are highlighted.Furthermore,a visualisation method aiming at an easy design guideline is proposed.These visual decision maps incorporate the uncertainty coming from the reduction of dimensionality and can be applied in early stage of design.展开更多
In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have...In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have been developed for privacy preserving data mining.This paper provided a wide survey of different privacy preserving data mining algorithms and analyzed the representative techniques for privacy preservation.The existing problems and directions for future research are also discussed.展开更多
Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer...Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.展开更多
The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technol...The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.展开更多
Lung cancer is a deadly disease, but there is a big chance for the patient to be cured if he or she is correctly diagnosed in early stage of his or her case. At a first glance, lung X-ray chest films being considered ...Lung cancer is a deadly disease, but there is a big chance for the patient to be cured if he or she is correctly diagnosed in early stage of his or her case. At a first glance, lung X-ray chest films being considered as the most reliable method in early detection of lung cancers, the serious mistake in some diagnosing cases giving bad results and causing the death, the computer aided diagnosis systems are necessary to support the medical staff to achieve high capability and effectiveness. Clinicians could predict patient’s behavior future and improve treatment programs by using data mining techniques and they can be better managing the health of patients today, in addition they do not become the problems of tomorrow. The lung cancer biological database which contains the medical images (chest X-ray) classifies the digital X-ray chest films into three categories: normal, benign and malignant. The normal ones are those characterizing a healthy patient (non nodules);, lung nodules can be either benign (non-cancerous) or malignant (cancer). Two steps are major in computer-aided diagnosis systems: pattern recognition approach, which is a combination of a feature extraction process and a classification process using neural network classifier.展开更多
It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practit...It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.展开更多
In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely f...In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.展开更多
Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a ...Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a vector of complex numbers on multicomputer system and give its computing times and its speedup in parallel environment supported by EXPRESS system on the multicomputer system which consists of four SGI workstations. Our analysis shows that the results is ideal and this scheme is suitable to multicomputer systems.展开更多
A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is stil...A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.展开更多
The performance of an optimized aerodynamic shape is further improved by a second-step optimization using the design knowledge discovered by a data mining technique based on Proper Orthogonal Decomposition(POD) in the...The performance of an optimized aerodynamic shape is further improved by a second-step optimization using the design knowledge discovered by a data mining technique based on Proper Orthogonal Decomposition(POD) in the present study. Data generated in the first-step optimization by using evolution algorithms is saved as the source data, among which the superior data with improved objectives and maintained constraints is chosen. Only the geometry components of the superior data are picked out and used for constructing the snapshots of POD. Geometry characteristics of the superior data illustrated by POD bases are the design knowledge, by which the second-step optimization can be rapidly achieved. The optimization methods are demonstrated by redesigning a transonic compressor rotor blade, NASA Rotor 37, in the study to maximize the peak adiabatic efficiency, while maintaining the total pressure ratio and mass flow rate.Firstly, the blade is redesigned by using a particle swarm optimization method, and the adiabatic efficiency is increased by 1.29%. Then, the second-step optimization is performed by using the design knowledge, and a 0.25% gain on the adiabatic efficiency is obtained. The results are presented and addressed in detail, demonstrating that geometry variations significantly change the pattern and strength of the shock wave in the blade passage. The former reduces the separation loss,while the latter reduces the shock loss, and both favor an increase of the adiabatic efficiency.展开更多
The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this pape...The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this paper,we propose a novel method for mobile device model recognition by using statistical information derived from large amounts of mobile network traffic data.Specifically,we create a Jaccardbased coefficient measure method to identify a proper keyword representing each mobile device model from massive unstructured textual HTTP access logs.To handle the large amount of traffic data generated from large mobile networks,this method is designed as a set of parallel algorithms,and is implemented through the MapReduce framework which is a distributed parallel programming model with proven low-cost and high-efficiency features.Evaluations using real data sets show that our method can accurately recognise mobile client models while meeting the scalability and producer-independency requirements of large mobile network operators.Results show that a 91.5% accuracy rate is achieved for recognising mobile client models from 2 billion records,which is dramatically higher than existing solutions.展开更多
The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic part...The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments.展开更多
With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Int...With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.展开更多
Due to the increasing availability and sophistication of data recording techniques, multiple information sources and distributed computing are becoming the important trends of modern information systems. Many applicat...Due to the increasing availability and sophistication of data recording techniques, multiple information sources and distributed computing are becoming the important trends of modern information systems. Many applications such as security informatics and social computing require a ubiquitous data analysis platform so that decisions can be made rapidly under distributed and dynamic system environments. Although data mining has now been popularly used to achieve such goals, building a data mining system is, however, a nontrivial task, which may require a complete understanding on numerous data mining techniques as well as solid programming skills. Employing agent techniques for data analysis thus becomes increasingly important, especially for users not familiar with engineering and computational sciences, to implement an effective ubiquitous mining platform. Such data mining agents should, in practice, be intelligent, complete, and compact. In this paper, we present an interactive data mining agent - OIDM (online interactive data mining), which provides three categories (classification, association analysis, and clustering) of data mining tools, and interacts with the user to facilitate the mining process. The interactive mining is accomplished through interviewing the user about the data mining task to gain efficient and intelligent data mining control. OIDM can help users find appropriate mining algorithms, refine and compare the mining process, and finally achieve the best mining results. Such interactive data mining agent techniques provide alternative solutions to rapidly deploy data mining techniques to broader areas of data intelligence and knowledge informaties.展开更多
Evolutionary computation (EC) has received significant attention in China during the last two decades. In this paper, we present an overview of the current state of this rapidly growing field in China. Chinese resea...Evolutionary computation (EC) has received significant attention in China during the last two decades. In this paper, we present an overview of the current state of this rapidly growing field in China. Chinese research in theoretical foundations of EC, EC-based optimization, EC-based data mining, and EC-based real-world applications are summarized.展开更多
The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. V...The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.展开更多
To realize nonferrous metals deposit mining remotely with mobile robot under unknown environment, parallel evolutionary computing and 3 tier load balance were proposed to overcome the efficiency problem of online evol...To realize nonferrous metals deposit mining remotely with mobile robot under unknown environment, parallel evolutionary computing and 3 tier load balance were proposed to overcome the efficiency problem of online evolutionary computing. A system of polar coordinates can be established on remote mining robot with the polar point of current position and the polar axis from the current point to goal point. With the polar coordinate system path planning of remote mining robot can be computed in a parallel way. From the results of simulations and analysis based on agent techniques, good computing quality can be guaranteed for remote mining robot, such as efficiency, optimization and robustness.展开更多
文摘An attempt has been made to develop a distributed software infrastructure model for onboard data fusion system simulation, which is also applied to netted radar systems, onboard distributed detection systems and advanced C3I systems. Two architectures are provided and verified: one is based on pure TCP/IP protocol and C/S model, and implemented with Winsock, the other is based on CORBA (common object request broker architecture). The performance of data fusion simulation system, i.e. reliability, flexibility and scalability, is improved and enhanced by two models. The study of them makes valuable explore on incorporating the distributed computation concepts into radar system simulation techniques.
文摘The rapid development of Internet of Things imposes new requirements on the data mining system, due to the weak capability of traditional distributed networking data mining. To meet the needs of the Internet of Things, this paper proposes a novel distributed data-mining model to realize the seamless access between cloud computing and distributed data mining. The model is based on the cloud computing architecture, which belongs to the type of incredible nodes.
文摘Objective: To discuss the clinical and imaging diagnostic rules of peripheral lung cancer by data mining technique, and to explore new ideas in the diagnosis of peripheral lung cancer, and to obtain early-stage technology and knowledge support of computer-aided detecting (CAD). Methods: 58 cases of peripheral lung cancer confirmed by clinical pathology were collected. The data were imported into the database after the standardization of the clinical and CT findings attributes were identified. The data was studied comparatively based on Association Rules (AR) of the knowledge discovery process and the Rough Set (RS) reduction algorithm and Genetic Algorithm(GA) of the generic data analysis tool (ROSETTA), respectively. Results: The genetic classification algorithm of ROSETTA generates 5 000 or so diagnosis rules. The RS reduction algorithm of Johnson's Algorithm generates 51 diagnosis rules and the AR algorithm generates 123 diagnosis rules. Three data mining methods basically consider gender, age, cough, location, lobulation sign, shape, ground-glass density attributes as the main basis for the diagnosis of peripheral lung cancer. Conclusion: These diagnosis rules for peripheral lung cancer with three data mining technology is same as clinical diagnostic rules, and these rules also can be used to build the knowledge base of expert system. This study demonstrated the potential values of data mining technology in clinical imaging diagnosis and differential diagnosis.
文摘Numerical mechanical models used for design of structures and processes are very complex and high-dimensionally parametrised.The understanding of the model characteristics is of interest for engineering tasks and subsequently for an efficient design.Multiple analysis methods are known and available to gain insight into existing models.In this contribution,selected methods from various fields are applied to a real world mechanical engineering example of a currently developed clinching process.The selection of introduced methods comprises techniques of machine learning and data mining,in which the utilization is aiming at a decreased numerical effort.The methods of choice are basically discussed and references are given as well as challenges in the context of meta-modelling and sensitivities are shown.An incremental knowledge gain is provided by a step-bystep application of the numerical methods,whereas resulting consequences for further applications are highlighted.Furthermore,a visualisation method aiming at an easy design guideline is proposed.These visual decision maps incorporate the uncertainty coming from the reduction of dimensionality and can be applied in early stage of design.
基金This work was supported by the National Social Science Foundation Project of China under Grant 16BTQ085.
文摘In recent years,with the explosive development in Internet,data storage and data processing technologies,privacy preservation has been one of the greater concerns in data mining.A number of methods and techniques have been developed for privacy preserving data mining.This paper provided a wide survey of different privacy preserving data mining algorithms and analyzed the representative techniques for privacy preservation.The existing problems and directions for future research are also discussed.
文摘Huge volume of structured and unstructured data which is called big data, nowadays, provides opportunities for companies especially those that use electronic commerce (e-commerce). The data is collected from customer’s internal processes, vendors, markets and business environment. This paper presents a data mining (DM) process for e-commerce including the three common algorithms: association, clustering and prediction. It also highlights some of the benefits of DM to e-commerce companies in terms of merchandise planning, sale forecasting, basket analysis, customer relationship management and market segmentation which can be achieved with the three data mining algorithms. The main aim of this paper is to review the application of data mining in e-commerce by focusing on structured and unstructured data collected thorough various resources and cloud computing services in order to justify the importance of data mining. Moreover, this study evaluates certain challenges of data mining like spider identification, data transformations and making data model comprehensible to business users. Other challenges which are supporting the slow changing dimensions of data, making the data transformation and model building accessible to business users are also evaluated. A clear guide to e-commerce companies sitting on huge volume of data to easily manipulate the data for business improvement which in return will place them highly competitive among their competitors is also provided in this paper.
文摘The big data cloud computing is a new computing mode,which integrates the distributed processing,the parallel processing,the network computing,the virtualization technology,the load balancing and other network technologies.Under the operation of the big data cloud computing system,the computing resources can be distributed in a resource pool composed of a large number of the computers,allowing users to connect with the remote computer systems according to their own data information needs.
文摘Lung cancer is a deadly disease, but there is a big chance for the patient to be cured if he or she is correctly diagnosed in early stage of his or her case. At a first glance, lung X-ray chest films being considered as the most reliable method in early detection of lung cancers, the serious mistake in some diagnosing cases giving bad results and causing the death, the computer aided diagnosis systems are necessary to support the medical staff to achieve high capability and effectiveness. Clinicians could predict patient’s behavior future and improve treatment programs by using data mining techniques and they can be better managing the health of patients today, in addition they do not become the problems of tomorrow. The lung cancer biological database which contains the medical images (chest X-ray) classifies the digital X-ray chest films into three categories: normal, benign and malignant. The normal ones are those characterizing a healthy patient (non nodules);, lung nodules can be either benign (non-cancerous) or malignant (cancer). Two steps are major in computer-aided diagnosis systems: pattern recognition approach, which is a combination of a feature extraction process and a classification process using neural network classifier.
文摘It is crucial,while using healthcare data,to assess the advantages of data privacy against the possible drawbacks.Data from several sources must be combined for use in many data mining applications.The medical practitioner may use the results of association rule mining performed on this aggregated data to better personalize patient care and implement preventive measures.Historically,numerous heuristics(e.g.,greedy search)and metaheuristics-based techniques(e.g.,evolutionary algorithm)have been created for the positive association rule in privacy preserving data mining(PPDM).When it comes to connecting seemingly unrelated diseases and drugs,negative association rules may be more informative than their positive counterparts.It is well-known that during negative association rules mining,a large number of uninteresting rules are formed,making this a difficult problem to tackle.In this research,we offer an adaptive method for negative association rule mining in vertically partitioned healthcare datasets that respects users’privacy.The applied approach dynamically determines the transactions to be interrupted for information hiding,as opposed to predefining them.This study introduces a novel method for addressing the problem of negative association rules in healthcare data mining,one that is based on the Tabu-genetic optimization paradigm.Tabu search is advantageous since it removes a huge number of unnecessary rules and item sets.Experiments using benchmark healthcare datasets prove that the discussed scheme outperforms state-of-the-art solutions in terms of decreasing side effects and data distortions,as measured by the indicator of hiding failure.
基金supported by the Major Scientific and Technological Projects of CNPC under Grant ZD2019-183-006partially supported by the Shandong Provincial Natural Science Foundation,China under Grant ZR2020MF006partially supported by“the Fundamental Research Funds for the Central Universities”of China University of Petroleum(East China)under Grant 20CX05017A,18CX02139A.
文摘In recent years,with the development of the social Internet of Things(IoT),all kinds of data accumulated on the network.These data,which contain a lot of social information and opinions.However,these data are rarely fully analyzed,which is a major obstacle to the intelligent development of the social IoT.In this paper,we propose a sentence similarity analysis model to analyze the similarity in people’s opinions on hot topics in social media and news pages.Most of these data are unstructured or semi-structured sentences,so the accuracy of sentence similarity analysis largely determines the model’s performance.For the purpose of improving accuracy,we propose a novel method of sentence similarity computation to extract the syntactic and semantic information of the semi-structured and unstructured sentences.We mainly consider the subjects,predicates and objects of sentence pairs and use Stanford Parser to classify the dependency relation triples to calculate the syntactic and semantic similarity between two sentences.Finally,we verify the performance of the model with the Microsoft Research Paraphrase Corpus(MRPC),which consists of 4076 pairs of training sentences and 1725 pairs of test sentences,and most of the data came from the news of social data.Extensive simulations demonstrate that our method outperforms other state-of-the-art methods regarding the correlation coefficient and the mean deviation.
文摘Multicomputer systems(distributed memory computer systems) are becoming more and more popular and will be wildly used in scientific researches. In this paper, we present a parallel algorithm of Fourier Transform of a vector of complex numbers on multicomputer system and give its computing times and its speedup in parallel environment supported by EXPRESS system on the multicomputer system which consists of four SGI workstations. Our analysis shows that the results is ideal and this scheme is suitable to multicomputer systems.
文摘A new era of data access and management has begun with the use of cloud computing in the healthcare industry.Despite the efficiency and scalability that the cloud provides, the security of private patient data is still a majorconcern. Encryption, network security, and adherence to data protection laws are key to ensuring the confidentialityand integrity of healthcare data in the cloud. The computational overhead of encryption technologies could leadto delays in data access and processing rates. To address these challenges, we introduced the Enhanced ParallelMulti-Key Encryption Algorithm (EPM-KEA), aiming to bolster healthcare data security and facilitate the securestorage of critical patient records in the cloud. The data was gathered from two categories Authorization forHospital Admission (AIH) and Authorization for High Complexity Operations.We use Z-score normalization forpreprocessing. The primary goal of implementing encryption techniques is to secure and store massive amountsof data on the cloud. It is feasible that cloud storage alternatives for protecting healthcare data will become morewidely available if security issues can be successfully fixed. As a result of our analysis using specific parametersincluding Execution time (42%), Encryption time (45%), Decryption time (40%), Security level (97%), and Energyconsumption (53%), the system demonstrated favorable performance when compared to the traditional method.This suggests that by addressing these security concerns, there is the potential for broader accessibility to cloudstorage solutions for safeguarding healthcare data.
基金supported by the National Natural Science Foundation of China(Nos.51676003,51206003,and 11702305)
文摘The performance of an optimized aerodynamic shape is further improved by a second-step optimization using the design knowledge discovered by a data mining technique based on Proper Orthogonal Decomposition(POD) in the present study. Data generated in the first-step optimization by using evolution algorithms is saved as the source data, among which the superior data with improved objectives and maintained constraints is chosen. Only the geometry components of the superior data are picked out and used for constructing the snapshots of POD. Geometry characteristics of the superior data illustrated by POD bases are the design knowledge, by which the second-step optimization can be rapidly achieved. The optimization methods are demonstrated by redesigning a transonic compressor rotor blade, NASA Rotor 37, in the study to maximize the peak adiabatic efficiency, while maintaining the total pressure ratio and mass flow rate.Firstly, the blade is redesigned by using a particle swarm optimization method, and the adiabatic efficiency is increased by 1.29%. Then, the second-step optimization is performed by using the design knowledge, and a 0.25% gain on the adiabatic efficiency is obtained. The results are presented and addressed in detail, demonstrating that geometry variations significantly change the pattern and strength of the shock wave in the blade passage. The former reduces the separation loss,while the latter reduces the shock loss, and both favor an increase of the adiabatic efficiency.
基金supported in part by the National Natural Science Foundation of China under Grant No.61072061the National Science and Technology Major Projects under Grant No.2012ZX03002008the Fundamental Research Funds for the Central Universities under Grant No.2012RC0121
文摘The ability of accurate and scalable mobile device recognition is critically important for mobile network operators and ISPs to understand their customers' behaviours and enhance their user experience.In this paper,we propose a novel method for mobile device model recognition by using statistical information derived from large amounts of mobile network traffic data.Specifically,we create a Jaccardbased coefficient measure method to identify a proper keyword representing each mobile device model from massive unstructured textual HTTP access logs.To handle the large amount of traffic data generated from large mobile networks,this method is designed as a set of parallel algorithms,and is implemented through the MapReduce framework which is a distributed parallel programming model with proven low-cost and high-efficiency features.Evaluations using real data sets show that our method can accurately recognise mobile client models while meeting the scalability and producer-independency requirements of large mobile network operators.Results show that a 91.5% accuracy rate is achieved for recognising mobile client models from 2 billion records,which is dramatically higher than existing solutions.
文摘The gravity gradient is a secondary derivative of gravity potential,containing more high-frequency information of Earth’s gravity field.Gravity gradient observation data require deducting its prior and intrinsic parts to obtain more variational information.A model generated from a topographic surface database is more appropriate to represent gradiometric effects derived from near-surface mass,as other kinds of data can hardly reach the spatial resolution requirement.The rectangle prism method,namely an analytic integration of Newtonian potential integrals,is a reliable and commonly used approach to modeling gravity gradient,whereas its computing efficiency is extremely low.A modified rectangle prism method and a graphical processing unit(GPU)parallel algorithm were proposed to speed up the modeling process.The modified method avoided massive redundant computations by deforming formulas according to the symmetries of prisms’integral regions,and the proposed algorithm parallelized this method’s computing process.The parallel algorithm was compared with a conventional serial algorithm using 100 elevation data in two topographic areas(rough and moderate terrain).Modeling differences between the two algorithms were less than 0.1 E,which is attributed to precision differences between single-precision and double-precision float numbers.The parallel algorithm showed computational efficiency approximately 200 times higher than the serial algorithm in experiments,demonstrating its effective speeding up in the modeling process.Further analysis indicates that both the modified method and computational parallelism through GPU contributed to the proposed algorithm’s performances in experiments.
基金the National Key Basic Research and Development (973) Program of China (Nos. 2012CB315801 and 2011CB302805)the National Natural Science Foundation of China (Nos. 61161140320 and 61233016)Intel Research Council with the title of Security Vulnerability Analysis based on Cloud Platform with Intel IA Architecture
文摘With the explosive increase in mobile apps, more and more threats migrate from traditional PC client to mobile device. Compared with traditional Win+Intel alliance in PC, Android+ARM alliance dominates in Mobile Internet, the apps replace the PC client software as the major target of malicious usage. In this paper, to improve the security status of current mobile apps, we propose a methodology to evaluate mobile apps based on cloud computing platform and data mining. We also present a prototype system named MobSafe to identify the mobile app's virulence or benignancy. Compared with traditional method, such as permission pattern based method, MobSafe combines the dynamic and static analysis methods to comprehensively evaluate an Android app. In the implementation, we adopt Android Security Evaluation Framework (ASEF) and Static Android Analysis Framework (SAAF), the two representative dynamic and static analysis methods, to evaluate the Android apps and estimate the total time needed to evaluate all the apps stored in one mobile app market. Based on the real trace from a commercial mobile app market called AppChina, we can collect the statistics of the number of active Android apps, the average number apps installed in one Android device, and the expanding ratio of mobile apps. As mobile app market serves as the main line of defence against mobile malwares, our evaluation results show that it is practical to use cloud computing platform and data mining to verify all stored apps routinely to filter out malware apps from mobile app markets. As the future work, MobSafe can extensively use machine learning to conduct automotive forensic analysis of mobile apps based on the generated multifaceted data in this stage.
基金supported by the National Basic Research 973 Program of China under Grant No. 2009CB326203the National Natural Science Foundation of China under Grant Nos. 60828005 and 60674109the Chinese Academy of Sciences under International Partnership Grant No. 2F05N01
文摘Due to the increasing availability and sophistication of data recording techniques, multiple information sources and distributed computing are becoming the important trends of modern information systems. Many applications such as security informatics and social computing require a ubiquitous data analysis platform so that decisions can be made rapidly under distributed and dynamic system environments. Although data mining has now been popularly used to achieve such goals, building a data mining system is, however, a nontrivial task, which may require a complete understanding on numerous data mining techniques as well as solid programming skills. Employing agent techniques for data analysis thus becomes increasingly important, especially for users not familiar with engineering and computational sciences, to implement an effective ubiquitous mining platform. Such data mining agents should, in practice, be intelligent, complete, and compact. In this paper, we present an interactive data mining agent - OIDM (online interactive data mining), which provides three categories (classification, association analysis, and clustering) of data mining tools, and interacts with the user to facilitate the mining process. The interactive mining is accomplished through interviewing the user about the data mining task to gain efficient and intelligent data mining control. OIDM can help users find appropriate mining algorithms, refine and compare the mining process, and finally achieve the best mining results. Such interactive data mining agent techniques provide alternative solutions to rapidly deploy data mining techniques to broader areas of data intelligence and knowledge informaties.
文摘Evolutionary computation (EC) has received significant attention in China during the last two decades. In this paper, we present an overview of the current state of this rapidly growing field in China. Chinese research in theoretical foundations of EC, EC-based optimization, EC-based data mining, and EC-based real-world applications are summarized.
基金supported by the National Natural Science Eoundation of China under Grant No.40221503the China National Key Programme for Development Basic Sciences (Abbreviation:973 Project,Grant No.G1999032801)
文摘The Spectral Statistical Interpolation (SSI) analysis system of NCEP is used to assimilate meteorological data from the Global Positioning Satellite System (GPS/MET) refraction angles with the variational technique. Verified by radiosonde, including GPS/MET observations into the analysis makes an overall improvement to the analysis variables of temperature, winds, and water vapor. However, the variational model with the ray-tracing method is quite expensive for numerical weather prediction and climate research. For example, about 4 000 GPS/MET refraction angles need to be assimilated to produce an ideal global analysis. Just one iteration of minimization will take more than 24 hours CPU time on the NCEP's Cray C90 computer. Although efforts have been taken to reduce the computational cost, it is still prohibitive for operational data assimilation. In this paper, a parallel version of the three-dimensional variational data assimilation model of GPS/MET occultation measurement suitable for massive parallel processors architectures is developed. The divide-and-conquer strategy is used to achieve parallelism and is implemented by message passing. The authors present the principles for the code's design and examine the performance on the state-of-the-art parallel computers in China. The results show that this parallel model scales favorably as the number of processors is increased. With the Memory-IO technique implemented by the author, the wall clock time per iteration used for assimilating 1420 refraction angles is reduced from 45 s to 12 s using 1420 processors. This suggests that the new parallelized code has the potential to be useful in numerical weather prediction (NWP) and climate studies.
文摘To realize nonferrous metals deposit mining remotely with mobile robot under unknown environment, parallel evolutionary computing and 3 tier load balance were proposed to overcome the efficiency problem of online evolutionary computing. A system of polar coordinates can be established on remote mining robot with the polar point of current position and the polar axis from the current point to goal point. With the polar coordinate system path planning of remote mining robot can be computed in a parallel way. From the results of simulations and analysis based on agent techniques, good computing quality can be guaranteed for remote mining robot, such as efficiency, optimization and robustness.