To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,t...To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.展开更多
As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thi...As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.展开更多
An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part ...An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.展开更多
The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By takin...The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.展开更多
A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data...A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.展开更多
Creating and rendering intermediate geometric primitives is one of the approaches to visualize data sets in 3D space. Some algorithms have been developed to construct isosurface from uniformly distributed 3D data sets...Creating and rendering intermediate geometric primitives is one of the approaches to visualize data sets in 3D space. Some algorithms have been developed to construct isosurface from uniformly distributed 3D data sets. These algorithms assume that the function value varies linearly along edges of each cell. But to irregular 3D data sets, this assumption is inapplicable. Moreover, the depth sorting of cells is more complicated for irregular data sets, which is indispensable for generating isosurface images or semitransparent isosurface images, if Z-buffer method is not adopted.In this paper, isosurface models based on the assumption that the function value has nonlinear distribution within a tetrahedroll are proposed. The depth sorting algorithm and data structures are developed for the irregular data sets in which cells may be subdivided into tetrahedra. The implementation issues of this algorithm are discussed and experimental results are shown to illustrate potentials of this technique.展开更多
In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponent...In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponential distribution. We assume that this lifetime data may be reported imprecisely and that this lack of precision may be described using fuzzy sets. As the direct application of the fuzzy sets methodology leads in this case to very complicated and time consuming calculations, we propose simple approximations of fuzzy numbers using shadowed sets introduced by Pedrycz (1998). The proposed methodology may be simply extended to the case of general lifetime probability distributions.展开更多
Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP...Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.展开更多
Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyp...Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyperelastic-magnetic coupling problems is proposed.Since the solution space consists of two-phase domains,two separate networks are constructed to independently predict the solution for each phase region.In addition,a conscious point allocation strategy is incorporated to enhance the prediction precision of the PINN in regions characterized by sharp gradients.With the developed framework,the magnetic fields and deformation fields of magnetorheological elastomers(MREs)are solved under the control of hyperelastic-magnetic coupling equations.Illustrative examples are provided and contrasted with the reference results to validate the predictive accuracy of the proposed framework.Moreover,the advantages of the proposed framework in solving hyperelastic-magnetic coupling problems are validated,particularly in handling small data sets,as well as its ability in swiftly and precisely forecasting magnetostrictive motion.展开更多
With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has be...With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.展开更多
To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different typ...To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different types and proportions of data noise are added to two reference data sets, Cifar-10 and Cifar-100. Then, this data containing noise is used to train deep convolutional models and classify the validation data set. The experimental results show that the noise in the data set has obvious adverse effects on deep convolutional network classification models. The adverse effects of random noise are small, but the cross-category noise among categories can significantly reduce the recognition ability of the model. Therefore, a solution is proposed to improve the quality of the data sets that are mixed into a single noise category. The model trained with a data set containing noise is used to evaluate the current training data and reclassify the categories of the anomalies to form a new data set. Repeating the above steps can greatly reduce the noise ratio, so the influence of cross-category noise can be effectively avoided.展开更多
The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for ...The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.展开更多
The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is conside...The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.展开更多
Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be creat...Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be created,deployed,and managed without worrying about the underlying infrastructure.These applications and their persisted data on one PaaS provider are not easy to port to another provider.To overcome this issue,we propose a middleware to abstract and make the database services as cloud-agnostic.The middleware supports several SQL and NoSQL data stores that can be hosted and ported among disparate PaaS providers.It facilitates the developers with data portability and data migration among relational and NoSQL-based cloud databases.NoSQL databases are fundamental to endure Big Data applications as they support the handling of an enormous volume of highly variable data while assuring fault tolerance,availability,and scalability.The implementation of the middleware depicts that using it alleviates the efforts of rewriting the application code while changing the backend database system.A working protocol of a migration tool has been developed using this middleware to facilitate the migration of the database(move existing data from a database on one cloud to a new database even on a different cloud).Although the middleware adds some overhead compared to the native code for the cloud services being used,the experimental evaluation on Twitter(a Big Data application)data set,proves this overhead is negligible.展开更多
In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI...In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.展开更多
Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting ins...Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.展开更多
The substantial vision loss due to Diabetic Retinopathy(DR)mainly damages the blood vessels of the retina.These feature changes in the blood vessels fail to exist any manifestation in the eye at its initial stage,if t...The substantial vision loss due to Diabetic Retinopathy(DR)mainly damages the blood vessels of the retina.These feature changes in the blood vessels fail to exist any manifestation in the eye at its initial stage,if this problem doesn’t exhibit initially,that leads to permanent blindness.So,this type of disorder can be only screened and identified through the processing of fundus images.The different stages in DR are Micro aneurysms(Ma),Hemorrhages(HE),and Exudates,and the stages in lesion show the chance of DR.For the advancement of early detection of DR in the eye we have developed the CNN-based identification approach on the fundus blood lesion image.The CNN-based automated detection of DR proposes the novel Graph cutter-built background and foreground superpixel segmentation technique and the foremost classification of fundus images feature was done through hybrid classifiers as K-Nearest Neighbor(KNN)classifier,Support Vector Machine(SVM)classifier,and Cascaded Rotation Forest(CRF)classifier.Over this classifier,the feature cross-validation made the classification more accurate and the comparison is made with the previous works of parameters such as specificity,sensitivity,and accuracy shows that the hybrid classifier attains excellent performance and achieves an overall accuracy of 98%.Among these Cascaded Rotation Forest(CRF)classifier has more accuracy than others.展开更多
The high energetic particle package(HEPP) on-board the China Seismo-Electromagnetic Satellite(CSES) was launched on February 2, 2018. This package includes three independent detectors: HEPP-H, HEPP-L, and HEPP-X. HEPP...The high energetic particle package(HEPP) on-board the China Seismo-Electromagnetic Satellite(CSES) was launched on February 2, 2018. This package includes three independent detectors: HEPP-H, HEPP-L, and HEPP-X. HEPP-H and HEPP-L can detect energetic electrons from 100 keV to approximately 50 MeV and protons from 2 MeV to approximately 200 MeV. HEPP-X can measure solar X-rays in the energy range from 1 keV to approximately 20 keV. The objective of the HEPP payload was to provide a survey of energetic particles with high energy, pitch angle, and time resolutions in order to gain new insight into the space radiation environments of the near-Earth system. Particularly, the HEPP can provide new measurements of the magnetic storm related precipitation of electrons in the slot region, and the dynamics of radiation belts. In this paper, the HEPP scientific data sets are described and initial results are provided.The scientific data can show variations in the flux of energetic particles during magnetic storms.展开更多
To improve the detection accuracy of the balise uplink signal transmitted in a strong noise environment,we use chaotic oscillator to detect the balise uplink signal based on the characteristics of the chaotic system t...To improve the detection accuracy of the balise uplink signal transmitted in a strong noise environment,we use chaotic oscillator to detect the balise uplink signal based on the characteristics of the chaotic system that is sensitive to initial conditions and immune to noise.Combining with the principle of Duffing oscillator system used in weak signal detection and uplink signal feature,the methods and steps of using Duffing oscillator to detect the balise signal are presented.Furthermore,the Lyapunov exponent algorithm is used to calculate the critical threshold of the Duffing oscillator detection system.Thus,the output states of the system can be quantitatively judged to achieve demodulation of the balise signal.The simulation results show that the chaotic oscillator detection method for balise signal based on Lyapunov exponent algorithm not only improves the accuracy and efficiency of threshold setting,but also ensures the reliability of balise signal detection.展开更多
基金Microsoft Research Asia Internet Services in Academic Research Fund(No.FY07-RES-OPP-116)the Science and Technology Development Program of Tianjin(No.06YFGZGX05900)
文摘To improve question answering (QA) performance based on real-world web data sets,a new set of question classes and a general answer re-ranking model are defined.With pre-defined dictionary and grammatical analysis,the question classifier draws both semantic and grammatical information into information retrieval and machine learning methods in the form of various training features,including the question word,the main verb of the question,the dependency structure,the position of the main auxiliary verb,the main noun of the question,the top hypernym of the main noun,etc.Then the QA query results are re-ranked by question class information.Experiments show that the questions in real-world web data sets can be accurately classified by the classifier,and the QA results after re-ranking can be obviously improved.It is proved that with both semantic and grammatical information,applications such as QA, built upon real-world web data sets, can be improved,thus showing better performance.
基金The National Natural Science Foundation of China under contract Nos 40576080 and 40506036 the National"863" Project of China under contract No 2007AA12Z182
文摘As for the satellite remote sensing data obtained by the visible and infrared bands myers,on, the clouds coverage in the sky over the ocean often results in missing data of inversion products on a large scale, and thin clouds difficult to be detected would cause the data of the inversion products to be abnormal. Alvera et a1.(2005) proposed a method for the reconstruction of missing data based on an Empirical Orthogonal Functions (EOF) decomposition, but his method couldn't process these images presenting extreme cloud coverage(more than 95%), and required a long time for recon- struction. Besides, the abnormal data in the images had a great effect on the reconstruction result. Therefore, this paper tries to improve the study result. It has reconstructed missing data sets by twice applying EOF decomposition method. Firstly, the abnormity time has been detected by analyzing the temporal modes of EOF decomposition, and the abnormal data have been eliminated. Secondly, the data sets, excluding the abnormal data, are analyzed by using EOF decomposition, and then the temporal modes undergo a filtering process so as to enhance the ability of reconstruct- ing the images which are of no or just a little data, by using EOF. At last, this method has been applied to a large data set, i.e. 43 Sea Surface Temperature (SST) satellite images of the Changjiang River (Yangtze River) estuary and its adjacent areas, and the total reconstruction root mean square error (RMSE) is 0.82℃. And it has been proved that this improved EOF reconstruction method is robust for reconstructing satellite missing data and unreliable data.
基金Sponsored by the National Basic Research Program of China(Grant No. 2006CB403302)the National Education Ministry foundation of China(Grant No.705011)the National Special Science and Technology Program Water Pollution Control and Treatment (Grant No.2009ZX07526-006,2008AX07208-001)
文摘An attempt of applying a novel genetic programming(GP) technique,a new member of evolution algorithms,has been made to predict the water storage of Wolonghu wetland response to the climate change in northeastern part of China with little data set.Fourteen years(1993-2006) of annual water storage and climatic data set of the wetland were taken for model training and testing.The results of simulations and predictions illustrated a good fit between calculated water storage and observed values(MAPE=9.47,r=0.99).By comparison,a multilayer perceptron(MLP)(a popular artificial neural network model) method and a grey model(GM) with the same data set were applied for performances estimation.It was found that GP technique had better performances than the other two methods both in the simulation step and predicting phase and the results were analyzed and discussed.The case study confirmed that GP method is a promising way for wetland managers to make a quick estimation of fluctuations of water storage in some wetlands under condition of little data set.
文摘The main goal of this research is to assess the impact of race, age at diagnosis, sex, and phenotype on the incidence and survivability of acute lymphocytic leukemia (ALL) among patients in the United States. By taking these factors into account, the study aims to explore how existing cancer registry data can aid in the early detection and effective treatment of ALL in patients. Our hypothesis was that statistically significant correlations exist between race, age at which patients were diagnosed, sex, and phenotype of the ALL patients, and their rate of incidence and survivability data were evaluated using SEER*Stat statistical software from National Cancer Institute. Analysis of the incidence data revealed that a higher prevalence of ALL was among the Caucasian population. The majority of ALL cases (59%) occurred in patients aged between 0 to 19 years at the time of diagnosis, and 56% of the affected individuals were male. The B-cell phenotype was predominantly associated with ALL cases (73%). When analyzing survivability data, it was observed that the 5-year survival rates slightly exceeded the 10-year survival rates for the respective demographics. Survivability rates of African Americans patients were the lowest compared to Caucasian, Asian, Pacific Islanders, Alaskan Native, Native Americans and others. Survivability rates progressively decreased for older patients. Moreover, this study investigated the typical treatment methods applied to ALL patients, mainly comprising chemotherapy, with occasional supplementation of radiation therapy as required. The study demonstrated the considerable efficacy of chemotherapy in enhancing patients’ chances of survival, while those who remained untreated faced a less favorable prognosis from the disease. Although a significant amount of data and information exists, this study can help doctors in the future by diagnosing patients with certain characteristics. It will further assist the health care professionals in screening potential patients and early detection of cases. This could also save the lives of elderly patients who have a higher mortality rate from this disease.
文摘A co-location pattern is a set of spatial features whose instances frequently appear in a spatial neighborhood. This paper efficiently mines the top-k probabilistic prevalent co-locations over spatially uncertain data sets and makes the following contributions: 1) the concept of the top-k prob- abilistic prevalent co-locations based on a possible world model is defined; 2) a framework for discovering the top- k probabilistic prevalent co-locations is set up; 3) a matrix method is proposed to improve the computation of the preva- lence probability of a top-k candidate, and two pruning rules of the matrix block are given to accelerate the search for ex- act solutions; 4) a polynomial matrix is developed to further speed up the top-k candidate refinement process; 5) an ap- proximate algorithm with compensation factor is introduced so that relatively large quantity of data can be processed quickly. The efficiency of our proposed algorithms as well as the accuracy of the approximation algorithms is evaluated with an extensive set of experiments using both synthetic and real uncertain data sets.
文摘Creating and rendering intermediate geometric primitives is one of the approaches to visualize data sets in 3D space. Some algorithms have been developed to construct isosurface from uniformly distributed 3D data sets. These algorithms assume that the function value varies linearly along edges of each cell. But to irregular 3D data sets, this assumption is inapplicable. Moreover, the depth sorting of cells is more complicated for irregular data sets, which is indispensable for generating isosurface images or semitransparent isosurface images, if Z-buffer method is not adopted.In this paper, isosurface models based on the assumption that the function value has nonlinear distribution within a tetrahedroll are proposed. The depth sorting algorithm and data structures are developed for the irregular data sets in which cells may be subdivided into tetrahedra. The implementation issues of this algorithm are discussed and experimental results are shown to illustrate potentials of this technique.
文摘In this paper, we consider the problem of the evaluation of system reliability using statistical data obtained from reliability tests of its elements, in which the lifetimes of elements are described using an exponential distribution. We assume that this lifetime data may be reported imprecisely and that this lack of precision may be described using fuzzy sets. As the direct application of the fuzzy sets methodology leads in this case to very complicated and time consuming calculations, we propose simple approximations of fuzzy numbers using shadowed sets introduced by Pedrycz (1998). The proposed methodology may be simply extended to the case of general lifetime probability distributions.
基金Projects(60903082,60975042)supported by the National Natural Science Foundation of ChinaProject(20070217043)supported by the Research Fund for the Doctoral Program of Higher Education of China
文摘Many classical clustering algorithms do good jobs on their prerequisite but do not scale well when being applied to deal with very large data sets(VLDS).In this work,a novel division and partition clustering method(DP) was proposed to solve the problem.DP cut the source data set into data blocks,and extracted the eigenvector for each data block to form the local feature set.The local feature set was used in the second round of the characteristics polymerization process for the source data to find the global eigenvector.Ultimately according to the global eigenvector,the data set was assigned by criterion of minimum distance.The experimental results show that it is more robust than the conventional clusterings.Characteristics of not sensitive to data dimensions,distribution and number of nature clustering make it have a wide range of applications in clustering VLDS.
基金supported by the National Natural Science Foundation of China(Nos.12072105 and 11932006)。
文摘Recently,numerous studies have demonstrated that the physics-informed neural network(PINN)can effectively and accurately resolve hyperelastic finite deformation problems.In this paper,a PINN framework for tackling hyperelastic-magnetic coupling problems is proposed.Since the solution space consists of two-phase domains,two separate networks are constructed to independently predict the solution for each phase region.In addition,a conscious point allocation strategy is incorporated to enhance the prediction precision of the PINN in regions characterized by sharp gradients.With the developed framework,the magnetic fields and deformation fields of magnetorheological elastomers(MREs)are solved under the control of hyperelastic-magnetic coupling equations.Illustrative examples are provided and contrasted with the reference results to validate the predictive accuracy of the proposed framework.Moreover,the advantages of the proposed framework in solving hyperelastic-magnetic coupling problems are validated,particularly in handling small data sets,as well as its ability in swiftly and precisely forecasting magnetostrictive motion.
基金National Key Research and Development Program of China(2016YFE0122600)。
文摘With an increasing number of scientific achievements published,it is particularly important to conduct literature-based knowledge discovery and data mining.Flood,as one of the most destructive natural disasters,has been the subject of numerous scientific publications.On January 1,2018,we conducted literature data collection and processing on flood research and categorized the retrieved paper records into Whole SCI Dataset(WS)and High-Citation SCI Dataset(HCS).These data sets can serve as basic data for bibliometric analysis to identify the status of global flood research during 1990-2017.Our study shows that while the Chinese Academy of Sciences was the most productive institution during this period,the United States was the most productive country.Besides,our keyword analysis reveals the potential popular issues and future trends of flood research.
基金The Science and Technology R&D Fund Project of Shenzhen(No.JCYJ2017081765149850)
文摘To evaluate the influence of data set noise, the network in network(NIN) model is introduced and the negative effects of different types and proportions of noise on deep convolutional models are studied. Different types and proportions of data noise are added to two reference data sets, Cifar-10 and Cifar-100. Then, this data containing noise is used to train deep convolutional models and classify the validation data set. The experimental results show that the noise in the data set has obvious adverse effects on deep convolutional network classification models. The adverse effects of random noise are small, but the cross-category noise among categories can significantly reduce the recognition ability of the model. Therefore, a solution is proposed to improve the quality of the data sets that are mixed into a single noise category. The model trained with a data set containing noise is used to evaluate the current training data and reclassify the categories of the anomalies to form a new data set. Repeating the above steps can greatly reduce the noise ratio, so the influence of cross-category noise can be effectively avoided.
基金Supported by the Key Special Project for Introduced Talents Team of Southern Marine Science and Engineering Guangdong Laboratory(Guangzhou)(No.GML2019ZD0208)the National Natural Science Foundation of China(No.41606030)+1 种基金the Science and Technology Program of Guangzhou(No.202102080363)the China Geological Survey projects(Nos.DD20190212,DD20190216)。
文摘The Chaoshan depression,a Mesozoic basin in the Dongsha sea area,northern South China Sea,is characterized by well-preserved Mesozoic strata,being good conditions for oil-gas preservation,promising good prospects for oil-gas exploration.However,breakthrough in oil-gas exploration in the Mesozoic strata has not been achieved due to less seismic surveys.New long-off set seismic data were processed that acquired with dense grid with single source and single cable.In addition,the data were processed with 3D imaging method and fi ner processing was performed to highlight the target strata.Combining the new imaging result and other geological information,we conducted integrated interpretation and proposed an exploratory well A-1-1 for potential hydrocarbon.The result provides a reliable basis for achieving breakthroughs in oil and gas exploration in the Mesozoic strata in the northern South China Sea.
基金supported by proposal No.OSD/BCUD/392/197 Board of Colleges and University Development,Savitribai Phule Pune University,Pune
文摘The rapid developments in the fields of telecommunication, sensor data, financial applications, analyzing of data streams, and so on, increase the rate of data arrival, among which the data mining technique is considered a vital process. The data analysis process consists of different tasks, among which the data stream classification approaches face more challenges than the other commonly used techniques. Even though the classification is a continuous process, it requires a design that can adapt the classification model so as to adjust the concept change or the boundary change between the classes. Hence, we design a novel fuzzy classifier known as THRFuzzy to classify new incoming data streams. Rough set theory along with tangential holoentropy function helps in the designing the dynamic classification model. The classification approach uses kernel fuzzy c-means(FCM) clustering for the generation of the rules and tangential holoentropy function to update the membership function. The performance of the proposed THRFuzzy method is verified using three datasets, namely skin segmentation, localization, and breast cancer datasets, and the evaluated metrics, accuracy and time, comparing its performance with HRFuzzy and adaptive k-NN classifiers. The experimental results conclude that THRFuzzy classifier shows better classification results providing a maximum accuracy consuming a minimal time than the existing classifiers.
文摘Vendor lock-in can occur at any layer of the cloud stack-Infrastructure,Platform,and Software-as-a-service.This paper covers the vendor lock-in issue at Platform as a Service(PaaS)level where applications can be created,deployed,and managed without worrying about the underlying infrastructure.These applications and their persisted data on one PaaS provider are not easy to port to another provider.To overcome this issue,we propose a middleware to abstract and make the database services as cloud-agnostic.The middleware supports several SQL and NoSQL data stores that can be hosted and ported among disparate PaaS providers.It facilitates the developers with data portability and data migration among relational and NoSQL-based cloud databases.NoSQL databases are fundamental to endure Big Data applications as they support the handling of an enormous volume of highly variable data while assuring fault tolerance,availability,and scalability.The implementation of the middleware depicts that using it alleviates the efforts of rewriting the application code while changing the backend database system.A working protocol of a migration tool has been developed using this middleware to facilitate the migration of the database(move existing data from a database on one cloud to a new database even on a different cloud).Although the middleware adds some overhead compared to the native code for the cloud services being used,the experimental evaluation on Twitter(a Big Data application)data set,proves this overhead is negligible.
基金the National Key Research and Development Program of China under Grant 2018YFF0301205in part by the National Natural Science Foundation of China under Grant NSFC 61925105 and Grant 61801260.
文摘In this paper,we build a remote-sensing satellite imagery priori-information data set,and propose an approach to evaluate the robustness of remote-sensing image feature detectors.The building TH Priori-Information(TPI)data set with 2297 remote sensing images serves as a standardized high-resolution data set for studies related to remote-sensing image features.The TPI contains 1)raw and calibrated remote-sensing images with high spatial and temporal resolutions(up to 2 m and 7 days,respectively),and 2)a built-in 3-D target area model that supports view position,view angle,lighting,shadowing,and other transformations.Based on TPI,we further present a quantized approach,including the feature recurrence rate,the feature match score,and the weighted feature robustness score,to evaluate the robustness of remote-sensing image feature detectors.The quantized approach gives general and objective assessments of the robustness of feature detectors under complex remote-sensing circumstances.Three remote-sensing image feature detectors,including scale-invariant feature transform(SIFT),speeded up robust features(SURF),and priori information based robust features(PIRF),are evaluated using the proposed approach on the TPI data set.Experimental results show that the robustness of PIRF outperforms others by over 6.2%.
文摘Recently, due to the rapid growth increment of data sensors, a massive volume of data is generated from different sources. The way of administering such data in a sense storing, managing, analyzing, and extracting insightful information from the massive volume of data is a challenging task. Big data analytics is becoming a vital research area in domains such as climate data analysis which demands fast access to data. Nowadays, an open-source platform namely MapReduce which is a distributed computing framework is widely used in many domains of big data analysis. In our work, we have developed a conceptual framework of data modeling essentially useful for the implementation of a hybrid data warehouse model to store the features of National Climatic Data Center (NCDC) climate data. The hybrid data warehouse model for climate big data enables for the identification of weather patterns that would be applicable in agricultural and other similar climate change-related studies that will play a major role in recommending actions to be taken by domain experts and make contingency plans over extreme cases of weather variability.
文摘The substantial vision loss due to Diabetic Retinopathy(DR)mainly damages the blood vessels of the retina.These feature changes in the blood vessels fail to exist any manifestation in the eye at its initial stage,if this problem doesn’t exhibit initially,that leads to permanent blindness.So,this type of disorder can be only screened and identified through the processing of fundus images.The different stages in DR are Micro aneurysms(Ma),Hemorrhages(HE),and Exudates,and the stages in lesion show the chance of DR.For the advancement of early detection of DR in the eye we have developed the CNN-based identification approach on the fundus blood lesion image.The CNN-based automated detection of DR proposes the novel Graph cutter-built background and foreground superpixel segmentation technique and the foremost classification of fundus images feature was done through hybrid classifiers as K-Nearest Neighbor(KNN)classifier,Support Vector Machine(SVM)classifier,and Cascaded Rotation Forest(CRF)classifier.Over this classifier,the feature cross-validation made the classification more accurate and the comparison is made with the previous works of parameters such as specificity,sensitivity,and accuracy shows that the hybrid classifier attains excellent performance and achieves an overall accuracy of 98%.Among these Cascaded Rotation Forest(CRF)classifier has more accuracy than others.
基金supported by a research grant from the Institute of Crustal Dynamics, China Earthquake Administration (No. ZDJ2017-20)
文摘The high energetic particle package(HEPP) on-board the China Seismo-Electromagnetic Satellite(CSES) was launched on February 2, 2018. This package includes three independent detectors: HEPP-H, HEPP-L, and HEPP-X. HEPP-H and HEPP-L can detect energetic electrons from 100 keV to approximately 50 MeV and protons from 2 MeV to approximately 200 MeV. HEPP-X can measure solar X-rays in the energy range from 1 keV to approximately 20 keV. The objective of the HEPP payload was to provide a survey of energetic particles with high energy, pitch angle, and time resolutions in order to gain new insight into the space radiation environments of the near-Earth system. Particularly, the HEPP can provide new measurements of the magnetic storm related precipitation of electrons in the slot region, and the dynamics of radiation belts. In this paper, the HEPP scientific data sets are described and initial results are provided.The scientific data can show variations in the flux of energetic particles during magnetic storms.
基金National Natural Science Foundation of China(No.61763025)。
文摘To improve the detection accuracy of the balise uplink signal transmitted in a strong noise environment,we use chaotic oscillator to detect the balise uplink signal based on the characteristics of the chaotic system that is sensitive to initial conditions and immune to noise.Combining with the principle of Duffing oscillator system used in weak signal detection and uplink signal feature,the methods and steps of using Duffing oscillator to detect the balise signal are presented.Furthermore,the Lyapunov exponent algorithm is used to calculate the critical threshold of the Duffing oscillator detection system.Thus,the output states of the system can be quantitatively judged to achieve demodulation of the balise signal.The simulation results show that the chaotic oscillator detection method for balise signal based on Lyapunov exponent algorithm not only improves the accuracy and efficiency of threshold setting,but also ensures the reliability of balise signal detection.