Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende...Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.展开更多
To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is...To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is developed.The procedure includes(a)conversion of the finite element data into a triangular mesh,(b)selection of a common coordinate system,(c)determination of the rigid body transformation to place both measurements and FEA data in the same system and(d)interpolation of the FEA nodal information to the same spatial locations as the StereoDIC measurements using barycentric coordinates.For an aluminum Al-6061 double edge notched tensile specimen,FEA results are obtained using both the von Mises isotropic yield criterion and Hill’s quadratic anisotropic yield criterion,with the unknown Hill model parameters determined using full-field specimen strain measurements for the nominally plane stress specimen.Using Hill’s quadratic anisotropic yield criterion,the point-by-point comparison of experimentally based full-field strains and stresses to finite element predictions are shown to be in excellent agreement,confirming the effectiveness of the field comparison process.展开更多
This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the or...This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.展开更多
In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is p...In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.展开更多
Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics data...Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.展开更多
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ...Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.展开更多
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relationa...Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.展开更多
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat...Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.展开更多
Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible d...Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible data standardization through data feature identification,cluster analysis,and weighted data transformation.The proposed method could handle locally inflated distribution with long tails.The results of this study enrich the method library of data standardization,allowing researchers to have more targeted data differentiation capabilities in the establishment of indicator systems.展开更多
Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the respons...Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the response of variations of M-AMBI, using biomass (M-BAMBI) in the calculations, with different transformations of the raw data. The results showed that the ecological quality of most areas in the study indicated by M-AMBI was from moderate to bad status with the worse status in the coastal areas, especially around the estuaries, harbors and ouffalls, and better status in the offshore areas except the area close to oil platforms or disposal sites. Despite large variations in nature of the input data, all variations of M-AMBI gave similar spatial and temporal distribution patterns of the ecological status within the bay, and showed high correlation between them. The agreement of new ecological status obtained from all M-AMBI variations, which were calculated according to linear regression, was almost perfect. The benthic quality, assessed using different input data, could be related to human pressures in the bay, such as water discharges, land reclamation, dredged sediment and drilling cuts disposal sites. It seems that M-BAMBI were more effective than M-NABMI (M-AMBI calculated using abundance data) in indicating human pressures of the Bay. Finally, indices calculated with more severe transformations, such as presence/absence data, could not indicate the higher density of human pressures in the coastal areas of the north part of our study area, but those calculated using mild transformation (i.e., square root) did.展开更多
This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author ...This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.展开更多
The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Hua...The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Huang transform( HHT) analysis method to the one-second interval records at seven superconducting gravimeter( SG) stations and seven broadband seismic( BS) stations to carry out spectrum analysis and compute the energy-frequency-time distribution. Tidal effects are removed from SG data by T-soft software before the data series are transformed by HHT method. Based on HHT spectra and the marginal spectra from the records at selected seven SG stations and seven BS stations we found anomalous signals in terms of energy. The dominant frequencies of the anomalous signals are respectively about 0. 13 Hz in SG records and 0. 2 Hz in seismic data,and the anomalous signals occurred one week or two to three days prior to the event. Taking into account that in this period no typhoon event occurred,we may conclude that these anomalous signals might be related to the great earthquake event.展开更多
Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model an...Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.展开更多
Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design.Current automatic visualization approa...Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design.Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data.However,these limited data transformations fail to capture complex patterns such as clusters and correlations.Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations,the auto-generated transformations lack explainability concerning how patterns are connected with the original features.To tackle these challenges,we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization.We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study,respectively.A recommendation algorithm is designed to compute optimal transformations,which can reveal specified types of patterns and maintain explainability.We demonstrate the effectiveness of our approach through two cases and a user study.展开更多
VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of t...VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of tabular data in the context of the Virtual Observatory (VO). It is the first Proposed Recommendation defined by International Virtual Observatory Alliance, and has obtained wide support from both the VO community and many Astronomy projects. OpenOffice.org is a mature, open source, and front office application suite with the advantage of native support of industrial standard OpenDocument XML file format. Using the VOFilter, VOTable files can be loaded in OpenOffice.org Calc, a spreadsheet application, and then displayed and analyzed as other spreadsheet files. Here, the VOFilter acts as a connector, bridging the coming VO with current industrial office applications. We introduce Virtual Observatory and technical background of the VOFilter. Its workflow, installation and usage are presented. Existing problems and limitations are also discussed together with the future development plans.展开更多
Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By us...Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.展开更多
The changing of the scattering data for the solutions of su (2) soliton systems which are relatedby a classical Darboux transformation (CDT) is obtained. It is shown that how a CDT creates anderases a soliton.
In view of the difficulty in predicting the cost data of power transmission and transformation projects at present,a method based on Pearson correlation coefficient-improved particle swarm optimization(IPSO)-extreme l...In view of the difficulty in predicting the cost data of power transmission and transformation projects at present,a method based on Pearson correlation coefficient-improved particle swarm optimization(IPSO)-extreme learning machine(ELM)is proposed.In this paper,the Pearson correlation coefficient is used to screen out the main influencing factors as the input-independent variables of the ELM algorithm and IPSO based on a ladder-structure coding method is used to optimize the number of hidden-layer nodes,input weights and bias values of the ELM.Therefore,the prediction model for the cost data of power transmission and transformation projects based on the Pearson correlation coefficient-IPSO-ELM algorithm is constructed.Through the analysis of calculation examples,it is proved that the prediction accuracy of the proposed method is higher than that of other algorithms,which verifies the effectiveness of the model.展开更多
To the transformation F(a(s))=a(s)/d^s,this paper brings out two kinds of ways to ascertain parameter d, one way, we let d be arithmetic average of class ration inverse, namely d1=1/n-1(y^(0)(2)/y^(0)(1...To the transformation F(a(s))=a(s)/d^s,this paper brings out two kinds of ways to ascertain parameter d, one way, we let d be arithmetic average of class ration inverse, namely d1=1/n-1(y^(0)(2)/y^(0)(1)+y^(0)(3)/y^(0)(2)+y^(0)(4)/y^(0)(3)+…+y^(0)(n-1)/y^(0)(n-1)+y^(0)(n)/y^(0)(n-1).The other way, we let d be geometric average of class ration inverse, namely d2=(y^(0)(n)/y^(0)(1))^1/n-1.Through the close theory testifying, we find that the model given by this paper has white exponential law coincidence property, that is to say, if we build the model to the normal index, the result has not model error. At the same time, this paper points out, the request d 〉 1 is not appropriate to monotone decreasing series; even precision of model will decrease.展开更多
基金This research was financially supported by the Ministry of Trade,Industry,and Energy(MOTIE),Korea,under the“Project for Research and Development with Middle Markets Enterprises and DNA(Data,Network,AI)Universities”(AI-based Safety Assessment and Management System for Concrete Structures)(ReferenceNumber P0024559)supervised by theKorea Institute for Advancement of Technology(KIAT).
文摘Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight.
基金Financial support provided by Correlated Solutions Incorporated to perform StereoDIC experimentsthe Department of Mechanical Engineering at the University of South Carolina for simulation studies is deeply appreciated.
文摘To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is developed.The procedure includes(a)conversion of the finite element data into a triangular mesh,(b)selection of a common coordinate system,(c)determination of the rigid body transformation to place both measurements and FEA data in the same system and(d)interpolation of the FEA nodal information to the same spatial locations as the StereoDIC measurements using barycentric coordinates.For an aluminum Al-6061 double edge notched tensile specimen,FEA results are obtained using both the von Mises isotropic yield criterion and Hill’s quadratic anisotropic yield criterion,with the unknown Hill model parameters determined using full-field specimen strain measurements for the nominally plane stress specimen.Using Hill’s quadratic anisotropic yield criterion,the point-by-point comparison of experimentally based full-field strains and stresses to finite element predictions are shown to be in excellent agreement,confirming the effectiveness of the field comparison process.
文摘This paper focuses on the integration and data transformation between GPS and totalstation.It emphasizes on the way to transfer the WGS84 Cartesian coordinates to the local two_dimensional plane coordinates and the orthometric height GPS receiver,totalstation,radio,notebook computer and the corresponding software work together to form a new surveying system,the super_totalstation positioning system(SPS) and a new surveying model for terrestrial surveying.With the help of this system,the positions of detail points can be measured.
基金the National Grand Fundamental Research 973 Program of China(G1998030414)
文摘In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing.
基金This paper is financially supported by the National I mportant MiningZone Database ( No .200210000004)Prediction and Assessment ofMineral Resources and Social Service (No .1212010331402) .
文摘Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system.
基金provided by the Marine S&T Fund of Shandong Province for Pilot National Laboratory for Marine Science and Technology (Qingdao) (No.2018SDKJ0501-2)。
文摘Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods.
基金supported by Universiti Putra Malaysia Grant Scheme(Putra Grant)(GP/2020/9692500).
文摘Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase.
基金supported by the Crohn's&Colitis Foundation Senior Research Award(No.902766 to J.S.)The National Institute of Diabetes and Digestive and Kidney Diseases(No.R01DK105118-01 and R01DK114126 to J.S.)+1 种基金United States Department of Defense Congressionally Directed Medical Research Programs(No.BC191198 to J.S.)VA Merit Award BX-19-00 to J.S.
文摘Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research.
文摘Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible data standardization through data feature identification,cluster analysis,and weighted data transformation.The proposed method could handle locally inflated distribution with long tails.The results of this study enrich the method library of data standardization,allowing researchers to have more targeted data differentiation capabilities in the establishment of indicator systems.
基金The National Natural Science Foundation of China under contract Nos 41406160 and 51209190the Public Science and Technology Research Funds Projects of Environmental Protection under contract No.201309007the Special Foundation of Chinese Research Academy of Sciences under contract No.gyk5091201
文摘Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the response of variations of M-AMBI, using biomass (M-BAMBI) in the calculations, with different transformations of the raw data. The results showed that the ecological quality of most areas in the study indicated by M-AMBI was from moderate to bad status with the worse status in the coastal areas, especially around the estuaries, harbors and ouffalls, and better status in the offshore areas except the area close to oil platforms or disposal sites. Despite large variations in nature of the input data, all variations of M-AMBI gave similar spatial and temporal distribution patterns of the ecological status within the bay, and showed high correlation between them. The agreement of new ecological status obtained from all M-AMBI variations, which were calculated according to linear regression, was almost perfect. The benthic quality, assessed using different input data, could be related to human pressures in the bay, such as water discharges, land reclamation, dredged sediment and drilling cuts disposal sites. It seems that M-BAMBI were more effective than M-NABMI (M-AMBI calculated using abundance data) in indicating human pressures of the Bay. Finally, indices calculated with more severe transformations, such as presence/absence data, could not indicate the higher density of human pressures in the coastal areas of the north part of our study area, but those calculated using mild transformation (i.e., square root) did.
文摘This paper describes a new type of transformed Landsat images (LBV images) and their application in discriminating soil gleization in subtropic region of China. LBV transformation was worked out by the present author for extracting useful information from original landsat images. Using this method three black and white images, L image, B image and V image, were computer generated from original bands of a Landsat scene, which covers a.large area of 34 528 km2 in Hubei and Hunan provinces in south China. Then a color composite was produced by these three images. This kind of black-and-white and color images contained rich and definite geographic information. By a field work, the relationship between the colors on the composite and the land use/cover categories on the ground was established. 37 composite colors and 70 ground feature categories can be discriminated altogether. Finally, 17 land use/cover categories and 10 subregions suffering from soil gleization were determined, and the gleization area for the study area was estimated to be 731.3 km2.
基金supported by National 973 Project China(2013CB733305)NSFC(41174011,41128003,41210006,41021061,40974015)
文摘The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Huang transform( HHT) analysis method to the one-second interval records at seven superconducting gravimeter( SG) stations and seven broadband seismic( BS) stations to carry out spectrum analysis and compute the energy-frequency-time distribution. Tidal effects are removed from SG data by T-soft software before the data series are transformed by HHT method. Based on HHT spectra and the marginal spectra from the records at selected seven SG stations and seven BS stations we found anomalous signals in terms of energy. The dominant frequencies of the anomalous signals are respectively about 0. 13 Hz in SG records and 0. 2 Hz in seismic data,and the anomalous signals occurred one week or two to three days prior to the event. Taking into account that in this period no typhoon event occurred,we may conclude that these anomalous signals might be related to the great earthquake event.
基金supported by the National Natural Science Foundation of China(41171336)the Project of Jiangsu Province Agricultural Science and Technology Innovation Fund(CX12-3054)
文摘Because of cloudy and rainy weather in south China, optical remote sens-ing images often can't be obtained easily. With the regional trial results in Baoying, Jiangsu province, this paper explored the fusion model and effect of ENVISAT/SAR and HJ-1A satel ite multispectral remote sensing images. Based on the ARSIS strat-egy, using the wavelet transform and the Interaction between the Band Structure Model (IBSM), the research progressed the ENVISAT satel ite SAR and the HJ-1A satel ite CCD images wavelet decomposition, and low/high frequency coefficient re-construction, and obtained the fusion images through the inverse wavelet transform. In the light of low and high-frequency images have different characteristics in differ-ent areas, different fusion rules which can enhance the integration process of self-adaptive were taken, with comparisons with the PCA transformation, IHS transfor-mation and other traditional methods by subjective and the corresponding quantita-tive evaluation. Furthermore, the research extracted the bands and NDVI values around the fusion with GPS samples, analyzed and explained the fusion effect. The results showed that the spectral distortion of wavelet fusion, IHS transform, PCA transform images was 0.101 6, 0.326 1 and 1.277 2, respectively and entropy was 14.701 5, 11.899 3 and 13.229 3, respectively, the wavelet fusion is the highest. The method of wavelet maintained good spectral capability, and visual effects while improved the spatial resolution, the information interpretation effect was much better than other two methods.
基金Project supported by the National Natural Science Foundation of China(No.62132017)the Fundamental Research Funds for the Central Universities,China(No.226202200235)。
文摘Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design.Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data.However,these limited data transformations fail to capture complex patterns such as clusters and correlations.Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations,the auto-generated transformations lack explainability concerning how patterns are connected with the original features.To tackle these challenges,we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization.We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study,respectively.A recommendation algorithm is designed to compute optimal transformations,which can reveal specified types of patterns and maintain explainability.We demonstrate the effectiveness of our approach through two cases and a user study.
基金Supported by the National Natural Science Foundation of China.
文摘VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of tabular data in the context of the Virtual Observatory (VO). It is the first Proposed Recommendation defined by International Virtual Observatory Alliance, and has obtained wide support from both the VO community and many Astronomy projects. OpenOffice.org is a mature, open source, and front office application suite with the advantage of native support of industrial standard OpenDocument XML file format. Using the VOFilter, VOTable files can be loaded in OpenOffice.org Calc, a spreadsheet application, and then displayed and analyzed as other spreadsheet files. Here, the VOFilter acts as a connector, bridging the coming VO with current industrial office applications. We introduce Virtual Observatory and technical background of the VOFilter. Its workflow, installation and usage are presented. Existing problems and limitations are also discussed together with the future development plans.
文摘Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context.
文摘The changing of the scattering data for the solutions of su (2) soliton systems which are relatedby a classical Darboux transformation (CDT) is obtained. It is shown that how a CDT creates anderases a soliton.
文摘In view of the difficulty in predicting the cost data of power transmission and transformation projects at present,a method based on Pearson correlation coefficient-improved particle swarm optimization(IPSO)-extreme learning machine(ELM)is proposed.In this paper,the Pearson correlation coefficient is used to screen out the main influencing factors as the input-independent variables of the ELM algorithm and IPSO based on a ladder-structure coding method is used to optimize the number of hidden-layer nodes,input weights and bias values of the ELM.Therefore,the prediction model for the cost data of power transmission and transformation projects based on the Pearson correlation coefficient-IPSO-ELM algorithm is constructed.Through the analysis of calculation examples,it is proved that the prediction accuracy of the proposed method is higher than that of other algorithms,which verifies the effectiveness of the model.
文摘To the transformation F(a(s))=a(s)/d^s,this paper brings out two kinds of ways to ascertain parameter d, one way, we let d be arithmetic average of class ration inverse, namely d1=1/n-1(y^(0)(2)/y^(0)(1)+y^(0)(3)/y^(0)(2)+y^(0)(4)/y^(0)(3)+…+y^(0)(n-1)/y^(0)(n-1)+y^(0)(n)/y^(0)(n-1).The other way, we let d be geometric average of class ration inverse, namely d2=(y^(0)(n)/y^(0)(1))^1/n-1.Through the close theory testifying, we find that the model given by this paper has white exponential law coincidence property, that is to say, if we build the model to the normal index, the result has not model error. At the same time, this paper points out, the request d 〉 1 is not appropriate to monotone decreasing series; even precision of model will decrease.