期刊文献+
共找到18篇文章
< 1 >
每页显示 20 50 100
Performances of Clustering Methods Considering Data Transformation and Sample Size: An Evaluation with Fisheries Survey Data
1
作者 WO Jia ZHANG Chongliang +2 位作者 XU Binduo XUE Ying REN Yiping 《Journal of Ocean University of China》 SCIE CAS CSCD 2020年第3期659-668,共10页
Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable ... Clustering is a group of unsupervised statistical techniques commonly used in many disciplines. Considering their applications to fish abundance data, many technical details need to be considered to ensure reasonable interpretation. However, the reliability and stability of the clustering methods have rarely been studied in the contexts of fisheries. This study presents an intensive evaluation of three common clustering methods, including hierarchical clustering(HC), K-means(KM), and expectation-maximization(EM) methods, based on fish community surveys in the coastal waters of Shandong, China. We evaluated the performances of these three methods considering different numbers of clusters, data size, and data transformation approaches, focusing on the consistency validation using the index of average proportion of non-overlap(APN). The results indicate that the three methods tend to be inconsistent in the optimal number of clusters. EM showed relatively better performances to avoid unbalanced classification, whereas HC and KM provided more stable clustering results. Data transformation including scaling, square-root, and log-transformation had substantial influences on the clustering results, especially for KM. Moreover, transformation also influenced clustering stability, wherein scaling tended to provide a stable solution at the same number of clusters. The APN values indicated improved stability with increasing data size, and the effect leveled off over 70 samples in general and most quickly in EM. We conclude that the best clustering method can be chosen depending on the aim of the study and the number of clusters. In general, KM is relatively robust in our tests. We also provide recommendations for future application of clustering analyses. This study is helpful to ensure the credibility of the application and interpretation of clustering methods. 展开更多
关键词 hierarchical cluster K-means cluster expectation-maximization cluster optimal number of clusters stability data transformation
下载PDF
Defect Detection Model Using Time Series Data Augmentation and Transformation
2
作者 Gyu-Il Kim Hyun Yoo +1 位作者 Han-Jin Cho Kyungyong Chung 《Computers, Materials & Continua》 SCIE EI 2024年第2期1713-1730,共18页
Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal depende... Time-series data provide important information in many fields,and their processing and analysis have been the focus of much research.However,detecting anomalies is very difficult due to data imbalance,temporal dependence,and noise.Therefore,methodologies for data augmentation and conversion of time series data into images for analysis have been studied.This paper proposes a fault detection model that uses time series data augmentation and transformation to address the problems of data imbalance,temporal dependence,and robustness to noise.The method of data augmentation is set as the addition of noise.It involves adding Gaussian noise,with the noise level set to 0.002,to maximize the generalization performance of the model.In addition,we use the Markov Transition Field(MTF)method to effectively visualize the dynamic transitions of the data while converting the time series data into images.It enables the identification of patterns in time series data and assists in capturing the sequential dependencies of the data.For anomaly detection,the PatchCore model is applied to show excellent performance,and the detected anomaly areas are represented as heat maps.It allows for the detection of anomalies,and by applying an anomaly map to the original image,it is possible to capture the areas where anomalies occur.The performance evaluation shows that both F1-score and Accuracy are high when time series data is converted to images.Additionally,when processed as images rather than as time series data,there was a significant reduction in both the size of the data and the training time.The proposed method can provide an important springboard for research in the field of anomaly detection using time series data.Besides,it helps solve problems such as analyzing complex patterns in data lightweight. 展开更多
关键词 Defect detection time series deep learning data augmentation data transformation
下载PDF
Explainable data transformation recommendation for automatic visualization 被引量:1
3
作者 Ziliang WU Wei CHEN +5 位作者 Yuxin MA Tong XU Fan YAN Lei LV Zhonghao QIAN Jiazhi XIA 《Frontiers of Information Technology & Electronic Engineering》 SCIE EI CSCD 2023年第7期1007-1027,共21页
Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design.Current automatic visualization approa... Automatic visualization generates meaningful visualizations to support data analysis and pattern finding for novice or casual users who are not familiar with visualization design.Current automatic visualization approaches adopt mainly aggregation and filtering to extract patterns from the original data.However,these limited data transformations fail to capture complex patterns such as clusters and correlations.Although recent advances in feature engineering provide the potential for more kinds of automatic data transformations,the auto-generated transformations lack explainability concerning how patterns are connected with the original features.To tackle these challenges,we propose a novel explainable recommendation approach for extended kinds of data transformations in automatic visualization.We summarize the space of feasible data transformations and measures on explainability of transformation operations with a literature review and a pilot study,respectively.A recommendation algorithm is designed to compute optimal transformations,which can reveal specified types of patterns and maintain explainability.We demonstrate the effectiveness of our approach through two cases and a user study. 展开更多
关键词 data transformation data transformation recommendation Automatic visualization Explainability
原文传递
An Efficient Schema Transformation Technique for Data Migration from Relational to Column-Oriented Databases
4
作者 Norwini Zaidi Iskandar Ishak +1 位作者 Fatimah Sidi Lilly Suriani Affendey 《Computer Systems Science & Engineering》 SCIE EI 2022年第12期1175-1188,共14页
Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relationa... Data transformation is the core process in migrating database from relational database to NoSQL database such as column-oriented database. However,there is no standard guideline for data transformation from relational database toNoSQL database. A number of schema transformation techniques have been proposed to improve data transformation process and resulted better query processingtime when compared to the relational database query processing time. However,these approaches produced redundant tables in the resulted schema that in turnconsume large unnecessary storage size and produce high query processing timedue to the generated schema with redundant column families in the transformedcolumn-oriented database. In this paper, an efficient data transformation techniquefrom relational database to column-oriented database is proposed. The proposedschema transformation technique is based on the combination of denormalizationapproach, data access pattern and multiple-nested schema. In order to validate theproposed work, the proposed technique is implemented by transforming data fromMySQL database to MongoDB database. A benchmark transformation techniqueis also performed in which the query processing time and the storage size arecompared. Based on the experimental results, the proposed transformation technique showed significant improvement in terms query processing time and storagespace usage due to the reduced number of column families in the column-orienteddatabase. 展开更多
关键词 data migration data transformation column-oriented database relational database big data
下载PDF
Direct Pointwise Comparison of FE Predictions to StereoDIC Measurements:Developments and Validation Using Double Edge-Notched Tensile Specimen
5
作者 Troy Myers MichaelASutton +2 位作者 Hubert Schreier Alistair Tofts Sreehari Rajan Kattil 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第8期1263-1298,共36页
To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is... To compare finite element analysis(FEA)predictions and stereovision digital image correlation(StereoDIC)strain measurements at the same spatial positions throughout a region of interest,a field comparison procedure is developed.The procedure includes(a)conversion of the finite element data into a triangular mesh,(b)selection of a common coordinate system,(c)determination of the rigid body transformation to place both measurements and FEA data in the same system and(d)interpolation of the FEA nodal information to the same spatial locations as the StereoDIC measurements using barycentric coordinates.For an aluminum Al-6061 double edge notched tensile specimen,FEA results are obtained using both the von Mises isotropic yield criterion and Hill’s quadratic anisotropic yield criterion,with the unknown Hill model parameters determined using full-field specimen strain measurements for the nominally plane stress specimen.Using Hill’s quadratic anisotropic yield criterion,the point-by-point comparison of experimentally based full-field strains and stresses to finite element predictions are shown to be in excellent agreement,confirming the effectiveness of the field comparison process. 展开更多
关键词 StereoDIC spatial co-registration data transformation finite element simulations point-wise comparison of measurements and FEA predictions double edge notch specimen model validation
下载PDF
Research of Extracting Data from HTML Web Pages Automatically 被引量:1
6
作者 王茹 宋瀚涛 陆玉昌 《Journal of Beijing Institute of Technology》 EI CAS 2003年第S1期104-108,共5页
In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is p... In order to use data information in the Internet, it is necessary to extract data from web pages. An HTT tree model representing HTML pages is presented. Based on the HTT model, a wrapper generation algorithm AGW is proposed. The AGW algorithm utilizes comparing and correcting technique to generate the wrapper with the native characteristic of the HTT tree structure. The AGW algorithm can not only generate the wrapper automatically, but also rebuild the data schema easily and reduce the complexity of the computing. 展开更多
关键词 information extraction data transformation WRAPPER HTML page
下载PDF
Developing a Geological Management Information System: National Important Mining Zone Database 被引量:1
7
作者 左仁广 汪新庆 夏庆霖 《Journal of China University of Geosciences》 SCIE CSCD 2006年第1期79-83,94,共6页
Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics data... Geo-data is a foundation for the prediction and assessment of ore resources, so managing and making full use of those data, including geography database, geology database, mineral deposits database, aeromagnetics database, gravity database, geochemistry database and remote sensing database, is very significant. We developed national important mining zone database (NIMZDB) to manage 14 national important mining zone databases to support a new round prediction of ore deposit. We found that attention should be paid to the following issues: ① data accuracy: integrity, logic consistency, attribute, spatial and time accuracy; ② management of both attribute and spatial data in the same system;③ transforming data between MapGIS and ArcGIS; ④ data sharing and security; ⑤ data searches that can query both attribute and spatial data. Accuracy of input data is guaranteed and the search, analysis and translation of data between MapGIS and ArcGIS has been made convenient via the development of a checking data module and a managing data module based on MapGIS and ArcGIS. Using AreSDE, we based data sharing on a client/server system, and attribute and spatial data are also managed in the same system. 展开更多
关键词 geological management information system checking data ARCSDE transforming data format data sharing data security
下载PDF
EMPIRICAL LIKELIHOOD-BASED INFERENCE IN LINEAR MODELS WITH INTERVAL CENSORED DATA 被引量:3
8
作者 He Qixiang Zheng Ming 《Applied Mathematics(A Journal of Chinese Universities)》 SCIE CSCD 2005年第3期338-346,共9页
An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical... An empirical likelihood approach to estimate the coefficients in linear model with interval censored responses is developed in this paper. By constructing unbiased transformation of interval censored data,an empirical log-likelihood function with asymptotic X^2 is derived. The confidence regions for the coefficients are constructed. Some simulation results indicate that the method performs better than the normal approximation method in term of coverage accuracies. 展开更多
关键词 interval censored data linear model empirical likelihood unbiased transformation.
下载PDF
一种用于建立指标体系的柔性数据标准化方法
9
作者 张冰 李相禛 《China Standardization》 2023年第S01期35-39,共5页
Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible d... Data standardization is an important part of data preprocessing,which directly affects the feasibility and distinction of the indicator system.This study proposes a data standardization framework to achieve flexible data standardization through data feature identification,cluster analysis,and weighted data transformation.The proposed method could handle locally inflated distribution with long tails.The results of this study enrich the method library of data standardization,allowing researchers to have more targeted data differentiation capabilities in the establishment of indicator systems. 展开更多
关键词 data standardization indicator system data transformation
下载PDF
Pretreating and normalizing metabolomics data for statistical analysis
10
作者 Jun Sun Yinglin Xia 《Genes & Diseases》 SCIE CSCD 2024年第3期188-205,共18页
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat... Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research. 展开更多
关键词 data centering and scaling data normalization data transformation Missing values MS-Baseddata preprocessing NMRdata preprocessing OUTLIERS Preprocessing/pretreatment
原文传递
Assessing the benthic quality status of the Bohai Bay(China)with proposed modifications of M-AMBI 被引量:5
11
作者 CAI Wenqian BORJA Angel +3 位作者 LIN Kuixuan ZHU Yanzhong ZHOU Juan LIU Lusan 《Acta Oceanologica Sinica》 SCIE CAS CSCD 2015年第10期111-121,共11页
Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the respons... Multivariate AZTI's Marine Biotic Index (M-AMBI) was designed to indicate the ecological status of European coastal areas. Based upon samples collected from 2009 to 2012 in the Bohai Bay, we have tested the response of variations of M-AMBI, using biomass (M-BAMBI) in the calculations, with different transformations of the raw data. The results showed that the ecological quality of most areas in the study indicated by M-AMBI was from moderate to bad status with the worse status in the coastal areas, especially around the estuaries, harbors and ouffalls, and better status in the offshore areas except the area close to oil platforms or disposal sites. Despite large variations in nature of the input data, all variations of M-AMBI gave similar spatial and temporal distribution patterns of the ecological status within the bay, and showed high correlation between them. The agreement of new ecological status obtained from all M-AMBI variations, which were calculated according to linear regression, was almost perfect. The benthic quality, assessed using different input data, could be related to human pressures in the bay, such as water discharges, land reclamation, dredged sediment and drilling cuts disposal sites. It seems that M-BAMBI were more effective than M-NABMI (M-AMBI calculated using abundance data) in indicating human pressures of the Bay. Finally, indices calculated with more severe transformations, such as presence/absence data, could not indicate the higher density of human pressures in the coastal areas of the north part of our study area, but those calculated using mild transformation (i.e., square root) did. 展开更多
关键词 macrozoobenthos M-AMBI ABUNDANCE BIOMASS data transformation ecological status
下载PDF
Background contents of heavy metals in sediments of the Yangtze River system and their calculation methods 被引量:4
12
作者 Zhang Chaosheng, Zhang Shen, Zhang Licheng, Wang Lijun(Institute of Geography , Chinese Academy of Sciences , Beijing 100101 , China) 《Journal of Environmental Sciences》 SCIE EI CAS CSCD 1995年第4期422-429,共8页
BackgroundcontentsofheavymetalsinsedimentsoftheYangtzeRiversystemandtheircalculationmethodsZhangChaosheng,Zh... BackgroundcontentsofheavymetalsinsedimentsoftheYangtzeRiversystemandtheircalculationmethodsZhangChaosheng,ZhangShen,ZhangLich... 展开更多
关键词 Yangtze River SEDIMENTS heavy metalsi robust statistics data transformation.
下载PDF
Anomalous signals before 2011 Tohoku-oki Mw9.1 earthquake,detected by superconducting gravimeters and broadband seismometers 被引量:3
13
作者 Gu Xiang Jiang Tianxing +3 位作者 Zhang Wenqiang Huang Weihang Chang Zhiqiang Shen Wenbin 《Geodesy and Geodynamics》 2014年第2期24-31,共8页
The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Hua... The 2011 Tohoku-oki earthquake,occurred on 11 March,2011,is a great earthquake with a seismic magnitude Mw9. 1,before which an Mw7. 5 earthquake occurred. Focusing on this great earthquake event,we applied Hilbert-Huang transform( HHT) analysis method to the one-second interval records at seven superconducting gravimeter( SG) stations and seven broadband seismic( BS) stations to carry out spectrum analysis and compute the energy-frequency-time distribution. Tidal effects are removed from SG data by T-soft software before the data series are transformed by HHT method. Based on HHT spectra and the marginal spectra from the records at selected seven SG stations and seven BS stations we found anomalous signals in terms of energy. The dominant frequencies of the anomalous signals are respectively about 0. 13 Hz in SG records and 0. 2 Hz in seismic data,and the anomalous signals occurred one week or two to three days prior to the event. Taking into account that in this period no typhoon event occurred,we may conclude that these anomalous signals might be related to the great earthquake event. 展开更多
关键词 earthquake anomalous signals superconducting gravity data broadband seismic data Hilbert-Huang Transform
下载PDF
VOFilter: Bridging Virtual Observatory and Industrial Office Applications
14
作者 Chen-Zhou Cui Markus Dolensky +2 位作者 Peter Quinn Yong-Heng Zhao Francoise Genova 《Chinese Journal of Astronomy and Astrophysics》 CSCD 2006年第3期379-386,共8页
VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of t... VOFilter is an XML based filter developed by the Chinese Virtual Observatory project to transform tabular data files from VOTable format into OpenDocument format. VOTable is an XML format defined for the exchange of tabular data in the context of the Virtual Observatory (VO). It is the first Proposed Recommendation defined by International Virtual Observatory Alliance, and has obtained wide support from both the VO community and many Astronomy projects. OpenOffice.org is a mature, open source, and front office application suite with the advantage of native support of industrial standard OpenDocument XML file format. Using the VOFilter, VOTable files can be loaded in OpenOffice.org Calc, a spreadsheet application, and then displayed and analyzed as other spreadsheet files. Here, the VOFilter acts as a connector, bridging the coming VO with current industrial office applications. We introduce Virtual Observatory and technical background of the VOFilter. Its workflow, installation and usage are presented. Existing problems and limitations are also discussed together with the future development plans. 展开更多
关键词 astronomical data bases: miscellaneous - Virtual Observatory - methods:data transform
下载PDF
Prediction model for cost data of a power transmission and transformation project based on Pearson correlation coefficient-IPSO-ELM
15
作者 Ju Xin Liu ShangKe +1 位作者 Xiao YanLi Wan Ye 《Clean Energy》 EI 2021年第4期756-764,共9页
In view of the difficulty in predicting the cost data of power transmission and transformation projects at present,a method based on Pearson correlation coefficient-improved particle swarm optimization(IPSO)-extreme l... In view of the difficulty in predicting the cost data of power transmission and transformation projects at present,a method based on Pearson correlation coefficient-improved particle swarm optimization(IPSO)-extreme learning machine(ELM)is proposed.In this paper,the Pearson correlation coefficient is used to screen out the main influencing factors as the input-independent variables of the ELM algorithm and IPSO based on a ladder-structure coding method is used to optimize the number of hidden-layer nodes,input weights and bias values of the ELM.Therefore,the prediction model for the cost data of power transmission and transformation projects based on the Pearson correlation coefficient-IPSO-ELM algorithm is constructed.Through the analysis of calculation examples,it is proved that the prediction accuracy of the proposed method is higher than that of other algorithms,which verifies the effectiveness of the model. 展开更多
关键词 cost data of power transmission and transformation project Pearson correlation coefficient IPSO-ELM algorithm project-cost prediction
原文传递
Ontology Dynamics in a Data Life Cycle: Challenges and Recommendations from a Geoscience Perspective 被引量:3
16
作者 Xiaogang Ma Peter Fox +2 位作者 Eric Rozell Patrick West Stephan Zednik 《Journal of Earth Science》 SCIE CAS CSCD 2014年第2期407-412,共6页
Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By us... Ontologies are increasingly deployed as a computer-accessible representation of key semantics in various parts of a data life cycle and, thus, ontology dynamics may pose challenges to data management and re-use. By using examples in the field of geosciences, we analyze challenges raised by ontology dynamics, such as heavy reworking of data, semantic heterogeneity among data providers and users, and error propagation in cross-discipline data discovery and re-use. We also make recommendations to address these challenges: (1) communities of practice on ontologies to re- duce inconsistency and duplicated efforts; (2) use ontologies in the procedure of data collection and make them accessible to data users; and (3) seek methods to speed up the reworking of data in a Semantic Web context. 展开更多
关键词 semantic web knowledge evolution data transformation geoscience.
原文传递
EVOLUTION OF THE SCATTERING DATA UNDER THE CLASSICAL DARBOUX TRANSFORM FOR su(2) SOLITON SYSTEMS
17
作者 林峻岷 《Acta Mathematicae Applicatae Sinica》 SCIE CSCD 1990年第4期308-316,共9页
The changing of the scattering data for the solutions of su (2) soliton systems which are relatedby a classical Darboux transformation (CDT) is obtained. It is shown that how a CDT creates anderases a soliton.
关键词 SOLITON SYSTEMS EVOLUTION OF THE SCATTERING data UNDER THE CLASSICAL DARBOUX TRANSFORM FOR su exp KdV CDT
原文传递
A novel classifier for multivariate instance using graph class signatures
18
作者 Parnika PARANJAPE Meera DHABU Parag DESHPANDE 《Frontiers of Computer Science》 SCIE EI CSCD 2020年第4期79-94,共16页
Applications like identifying different customers from their unique buying behaviours,determining ratings of a product given by users based on different sets of features,etc.require classification using class-specific... Applications like identifying different customers from their unique buying behaviours,determining ratings of a product given by users based on different sets of features,etc.require classification using class-specific subsets of features.Most of the existing state-of-the-art classifiers for multivariate data use complete feature set for classification regardless of the different class labels.Decision tree classifier can produce class-wise subsets of features.However,none of these classifiers model the relationship between features which may enhance classification accuracy.We call the class-specific subsets of features and the features’interrelationships as class signatures.In this work,we propose to map the original input space of multivariate data to the feature space characterized by connected graphs as graphs can easily model entities,their attributes,and relationships among attributes.Mostly,entities are modeled using graphs,where graphs occur naturally,for example,chemical compounds.However,graphs do not occur naturally in multivariate data.Thus,extracting class signatures from multivariate data is a challenging task.We propose some feature selection heuristics to obtain class-specific prominent subgraph signatures.We also propose two variants of class signatures based classifier namely:1)maximum matching signature(gMM),and 2)score and size of matched signatures(gSM).The effectiveness of the proposed approach on real-world and synthetic datasets has been studied and compared with other established classifiers.Experimental results confirm the ascendancy of the proposed class signatures based classifier on most of the datasets. 展开更多
关键词 multivariate instance data transformation subgraph feature selection class signatures classification
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部