期刊文献+
共找到15篇文章
< 1 >
每页显示 20 50 100
Improved KNN Imputation for Missing Values in Gene Expression Data 被引量:3
1
作者 Phimmarin Keerin Tossapon Boongoen 《Computers, Materials & Continua》 SCIE EI 2022年第2期4009-4025,共17页
The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many... The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively. 展开更多
关键词 Gene expression missing value IMPUTATION KNN OWA operator
下载PDF
Modelling method with missing values based on clustering and support vector regression 被引量:2
2
作者 Ling Wang Dongmei Fu Qing Li Zhichun Mu 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2010年第1期142-147,共6页
Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real proc... Most real application processes belong to a complex nonlinear system with incomplete information. It is difficult to estimate a model by assuming that the data set is governed by a global model. Moreover, in real processes, the available data set is usually obtained with missing values. To overcome the shortcomings of global modeling and missing data values, a new modeling method is proposed. Firstly, an incomplete data set with missing values is partitioned into several clusters by a K-means with soft constraints (KSC) algorithm, which incorporates soft constraints to enable clustering with missing values. Then a local model based on each group is developed by using SVR algorithm, which adopts a missing value insensitive (MVI) kernel to investigate the missing value estimation problem. For each local model, its valid area is gotten as well. Simulation results prove the effectiveness of the current local model and the estimation algorithm. 展开更多
关键词 MODELING missing value K-means with soft constraints clustering missing value insensitive kernel.
下载PDF
Missing Value Imputation for Radar-Derived Time-Series Tracks of Aerial Targets Based on Improved Self-Attention-Based Network
3
作者 Zihao Song Yan Zhou +2 位作者 Wei Cheng Futai Liang Chenhao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3349-3376,共28页
The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis... The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design. 展开更多
关键词 missing value imputation time-series tracks probabilistic sparsity diagonal masking self-attention weight fusion
下载PDF
Missing Value Imputation Model Based on Adversarial Autoencoder Using Spatiotemporal Feature Extraction
4
作者 Dong-Hoon Shin Seo-El Lee +1 位作者 Byeong-Uk Jeon Kyungyong Chung 《Intelligent Automation & Soft Computing》 SCIE 2023年第8期1925-1940,共16页
Recently,the importance of data analysis has increased significantly due to the rapid data increase.In particular,vehicle communication data,considered a significant challenge in Intelligent Transportation Systems(ITS... Recently,the importance of data analysis has increased significantly due to the rapid data increase.In particular,vehicle communication data,considered a significant challenge in Intelligent Transportation Systems(ITS),has spatiotemporal characteristics and many missing values.High missing values in data lead to the decreased predictive performance of models.Existing missing value imputation models ignore the topology of transportation net-works due to the structural connection of road networks,although physical distances are close in spatiotemporal image data.Additionally,the learning process of missing value imputation models requires complete data,but there are limitations in securing complete vehicle communication data.This study proposes a missing value imputation model based on adversarial autoencoder using spatiotemporal feature extraction to address these issues.The proposed method replaces missing values by reflecting spatiotemporal characteristics of transportation data using temporal convolution and spatial convolution.Experimental results show that the proposed model has the lowest error rate of 5.92%,demonstrating excellent predictive accuracy.Through this,it is possible to solve the data sparsity problem and improve traffic safety by showing superior predictive performance. 展开更多
关键词 missing value adversarial autoencoder spatiotemporal feature extraction
下载PDF
Improving Prediction of Chronic Kidney Disease Using KNN Imputed SMOTE Features and TrioNet Model
5
作者 Nazik Alturki Abdulaziz Altamimi +5 位作者 Muhammad Umer Oumaima Saidani Amal Alshardan Shtwai Alsubai Marwan Omar Imran Ashraf 《Computer Modeling in Engineering & Sciences》 SCIE EI 2024年第6期3513-3534,共22页
Chronic kidney disease(CKD)is a major health concern today,requiring early and accurate diagnosis.Machine learning has emerged as a powerful tool for disease detection,and medical professionals are increasingly using ... Chronic kidney disease(CKD)is a major health concern today,requiring early and accurate diagnosis.Machine learning has emerged as a powerful tool for disease detection,and medical professionals are increasingly using ML classifier algorithms to identify CKD early.This study explores the application of advanced machine learning techniques on a CKD dataset obtained from the University of California,UC Irvine Machine Learning repository.The research introduces TrioNet,an ensemble model combining extreme gradient boosting,random forest,and extra tree classifier,which excels in providing highly accurate predictions for CKD.Furthermore,K nearest neighbor(KNN)imputer is utilized to deal withmissing values while synthetic minority oversampling(SMOTE)is used for class-imbalance problems.To ascertain the efficacy of the proposed model,a comprehensive comparative analysis is conducted with various machine learning models.The proposed TrioNet using KNN imputer and SMOTE outperformed other models with 98.97%accuracy for detectingCKD.This in-depth analysis demonstrates the model’s capabilities and underscores its potential as a valuable tool in the diagnosis of CKD. 展开更多
关键词 Precisionmedicine chronic kidney disease detection SMOTE missing values healthcare KNNimputer ensemble learning
下载PDF
Reconstruction of time series with missing value using 2D representation-based denoising autoencoder 被引量:1
6
作者 TAO Huamin DENG Qiuqun XIAO Shanzhu 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2020年第6期1087-1096,共10页
Time series analysis is a key technology for medical diagnosis,weather forecasting and financial prediction systems.However,missing data frequently occur during data recording,posing a great challenge to data mining t... Time series analysis is a key technology for medical diagnosis,weather forecasting and financial prediction systems.However,missing data frequently occur during data recording,posing a great challenge to data mining tasks.In this study,we propose a novel time series data representation-based denoising autoencoder(DAE)for the reconstruction of missing values.Two data representation methods,namely,recurrence plot(RP)and Gramian angular field(GAF),are used to transform the raw time series to a 2D matrix for establishing the temporal correlations between different time intervals and extracting the structural patterns from the time series.Then an improved DAE is proposed to reconstruct the missing values from the 2D representation of time series.A comprehensive comparison is conducted amongst the different representations on standard datasets.Results show that the 2D representations have a lower reconstruction error than the raw time series,and the RP representation provides the best outcome.This work provides useful insights into the better reconstruction of missing values in time series analysis to considerably improve the reliability of timevarying system. 展开更多
关键词 time series missing value 2D representation denoising autoencoder(DAE) RECONSTRUCTION
下载PDF
Cardiac Arrhythmia Disease Classifier Model Based on a Fuzzy Fusion Approach 被引量:1
7
作者 Fatma Taher Hamoud Alshammari +3 位作者 Lobna Osman Mohamed Elhoseny Abdulaziz Shehab Eman Elayat 《Computers, Materials & Continua》 SCIE EI 2023年第5期4485-4499,共15页
Cardiac diseases are one of the greatest global health challenges.Due to the high annual mortality rates,cardiac diseases have attracted the attention of numerous researchers in recent years.This article proposes a hy... Cardiac diseases are one of the greatest global health challenges.Due to the high annual mortality rates,cardiac diseases have attracted the attention of numerous researchers in recent years.This article proposes a hybrid fuzzy fusion classification model for cardiac arrhythmia diseases.The fusion model is utilized to optimally select the highest-ranked features generated by a variety of well-known feature-selection algorithms.An ensemble of classifiers is then applied to the fusion’s results.The proposed model classifies the arrhythmia dataset from the University of California,Irvine into normal/abnormal classes as well as 16 classes of arrhythmia.Initially,at the preprocessing steps,for the miss-valued attributes,we used the average value in the linear attributes group by the same class and the most frequent value for nominal attributes.However,in order to ensure the model optimality,we eliminated all attributes which have zero or constant values that might bias the results of utilized classifiers.The preprocessing step led to 161 out of 279 attributes(features).Thereafter,a fuzzy-based feature-selection fusion method is applied to fuse high-ranked features obtained from different heuristic feature-selection algorithms.In short,our study comprises three main blocks:(1)sensing data and preprocessing;(2)feature queuing,selection,and extraction;and(3)the predictive model.Our proposed method improves classification performance in terms of accuracy,F1measure,recall,and precision when compared to state-of-the-art techniques.It achieves 98.5%accuracy for binary class mode and 98.9%accuracy for categorized class mode. 展开更多
关键词 CARDIAC ARRHYTHMIA PREPROCESSING missing values classification model FUSION
下载PDF
Belief Combination of Classifiers for Incomplete Data
8
作者 Zuowei Zhang Songtao Ye +2 位作者 Yiru Zhang Weiping Ding Hao Wang 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2022年第4期652-667,共16页
Data with missing values,or incomplete information,brings some challenges to the development of classification,as the incompleteness may significantly affect the performance of classifiers.In this paper,we handle miss... Data with missing values,or incomplete information,brings some challenges to the development of classification,as the incompleteness may significantly affect the performance of classifiers.In this paper,we handle missing values in both training and test sets with uncertainty and imprecision reasoning by proposing a new belief combination of classifier(BCC)method based on the evidence theory.The proposed BCC method aims to improve the classification performance of incomplete data by characterizing the uncertainty and imprecision brought by incompleteness.In BCC,different attributes are regarded as independent sources,and the collection of each attribute is considered as a subset.Then,multiple classifiers are trained with each subset independently and allow each observed attribute to provide a sub-classification result for the query pattern.Finally,these sub-classification results with different weights(discounting factors)are used to provide supplementary information to jointly determine the final classes of query patterns.The weights consist of two aspects:global and local.The global weight calculated by an optimization function is employed to represent the reliability of each classifier,and the local weight obtained by mining attribute distribution characteristics is used to quantify the importance of observed attributes to the pattern classification.Abundant comparative experiments including seven methods on twelve datasets are executed,demonstrating the out-performance of BCC over all baseline methods in terms of accuracy,precision,recall,F1 measure,with pertinent computational costs. 展开更多
关键词 Classifier fusion CLASSIFICATION evidence theory incomplete data missing values
下载PDF
Towards Improving Predictive Statistical Learning Model Accuracy by Enhancing Learning Technique
9
作者 Ali Algarni Mahmoud Ragab +1 位作者 Wardah Alamri Samih MMostafa 《Computer Systems Science & Engineering》 SCIE EI 2022年第7期303-318,共16页
The accuracy of the statistical learning model depends on the learning technique used which in turn depends on the dataset’s values.In most research studies,the existence of missing values(MVs)is a vital problem.In a... The accuracy of the statistical learning model depends on the learning technique used which in turn depends on the dataset’s values.In most research studies,the existence of missing values(MVs)is a vital problem.In addition,any dataset with MVs cannot be used for further analysis or with any data driven tool especially when the percentage of MVs are high.In this paper,the authors propose a novel algorithm for dealing with MVs depending on the feature selec-tion(FS)of similarity classifier with fuzzy entropy measure.The proposed algo-rithm imputes MVs in cumulative order.The candidate feature to be manipulated is selected using similarity classifier with Parkash’s fuzzy entropy measure.The predictive model to predict MVs within the candidate feature is the Bayesian Ridge Regression(BRR)technique.Furthermore,any imputed features will be incorporated within the BRR equation to impute the MVs in the next chosen incomplete feature.The proposed algorithm was compared against some practical state-of-the-art imputation methods by conducting an experiment on four medical datasets which were gathered from several databases repository with MVs gener-ated from the three missingness mechanisms.The evaluation metrics of mean abso-lute error(MAE),root mean square error(RMSE)and coefficient of determination(R2 score)were used to measure the performance.The results exhibited that perfor-mance vary depending on the size of the dataset,amount of MVs and the missing-ness mechanism type.Moreover,compared to other methods,the results showed that the proposed method gives better accuracy and less error in most cases. 展开更多
关键词 Bayesian ridge regression fuzzy entropy measure feature selection IMPUTATION missing values missingness mechanisms similarity classifier medical dataset
下载PDF
Pretreating and normalizing metabolomics data for statistical analysis 被引量:1
10
作者 Jun Sun Yinglin Xia 《Genes & Diseases》 SCIE CSCD 2024年第3期188-205,共18页
Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integrat... Metabolomics as a research field and a set of techniques is to study the entire small molecules in biological samples.Metabolomics is emerging as a powerful tool generally for pre-cision medicine.Particularly,integration of microbiome and metabolome has revealed the mechanism and functionality of microbiome in human health and disease.However,metabo-lomics data are very complicated.Preprocessing/pretreating and normalizing procedures on metabolomics data are usually required before statistical analysis.In this review article,we comprehensively review various methods that are used to preprocess and pretreat metabolo-mics data,including MS-based data and NMR-based data preprocessing,dealing with zero and/or missing values and detecting outliers,data normalization,data centering and scaling,data transformation.We discuss the advantages and limitations of each method.The choice for a suitable preprocessing method is determined by the biological hypothesis,the characteristics of the data set,and the selected statistical data analysis method.We then provide the perspective of their applications in the microbiome and metabolome research. 展开更多
关键词 Data centering and scaling Data normalization Data transformation missing values MS-Baseddata preprocessing NMRData preprocessing OUTLIERS Preprocessing/pretreatment
原文传递
Missing Data Imputation for Traffic Flow Based on Improved Local Least Squares 被引量:6
11
作者 Gang Chang Yi Zhang Danya Yao 《Tsinghua Science and Technology》 EI CAS 2012年第3期304-309,共6页
Complete and reliable field traffic data is vital for the planning, design, and operation of urban traf- fic management systems. However, traffic data is often very incomplete in many traffic information systems, whic... Complete and reliable field traffic data is vital for the planning, design, and operation of urban traf- fic management systems. However, traffic data is often very incomplete in many traffic information systems, which hinders effective use of the data. Methods are needed for imputing missing traffic data to minimize the effect of incomplete data on the utilization. This paper presents an improved Local Least Squares (LLS) ap- proach to impute the incomplete data. The LLS is an improved version of the K Nearest Neighbor (KNN) method. First, the missing traffic data is replaced by a row average of the known values. Then, the vector angle and Euclidean distance are used to select the nearest neighbors. Finally, a regression step is used to get weights of the nearest neighbors and the imputation results. Traffic flow volume collected in Beijing was analyzed to compare this approach with the Bayesian Principle Component Analysis (BPCA) imputation ap- proach. Tests show that this approach provides slightly better performance than BPCA imputation to impute missing traffic data. 展开更多
关键词 Local Least Squares (LLS) vector angle missing value imputation traffic flow
原文传递
COSSET+: Crowdsourced Missing Value Imputation Optimized by Knowledge Base
12
作者 Hong-Zhi Wang Zhi-Xin Qi +2 位作者 Ruo-Xi Shi Jian-Zhong Li Hong Gao 《Journal of Computer Science & Technology》 SCIE EI CSCD 2017年第5期845-857,共13页
Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are ... Missing value imputation with crowdsourcing is a novel method in data cleaning to capture missing values that could hardly be filled with automatic approaches. However, the time cost and overhead in crowdsourcing are high. Therefore, we have to reduce cost and guarantee the accuracy of crowdsourced imputation. To achieve the optimization goal, we present COSSET+, a crowdsourced framework optimized by knowledge base. We combine the advantages of both knowledge-based filter and crowdsourcing platform to capture missing values. Since the amount of crowd values will affect the cost of COSSET+, we aim to select partial missing values to be crowdsourced. We prove that the crowd value selection problem is an NP-hard problem and develop an approximation algorithm for this problem. Extensive experimental results demonstrate the efficiency and effectiveness of the proposed approaches. 展开更多
关键词 crowdsourcing missing value IMPUTATION knowledge base OPTIMIZATION
原文传递
Incremental expectation maximization principal component analysis for missing value imputation for coevolving EEG data
13
作者 Sun Hee KIM Hyung Jeong YANG Kam Swee NG 《Journal of Zhejiang University-Science C(Computers and Electronics)》 SCIE EI 2011年第8期687-697,共11页
Missing values occur in bio-signal processing for various reasons,including technical problems or biological char-acteristics.These missing values are then either simply excluded or substituted with estimated values f... Missing values occur in bio-signal processing for various reasons,including technical problems or biological char-acteristics.These missing values are then either simply excluded or substituted with estimated values for further processing.When the missing signal values are estimated for electroencephalography (EEG) signals,an example where electrical signals arrive quickly and successively,rapid processing of high-speed data is required for immediate decision making.In this study,we propose an incremental expectation maximization principal component analysis (iEMPCA) method that automatically estimates missing values from multivariable EEG time series data without requiring a whole and complete data set.The proposed method solves the problem of a biased model,which inevitably results from simply removing incomplete data rather than estimating them,and thus reduces the loss of information by incorporating missing values in real time.By using an incremental approach,the proposed method alsominimizes memory usage and processing time of continuously arriving data.Experimental results show that the proposed method assigns more accurate missing values than previous methods. 展开更多
关键词 Electroencephalography (EEG) missing value imputation Hidden pattern discovery Expectation maximization Principal component analysis
原文传递
Multivariate time series imputation for energy data using neural networks
14
作者 Christopher Bulte Max Kleinebrahm +1 位作者 Hasan Umitcan Yilmaz Juan Gomez-Romero 《Energy and AI》 2023年第3期25-35,共11页
Multivariate time series with missing values are common in a wide range of applications,including energy data.Existing imputation methods often fail to focus on the temporal dynamics and the cross-dimensional correlat... Multivariate time series with missing values are common in a wide range of applications,including energy data.Existing imputation methods often fail to focus on the temporal dynamics and the cross-dimensional correlation simultaneously.In this paper we propose a two-step method based on an attention model to impute missing values in multivariate energy time series.First,the underlying distribution of the missing values in the data is learned.This information is then further used to train an attention based imputation model.By learning the distribution prior to the imputation process,the model can respond flexibly to the specific characteristics of the underlying data.The developed model is applied to European energy data,obtained from the European Network of Transmission System Operators for Electricity.Using different evaluation metrics and benchmarks,the conducted experiments show that the proposed model is preferable to the benchmarks and is able to accurately impute missing values. 展开更多
关键词 missing value estimation Multivariate time series Neural networks Attention model Energy data
原文传递
Issues in the Mining of Heart Failure Datasets
15
作者 Nongnuch Poolsawad Lisa Moore +1 位作者 Chandrasekhar Kambhampati John G.F.Cleland 《International Journal of Automation and computing》 EI CSCD 2014年第2期162-179,共18页
This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics ... This paper investigates the characteristics of a clinical dataset using a combination of feature selection and classification methods to handle missing values and understand the underlying statistical characteristics of a typical clinical dataset. Typically, when a large clinical dataset is presented, it consists of challenges such as missing values, high dimensionality, and unbalanced classes. These pose an inherent problem when implementing feature selection and classification algorithms. With most clinical datasets, an initial exploration of the dataset is carried out, and those attributes with more than a certain percentage of missing values are eliminated from the dataset. Later, with the help of missing value imputation, feature selection and classification algorithms, prognostic and diagnostic models are developed. This paper has two main conclusions: 1) Despite the nature of clinical datasets, and their large size, methods for missing value imputation do not affect the final performance. What is crucial is that the dataset is an accurate representation of the clinical problem and those methods of imputing missing values are not critical for developing classifiers and prognostic/diagnostic models. 2) Supervised learning has proven to be more suitable for mining clinical data than unsupervised methods. It is also shown that non-parametric classifiers such as decision trees give better results when compared to parametric classifiers such as radial basis function networks(RBFNs). 展开更多
关键词 Heart failure clinical dataset classification CLUSTERING missing values feature selection.
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部