期刊文献+
共找到102篇文章
< 1 2 6 >
每页显示 20 50 100
Missing Value Imputation for Radar-Derived Time-Series Tracks of Aerial Targets Based on Improved Self-Attention-Based Network
1
作者 Zihao Song Yan Zhou +2 位作者 Wei Cheng Futai Liang Chenhao Zhang 《Computers, Materials & Continua》 SCIE EI 2024年第3期3349-3376,共28页
The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random mis... The frequent missing values in radar-derived time-series tracks of aerial targets(RTT-AT)lead to significant challenges in subsequent data-driven tasks.However,the majority of imputation research focuses on random missing(RM)that differs significantly from common missing patterns of RTT-AT.The method for solving the RM may experience performance degradation or failure when applied to RTT-AT imputation.Conventional autoregressive deep learning methods are prone to error accumulation and long-term dependency loss.In this paper,a non-autoregressive imputation model that addresses the issue of missing value imputation for two common missing patterns in RTT-AT is proposed.Our model consists of two probabilistic sparse diagonal masking self-attention(PSDMSA)units and a weight fusion unit.It learns missing values by combining the representations outputted by the two units,aiming to minimize the difference between the missing values and their actual values.The PSDMSA units effectively capture temporal dependencies and attribute correlations between time steps,improving imputation quality.The weight fusion unit automatically updates the weights of the output representations from the two units to obtain a more accurate final representation.The experimental results indicate that,despite varying missing rates in the two missing patterns,our model consistently outperforms other methods in imputation performance and exhibits a low frequency of deviations in estimates for specific missing entries.Compared to the state-of-the-art autoregressive deep learning imputation model Bidirectional Recurrent Imputation for Time Series(BRITS),our proposed model reduces mean absolute error(MAE)by 31%~50%.Additionally,the model attains a training speed that is 4 to 8 times faster when compared to both BRITS and a standard Transformer model when trained on the same dataset.Finally,the findings from the ablation experiments demonstrate that the PSDMSA,the weight fusion unit,cascade network design,and imputation loss enhance imputation performance and confirm the efficacy of our design. 展开更多
关键词 Missing value imputation time-series tracks probabilistic sparsity diagonal masking self-attention weight fusion
下载PDF
Missing Data Imputation: A Comprehensive Review
2
作者 Majed Alwateer El-Sayed Atlam +2 位作者 Mahmoud Mohammed Abd El-Raouf Osama A. Ghoneim Ibrahim Gad 《Journal of Computer and Communications》 2024年第11期53-75,共23页
Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techn... Missing data presents a significant challenge in statistical analysis and machine learning, often resulting in biased outcomes and diminished efficiency. This comprehensive review investigates various imputation techniques, categorizing them into three primary approaches: deterministic methods, probabilistic models, and machine learning algorithms. Traditional techniques, including mean or mode imputation, regression imputation, and last observation carried forward, are evaluated alongside more contemporary methods such as multiple imputation, expectation-maximization, and deep learning strategies. The strengths and limitations of each approach are outlined. Key considerations for selecting appropriate methods, based on data characteristics and research objectives, are discussed. The importance of evaluating imputation’s impact on subsequent analyses is emphasized. This synthesis of recent advancements and best practices provides researchers with a robust framework for effectively handling missing data, thereby improving the reliability of empirical findings across diverse disciplines. 展开更多
关键词 Missing Data Machine Learning PREDICTION Deep Learning IMPUTATION
下载PDF
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
3
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional Data Linear Regression Model Least Square Method Robust Least Square Method Synthetic Data Aitchison Distance Maximum Likelihood Estimation Expectation-Maximization Algorithm k-Nearest Neighbor and Mean imputation
下载PDF
特征价格法在房地产价格指数中的应用 被引量:6
4
作者 孙宪华 刘振惠 张臣曦 《现代财经(天津财经大学学报)》 CSSCI 北大核心 2008年第5期61-65,共5页
特征价格法(Hedonic method)是将房地产价格变动中的质量特征因素进行分解,以显现出各项特征的隐含价格。并从价格的总变动中逐项剔除质量特征变动的影响,达到仅仅反映纯价格变动的目的。本文通过双重Imputation过程估计缺失价格和剔除... 特征价格法(Hedonic method)是将房地产价格变动中的质量特征因素进行分解,以显现出各项特征的隐含价格。并从价格的总变动中逐项剔除质量特征变动的影响,达到仅仅反映纯价格变动的目的。本文通过双重Imputation过程估计缺失价格和剔除异常值的影响,解决了可比性问题,并增强了Hedonic模型的稳定性。 展开更多
关键词 房地产价格指数 质量调整 特征价格法 双重Imputation
下载PDF
Imputation from SNP chip to sequence: a case study in a Chinese indigenous chicken population 被引量:9
5
作者 Shaopan Ye Xiaolong Yuan +6 位作者 Xiran Lin Ning Gao Yuanyu Luo Zanmou Chen Jiaqi Li Xiquan Zhang Zhe Zhang 《Journal of Animal Science and Biotechnology》 SCIE CAS CSCD 2018年第2期294-305,共12页
Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputatio... Background: Genome-wide association studies and genomic predictions are thought to be optimized by using whole-genome sequence(WGS) data. However, sequencing thousands of individuals of interest is expensive.Imputation from SNP panels to WGS data is an attractive and less expensive approach to obtain WGS data. The aims of this study were to investigate the accuracy of imputation and to provide insight into the design and execution of genotype imputation.Results: We genotyped 450 chickens with a 600 K SNP array, and sequenced 24 key individuals by whole genome re-sequencing. Accuracy of imputation from putative 60 K and 600 K array data to WGS data was 0.620 and 0.812 for Beagle, and 0.810 and 0.914 for FImpute, respectively. By increasing the sequencing cost from 24 X to 144 X, the imputation accuracy increased from 0.525 to 0.698 for Beagle and from 0.654 to 0.823 for FImpute. With fixed sequence depth(12 X), increasing the number of sequenced animals from 1 to 24, improved accuracy from 0.421 to0.897 for FImpute and from 0.396 to 0.777 for Beagle. Using optimally selected key individuals resulted in a higher imputation accuracy compared with using randomly selected individuals as a reference population for resequencing. With fixed reference population size(24), imputation accuracy increased from 0.654 to 0.875 for FImpute and from 0.512 to 0.762 for Beagle as the sequencing depth increased from 1 X to 12 X. With a given total cost of genotyping, accuracy increased with the size of the reference population for FImpute, but the pattern was not valid for Beagle, which showed the highest accuracy at six fold coverage for the scenarios used in this study.Conclusions: In conclusion, we comprehensively investigated the impacts of several key factors on genotype imputation. Generally, increasing sequencing cost gave a higher imputation accuracy. But with a fixed sequencing cost, the optimal imputation enhance the performance of WGP and GWAS. An optimal imputation strategy should take size of reference population, imputation algorithms, marker density, and population structure of the target population and methods to select key individuals into consideration comprehensively. This work sheds additional light on how to design and execute genotype imputation for livestock populations. 展开更多
关键词 CHICKENS IMPUTATION RE-SEQUENCING SNP
下载PDF
基于IMPUTE2的全基因组关联性研究的基因型填补
6
作者 辛俊逸 葛雨秋 +5 位作者 邵卫 杜牧龙 马高祥 储海燕 王美林 张正东 《科学技术与工程》 北大核心 2018年第15期56-60,共5页
多数全基因组关联性研究(GWAS)采用不同的分型芯片,导致遗传变异位点的数目及选择准则不同。基因型填补可以依据已有的基因分型数据,对未分型的位点进行填补。在应用IMPUTE2软件对基因型和表型数据库(db Ga P)中胃癌GWAS数据进行全基因... 多数全基因组关联性研究(GWAS)采用不同的分型芯片,导致遗传变异位点的数目及选择准则不同。基因型填补可以依据已有的基因分型数据,对未分型的位点进行填补。在应用IMPUTE2软件对基因型和表型数据库(db Ga P)中胃癌GWAS数据进行全基因组填补,以详细介绍全基因组填补的原理和过程。以第九号染色体为例,使用1000 Genome Project模板介绍全基因组填补的过程,包括填补前的质量控制、Pre-phasing、填补过程、填补的质量评估及填补后的关联性分析。第九号染色体在填补前有21 033个位点;而在填补后有1 630 406个SNP;其中INFO>0.3的SNP位点有817 494个;而填补质量较高(INFO>0.5)的位点数目有584 755个。IMPUTE2软件可以快速准确的对未分型的基因型进行填补,从而可以将多个GWAS数据整合到相同的位点数和密度上,再进行联合分析可以提高检验的把握度以便发现新的遗传易感性位点。 展开更多
关键词 GWAS 基因型填补 IMPUTE2 填补质量
下载PDF
New insights into the associations among feed efficiency, metabolizable efficiency traits and related QTL regions in broiler chickens 被引量:6
7
作者 Wei Li Ranran Liu +5 位作者 Maiqing Zheng Furong Feng Dawei Liu Yuming Guo Guiping Zhao Jie Wen 《Journal of Animal Science and Biotechnology》 SCIE CAS CSCD 2020年第4期950-964,共15页
Background: Improving the feed efficiency would increase profitability for producers while also reducing the environmental footprint of livestock production. This study was conducted to investigate the relationships a... Background: Improving the feed efficiency would increase profitability for producers while also reducing the environmental footprint of livestock production. This study was conducted to investigate the relationships among feed efficiency traits and metabolizable efficiency traits in 180 male broilers. Significant loci and genes affecting the metabolizable efficiency traits were explored with an imputation-based genome-wide association study. The traits measured or calculated comprised three growth traits, five feed efficiency related traits, and nine metabolizable efficiency traits.Results: The residual feed intake(RFI) showed moderate to high and positive phenotypic correlations with eight other traits measured, including average daily feed intake(ADFI), dry excreta weight(DEW), gross energy excretion(GEE), crude protein excretion(CPE), metabolizable dry matter(MDM), nitrogen corrected apparent metabolizable energy(AMEn), abdominal fat weight(Ab F), and percentage of abdominal fat(Ab P). Greater correlations were observed between growth traits and the feed conversion ratio(FCR) than RFI. In addition, the RFI, FCR, ADFI, DEW,GEE, CPE, MDM, AMEn, Ab F, and Ab P were lower in low-RFI birds than high-RFI birds(P < 0.01 or P < 0.05), whereas the coefficients of MDM and MCP of low-RFI birds were greater than those of high-RFI birds(P < 0.01). Five narrow QTLs for metabolizable efficiency traits were detected, including one 82.46-kb region for DEW and GEE on Gallus gallus chromosome(GGA) 26, one 120.13-kb region for MDM and AMEn on GGA1, one 691.25-kb region for the coefficients of MDM and AMEn on GGA5, one region for the coefficients of MDM and MCP on GGA2(103.45–103.53 Mb), and one 690.50-kb region for the coefficient of MCP on GGA14. Linkage disequilibrium(LD) analysis indicated that the five regions contained high LD blocks, as well as the genes chromosome 26 C6 orf106 homolog(C26 H6 orf106), LOC396098, SH3 and multiple ankyrin repeat domains 2(SHANK2), ETS homologous factor(EHF), and histamine receptor H3-like(HRH3 L), which are known to be involved in the regulation of neurodevelopment, cell proliferation and differentiation, and food intake.Conclusions: Selection for low RFI significantly decreased chicken feed intake, excreta output, and abdominal fat deposition, and increased nutrient digestibility without changing the weight gain. Five novel QTL regions involved in the control of metabolizable efficiency in chickens were identified. These results, combined through nutritional and genetic approaches, should facilitate novel insights into improving feed efficiency in poultry and other species. 展开更多
关键词 BROILER Feed efficiency Genome-wide association study IMPUTATION Metabolizable efficiency
下载PDF
Comparisons of improved genomic predictions generated by different imputation methods for genotyping by sequencing data in livestock populations 被引量:4
8
作者 Xiao Wang Guosheng Su +2 位作者 Dan Hao Mogens SandøLund Haja N.Kadarmideen 《Journal of Animal Science and Biotechnology》 CAS CSCD 2020年第2期316-326,共11页
Background:Genotyping by sequencing(GBS)still has problems with missing genotypes.Imputation is important for using GBS for genomic predictions,especially for low depths,due to the large number of missing genotypes.Mi... Background:Genotyping by sequencing(GBS)still has problems with missing genotypes.Imputation is important for using GBS for genomic predictions,especially for low depths,due to the large number of missing genotypes.Minor allele frequency(MAF)is widely used as a marker data editing criteria for genomic predictions.In this study,three imputation methods(Beagle,IMPUTE2 and FImpute software)based on four MAF editing criteria were investigated with regard to imputation accuracy of missing genotypes and accuracy of genomic predictions,based on simulated data of livestock population.Results:Four MAFs(no MAF limit,MAF≥0.001,MAF≥0.01 and MAF≥0.03)were used for editing marker data before imputation.Beagle,IMPUTE2 and FImpute software were applied to impute the original GBS.Additionally,IMPUTE2 also imputed the expected genotype dosage after genotype correction(GcIM).The reliability of genomic predictions was calculated using GBS and imputed GBS data.The results showed that imputation accuracies were the same for the three imputation methods,except for the data of sequencing read depth(depth)=2,where FImpute had a slightly lower imputation accuracy than Beagle and IMPUTE2.GcIM was observed to be the best for all of the imputations at depth=4,5 and 10,but the worst for depth=2.For genomic prediction,retaining more SNPs with no MAF limit resulted in higher reliability.As the depth increased to 10,the prediction reliabilities approached those using true genotypes in the GBS loci.Beagle and IMPUTE2 had the largest increases in prediction reliability of 5 percentage points,and FImpute gained 3 percentage points at depth=2.The best prediction was observed at depth=4,5 and 10 using GcIM,but the worst prediction was also observed using GcIM at depth=2.Conclusions:The current study showed that imputation accuracies were relatively low for GBS with low depths and high for GBS with high depths.Imputation resulted in larger gains in the reliability of genomic predictions for GBS with lower depths.These results suggest that the application of IMPUTE2,based on a corrected GBS(GcIM)to improve genomic predictions for higher depths,and FImpute software could be a good alternative for routine imputation. 展开更多
关键词 Genomic prediction Genotyping by sequencing IMPUTATION MAF Simulation
下载PDF
Energy Consumption Prediction of a CNC Machining Process With Incomplete Data 被引量:6
9
作者 Jian Pan Congbo Li +2 位作者 Ying Tang Wei Li Xiaoou Li 《IEEE/CAA Journal of Automatica Sinica》 SCIE EI CSCD 2021年第5期987-1000,共14页
Energy consumption prediction of a CNC machining process is important for energy efficiency optimization strategies.To improve the generalization abilities,more and more parameters are acquired for energy prediction m... Energy consumption prediction of a CNC machining process is important for energy efficiency optimization strategies.To improve the generalization abilities,more and more parameters are acquired for energy prediction modeling.While the data collected from workshops may be incomplete because of misoperation,unstable network connections,and frequent transfers,etc.This work proposes a framework for energy modeling based on incomplete data to address this issue.First,some necessary preliminary operations are used for incomplete data sets.Then,missing values are estimated to generate a new complete data set based on generative adversarial imputation nets(GAIN).Next,the gene expression programming(GEP)algorithm is utilized to train the energy model based on the generated data sets.Finally,we test the predictive accuracy of the obtained model.Computational experiments are designed to investigate the performance of the proposed framework with different rates of missing data.Experimental results demonstrate that even when the missing data rate increases to 30%,the proposed framework can still make efficient predictions,with the corresponding RMSE and MAE 0.903 k J and 0.739 k J,respectively. 展开更多
关键词 Energy consumption prediction incomplete data generative adversarial imputation nets(GAIN) gene expression programming(GEP)
下载PDF
Establishment and verification of a surgical prognostic model for cervical spinal cord injury without radiological abnormality 被引量:5
10
作者 Jie Wang Shuai Guo +2 位作者 Xuan Cai Jia-Wei Xu Hao-Peng Li 《Neural Regeneration Research》 SCIE CAS CSCD 2019年第4期713-720,共8页
Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prog... Some studies have suggested that early surgical treatment can effectively improve the prognosis of cervical spinal cord injury without radiological abnormality, but no research has focused on the development of a prognostic model of cervical spinal cord injury without radiological abnormality. This retrospective analysis included 43 patients with cervical spinal cord injury without radiological abnormality. Seven potential factors were assessed: age, sex, external force strength causing damage, duration of disease, degree of cervical spinal stenosis, Japanese Orthopaedic Association score, and physiological cervical curvature. A model was established using multiple binary logistic regression analysis. The model was evaluated by concordant profiling and the area under the receiver operating characteristic curve. Bootstrapping was used for internal validation. The prognostic model was as follows: logit(P) =-25.4545 + 21.2576 VALUE + 1.2160SCORE-3.4224 TIME, where VALUE refers to the Pavlov ratio indicating the extent of cervical spinal stenosis, SCORE refers to the Japanese Orthopaedic Association score(0–17) after the operation, and TIME refers to the disease duration(from injury to operation). The area under the receiver operating characteristic curve for all patients was 0.8941(95% confidence interval, 0.7930–0.9952). Three factors assessed in the predictive model were associated with patient outcomes: a great extent of cervical stenosis, a poor preoperative neurological status, and a long disease duration. These three factors could worsen patient outcomes. Moreover, the disease prognosis was considered good when logit(P) ≥-2.5105. Overall, the model displayed a certain clinical value. This study was approved by the Biomedical Ethics Committee of the Second Affiliated Hospital of Xi'an Jiaotong University, China(approval number: 2018063) on May 8, 2018. 展开更多
关键词 nerve REGENERATION SURGICAL prognostic model CERVICAL SPINAL cord injury retrospective study MULTIPLE binary logistic regression analysis bootstrapping internal validation MULTIPLE imputations CERVICAL SPINAL stenosis duration of disease Pavlov ratio neural REGENERATION
下载PDF
Improved KNN Imputation for Missing Values in Gene Expression Data 被引量:3
11
作者 Phimmarin Keerin Tossapon Boongoen 《Computers, Materials & Continua》 SCIE EI 2022年第2期4009-4025,共17页
The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many... The problem of missing values has long been studied by researchers working in areas of data science and bioinformatics,especially the analysis of gene expression data that facilitates an early detection of cancer.Many attempts show improvements made by excluding samples with missing information from the analysis process,while others have tried to fill the gaps with possible values.While the former is simple,the latter safeguards information loss.For that,a neighbour-based(KNN)approach has proven more effective than other global estimators.The paper extends this further by introducing a new summarizationmethod to theKNNmodel.It is the first study that applies the concept of ordered weighted averaging(OWA)operator to such a problem context.In particular,two variations of OWA aggregation are proposed and evaluated against their baseline and other neighbor-based models.Using different ratios of missing values from 1%-20%and a set of six published gene expression datasets,the experimental results suggest that newmethods usually provide more accurate estimates than those compared methods.Specific to the missing rates of 5%and 20%,the best NRMSE scores as averages across datasets is 0.65 and 0.69,while the highest measures obtained by existing techniques included in this study are 0.80 and 0.84,respectively. 展开更多
关键词 Gene expression missing value IMPUTATION KNN OWA operator
下载PDF
Incorporating genomic annotation into single-step genomic prediction with imputed whole-genome sequence data 被引量:2
12
作者 TENG Jin-yan YE Shao-pan +8 位作者 GAO Ning CHEN Zi-tao DIAO Shu-qi LI Xiu-jin YUAN Xiao-long ZHANG Hao LI Jia-qi ZHANG Xi-quan ZHANG Zhe 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2022年第4期1126-1136,共11页
Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungeno... Single-step genomic best linear unbiased prediction(ss GBLUP) is now intensively investigated and widely used in livestock breeding due to its beneficial feature of combining information from both genotyped and ungenotyped individuals in the single model. With the increasing accessibility of whole-genome sequence(WGS) data at the population level, more attention is being paid to the usage of WGS data in ss GBLUP. The predictive ability of ss GBLUP using WGS data might be improved by incorporating biological knowledge from public databases. Thus, we extended ss GBLUP, incorporated genomic annotation information into the model, and evaluated them using a yellow-feathered chicken population as the examples. The chicken population consisted of 1 338 birds with 23 traits, where imputed WGS data including 5 127 612 single nucleotide polymorphisms(SNPs) are available for 895 birds. Considering different combinations of annotation information and models, original ss GBLUP, haplotype-based ss GHBLUP, and four extended ss GBLUP incorporating genomic annotation models were evaluated. Based on the genomic annotation(GRCg6a) of chickens, 3 155 524 and 94 837 SNPs were mapped to genic and exonic regions, respectively. Extended ss GBLUP using genic/exonic SNPs outperformed other models with respect to predictive ability in 15 out of 23 traits, and their advantages ranged from 2.5 to 6.1% compared with original ss GBLUP. In addition, to further enhance the performance of genomic prediction with imputed WGS data, we investigated the genotyping strategies of reference population on ss GBLUP in the chicken population. Comparing two strategies of individual selection for genotyping in the reference population, the strategy of evenly selection by family(SBF) performed slightly better than random selection in most situations. Overall, we extended genomic prediction models that can comprehensively utilize WGS data and genomic annotation information in the framework of ss GBLUP, and validated the idea that properly handling the genomic annotation information and WGS data increased the predictive ability of ss GBLUP. Moreover, while using WGS data, the genotyping strategy of maximizing the expected genetic relationship between the reference and candidate population could further improve the predictive ability of ss GBLUP. The results from this study shed light on the comprehensive usage of genomic annotation information in WGS-based single-step genomic prediction. 展开更多
关键词 genomic selection prior information sequencing data genotype imputation HAPLOTYPE
下载PDF
A comprehensive evaluation of factors affecting the accuracy of pig genotype imputation using a single or multi-breed reference population 被引量:2
13
作者 ZHANG Kai-li PENG Xia +6 位作者 ZHANG Sai-xian ZHAN Hui-wen LU Jia-hui XIE Sheng-song ZHAO Shu-hong LI Xin-yun MA Yun-long 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2022年第2期486-495,共10页
Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce i... Genotype imputation has become an indispensable part of genomic data analysis. In recent years, imputation based on a multi-breed reference population has received more attention, but the relevant studies are scarce in pigs. In this study, we used the Illumina Porcine SNP50 Bead Chip to investigate the variations of imputation accuracy with various influencing factors and compared the imputation performance of four commonly used imputation software programs. The results indicated that imputation accuracy increased as either the validation population marker density, reference population sample size, or minor allele frequency(MAF) increased. However, the imputation accuracy would have a certain extent of decrease when the pig reference population was a mixed group of multiple breeds or lines. Considering both imputation accuracy and running time, Beagle 4.1 and FImpute are excellent choices among the four software packages tested. This work visually presents the impacts of these influencing factors on imputation and provides a reference for formulating reasonable imputation strategies in actual pig breeding. 展开更多
关键词 genotype imputation multi-breed reference population imputation accuracy
下载PDF
Comparative Study of Four Methods in Missing Value Imputations under Missing Completely at Random Mechanism 被引量:3
14
作者 Michikazu Nakai Ding-Geng Chen +1 位作者 Kunihiro Nishimura Yoshihiro Miyamoto 《Open Journal of Statistics》 2014年第1期27-37,共11页
In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. ... In analyzing data from clinical trials and longitudinal studies, the issue of missing values is always a fundamental challenge since the missing data could introduce bias and lead to erroneous statistical inferences. To deal with this challenge, several imputation methods have been developed in the literature to handle missing values where the most commonly used are complete case method, mean imputation method, last observation carried forward (LOCF) method, and multiple imputation (MI) method. In this paper, we conduct a simulation study to investigate the efficiency of these four typical imputation methods with longitudinal data setting under missing completely at random (MCAR). We categorize missingness with three cases from a lower percentage of 5% to a higher percentage of 30% and 50% missingness. With this simulation study, we make a conclusion that LOCF method has more bias than the other three methods in most situations. MI method has the least bias with the best coverage probability. Thus, we conclude that MI method is the most effective imputation method in our MCAR simulation study. 展开更多
关键词 MISSING Data IMPUTATION MCAR COMPLETE Case LOCF
下载PDF
Genome-wide association study for numbers of vertebrae in Dezhou donkey population reveals new candidate genes 被引量:1
15
作者 SUN Yan LI Yu-hua +11 位作者 ZHAO Chang-heng TENG Jun WANG Yong-hui WANG Tian-qi SHI Xiaoyuan LIU Zi-wen LI Hai-jing WANG Ji-jing WANG Wen-wen NING Chao WANG Chang-fa ZHANG Qin 《Journal of Integrative Agriculture》 SCIE CAS CSCD 2023年第10期3159-3169,共11页
Numbers of vertebrae is an important economic trait associated with body size and meat productivity in animals.However,the genetic basis of vertebrae number in donkey remains to be well understood.The aim of this stud... Numbers of vertebrae is an important economic trait associated with body size and meat productivity in animals.However,the genetic basis of vertebrae number in donkey remains to be well understood.The aim of this study was to identify candidate genes affecting the number of thoracic(TVn)and the number of lumbar vertebrae(LVn)in Dezhou donkey.A genome-wide association study was conducted using whole genome sequence data imputed from low-coverage genome sequencing.For TVn,we identified 38 genome-wide significant and 64 suggestive SNPs,which relate to 7 genes(NLGN1,DCC,SLC26A7,TOX,WNT7A,LOC123286078,and LOC123280142).For LVn,we identified 9 genome-wide significant and 38 suggestive SNPs,which relate to 8 genes(GABBR2,FBXO4,LOC123277146,LOC123277359,BMP7,B3GAT1,EML2,and LRP5).The genes involve in the Wnt and TGF-βsignaling pathways and may play an important role in embryonic development or bone formation and could be good candidate genes for TVn and LVn. 展开更多
关键词 numbers of vertebrae GWAS genotype imputation Dezhou donkey
下载PDF
Comparative Variance and Multiple Imputation Used for Missing Values in Land Price DataSet 被引量:1
16
作者 Longqing Zhang Xinwei Zhang +2 位作者 Liping Bai Yanghong Zhang Feng Sun Changcheng Chen 《Computers, Materials & Continua》 SCIE EI 2019年第9期1175-1187,共13页
Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by st... Based on the two-dimensional relation table,this paper studies the missing values in the sample data of land price of Shunde District of Foshan City.GeoDa software was used to eliminate the insignificant factors by stepwise regression analysis;NORM software was adopted to construct the multiple imputation models;EM algorithm and the augmentation algorithm were applied to fit multiple linear regression equations to construct five different filling datasets.Statistical analysis is performed on the imputation data set in order to calculate the mean and variance of each data set,and the weight is determined according to the differences.Finally,comprehensive integration is implemented to achieve the imputation expression of missing values.The results showed that in the three missing cases where the PRICE variable was missing and the deletion rate was 5%,the PRICE variable was missing and the deletion rate was 10%,and the PRICE variable and the CBD variable were both missing.The new method compared to the traditional multiple filling methods of true value closer ratio is 75%to 25%,62.5%to 37.5%,100%to 0%.Therefore,the new method is obviously better than the traditional multiple imputation methods,and the missing value data estimated by the new method bears certain reference value. 展开更多
关键词 Imputation method multiple imputations probabilistic model
下载PDF
Comparison of Missing Data Imputation Methods in Time Series Forecasting 被引量:1
17
作者 Hyun Ahn Kyunghee Sun Kwanghoon Pio Kim 《Computers, Materials & Continua》 SCIE EI 2022年第1期767-779,共13页
Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.I... Time series forecasting has become an important aspect of data analysis and has many real-world applications.However,undesirable missing values are often encountered,which may adversely affect many forecasting tasks.In this study,we evaluate and compare the effects of imputationmethods for estimating missing values in a time series.Our approach does not include a simulation to generate pseudo-missing data,but instead perform imputation on actual missing data and measure the performance of the forecasting model created therefrom.In an experiment,therefore,several time series forecasting models are trained using different training datasets prepared using each imputation method.Subsequently,the performance of the imputation methods is evaluated by comparing the accuracy of the forecasting models.The results obtained from a total of four experimental cases show that the k-nearest neighbor technique is the most effective in reconstructing missing data and contributes positively to time series forecasting compared with other imputation methods. 展开更多
关键词 Missing data imputation method time series forecasting LSTM
下载PDF
A Fast and Effective Multiple Kernel Clustering Method on Incomplete Data 被引量:1
18
作者 Lingyun Xiang Guohan Zhao +3 位作者 Qian Li Gwang-Jun Kim Osama Alfarraj Amr Tolba 《Computers, Materials & Continua》 SCIE EI 2021年第4期267-284,共18页
Multiple kernel clustering is an unsupervised data analysis method that has been used in various scenarios where data is easy to be collected but hard to be labeled.However,multiple kernel clustering for incomplete da... Multiple kernel clustering is an unsupervised data analysis method that has been used in various scenarios where data is easy to be collected but hard to be labeled.However,multiple kernel clustering for incomplete data is a critical yet challenging task.Although the existing absent multiple kernel clustering methods have achieved remarkable performance on this task,they may fail when data has a high value-missing rate,and they may easily fall into a local optimum.To address these problems,in this paper,we propose an absent multiple kernel clustering(AMKC)method on incomplete data.The AMKC method rst clusters the initialized incomplete data.Then,it constructs a new multiple-kernel-based data space,referred to as K-space,from multiple sources to learn kernel combination coefcients.Finally,it seamlessly integrates an incomplete-kernel-imputation objective,a multiple-kernel-learning objective,and a kernel-clustering objective in order to achieve absent multiple kernel clustering.The three stages in this process are carried out simultaneously until the convergence condition is met.Experiments on six datasets with various characteristics demonstrate that the kernel imputation and clustering performance of the proposed method is signicantly better than state-of-the-art competitors.Meanwhile,the proposed method gains fast convergence speed. 展开更多
关键词 Multiple kernel clustering absent-kernel imputation incomplete data kernel k-means clustering
下载PDF
Application of imputation methods to genomic selection in Chinese Holstein cattle 被引量:2
19
作者 Ziqing Weng Zhe Zhang +4 位作者 Xiangdong Ding Weixuan Fu Peipei Ma Chonglong Wang Qin Zhang 《Journal of Animal Science and Biotechnology》 SCIE 2012年第1期16-20,共5页
关键词 Chinese Holstein Cows dairy cattle genomic selection imputation methods quality control SNP
下载PDF
Missing Data Imputations for Upper Air Temperature at 24 Standard Pressure Levels over Pakistan Collected from Aqua Satellite 被引量:4
20
作者 Muhammad Usman Saleem Sajid Rashid Ahmed 《Journal of Data Analysis and Information Processing》 2016年第3期132-146,共16页
This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bil... This research was an effort to select best imputation method for missing upper air temperature data over 24 standard pressure levels. We have implemented four imputation techniques like inverse distance weighting, Bilinear, Natural and Nearest interpolation for missing data imputations. Performance indicators for these techniques were the root mean square error (RMSE), absolute mean error (AME), correlation coefficient and coefficient of determination ( R<sup>2</sup> ) adopted in this research. We randomly make 30% of total samples (total samples was 324) predictable from 70% remaining data. Although four interpolation methods seem good (producing <1 RMSE, AME) for imputations of air temperature data, but bilinear method was the most accurate with least errors for missing data imputations. RMSE for bilinear method remains <0.01 on all pressure levels except 1000 hPa where this value was 0.6. The low value of AME (<0.1) came at all pressure levels through bilinear imputations. Very strong correlation (>0.99) found between actual and predicted air temperature data through this method. The high value of the coefficient of determination (0.99) through bilinear interpolation method, tells us best fit to the surface. We have also found similar results for imputation with natural interpolation method in this research, but after investigating scatter plots over each month, imputations with this method seem to little obtuse in certain months than bilinear method. 展开更多
关键词 Missing Data Imputations Spatial Interpolation AQUA Satellite Upper Level Air Temperature AIRX3STML
下载PDF
上一页 1 2 6 下一页 到第
使用帮助 返回顶部