期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
1
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional Data Linear Regression Model Least Square Method Robust Least Square Method Synthetic Data Aitchison distance Maximum Likelihood Estimation Expectation-Maximization Algorithm k-nearest neighbor and Mean imputation
下载PDF
区间多目标优化中决策空间约束、支配及同序解筛选策略 被引量:14
2
作者 陈志旺 白锌 +2 位作者 杨七 黄兴旺 李国强 《自动化学报》 EI CSCD 北大核心 2015年第12期2115-2124,共10页
针对优化函数未知的昂贵区间多目标优化,根据决策空间数据挖掘,提出了一种基于最近邻法和主成分分析法(Principal component analysis,PCA)的NSGA-II算法.该算法首先通过约束条件将待测解集分为可行解和非可行解,利用最近邻法对待测解... 针对优化函数未知的昂贵区间多目标优化,根据决策空间数据挖掘,提出了一种基于最近邻法和主成分分析法(Principal component analysis,PCA)的NSGA-II算法.该算法首先通过约束条件将待测解集分为可行解和非可行解,利用最近邻法对待测解和样本解进行相似性计算,判断待测解是否满足约束.然后对于两个解的Pareto支配性同样利用最近邻法来区分解之间的被支配和非被支配关系.由于目标空间拥挤距离无法求出,为此在决策空间利用主成分分析法将K-均值聚类后的解集降维,找出待测解的前、后近距离解,通过决策空间拥挤距离对同序值解进行筛选.实现NSGA-II算法的改进. 展开更多
关键词 多目标优化 区间规划 NSGA-II 最近邻法 拥挤距离
下载PDF
利用k近邻区间距离的异步抗差航迹关联算法 被引量:3
3
作者 衣晓 曾睿 曹昕莹 《系统工程与电子技术》 EI CSCD 北大核心 2022年第5期1475-1482,共8页
针对异步和系统误差并存情况下的航迹关联问题,提出利用航迹序列k近邻区间距离的异步抗差航迹关联算法。定义区间序列与区间点的k近邻区间距离度量,提出系统误差区间化方法,通过不等长航迹区间序列间的灰色关联度,利用经典分配法进行航... 针对异步和系统误差并存情况下的航迹关联问题,提出利用航迹序列k近邻区间距离的异步抗差航迹关联算法。定义区间序列与区间点的k近邻区间距离度量,提出系统误差区间化方法,通过不等长航迹区间序列间的灰色关联度,利用经典分配法进行航迹关联判定。与传统算法相比,对系统误差先验信息的要求低。仿真结果表明,算法能以较高正确率实现稳定关联,具有良好的抗差性。算法亦可处理异步不等速率航迹关联问题,无需时域配准,具有明显的优势。 展开更多
关键词 航迹关联 抗差关联 k近邻区间距离 灰色关联度 区间化
下载PDF
An up -to -date comparative analysis of the KNN classifier distance metrics for text categorization
4
作者 Onder Coban 《Data Science and Informetrics》 2023年第2期67-78,共12页
Text categorization(TC)is one of the widely studied branches of text mining and has many applications in different domains.It tries to automatically assign a text document to one of the predefined categories often by ... Text categorization(TC)is one of the widely studied branches of text mining and has many applications in different domains.It tries to automatically assign a text document to one of the predefined categories often by using machine learning(ML)techniques.Choosing the best classifier in this task is the most important step in which k-Nearest Neighbor(KNN)is widely employed as a classifier as well as several other well-known ones such as Support Vector Machine,Multinomial Naive Bayes,Logistic Regression,and so on.The KNN has been extensively used for TC tasks and is one of the oldest and simplest methods for pattern classification.Its performance crucially relies on the distance metric used to identify nearest neighbors such that the most frequently observed label among these neighbors is used to classify an unseen test instance.Hence,in this paper,a comparative analysis of the KNN classifier is performed on a subset(i.e.,R8)of the Reuters-21578 benchmark dataset for TC.Experimental results are obtained by using different distance metrics as well as recently proposed distance learning metrics under different cases where the feature model and term weighting scheme are different.Our comparative evaluation of the results shows that Bray-Curtis and Linear Discriminant Analysis(LDA)are often superior to the other metrics and work well with raw term frequency weights. 展开更多
关键词 Text categorization k-nearest neighbor distance metric distance learning algorithms
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部