期刊文献+
共找到136篇文章
< 1 2 7 >
每页显示 20 50 100
Active learning accelerated Monte-Carlo simulation based on the modified K-nearest neighbors algorithm and its application to reliability estimations
1
作者 Zhifeng Xu Jiyin Cao +2 位作者 Gang Zhang Xuyong Chen Yushun Wu 《Defence Technology(防务技术)》 SCIE EI CAS CSCD 2023年第10期306-313,共8页
This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a rand... This paper proposes an active learning accelerated Monte-Carlo simulation method based on the modified K-nearest neighbors algorithm.The core idea of the proposed method is to judge whether or not the output of a random input point can be postulated through a classifier implemented through the modified K-nearest neighbors algorithm.Compared to other active learning methods resorting to experimental designs,the proposed method is characterized by employing Monte-Carlo simulation for sampling inputs and saving a large portion of the actual evaluations of outputs through an accurate classification,which is applicable for most structural reliability estimation problems.Moreover,the validity,efficiency,and accuracy of the proposed method are demonstrated numerically.In addition,the optimal value of K that maximizes the computational efficiency is studied.Finally,the proposed method is applied to the reliability estimation of the carbon fiber reinforced silicon carbide composite specimens subjected to random displacements,which further validates its practicability. 展开更多
关键词 Active learning Monte-carlo simulation k-nearest neighbors Reliability estimation CLASSIFICATION
下载PDF
GHM-FKNN:a generalized Heronian mean based fuzzy k-nearest neighbor classifier for the stock trend prediction
2
作者 吴振峰 WANG Mengmeng +1 位作者 LAN Tian ZHANG Anyuan 《High Technology Letters》 EI CAS 2023年第2期122-129,共8页
Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-n... Stock trend prediction is a challenging problem because it involves many variables.Aiming at the problem that some existing machine learning techniques, such as random forest(RF), probabilistic random forest(PRF), k-nearest neighbor(KNN), and fuzzy KNN(FKNN), have difficulty in accurately predicting the stock trend(uptrend or downtrend) for a given date, a generalized Heronian mean(GHM) based FKNN predictor named GHM-FKNN was proposed.GHM-FKNN combines GHM aggregation function with the ideas of the classical FKNN approach.After evaluation, the comparison results elucidated that GHM-FKNN outperformed the other best existing methods RF, PRF, KNN and FKNN on independent test datasets corresponding to three stocks, namely AAPL, AMZN and NFLX.Compared with RF, PRF, KNN and FKNN, GHM-FKNN achieved the best performance with accuracy of 62.37% for AAPL, 58.25% for AMZN, and 64.10% for NFLX. 展开更多
关键词 stock trend prediction Heronian mean fuzzy k-nearest neighbor(FKNN)
下载PDF
Diagnosis of Disc Space Variation Fault Degree of Transformer Winding Based on K-Nearest Neighbor Algorithm
3
作者 Song Wang Fei Xie +3 位作者 Fengye Yang Shengxuan Qiu Chuang Liu Tong Li 《Energy Engineering》 EI 2023年第10期2273-2285,共13页
Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose t... Winding is one of themost important components in power transformers.Ensuring the health state of the winding is of great importance to the stable operation of the power system.To efficiently and accurately diagnose the disc space variation(DSV)fault degree of transformer winding,this paper presents a diagnostic method of winding fault based on the K-Nearest Neighbor(KNN)algorithmand the frequency response analysis(FRA)method.First,a laboratory winding model is used,and DSV faults with four different degrees are achieved by changing disc space of the discs in the winding.Then,a series of FRA tests are conducted to obtain the FRA results and set up the FRA dataset.Second,ten different numerical indices are utilized to obtain features of FRA curves of faulted winding.Third,the 10-fold cross-validation method is employed to determine the optimal k-value of KNN.In addition,to improve the accuracy of the KNN model,a comparative analysis is made between the accuracy of the KNN algorithm and k-value under four distance functions.After getting the most appropriate distance metric and kvalue,the fault classificationmodel based on theKNN and FRA is constructed and it is used to classify the degrees of DSV faults.The identification accuracy rate of the proposed model is up to 98.30%.Finally,the performance of the model is presented by comparing with the support vector machine(SVM),SVM optimized by the particle swarmoptimization(PSO-SVM)method,and randomforest(RF).The results show that the diagnosis accuracy of the proposed model is the highest and the model can be used to accurately diagnose the DSV fault degrees of the winding. 展开更多
关键词 Transformer winding frequency response analysis(FRA)method k-nearest neighbor(KNN) disc space variation(DSV)
下载PDF
基于不规则区域划分方法的k-Nearest Neighbor查询算法 被引量:1
4
作者 张清清 李长云 +3 位作者 李旭 周玲芳 胡淑新 邹豪杰 《计算机系统应用》 2015年第9期186-190,共5页
随着越来越多的数据累积,对数据处理能力和分析能力的要求也越来越高.传统k-Nearest Neighbor(k NN)查询算法由于其容易导致计算负载整体不均衡的规则区域划分方法及其单个进程或单台计算机运行环境的较低数据处理能力.本文提出并详细... 随着越来越多的数据累积,对数据处理能力和分析能力的要求也越来越高.传统k-Nearest Neighbor(k NN)查询算法由于其容易导致计算负载整体不均衡的规则区域划分方法及其单个进程或单台计算机运行环境的较低数据处理能力.本文提出并详细介绍了一种基于不规则区域划分方法的改进型k NN查询算法,并利用对大规模数据集进行分布式并行计算的模型Map Reduce对该算法加以实现.实验结果与分析表明,Map Reduce框架下基于不规则区域划分方法的k NN查询算法可以获得较高的数据处理效率,并可以较好的支持大数据环境下数据的高效查询. 展开更多
关键词 k-nearest neighbor(k NN)查询算法 不规则区域划分方法 MAP REDUCE 大数据
下载PDF
Mapping aboveground biomass by integrating geospatial and forest inventory data through a k-nearest neighbor strategy in North Central Mexico 被引量:3
5
作者 Carlos A AGUIRRE-SALADO Eduardo J TREVIO-GARZA +7 位作者 Oscar A AGUIRRE-CALDERóN Javier JIMNEZ-PREZ Marco A GONZLEZ-TAGLE José R VALDZ-LAZALDE Guillermo SNCHEZ-DíAZ Reija HAAPANEN Alejandro I AGUIRRE-SALADO Liliana MIRANDA-ARAGóN 《Journal of Arid Land》 SCIE CSCD 2014年第1期80-96,共17页
As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring s... As climate change negotiations progress,monitoring biomass and carbon stocks is becoming an important part of the current forest research.Therefore,national governments are interested in developing forest-monitoring strategies using geospatial technology.Among statistical methods for mapping biomass,there is a nonparametric approach called k-nearest neighbor(kNN).We compared four variations of distance metrics of the kNN for the spatially-explicit estimation of aboveground biomass in a portion of the Mexican north border of the intertropical zone.Satellite derived,climatic,and topographic predictor variables were combined with the Mexican National Forest Inventory(NFI)data to accomplish the purpose.Performance of distance metrics applied into the kNN algorithm was evaluated using a cross validation leave-one-out technique.The results indicate that the Most Similar Neighbor(MSN)approach maximizes the correlation between predictor and response variables(r=0.9).Our results are in agreement with those reported in the literature.These findings confirm the predictive potential of the MSN approach for mapping forest variables at pixel level under the policy of Reducing Emission from Deforestation and Forest Degradation(REDD+). 展开更多
关键词 k-nearest neighbor Mahalanobis most similar neighbor MODIS BRDF-adjusted reflectance forest inventory the policy of Reducing Emission from Deforestation and Forest Degradation
下载PDF
Pruned fuzzy K-nearest neighbor classifier for beat classification 被引量:2
6
作者 Muhammad Arif Muhammad Usman Akram Fayyaz-ul-Afsar Amir Minhas 《Journal of Biomedical Science and Engineering》 2010年第4期380-389,共10页
Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats... Arrhythmia beat classification is an active area of research in ECG based clinical decision support systems. In this paper, Pruned Fuzzy K-nearest neighbor (PFKNN) classifier is proposed to classify six types of beats present in the MIT-BIH Arrhythmia database. We have tested our classifier on ~ 103100 beats for six beat types present in the database. Fuzzy KNN (FKNN) can be implemented very easily but large number of training examples used for classification can be very time consuming and requires large storage space. Hence, we have proposed a time efficient Arif-Fayyaz pruning algorithm especially suitable for FKNN which can maintain good classification accuracy with appropriate retained ratio of training data. By using Arif-Fayyaz pruning algorithm with Fuzzy KNN, we have achieved a beat classification accuracy of 97% and geometric mean of sensitivity of 94.5% with only 19% of the total training examples. The accuracy and sensitivity is comparable to FKNN when all the training data is used. Principal Component Analysis is used to further reduce the dimension of feature space from eleven to six without compromising the accuracy and sensitivity. PFKNN was found to robust against noise present in the ECG data. 展开更多
关键词 ARRHYTHMIA ECG k-nearest neighbor PRUNING FUZZY Classification
下载PDF
A Short-Term Traffic Flow Forecasting Method Based on a Three-Layer K-Nearest Neighbor Non-Parametric Regression Algorithm 被引量:7
7
作者 Xiyu Pang Cheng Wang Guolin Huang 《Journal of Transportation Technologies》 2016年第4期200-206,共7页
Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting... Short-term traffic flow is one of the core technologies to realize traffic flow guidance. In this article, in view of the characteristics that the traffic flow changes repeatedly, a short-term traffic flow forecasting method based on a three-layer K-nearest neighbor non-parametric regression algorithm is proposed. Specifically, two screening layers based on shape similarity were introduced in K-nearest neighbor non-parametric regression method, and the forecasting results were output using the weighted averaging on the reciprocal values of the shape similarity distances and the most-similar-point distance adjustment method. According to the experimental results, the proposed algorithm has improved the predictive ability of the traditional K-nearest neighbor non-parametric regression method, and greatly enhanced the accuracy and real-time performance of short-term traffic flow forecasting. 展开更多
关键词 Three-Layer Traffic Flow Forecasting k-nearest neighbor Non-Parametric Regression
下载PDF
Computational Intelligence Prediction Model Integrating Empirical Mode Decomposition,Principal Component Analysis,and Weighted k-Nearest Neighbor 被引量:1
8
作者 Li Tang He-Ping Pan Yi-Yong Yao 《Journal of Electronic Science and Technology》 CAS CSCD 2020年第4期341-349,共9页
On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feat... On the basis of machine leaning,suitable algorithms can make advanced time series analysis.This paper proposes a complex k-nearest neighbor(KNN)model for predicting financial time series.This model uses a complex feature extraction process integrating a forward rolling empirical mode decomposition(EMD)for financial time series signal analysis and principal component analysis(PCA)for the dimension reduction.The information-rich features are extracted then input to a weighted KNN classifier where the features are weighted with PCA loading.Finally,prediction is generated via regression on the selected nearest neighbors.The structure of the model as a whole is original.The test results on real historical data sets confirm the effectiveness of the models for predicting the Chinese stock index,an individual stock,and the EUR/USD exchange rate. 展开更多
关键词 Empirical mode decomposition(EMD) k-nearest neighbor(KNN) principal component analysis(PCA) time series
下载PDF
Condition Monitoring of Roller Bearing by K-star Classifier andK-nearest Neighborhood Classifier Using Sound Signal
9
作者 Rahul Kumar Sharma V.Sugumaran +1 位作者 Hemantha Kumar M.Amarnath 《Structural Durability & Health Monitoring》 EI 2017年第1期1-17,共17页
Most of the machineries in small or large-scale industry have rotating elementsupported by bearings for rigid support and accurate movement. For proper functioning ofmachinery, condition monitoring of the bearing is v... Most of the machineries in small or large-scale industry have rotating elementsupported by bearings for rigid support and accurate movement. For proper functioning ofmachinery, condition monitoring of the bearing is very important. In present study soundsignal is used to continuously monitor bearing health as sound signals of rotatingmachineries carry dynamic information of components. There are numerous studies inliterature that are reporting superiority of vibration signal of bearing fault diagnosis.However, there are very few studies done using sound signal. The cost associated withcondition monitoring using sound signal (Microphone) is less than the cost of transducerused to acquire vibration signal (Accelerometer). This paper employs sound signal forcondition monitoring of roller bearing by K-star classifier and k-nearest neighborhoodclassifier. The statistical feature extraction is performed from acquired sound signals. Thentwo-layer feature selection is done using J48 decision tree algorithm and random treealgorithm. These selected features were classified using K-star classifier and k-nearestneighborhood classifier and parametric optimization is performed to achieve the maximumclassification accuracy. The classification results for both K-star classifier and k-nearestneighborhood classifier for condition monitoring of roller bearing using sound signals werecompared. 展开更多
关键词 K-star k-nearest neighborhood k-nn machine learning approach conditionmonitoring fault diagnosis roller bearing decision tree algorithm J-48 random treealgorithm decision making two-layer feature selection sound signal statistical features
下载PDF
Propagation Path Loss Models at 28 GHz Using K-Nearest Neighbor Algorithm
10
作者 Vu Thanh Quang Dinh Van Linh To Thi Thao 《通讯和计算机(中英文版)》 2022年第1期1-8,共8页
In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based o... In this paper,we develop and apply K-Nearest Neighbor algorithm to propagation pathloss regression.The path loss models present the dependency of attenuation value on distance using machine learning algorithms based on the experimental data.The algorithm is performed by choosing k nearest points and training dataset to find the optimal k value.The proposed method is applied to impove and adjust pathloss model at 28 GHz in Keangnam area,Hanoi,Vietnam.The experiments in both line-of-sight and non-line-of-sight scenarios used many combinations of transmit and receive antennas at different transmit antenna heights and random locations of receive antenna have been carried out using Wireless Insite Software.The results have been compared with 3GPP and NYU Wireless Path Loss Models in order to verify the performance of the proposed approach. 展开更多
关键词 k-nearest neighbor regression 5G millimeter waves path loss
下载PDF
Wireless Communication Signal Strength Prediction Method Based on the K-nearest Neighbor Algorithm
11
作者 Zhao Chen Ning Xiong +6 位作者 Yujue Wang Yong Ding Hengkui Xiang Chenjun Tang Lingang Liu Xiuqing Zou Decun Luo 《国际计算机前沿大会会议论文集》 2019年第1期238-240,共3页
Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically ... Existing interference protection systems lack automatic evaluation methods to provide scientific, objective and accurate assessment results. To address this issue, this paper develops a layout scheme by geometrically modeling the actual scene, so that the hand-held full-band spectrum analyzer would be able to collect signal field strength values for indoor complex scenes. An improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression was proposed to predict the signal field strengths for the whole plane before and after being shield. Then the highest accuracy set of data could be picked out by comparison. The experimental results show that the improved prediction algorithm based on the K-nearest neighbor non-parametric kernel regression can scientifically and objectively predict the indoor complex scenes’ signal strength and evaluate the interference protection with high accuracy. 展开更多
关键词 INTERFERENCE protection k-nearest neighbor algorithm NON-PARAMETRIC KERNEL regression SIGNAL field STRENGTH
下载PDF
Efficient Parallel Processing of k-Nearest Neighbor Queries by Using a Centroid-based and Hierarchical Clustering Algorithm
12
作者 Elaheh Gavagsaz 《Artificial Intelligence Advances》 2022年第1期26-41,共16页
The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a cer... The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance. 展开更多
关键词 CLASSIFICATION k-nearest neighbor Big data CLUSTERING Parallel processing
下载PDF
基于k-NN和SCATS交通数据的路段行程时间估计方法 被引量:5
13
作者 姜桂艳 李琦 董硕 《西南交通大学学报》 EI CSCD 北大核心 2013年第2期343-349,共7页
为了改善利用SCATS交通数据估计路段行程时间的效果,通过分析SCATS实际交通数据获取时间间隔不一致的特征,构建了SCATS交通数据虚拟时间序列,将利用因子分析法提取的累计贡献率在85%以上的主因子作为交通模式特征向量的构成要素,用欧氏... 为了改善利用SCATS交通数据估计路段行程时间的效果,通过分析SCATS实际交通数据获取时间间隔不一致的特征,构建了SCATS交通数据虚拟时间序列,将利用因子分析法提取的累计贡献率在85%以上的主因子作为交通模式特征向量的构成要素,用欧氏距离作为当前交通模式特征向量和历史交通模式特征向量相似性的测度指标,以路段行程时间估计误差最小为目标选取当前交通模式的近邻数,对交通模式之间距离的倒数进行归一化处理,确定了相似交通模式的行程时间权重,设计了基于SCATS交通数据的路段行程时间估计方法.实例结果表明:与多元线性回归方法相比,本文方法估计的路段行程时间平均绝对误差、平均绝对百分比误差和均方根误差分别平均减少了9.68 s、8.07%和4.5 s. 展开更多
关键词 悉尼自适应交通控制系统 路段行程时间估计 K近邻算法 因子分析
下载PDF
支持均匀缩放的不等长时间子序列查询方法
14
作者 熊浩然 何震瀛 《计算机工程》 CSCD 北大核心 2024年第1期60-67,共8页
作为时序数据分析中的基础技术之一,时间序列的子序列查询旨在寻找与目标序列相似的子序列。现有的子序列查询方法大多仅支持查询与目标序列长度相同的子序列,因而均匀缩放技术常被用于解决子序列查询中的不等长问题。但现有支持均匀缩... 作为时序数据分析中的基础技术之一,时间序列的子序列查询旨在寻找与目标序列相似的子序列。现有的子序列查询方法大多仅支持查询与目标序列长度相同的子序列,因而均匀缩放技术常被用于解决子序列查询中的不等长问题。但现有支持均匀缩放的子序列查询技术大多未考虑子序列的Z-标准化,且对查询效率仍有改善的空间。针对该问题,提出一种基于索引技术且支持均匀缩放的子序列查询方法。结合现有索引方法 ULISSE提供的树状数据结构,设计可保证非漏报的下界距离,为索引结构的剪枝提供理论保证,并利用索引中存储的元数据,提出精确K-近邻查询算法。所提方法适用于非归一化和归一化两种场景。实验结果表明,较UCR-US和ULISSE基线方法,该基于索引的不等长子序列查询方法在CAP、GAP两个真实数据集以及随机游走人工合成数据集上均实现了查询效率的显著提升,针对在非归一化和归一化两种场景下的不等长子序列查询,该方法的平均效率提升分别为2.33和2.51倍。 展开更多
关键词 时间序列 子序列查询 均匀缩放 索引 下界距离 K-近邻
下载PDF
基于TBM的自适应模糊k-NN分类器 被引量:1
15
作者 刘邱云 付雪峰 吴根秀 《计算机工程》 CAS CSCD 北大核心 2009年第16期183-185,188,共4页
针对训练模式类标签不精确的识别问题,提出基于可传递信度模型的自适应模糊k-NN(k-Nearest Neighbor)分类器。利用可传递信度模型结合模糊集理论和可能性理论并运用pignistic变换,对待识别模式真正所属的类做出决策。采用梯度下降最小... 针对训练模式类标签不精确的识别问题,提出基于可传递信度模型的自适应模糊k-NN(k-Nearest Neighbor)分类器。利用可传递信度模型结合模糊集理论和可能性理论并运用pignistic变换,对待识别模式真正所属的类做出决策。采用梯度下降最小化误差函数,以实现参数的自适应学习。实验结果表明,该分类器误分类率低、鲁棒性强。 展开更多
关键词 可传递信度模型 自适应 k-nn分类器 pignistic概率 梯度下降
下载PDF
动态网络空间中的k-NN查询 被引量:3
16
作者 殷晓岚 《电子学报》 EI CAS CSCD 北大核心 2011年第2期389-394,共6页
随着无线通讯应用的持续增长和定位技术的发展,如何有效率的应答大量移动对象的查询请求以及基于位置的服务(location-based services LBS)变得越来越重要,k-NN查询是其中的重要服务功能.本文提出了一种解决动态网络中静态对象k-NN查询... 随着无线通讯应用的持续增长和定位技术的发展,如何有效率的应答大量移动对象的查询请求以及基于位置的服务(location-based services LBS)变得越来越重要,k-NN查询是其中的重要服务功能.本文提出了一种解决动态网络中静态对象k-NN查询算法,该算法先将网络以目标对象为中心进行网络划分,通过定位原始对象在网络上的位置来计算位置相关查询.同时还分析了算法的复杂性,给出了实验比较. 展开更多
关键词 移动对象 空间数据网络库 距离索引 k-nn
下载PDF
基于改进K-NN和SVM的多学科协作诊疗决策支持系统 被引量:1
17
作者 李晓峰 王妍玮 李东 《计算机系统应用》 2020年第6期80-88,共9页
由于当前的诊疗决策支持系统采用单一学科的决策方法,导致诊疗精度不高,获取的数据分类结果准确率较低,提出并设计一种基于改进K-NN(K-Nearest Neighbour)分类算法和SVM(Support Vector Mechine)的多学科协作诊疗决策支持系统.在构建系... 由于当前的诊疗决策支持系统采用单一学科的决策方法,导致诊疗精度不高,获取的数据分类结果准确率较低,提出并设计一种基于改进K-NN(K-Nearest Neighbour)分类算法和SVM(Support Vector Mechine)的多学科协作诊疗决策支持系统.在构建系统总体框架的基础上,对数据库系统模块、人机交互模块和诊疗推理模块进行设计,其中诊疗推理模块是系统的软件核心,通过改进K-NN分类算法和SVM建立推理引擎,在计算机的辅助下,搜索与患者病症信息相似的医疗案例,并进行相似度匹配,根据匹配结果与患者症状集构建一个新的临床案例,引入CDA(Clinical Document Architecture)概念,实现改进K-NN分类算法和SVM算法的有效融合,完成多学科协作诊疗决策.实验结果表明,与传统系统相比,该系统的诊疗决策精度高,评价指标测试平均值达到95.98%,分类结果准确率较高,在该系统辅助下能提高医生诊断正确性,降低误诊率,且运算复杂度较低. 展开更多
关键词 改进k-nn分类算法 SVM 多学科协作 诊疗决策支持系统
下载PDF
RecBERT:Semantic recommendation engine with large language model enhanced query segmentation for k-nearest neighbors ranking retrieval
18
作者 Richard Wu 《Intelligent and Converged Networks》 EI 2024年第1期42-52,共11页
The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging... The increasing amount of user traffic on Internet discussion forums has led to a huge amount of unstructured natural language data in the form of user comments.Most modern recommendation systems rely on manual tagging,relying on administrators to label the features of a class,or story,which a user comment corresponds to.Another common approach is to use pre-trained word embeddings to compare class descriptions for textual similarity,then use a distance metric such as cosine similarity or Euclidean distance to find top k neighbors.However,neither approach is able to fully utilize this user-generated unstructured natural language data,reducing the scope of these recommendation systems.This paper studies the application of domain adaptation on a transformer for the set of user comments to be indexed,and the use of simple contrastive learning for the sentence transformer fine-tuning process to generate meaningful semantic embeddings for the various user comments that apply to each class.In order to match a query containing content from multiple user comments belonging to the same class,the construction of a subquery channel for computing class-level similarity is proposed.This channel uses query segmentation of the aggregate query into subqueries,performing k-nearest neighbors(KNN)search on each individual subquery.RecBERT achieves state-of-the-art performance,outperforming other state-of-the-art models in accuracy,precision,recall,and F1 score for classifying comments between four and eight classes,respectively.RecBERT outperforms the most precise state-of-the-art model(distilRoBERTa)in precision by 6.97%for matching comments between eight classes. 展开更多
关键词 sentence transformer simple contrastive learning large language models query segmentation k-nearest neighbors
原文传递
A Study of EM Algorithm as an Imputation Method: A Model-Based Simulation Study with Application to a Synthetic Compositional Data
19
作者 Yisa Adeniyi Abolade Yichuan Zhao 《Open Journal of Modelling and Simulation》 2024年第2期33-42,共10页
Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear mode... Compositional data, such as relative information, is a crucial aspect of machine learning and other related fields. It is typically recorded as closed data or sums to a constant, like 100%. The statistical linear model is the most used technique for identifying hidden relationships between underlying random variables of interest. However, data quality is a significant challenge in machine learning, especially when missing data is present. The linear regression model is a commonly used statistical modeling technique used in various applications to find relationships between variables of interest. When estimating linear regression parameters which are useful for things like future prediction and partial effects analysis of independent variables, maximum likelihood estimation (MLE) is the method of choice. However, many datasets contain missing observations, which can lead to costly and time-consuming data recovery. To address this issue, the expectation-maximization (EM) algorithm has been suggested as a solution for situations including missing data. The EM algorithm repeatedly finds the best estimates of parameters in statistical models that depend on variables or data that have not been observed. This is called maximum likelihood or maximum a posteriori (MAP). Using the present estimate as input, the expectation (E) step constructs a log-likelihood function. Finding the parameters that maximize the anticipated log-likelihood, as determined in the E step, is the job of the maximization (M) phase. This study looked at how well the EM algorithm worked on a made-up compositional dataset with missing observations. It used both the robust least square version and ordinary least square regression techniques. The efficacy of the EM algorithm was compared with two alternative imputation techniques, k-Nearest Neighbor (k-NN) and mean imputation (), in terms of Aitchison distances and covariance. 展开更多
关键词 Compositional Data Linear Regression Model Least Square Method Robust Least Square Method Synthetic Data Aitchison Distance Maximum Likelihood Estimation Expectation-Maximization Algorithm k-nearest neighbor and Mean imputation
下载PDF
多颜色模型分割自学习k-NN设备状态识别方法 被引量:2
20
作者 郭雪梅 刘桂雄 《中国测试》 CAS 北大核心 2016年第4期107-110,共4页
在浪涌测试中,由于每次识别对象不同,直接采用特征匹配每次测试前需要根据受试设备重新训练样本。先根据图像中高亮度点、白光所占比例,决策用于图像分割的颜色模型(L*a*b*、HSL、HSV),实现自适应分割;其次,提出自学习k-NN算法,以像素数... 在浪涌测试中,由于每次识别对象不同,直接采用特征匹配每次测试前需要根据受试设备重新训练样本。先根据图像中高亮度点、白光所占比例,决策用于图像分割的颜色模型(L*a*b*、HSL、HSV),实现自适应分割;其次,提出自学习k-NN算法,以像素数n、偏心率e、密实度比r、欧拉数E为样本S特征向量X,构建数据集T0,以欧氏距离D实现样本分类;若样本置信度为k,加入预备数据集Tz′中,当Tz′满足条件,则扩充数据集Tz形成数据集Tz+1。结果证明:算法在9组各类样本(共21 600帧图像)识别中,准确度可达98.65%;并自学习扩充5组样本,距离矩阵变化较小,可见算法学习效率、学习准确度较高。 展开更多
关键词 多颜色模型 K近邻算法 自学习 浪涌测试
下载PDF
上一页 1 2 7 下一页 到第
使用帮助 返回顶部