期刊文献+
共找到4篇文章
< 1 >
每页显示 20 50 100
基于Levenshtein distance算法的句子相似度计算 被引量:12
1
作者 吉胜军 《电脑知识与技术》 2009年第3X期2177-2178,共2页
基于Levenshtein distance(LD)算法,以计算句子相似度为切入点,通过实验分析对自然语言处理过程中如何计算两个句子的相似度进行研究。实验证明在度量两个句子相似度的计算方法中Levenshtein distance(LD)算法是度量两个句子相似度的有... 基于Levenshtein distance(LD)算法,以计算句子相似度为切入点,通过实验分析对自然语言处理过程中如何计算两个句子的相似度进行研究。实验证明在度量两个句子相似度的计算方法中Levenshtein distance(LD)算法是度量两个句子相似度的有力算法或工具,有助于在拼写检查、雷同试卷分析等程序开发过程中编写出更加高效的代码。 展开更多
关键词 levenshtein distance(LD) 自然语言处理 句子相似度
下载PDF
Identifying G-protein Coupled Receptors Using Weighted Levenshtein Distance and Nearest Neighbor Method 被引量:1
2
作者 Jian-Hua Xu 《Genomics, Proteomics & Bioinformatics》 SCIE CAS CSCD 2005年第4期252-257,共6页
G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR ... G-protein coupled receptors (GPCRs) are a class of seven-helix transmembrane proteins that have been used in bioinformatics as the targets to facilitate drug discovery for human diseases. Although thousands of GPCR sequences have been collected, the ligand specificity of many GPCRs is still unknown and only one crystal structure of the rhodopsin-like family has been solved. Therefore, identifying GPCR types only from sequence data has become an important research issue. In this study, a novel technique for identifying GPCR types based on the weighted Levenshtein distance between two receptor sequences and the nearest neighbor method (NNM) is introduced, which can deal with receptor sequences with different lengths directly. In our experiments for classifying four classes (acetylcholine, adrenoceptor, dopamine, and serotonin) of the rhodopsin-like family of GPCRs, the error rates from the leave-one-out procedure and the leave-half-out procedure were 0.62% and 1.24%, respectively. These results are prior to those of the covariant discriminant algorithm, the support vector machine method, and the NNM with Euclidean distance. 展开更多
关键词 GPCR weighted levenshtein distance nearest neighbor method
原文传递
FG-SMOTE:Fuzzy-based Gaussian synthetic minority oversampling with deep belief networks classifier for skewed class distribution 被引量:1
3
作者 Putta Hemalatha Geetha Mary Amalanathan 《International Journal of Intelligent Computing and Cybernetics》 EI 2021年第2期269-286,共18页
Purpose-Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance.The data usually follows a biased distribution of classes that ... Purpose-Adequate resources for learning and training the data are an important constraint to develop an efficient classifier with outstanding performance.The data usually follows a biased distribution of classes that reflects an unequal distribution of classes within a dataset.This issue is known as the imbalance problem,which is one of the most common issues occurring in real-time applications.Learning of imbalanced datasets is a ubiquitous challenge in the field of data mining.Imbalanced data degrades the performance of the classifier by producing inaccurate results.Design/methodology/approach-In the proposed work,a novel fuzzy-based Gaussian synthetic minority oversampling(FG-SMOTE)algorithm is proposed to process the imbalanced data.The mechanism of the Gaussian SMOTE technique is based on finding the nearest neighbour concept to balance the ratio between minority and majority class datasets.The ratio of the datasets belonging to the minority and majority class is balanced using a fuzzy-based Levenshtein distance measure technique.Findings-The performance and the accuracy of the proposed algorithm is evaluated using the deep belief networks classifier and the results showed the efficiency of the fuzzy-based Gaussian SMOTE technique achieved an AUC:93.7%.F1 Score Prediction:94.2%,Geometric Mean Score:93.6%predicted from confusion matrix.Research limitations/implications-The proposed research still retains some of the challenges that need to be focused such as application FG-SMOTE to multiclass imbalanced dataset and to evaluate dataset imbalance problem in a distributed environment.Originality/value-The proposed algorithm fundamentally solves the data imbalance issues and challenges involved in handling the imbalanced data.FG-SMOTE has aided in balancing minority and majority class datasets. 展开更多
关键词 Imbalanced data Gaussian SMOTE levenshtein distance measure technique Skewed class distribution Fuzzy based Gaussian SMOTE Deep learning Deep belief network classifie
原文传递
A Multiple Feature Approach for Disorder Normalization in Clinical Notes
4
作者 Lü Chen CHEN Bo +2 位作者 Lü Chaozhen QIU Likun JI Donghong 《Wuhan University Journal of Natural Sciences》 CAS CSCD 2016年第6期482-490,共9页
In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We d... In this paper we propose a multiple feature approach for the normalization task which can map each disorder mention in the text to a unique unified medical language system(UMLS)concept unique identifier(CUI). We develop a two-step method to acquire a list of candidate CUIs and their associated preferred names using UMLS API and to choose the closest CUI by calculating the similarity between the input disorder mention and each candidate. The similarity calculation step is formulated as a classification problem and multiple features(string features,ranking features,similarity features,and contextual features) are used to normalize the disorder mentions. The results show that the multiple feature approach improves the accuracy of the normalization task from 32.99% to 67.08% compared with the Meta Map baseline. 展开更多
关键词 natural language processing disorder normalization levenshtein distance semantic composition multiple features
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部