多义词语义是汉语国际教育和HSK考试的重点和难点。词义消歧研究致力于确定多义词在给定上下文中的具体含义,在人机交互、机器翻译、作文自动评分等领域被广泛应用。然而,现有的词义消歧方法存在准确率较低、语料库匮乏、特征简单等弊...多义词语义是汉语国际教育和HSK考试的重点和难点。词义消歧研究致力于确定多义词在给定上下文中的具体含义,在人机交互、机器翻译、作文自动评分等领域被广泛应用。然而,现有的词义消歧方法存在准确率较低、语料库匮乏、特征简单等弊端。针对汉语国际教育的相关语料库和评价系统,基于深度神经网络设计汉语多义词词义消歧的分类模型是当前的研究热点,同时也是实现HSK作文自动评分的重要技术保障。已有研究假定多个义项相互独立,缺乏对多义词义项演变关系的重视,对此文中首先对典型的汉语多义词进行语义研究,以区分基础义项和固定搭配义项来构建语义拓扑图,用于指导分类模型的训练。在建立多义词语义拓扑图的基础上,通过对汉语语料库的爬虫,获取典型多义词的语料样本,进而构建有监督的深度神经网络模型,包括RNN,LSTM和GRU。通过对爬虫所获样本的分析,选取了30字长和60字长,分别设计单向和双向6种神经网络,通过多次训练对模型参数进行优化,最终获得词义消歧分类模型。实验选取“意思”多义词作为代表,开展多义词在给定上下文的词义消歧实验。结果表明,基于RNN,LSTM网络和GRU的深度学习模型的平均准确率均超过75%,其中各模型的最大准确率均超过94%;各模型的ROC曲线下面积(Area Under Curve,AUC)均超过0.966,表明其对样本类不均衡性具有较好的处理效果;单向和双向RNN模型在不同字长条件下均取得最佳学习效果。展开更多
针对视觉SLAM(同步定位与地图创建)中现有的闭环检测方法容易产生假阳性检测的问题,利用YOLOv3目标检测算法获取场景中的语义信息,以DBSCAN(density-based spatial clustering of application with noise)算法修正错误检测和遗漏检测,...针对视觉SLAM(同步定位与地图创建)中现有的闭环检测方法容易产生假阳性检测的问题,利用YOLOv3目标检测算法获取场景中的语义信息,以DBSCAN(density-based spatial clustering of application with noise)算法修正错误检测和遗漏检测,构建语义节点,对关键帧形成局部语义拓扑图.利用图像特征和目标类别信息进行语义节点匹配,计算不同语义拓扑图中对应边的变换关系,得到关键帧之间的相似度,并根据连续关键帧的相似度变化情况进行闭环的判断.在公开数据集上的实验表明,目标聚类有效地提高了室内场景下的闭环检测准确性.与单纯利用传统视觉特征的算法相比,本文算法能够获得更加准确的闭环检测结果.展开更多
Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing gen...Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments.With the advances of the high-throughput techniques,a large number of protein-protein interactions have been produced.Therefore,to address this issue,several methods based on protein interaction network have been proposed.In this paper,we propose a shortest path-based algorithm,named SPranker,to prioritize disease-causing genes in protein interaction networks.Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes,we further propose an improved algorithm SPGOranker by integrating the semantic similarity of gene ontology(GO)annotations.SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account.The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches,ICN,VS and RWR.The experimental results show that SPranker and SPGOranker outperform ICN,VS,and RWR for the prioritization of orphan disease-causing genes.Importantly,for the case study of severe combined immunodeficiency,SPranker and SPGOranker predict several novel causal genes.展开更多
文摘多义词语义是汉语国际教育和HSK考试的重点和难点。词义消歧研究致力于确定多义词在给定上下文中的具体含义,在人机交互、机器翻译、作文自动评分等领域被广泛应用。然而,现有的词义消歧方法存在准确率较低、语料库匮乏、特征简单等弊端。针对汉语国际教育的相关语料库和评价系统,基于深度神经网络设计汉语多义词词义消歧的分类模型是当前的研究热点,同时也是实现HSK作文自动评分的重要技术保障。已有研究假定多个义项相互独立,缺乏对多义词义项演变关系的重视,对此文中首先对典型的汉语多义词进行语义研究,以区分基础义项和固定搭配义项来构建语义拓扑图,用于指导分类模型的训练。在建立多义词语义拓扑图的基础上,通过对汉语语料库的爬虫,获取典型多义词的语料样本,进而构建有监督的深度神经网络模型,包括RNN,LSTM和GRU。通过对爬虫所获样本的分析,选取了30字长和60字长,分别设计单向和双向6种神经网络,通过多次训练对模型参数进行优化,最终获得词义消歧分类模型。实验选取“意思”多义词作为代表,开展多义词在给定上下文的词义消歧实验。结果表明,基于RNN,LSTM网络和GRU的深度学习模型的平均准确率均超过75%,其中各模型的最大准确率均超过94%;各模型的ROC曲线下面积(Area Under Curve,AUC)均超过0.966,表明其对样本类不均衡性具有较好的处理效果;单向和双向RNN模型在不同字长条件下均取得最佳学习效果。
文摘针对视觉SLAM(同步定位与地图创建)中现有的闭环检测方法容易产生假阳性检测的问题,利用YOLOv3目标检测算法获取场景中的语义信息,以DBSCAN(density-based spatial clustering of application with noise)算法修正错误检测和遗漏检测,构建语义节点,对关键帧形成局部语义拓扑图.利用图像特征和目标类别信息进行语义节点匹配,计算不同语义拓扑图中对应边的变换关系,得到关键帧之间的相似度,并根据连续关键帧的相似度变化情况进行闭环的判断.在公开数据集上的实验表明,目标聚类有效地提高了室内场景下的闭环检测准确性.与单纯利用传统视觉特征的算法相比,本文算法能够获得更加准确的闭环检测结果.
基金supported in part by the National Natural Science Foundation of China(61370024,61428209,61232001)Program for New Century Excellent Talents in University(NCET-12-0547)
文摘Identification of disease-causing genes among a large number of candidates is a fundamental challenge in human disease studies.However,it is still time-consuming and laborious to determine the real disease-causing genes by biological experiments.With the advances of the high-throughput techniques,a large number of protein-protein interactions have been produced.Therefore,to address this issue,several methods based on protein interaction network have been proposed.In this paper,we propose a shortest path-based algorithm,named SPranker,to prioritize disease-causing genes in protein interaction networks.Considering the fact that diseases with similar phenotypes are generally caused by functionally related genes,we further propose an improved algorithm SPGOranker by integrating the semantic similarity of gene ontology(GO)annotations.SPGOranker not only considers the topological similarity between protein pairs in a protein interaction network but also takes their functional similarity into account.The proposed algorithms SPranker and SPGOranker were applied to 1598 known orphan disease-causing genes from 172 orphan diseases and compared with three state-of-the-art approaches,ICN,VS and RWR.The experimental results show that SPranker and SPGOranker outperform ICN,VS,and RWR for the prioritization of orphan disease-causing genes.Importantly,for the case study of severe combined immunodeficiency,SPranker and SPGOranker predict several novel causal genes.