To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of ...To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.展开更多
针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and g...针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks, GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了5.78、4.31和5.02百分点。展开更多
针对景区手写诗词存在背景纹理复杂、字体尺寸及风格多样等特点导致景区游客难以识别手写诗词的问题,首先,分析研究景区手写诗词的识别场景,设计景区诗词检测网络(detection of poetry in scenic areas-network,DPSA-Net)以提取景区手...针对景区手写诗词存在背景纹理复杂、字体尺寸及风格多样等特点导致景区游客难以识别手写诗词的问题,首先,分析研究景区手写诗词的识别场景,设计景区诗词检测网络(detection of poetry in scenic areas-network,DPSA-Net)以提取景区手写诗词不同尺度的特征,并结合手写诗词字符间的链接依赖关系实现景区手写诗词检测;其次,设计了卷积循环聚合网络(convolution recurrent aggregation network,CRA-Net)以对景区手写诗词进行识别,结合卷积神经网络(convolutional neural networks,CNN)和双向长短期记忆网络提取手写诗词图像的序列特征,并通过聚合交叉熵(aggregation cross-entropy,ACE)实现特征向文本的转换;最后,结合景区知识图谱对CRA-Net的输出进行校正,进而提高景区手写诗词的识别准确率。实验结果表明,通过景区手写诗词矫正技术对CRA-Net的识别结果矫正后,识别准确率达到了79.04%,同时,该技术具有较好的抗干扰能力和良好的应用前景。展开更多
文摘To construct a high efficient text clustering algorithm the multilevel graph model and the refinement algorithm used in the uncoarsening phase is discussed. The model is applied to text clustering. The performance of clustering algorithm has to be improved with the refinement algorithm application. The experiment result demonstrated that the multilevel graph text clustering algorithm is available. Key words text clustering - multilevel coarsen graph model - refinement algorithm - high-dimensional clustering CLC number TP301 Foundation item: Supported by the National Natural Science Foundation of China (60173051)Biography: CHEN Jian-bin(1970-), male, Associate professor, Ph. D., research direction: data mining.
文摘针对学术论文在学科领域内进行层次标签分类问题,提出了一种基于知识增强的语义表示与图注意力网络的文本层次标签分类(text hierarchical label classification based on enhanced representation through knowledge integration and graph attention networks, GETHLC)模型。首先,通过层次标签抽取模块提取学科领域下层次标签的结构特征,并通过预训练模型对学术论文的摘要、标题和抽取后的层次标签结构特征进行嵌入;然后,在分类阶段基于层次标签的结构分层构造层次分类器,将学术论文逐层分类至最符合的类别中。在大规模中文科学文献数据集CSL上进行的实验结果表明,与基准的ERNIE模型相比,GETHLC模型的准确率、召回率和F1值分别提升了5.78、4.31和5.02百分点。