摘要
随着网络中三维数据的涌现,三元概念分析的优势也逐渐体现出来。三元概念分析是较新的研究领域,具有广阔的发展前景。提出基于三元概念分析的文本分类方法,该方法是一种全新的构思理念,是三元概念分析在应用上的拓展。该算法的主要思路是:首先将数据集预处理为三元背景,同时将背景中的二值关系扩展为0-1间的模糊关系,其用于表示特定条件下属性对于对象的隶属度,并基于此构建三元概念,利用三元概念表示数据集中文本、特征词与类别之间的三元关系;然后结合模糊理论中的贴近度,类比得出三元概念间的相似度,并运用相似性度量计算出训练集中三元概念与新文本的相似值。实验结果表明,文中所提模型是有效的,且在特定的数据集上相较于机器学习Support Vector Machine(SVM)算法、K-Nearest Neighbor(KNN)算法、卷积神经网络(CNN)算法以及基于形式概念分析的分类模型均有更好的分类效果。
With the emergence of three-dimensional data in the network,the advantages of triadic concept analysis(TCA)have been reflected gradually.As a relatively new field,TCA has a bright prospect.This paper proposed a text classification algorithm based on TCA,which is a novel idea and a development of TCA in application aspect.The main idea of this algorithm is firstly preprocessing the dataset so that we can convert it into triadic context,meanwhile extend the binary relation in the context to a fuzzy value between 0-1which represents membership degree about attribute for object under certain conditions.Based on this,we can build triadic concepts and utilize it to express the ternary relation among text,term and category.Then,combined with the approach degree in fuzzy theory,we can analogize the similarity formula of triadic concepts,accordingly calculate the training set's similar value about triadic concept for a new text.Compared to support vector machine(SVM),K-nearest neighbor(KNN),convolution neural network(CNN)algorithm and classification based on formal concept analysis model,the results indicate that the proposed model in specific dataset is effective and achieves a better performance.
作者
李贞
张卓
王黎明
LI Zhen ZHANG Zhuo WANG Li-ming(School of Information Engineering, Zhengzhou University, Zhengzhou 450001 ,China)
出处
《计算机科学》
CSCD
北大核心
2017年第8期207-215,共9页
Computer Science
基金
国家青年科学基金项目(61303044)资助
关键词
三元概念分析
三元概念
模糊理论
文本分类
三元概念相似度
Triadic concept analysis
Triadic concept
Fuzzy theory
Text classification
Triadic concept similarity