摘要
针对维吾尔文组词算法在文本分类中的分类性能不高,以及处理海量数据困难等问题,提出一种改进维吾尔文组词算法(DM),并设计一种基于Hadoop和改进维吾尔文组词算法的文本分类模型。对文本进行分段式处理,对每段分别采用DM组词算法,利用MapReduce编程模型实现该算法的并行化设计,结合Mahout贝叶斯分类算法进行文本分类,实验结果表明,该模型具有较好的分类结果。
Aiming at the problem of classification performance of Uighur group word algorithm in text classification and the difficulties of dealing with massive data,an improved Uyghur group word algorithm(DM)was proposed,and a text classification model based on Hadoop and improved Uygur group word group was proposed.The text was segmented and the DM group word algorithm was used for each segment,and the MapReduce programming model was used to realize the parallel design of the algorithm.The Mahout Bayesian classification algorithm was used to classify the text.Experimental data show that the proposed model has good classification results.
作者
艾比布拉.阿不拉
马振
哈力旦.阿布都热依木
吴冰冰
Aibibula·Abula, MA Zhen, Halidan· Abudureyimu, WU Bing-bing(School of Electrical Engineering, Xinjiang University, Urumqi 830047, Chin)
出处
《计算机工程与设计》
北大核心
2018年第8期2500-2504,共5页
Computer Engineering and Design
基金
新疆维吾尔自治区自然科学基金项目(2016D01C048)