摘要
该文尝试将序列模式挖掘算法Prefixspan应用于中文文本新词提取中,针对Prefixspan算法挖掘出的序列模式不连续、挖掘出的序列模式项相互间存在包含关系等问题,对算法进行改进,采用语义特征与统计相结合的方法,实现了从中文语料中有效提取新词。实验结果表明,该方法对于专业领域新词的识别具有较高的准确性。
The article attempts to apply the sequential pattern mining algorithm—Prefixspan to the extraction of Chinese text Neologisms.Aiming at the problem of sequential pattern discontinuity,the mining sequence patterns include each other and so on, the paper improved the prefixspan algorithm and combined semantic features with statistics to achieve effective discovery new words from Chinese text. The experimental results show that the method has high accuracy in the new word discovery.
出处
《电脑知识与技术》
2018年第3Z期160-163,共4页
Computer Knowledge and Technology
基金
国家自然科学基金(41701537)
湖北省教育厅科研项目(B2015448)