期刊文献+

基于FP序列树的法文词语提取方法研究 被引量:1

Extracting Terms Form French Corpora with FP Sequence Tree
下载PDF
导出
摘要 法语复杂的语法和词形变化规则导致N-gram等词语提取方法的效果无法保证,影响法语文本挖掘的准确性。该文提出一种高效的法文词语提取方法,从待分析的法语文本中自动获取包括单词和短语的词语集合,构建法语文本挖掘所需的词库。该方法把文本中的单词共现信息压缩为FP序列树结构,快速提取频繁词串并计算其成词度,得到法文词语集合。实验表明,该方法的准确率高达90%,且具有比现有法文词语提取方法更高的召回率,能有效支持法语文本挖掘应用。 French is one of the working languages of the United Nations.Its complex grammar and part-ofspeech rules result in the inability of term extraction methods such as N-gram and thus affect the accuracy of French text mining.This paper proposes an effective and efficient French term extraction method,which can be used to extract words and phrases from the analyzing French text corpora and provide a complete lexicon for French text mining.Firstly,word co-occurrence information of the corpora being analyzed is compressed into an FP(Frequent Pattern)sequence tree for extracting frequent word sequences rapidly,and then the termhood of each frequent word sequence is calculated to obtain the term set.The FP sequence tree is a newly-designed data structure for reducing the time complexity of word co-occurrence statistics to linear time.Experiments show that the proposed method has a high accuracy of approximate 90%with a much higher than normal recall rate and thus has good potentials for French text mining applications.
作者 于娟 吴晓鹏 廖晓 刘建国 YU Juan;WU Xiao-peng;LIAO Xiao;LIU Jian-guo(School of Economics and Management,Fuzhou University,Fuzhou 350108;School of Internet Finance and Information Engineering,Guangdong University of Finance,Guangzhou 510521;Institute of Finance and Accounting,Shanghai University of Finance and Economics,Yangpu Shanghai 200433)
出处 《电子科技大学学报》 EI CAS CSCD 北大核心 2021年第1期84-90,共7页 Journal of University of Electronic Science and Technology of China
基金 国家自然科学基金(71771054)。
关键词 FP序列树 法语文本挖掘 词语提取 成词度 文本压缩 FP sequence tree French text mining term extraction termhood text compression
  • 相关文献

参考文献4

二级参考文献32

共引文献54

同被引文献5

引证文献1

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部