摘要
提出一种多策略同义词获取方法,一方面利用《同义词词林》、《中文概念词典》等现有语义词典中蕴含的同义关系获取同义词,另一方面根据百度百科信息框(Bdbk)中特征词和汉典网(Zdic)中HTML标记获取同义词,同时采用DIPRE自动获取模式的方法,从百度百科文本中发现置信度较高的模式和同义关系。实验结果表明,所提方法在NLP&CC 2012同义词评测数据集中取得较好结果。利用该方法,以《现代汉语语法信息词典》名词部分为目标,构建一部同义词词典并进行人工校对,为《现代汉语语法信息词典》构建较为完善的语义关系体系做出尝试。
Cilin and Chinese Concept Dictionary are used as dictionary resources in many NLP applications. The authors study some strategies on Chinese synonyms extraction according to key word of the infobox in Baidubaike and HTML tag of the web page in Zdic. Meanwhile, DIPRE (Dual Iterative Pattern Relation Expansion) is applied to discover high credible patterns and synonymous instances in Encyclopedia corpora. Extensive experimental evaluation demonstrates that proposed strategies outperform the NLP&CC 2012 evaluation results. A sophisticated synonym dictionary is built with manually proofreading for noun part of the Grammatical Knowledge-Base of Contemporary Chinese, which would make contributions to perfect the semantic systems of the Grammatical Knowledge-base of Contemporary Chinese.
出处
《北京大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2015年第2期301-306,共6页
Acta Scientiarum Naturalium Universitatis Pekinensis
基金
国家自然科学基金(61272221
61472191)
国家社会科学基金(11CYY030
10CYY021)
江苏省社会科学基金(12YYA002)
江苏省高校自然科学基金(14KJB520022)资助
关键词
同义词
关系抽取
模式匹配
网络百科
synonym
relation extraction
pattern-based method
Encyclopedia