摘要
词语语义知识库对于扩大自然语言理解的深度具有重要的意义。目前较为成熟的WordNet、HowNet、同义词词林等均为人工开发,对知识的描述较为准确,但开发的工作量巨大,实际应用存在很多困难。为了更加自动化、实证性地获取中文词语相互关联状况的知识,该文提出词语相关度的概念以及基于统计的词语相关度计算方法,并以此为基础构建一个基于强领域特性中文词语的词语相关度网络,设计数组分割的硬盘存储方法,使该任务涉及到的海量数据的分析处理可以在目前的个人PC上完成。最终获得的词语语义知识具备经验主义方法的优点,准确性、泛化性较强,可以在文本分类、检索、过滤等领域发挥重要作用。
Semantic knowledge-base has important meaning for increasing the deepness of NLU.Some comparatively mature Semantic knowledge-base such as WordNet,HowNet and Tongyicicilin was developed by manpower,and has many difficulties on actual application.In order to capture Chinese word knowledge of relating status more automatically and demonstrably,this paper presented the concept of word correlation and a calculation method of word correlation based on statistic.Then a correlation net based on Chinese words which have strong domain characteristic was built.In order to resolve the difficulty of processing the huge amount of data,a hard disk storing method of array segmentation was designed.The semantic knowledge gained by the experiment had the advantage of empiricism.It is veracity and generalization is strong so it can play an important role in many fields such as text categorization,text retrieval,text filtering,etc.
出处
《计算机与数字工程》
2012年第2期15-18,86,共5页
Computer & Digital Engineering
基金
海军工程大学自然科学基金引导项目(编号:HGDYDJJ10008)资助
关键词
词语相关度
词语相关度网络
语义词典
word correlation
word correlation net
semantic knowledge-base