摘要
领域概念分类体系自动构建在人工智能、自然语言处理和信息检索等领域具有重要作用,但现有研究较多关注通用知识,面向特定领域的研究较少,且存在领域概念间关系抽取准确率以及自动构建算法效率较低等问题。为此,提出一种混合的领域概念分类体系自动构建算法,该算法主要包括领域概念间关系抽取模块和分类体系构建模块。领域概念间关系抽取模块设计考虑中文自身的特点,采取句法树和基于规则相结合的方法,以提高抽取领域概念间关系的查准率和查全率;分类体系构建模块设计采取改进的BRT算法,从而在降低算法复杂度的同时,提高领域分类体系构建的查准率。在通信、金融和计算机领域的实验结果均表明,与BRT算法相比,该算法的构建效果较好,查准率最高可达到89.3%。
Domain concept taxonomy automatic construction plays an important role in artificial intelligence,natural language processing and information retrieval. Existing approaches pay more attention on common knowledge, while there are fewer reports about domain concepts. Two main challenges of domain concept taxonomy automatic construction are identifying relationships between concepts and less efficiency of current algorithms. In this paper,a Hybrid algorithm of Automatic Domain concept Taxonomy construction(HADT) is proposed,which has two main modules:extracting relationships between domain concepts and automatic taxonomy construction. Considering Chinese characteristics,the first module uses syntax tree method and rule-based method together,to get the aim of higher precision and higher recall. The second module uses an improved BRT algorithm to reduce time complexity and to improve taxonomy construction precision. The experiments conducted on three datasets of mobile,financial and computer show the HADT algorithm is effectiveness compared with the BRT algorithm,and the highest precision rate is 89. 3%.
出处
《计算机工程》
CAS
CSCD
2014年第12期57-62,67,共7页
Computer Engineering
基金
国家科技支撑计划基金资助项目(2012BAH74F02)
上海市科委科研基金资助项目(12dz1500205)