摘要
针对症状间上下位关系具有较强结构特性的问题,提出一种基于症状构成成分的上下位关系自动抽取方法。首先,通过观察症状实体,发现症状可以切分为原子症状词、修饰词等八种成分,且成分的构成序列满足一定的规则。然后,利用词法分析系统和条件随机场模型对症状进行切分和成分标注。最后,把症状之间的关系抽取看作一个分类问题,选取症状成分的构成特征、词典特征以及通用特征作为分类算法的特征;基于多种分类算法训练模型,将症状间的关系分为上下位关系和非上下位关系。实验结果表明,当选用支持向量机算法,同时选用三类特征时,取得了最好的效果,准确率、召回率和F1值分别达到了82.68%、82.13%和82.40%。在此基础上,使用所提出的关系抽取算法,抽取了20 619条上下位关系,构建了具有上下位关系的症状知识库。
Since the hyponymy between symptoms has strong structural features, an automatic hyponymy extracting method based on symptom components was proposed. Firstly, it was found that symptoms can be divided into eight parts: atomic symptoms, adjunct words, and so on, and the composition of these parts satisfied certain constructed rules. Then, the lexical analysis system and Conditional Random Field (CRF) model were used to segment symptoms and label the parts of speech. Finally, the hyponymy extraction was considered as a classification problem. Symptom constitution features, dictionary features and general features were selected as the features of different classification algorithms to train the models. The relationship between symptoms were divided into hyponymy and non-hyponymy. The experimental results show that when these features are selected simultaneously, precision, recall and Fl-measure of Support Vector Machine (SVM) are up to 82. 68%, 82. 13% and 82.40%, respectively. On this basis, by using the above hyponymy extracting algorithm, 20619 hyponymies were extracted, and the knowledge base of symptom hyponymy was built.
出处
《计算机应用》
CSCD
北大核心
2017年第10期2999-3005,共7页
journal of Computer Applications
基金
国家863计划项目(2015AA020107)
国家科技支撑计划项目(2015BAH12 F01-05)~~
关键词
上下位关系
症状构成成分
条件随机场
关系分类
支持向量机
决策树
朴素贝叶斯
hyponymy
symptom component
Conditional Random Field (CRF)
relationship classification
SupportVector Machine (SVM)
decision tree
Naive Bayesian (NB)