摘要
To resolve the ontology understanding problem, the structural features and the potential important terms of a large-scale ontology are investigated from the perspective of complex networks analysis. Through the empirical studies of the gene ontology with various perspectives, this paper shows that the whole gene ontology displays the same topological features as complex networks including "small world" and "scale-free",while some sub-ontologies have the "scale-free" property but no "small world" effect.The potential important terms in an ontology are discovered by some famous complex network centralization methods.An evaluation method based on information retrieval in MEDLINE is designed to measure the effectiveness of the discovered important terms.According to the relevant literature of the gene ontology terms,the suitability of these centralization methods for ontology important concepts discovering is quantitatively evaluated.The experimental results indicate that the betweenness centrality is the most appropriate method among all the evaluated centralization measures.
为解决大规模本体理解问题,提出了一个从复杂网络分析的角度研究大规模本体结构信息和重要概念挖掘的方法.通过将基因本体的各种视图转换为网络进行全面分析,证明了整个基因本体具有明显的复杂网络特征,尤其是"小世界特性"和"无标度特性";但其子本体的复杂网络特性没有这么明显,往往只具有"无标度特性"而没有"小世界特性".同时,利用网络分析中常用的节点重要性度量算法对本体中的重要概念进行挖掘.在此基础上,提出了基于MEDLINE信息检索结果的概念重要性评价算法,评估几种节点重要性算法用于本体重要概念挖掘任务的正确性.实验结果表明介数中心性算法在各种节点重要性度量算法中最适合于本体重要概念挖掘.
基金
The National Basic Research Program of China (973Program) (No.2005CB321802)
Program for New Century Excellent Talents in University (No.NCET-06-0926)
the National Natural Science Foundation of China (No.60873097,90612009)