摘要
针对XML(可扩展标记语言)文档提出的信息单元和信息特征的概念,利用信息单元作为候选查询结果,以信息特征表征标签信息,结合XML文档的结构语义,在TF*IDF模型的基础上,设计了查询结果的相关度排序策略.该策略同时涉及了查询结果的结构信息和内容信息,计算了信息特征的重要性,且能利用信息特征的重要性衡量不同信息特征下关键字的语义相关度,进而设计并实现了关键字查询算法XRIU.实验结果表明XRIU在查询质量上优于现有的主要算法.
Notions of information unit (IU) and IU feature of XML (eXtensible markup language) documents were given. By taking IUs and IU features as candidate search results and the tags respec- tively, a semantic ranking strategy based on TF IDF (term frequency * inverse document frequen- cy) model was proposed based on the structural semantics, which took account of both structures and contents in search results. The ranking strategy could prejudge the influence of IU features, and the influence was used to measure the relatedness of same keyword located in different IU features. Final- ly, the proposed technique was implemented by the search engine named XRIU (XML ranking infor- mation unit). The extensive experiments demonstrate the effectiveness and efficiency of the approach.
出处
《华中科技大学学报(自然科学版)》
EI
CAS
CSCD
北大核心
2011年第9期82-86,共5页
Journal of Huazhong University of Science and Technology(Natural Science Edition)
基金
NSFC-JST重大国际(地区)合作资助项目(60720106001)
国家自然科学基金资助项目(60803043)
国家高技术研究发展计划资助项目(009AA1Z134)