摘要
语义网是依托互联网技术而产生的一类非常重要的资源。目前,语义网中的用户查询仅支持形式化的查询方式,因此需要严格地遵循某种特定的语法规范,从而导致只有熟悉语义网系统和形式语言的专业人士才能正确进行查询操作。为了弥补这一缺陷,提出了一个无指导的自然语言查询系统,它能自动地将自然语言的句子转换成语义网查询支持的形式语言语句,从而方便非专业用户(即普通用户)使用。该系统首先根据语义网自动抽取给定句子中的所有实体和属性,然后将这些实体和属性关联起来形成一个语义关联图,最后通过启发式的方式从图中搜索出一条最优路径,并将这条路径转换成SPARQL语句。该系统最关键的部分在于语义网中的实体和属性覆盖度,它能直接决定语义关联图的好坏,从而影响系统的最终性能。为了提升系统的实用性,进一步利用外部语义网的知识来补全和丰富自然语言句子中所蕴含的信息,优化中间生成的语义关联度,得到更准确的SPARQL语句。最后采用美国地理问题集进行实验以验证该系统以及提出的改进方法,该数据集共包含了880个问句的人工SPARQL语句,是自然语言查询相关工作中一个被广泛认可的数据集。最终实验结果表明:提出的基准系统能够正确回答77.6%的问题,显著优于当前最好的无指导系统;当采用外部语义知识补全后,回答正确率达到78.5%。
Semantic Web is one kind of extremely important resources based on Internet technique.Querying on a semantic Web only supports formal languages,which need manipulator to strictly observe certain syntax constraints,and thus only experts that are familiar with semantic Web system and formal language are capable of querying.To overcome this problem,this paper presented an unsupervised natural language querying system,which can convert natural languages into formal languages automatically,thus making common users query on a semantic web using natural languages conveniently.The system first extracts all entities and attributes in a sentence based on a specific semantic Web,then connects them to form a semantic relationship graph,and finally exploits a heuristic strategy to search for an optimum path which is used to produce the output SPARQL expression.The key of the system is the coverage of the entities and attributes from the semantic Web,which directly decides the quality of the inter-mediate semantic relationship graph,and influences the final performance of system.In order to achieve a practical system,this paper enriched a human-annotated semantic Web for a specific domain through using external semantic knowledge,so that the natural language formed languages can contain more information.By this method,better semantic relationship graphs can be obtained and more accurate SPARQL expressions for sentences are achieved.Finally,this paper used the dataset based on American geography for experimental evaluation to verify this system.The dataset is widely acceptable for related research work of natural language querying,which includes manually-annotated SPARQL expressions with 880 questions.The experimental results show that this system can correctly answer 77.6%of the natural queries,outperforming the best unsupervised system in the literature significantly.After knowledge enriching by the external semantic Web,the system reaches 78.5%in term of the correctly-answering accuracy.
作者
冯雪
FENG Xue(Computer School,Beijing Information Science and Technology University,Beijing 100192,China)
出处
《计算机科学》
CSCD
北大核心
2019年第8期272-276,共5页
Computer Science
基金
国家重点研发计划项目(2018YFB1004100)资助