摘要
针对Web页中存在不少不真实信息的问题,提出了一个两步的方法来鉴别一个中文陈述句是否是事实。第一步根据陈述句中的不确定单元对陈述句进行分类扩展,找到一些和待验证陈述句主题匹配的候选陈述句。第二步把候选陈述句代入现有搜索引擎,确定出最有可能的候选。这两步过程都需要从主流的搜索引擎的搜索结果中抽取各种特性。实验结果表明,准确率可以达到85%以上。经过改进,该技术可以用来评测网页的可信度。
The Web contains a significant amount of untruthful information. This paper proposes a two-step method that aims to determine whether a given Chinese fact statement is truthful. In the first step it classifies the given state-ment and extends to alternative statement which has the same topic with the given statement based on doubt unit. In the second step, it sends every alternative statement including the given statement as a query to a search engine and analyzes various features extracted from the search results returned from the search engine. The experimental results show this method can achieve a precision of about 85%. After improvement, the technique can be used to evaluate the reliability of webpage.
出处
《计算机工程与应用》
CSCD
2014年第15期75-81,共7页
Computer Engineering and Applications
基金
国家自然科学基金(No.61170039)
河北省自然科学基金(No.F2012201006)
关键词
陈述句
正误
验证
WEB页面
可信度
fact statement
truthfulness
verifying
Web page
reliability