摘要
网页是传递信息的重要载体,以网页作为研究对象是现今信息检索和信息关联研究的必然趋势.鉴于句子既是传递信息的基本单位,又是表述完整意思的语言单位,本文以句子为出发点对网页鉴别问题进行研究.句子的不同变换形式能够表述相同的含义的特点,给网页鉴别带来了困难.为解决该问题,首先定义句子和网页之间的4种关系:属于关系、同义词替换关系、简单语序变换关系、复杂语序变换关系,然后讨论每一种关系的识别问题,证明了:(1)识别句子和网页的属于关系是可判定问题并且是P问题;(2)识别同义词替换关系是不可判定问题;(3)识别简单语序变换关系是不可判定问题;(4)识别复杂语序变换关系是不可识别问题.上述结论勾画出了网页鉴别问题难易程度的谱系.
The webpage is an important carrier of transmitting information, and that as the study object is the inevitable trend in the information retrieval and information association. In view that the sentence is not only the basic unit of transmission information but also the language unit of expression completeness, we research the webpage identify issues in the sentence angle. The fact that various sentence transformation forms can express the same meanings has brought difficulties to the webpage identification. In order to solve this problem, firstly we define four relationships between sentences and webpage: belong to relationship, synonym substitution relationship, simple order transformation relationship, complex order transformation relationship. Then we discuss the identification problem of every relationship, and prove that: ( 1 ) the recognition of "belong to relationship" is the decidable problem and P problem; (2) the recognition of "synonym substitution relationship" is the undecidable problem; {3) the recognition of "simple order transformation relationship" is undecidable problem; (4) the recognition of "complex order transformation relationship" is unrecognizable problem. Above conclusions outline the pedigree of difficulty degree in webpage identification.
出处
《小型微型计算机系统》
CSCD
北大核心
2014年第6期1232-1238,共7页
Journal of Chinese Computer Systems
基金
国家自然科学基金项目(61070032)资助
黑龙江省自然科学基金项目(F201204)资助
齐齐哈尔大学青年教师科研启动项目(2010K-M13)资助
关键词
网页鉴别
句子
可判定问题
不可判定问题
不可识别问题
webpage identification
sentences
decidable problem
undecidable problem
unrecognizable problem