期刊文献+

含有不希望出现句子的网页鉴别问题

Webpage Identification Issues Containing Undesirable Sentences
下载PDF
导出
摘要 网页是传递信息的重要载体,以网页作为研究对象是现今信息检索和信息关联研究的必然趋势.鉴于句子既是传递信息的基本单位,又是表述完整意思的语言单位,本文以句子为出发点对网页鉴别问题进行研究.句子的不同变换形式能够表述相同的含义的特点,给网页鉴别带来了困难.为解决该问题,首先定义句子和网页之间的4种关系:属于关系、同义词替换关系、简单语序变换关系、复杂语序变换关系,然后讨论每一种关系的识别问题,证明了:(1)识别句子和网页的属于关系是可判定问题并且是P问题;(2)识别同义词替换关系是不可判定问题;(3)识别简单语序变换关系是不可判定问题;(4)识别复杂语序变换关系是不可识别问题.上述结论勾画出了网页鉴别问题难易程度的谱系. The webpage is an important carrier of transmitting information, and that as the study object is the inevitable trend in the information retrieval and information association. In view that the sentence is not only the basic unit of transmission information but also the language unit of expression completeness, we research the webpage identify issues in the sentence angle. The fact that various sentence transformation forms can express the same meanings has brought difficulties to the webpage identification. In order to solve this problem, firstly we define four relationships between sentences and webpage: belong to relationship, synonym substitution relationship, simple order transformation relationship, complex order transformation relationship. Then we discuss the identification problem of every relationship, and prove that: ( 1 ) the recognition of "belong to relationship" is the decidable problem and P problem; (2) the recognition of "synonym substitution relationship" is the undecidable problem; {3) the recognition of "simple order transformation relationship" is undecidable problem; (4) the recognition of "complex order transformation relationship" is unrecognizable problem. Above conclusions outline the pedigree of difficulty degree in webpage identification.
作者 王柠 刘国华
出处 《小型微型计算机系统》 CSCD 北大核心 2014年第6期1232-1238,共7页 Journal of Chinese Computer Systems
基金 国家自然科学基金项目(61070032)资助 黑龙江省自然科学基金项目(F201204)资助 齐齐哈尔大学青年教师科研启动项目(2010K-M13)资助
关键词 网页鉴别 句子 可判定问题 不可判定问题 不可识别问题 webpage identification sentences decidable problem undecidable problem unrecognizable problem
  • 相关文献

参考文献2

二级参考文献33

  • 1卢娇丽,郑家恒.基于粗糙集的文本分类方法研究[J].中文信息学报,2005,19(2):66-70. 被引量:16
  • 2张宇,刘挺,文勖.基于改进贝叶斯模型的问题分类[J].中文信息学报,2005,19(2):100-105. 被引量:47
  • 3Li XR,Chen L,Zhang L,Lin FZ,Ma WY.Image annotation by large-scale content-based image retrieval.In:Nahrstedt K,et al.,ed.Proc.of the 14th ACM Int'l Conf.on Multimedia.Santa Barbara:ACM Press,2006.607-610.
  • 4Wang XJ,Zhang L,Jing F,Ma WY.AnnoSearch:Image auto-annotation by search.In:Hari S,Milind RN,John RS,Yong R,eds.Proc.of the Conf.Image and Video Retrieval.2006.1483-1490.
  • 5Feng HM,Shi R,Chua TS.A bootstrapping framework for annotating and retrieving WEB images.In:Schulzrinne H,et al.,eds.Proc.of the 12th ACM Int'l Conf.on Multimedia.New York:ACM Press,2004.960-967.
  • 6Tseng VS,Su JH,Wang BW,Lin YM.WEB image annotation by fusing visual features and textual information.In:Proc.of the 2007 ACM Symp.on Applied Computing,Symposium on Applied Computing.New York:ACM Press,2007.1056-1060.
  • 7Mori Y,Takahashi H,Oka R.Image-to-word transformation based on dividing and vector quantizing images with words.In:Proc.of the 1st Int'l Workshop on Multimedia Intelligent Storage and Retrieval Management.1999.
  • 8Duygulu P,Barnard K,de Freitas JFG,Forsyth DA.Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary.In:Proc.of the European Conf.on Computer Vision.2002.97-112.
  • 9Blei D,Jordan M.Modeling annotated data.In:Proc.of the Int'l ACM SIGIR.Toronto:ACM Press,2003.127-134.
  • 10Jeon J,Lavrenko V,Manmatha R.Automatic image annotation and retrieval using cross-media relevance models.In:Proc.of the Int'l ACM SIGIR.Toronto:ACM Press,2003.119-126.

共引文献18

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部