含有不希望出现句子的网页鉴别问题

Webpage Identification Issues Containing Undesirable Sentences

下载PDF

导出

摘要网页是传递信息的重要载体,以网页作为研究对象是现今信息检索和信息关联研究的必然趋势.鉴于句子既是传递信息的基本单位,又是表述完整意思的语言单位,本文以句子为出发点对网页鉴别问题进行研究.句子的不同变换形式能够表述相同的含义的特点,给网页鉴别带来了困难.为解决该问题,首先定义句子和网页之间的4种关系:属于关系、同义词替换关系、简单语序变换关系、复杂语序变换关系,然后讨论每一种关系的识别问题,证明了:(1)识别句子和网页的属于关系是可判定问题并且是P问题;(2)识别同义词替换关系是不可判定问题;(3)识别简单语序变换关系是不可判定问题;(4)识别复杂语序变换关系是不可识别问题.上述结论勾画出了网页鉴别问题难易程度的谱系. The webpage is an important carrier of transmitting information, and that as the study object is the inevitable trend in the information retrieval and information association. In view that the sentence is not only the basic unit of transmission information but also the language unit of expression completeness, we research the webpage identify issues in the sentence angle. The fact that various sentence transformation forms can express the same meanings has brought difficulties to the webpage identification. In order to solve this problem, firstly we define four relationships between sentences and webpage： belong to relationship, synonym substitution relationship, simple order transformation relationship, complex order transformation relationship. Then we discuss the identification problem of every relationship, and prove that：（ 1 ） the recognition of ＂belong to relationship＂ is the decidable problem and P problem; （2） the recognition of ＂synonym substitution relationship＂ is the undecidable problem; {3） the recognition of ＂simple order transformation relationship＂ is undecidable problem; （4） the recognition of ＂complex order transformation relationship＂ is unrecognizable problem. Above conclusions outline the pedigree of difficulty degree in webpage identification.

作者王柠刘国华

机构地区燕山大学信息科学与工程学院齐齐哈尔大学计算机与控制工程学院东华大学计算机科学与技术学院

出处《小型微型计算机系统》 CSCD 北大核心 2014年第6期1232-1238,共7页 Journal of Chinese Computer Systems

基金国家自然科学基金项目(61070032)资助黑龙江省自然科学基金项目(F201204)资助齐齐哈尔大学青年教师科研启动项目(2010K-M13)资助

关键词网页鉴别句子可判定问题不可判定问题不可识别问题 webpage identification sentences decidable problem undecidable problem unrecognizable problem

分类号 TP309 [自动化与计算机技术—计算机系统结构]

引文网络
相关文献

参考文献2

1许红涛,周向东,向宇,施伯乐.一种自适应的Web图像语义自动标注方法[J].软件学报,2010,21(9):2183-2195. 被引量：15
2孙艳,周学广.基于粗糙集与贝叶斯决策的不良网页过滤研究[J].中文信息学报,2012,26(1):67-72. 被引量：5

二级参考文献33

1卢娇丽,郑家恒.基于粗糙集的文本分类方法研究[J].中文信息学报,2005,19(2):66-70. 被引量：16
2张宇,刘挺,文勖.基于改进贝叶斯模型的问题分类[J].中文信息学报,2005,19(2):100-105. 被引量：47
3Li XR,Chen L,Zhang L,Lin FZ,Ma WY.Image annotation by large-scale content-based image retrieval.In:Nahrstedt K,et al.,ed.Proc.of the 14th ACM Int'l Conf.on Multimedia.Santa Barbara:ACM Press,2006.607-610.
4Wang XJ,Zhang L,Jing F,Ma WY.AnnoSearch:Image auto-annotation by search.In:Hari S,Milind RN,John RS,Yong R,eds.Proc.of the Conf.Image and Video Retrieval.2006.1483-1490.
5Feng HM,Shi R,Chua TS.A bootstrapping framework for annotating and retrieving WEB images.In:Schulzrinne H,et al.,eds.Proc.of the 12th ACM Int'l Conf.on Multimedia.New York:ACM Press,2004.960-967.
6Tseng VS,Su JH,Wang BW,Lin YM.WEB image annotation by fusing visual features and textual information.In:Proc.of the 2007 ACM Symp.on Applied Computing,Symposium on Applied Computing.New York:ACM Press,2007.1056-1060.
7Mori Y,Takahashi H,Oka R.Image-to-word transformation based on dividing and vector quantizing images with words.In:Proc.of the 1st Int'l Workshop on Multimedia Intelligent Storage and Retrieval Management.1999.
8Duygulu P,Barnard K,de Freitas JFG,Forsyth DA.Object recognition as machine translation:Learning a lexicon for a fixed image vocabulary.In:Proc.of the European Conf.on Computer Vision.2002.97-112.
9Blei D,Jordan M.Modeling annotated data.In:Proc.of the Int'l ACM SIGIR.Toronto:ACM Press,2003.127-134.
10Jeon J,Lavrenko V,Manmatha R.Automatic image annotation and retrieval using cross-media relevance models.In:Proc.of the Int'l ACM SIGIR.Toronto:ACM Press,2003.119-126.

共引文献18

1孙君顶,杜娟.图像自动语义标注技术综述[J].计算机系统应用,2012,21(7):258-261. 被引量：3
2蒋黎星,侯进.基于集成分类算法的自动图像标注[J].自动化学报,2012,38(8):1257-1262. 被引量：11
3陈叶旺,钟必能,王靖,李海波.一种基于本体与描述文本的网络图像语义标注方法[J].计算机科学,2012,39(B06):293-299. 被引量：4
4俞建松,曹冬林,李绍滋,林达真.基于互联网搜索与反馈验证的图像自动标注[J].计算机工程,2012,38(24):211-215. 被引量：1
5张志武,阚德涛.基于语义学习的自动图像标注技术研究述评[J].情报学报,2013,32(10):1112-1120. 被引量：2
6冯鹏展,郭程,郑剑夕,张桂平.一种基于网络的竞争企业名录自动更新方法[J].沈阳航空航天大学学报,2013,30(5):67-72.
7郭海凤,张盈盈,李广水,韩立新.基于社会网络的图像语义获取研究综述[J].计算机与现代化,2014(1):126-131.
8段喜萍,刘家锋,王建华,唐降龙.一种语义级文本协同图像识别方法[J].哈尔滨工业大学学报,2014,46(3):49-53. 被引量：7
9刘杰,骆力明,吴宇航,马轶芳,蔡红梅.一种中文领域网页过滤方法[J].北京理工大学学报,2014,34(5):533-536. 被引量：2
10吴伟,高光来,聂建云.一种融合语义距离的最近邻图像标注方法[J].计算机科学,2015,42(1):297-302. 被引量：5

1赵冰.网页原始性鉴别方法研究[J].河南教育学院学报（自然科学版）,2010,19(3):23-25.
2张焕国,杜瑞颖,傅建明,赵波,王丽娜.信息安全:一门独立的学科一门新兴的学科[J].信息安全与通信保密,2014,0(5):37-39. 被引量：10
3缪力,张大方.过程间并发程序分析不可判定的一个新证明方法[J].计算技术与自动化,2007,26(2):53-56.
4孙黎.市售英特尔平台移动处理器解析[J].微型计算机,2008,28(28):112-114.
5刘伟,杨勇,张亮.一种考虑消息类型的服务可替换性验证方法[J].计算机工程,2012,38(13):40-43.
6徐宁,杨庚.半导体器件动力模型的有限元法仿真与分析[J].计算机仿真,2003,20(9):53-55.
7邓宇,王蕾,张明,龚锐,郭御风,窦强.一个基于图着色的CACHE优化方法[J].国防科技大学学报,2012,34(6):20-25.
8刘雯.以“快、慢”为例分析汉语语义场的历时演变[J].文教资料,2009(23):36-37.
9李进,张江华.基于碳排放与速度优化的带时间窗车辆路径问题[J].系统工程理论与实践,2014,34(12):3063-3072. 被引量：51
10胡慧敏.高一学生对集合符号的学习困难分析[J].数学教学,2009(2):15-17. 被引量：1

小型微型计算机系统

2014年第6期

浏览历史

内容加载中请稍等...

含有不希望出现句子的网页鉴别问题

参考文献2

二级参考文献33

共引文献18

相关作者

相关机构

相关主题

浏览历史