期刊文献+

多载体数据流中的特定信息识别研究(英文) 被引量:1

Research of Specific Information Recognition in Multi-Carrier Data Streams
下载PDF
导出
摘要 提出了一种识别多载体数据流中包含的特定信息的新方法.该方法按照特征词及其拼音匹配规则,基于统计自然语言理论,通过自动的归纳学习,将从语料库中获得的词性间的转移值作为系统知识,利用有效的知识逼近策略判断真实数据流中的特征词与其上下文的关系,并得到特征词在真实文本中的评测值,以此来考查真实数据流中出现的全部特征词与在语料中所学到的特征词下下文搭配规则上的相似程度.如果整个数据流的评测值超过阈值,该数据流将被屏蔽.实验结果表明,根据该方法开发的识别及监控多载体数据流中不良信息的实验系统取得很好的效果. A method is presented to identify some pieces of specific information in multi-carrier data streams by feature words and based on Pin Yin matching. An effective knowledge approximation method is used to judge the relation between feature words and context by statistics theory. The part of speech transfer-value as system knowledge can be obtained by inductive learning of training corpus. When data streams are evaluated, the evaluation value can be gained according to the system knowledge by matching all feature words and based on their Pin Yin, which examines the comparability with context regular of part of speech between all feature words in data streams and themselves in training corpus. Further more, if the evaluation value exceeds the threshold, the data streams will be shielded. Experimental results show that the effect of the experiment system based on this method is efficient for identifying ill information and monitoring and controlling their spreading by multi-carrier data streams.
出处 《软件学报》 EI CSCD 北大核心 2003年第9期1538-1543,共6页 Journal of Software
基金 国家高技术研究发展计划(863)~~
关键词 信息识别 知识逼近 词性转移 归纳学习 Calculations Evaluation Information retrieval Knowledge engineering Statistics Telecommunication networks Text processing Word processing
  • 相关文献

参考文献1

二级参考文献4

  • 1Xu Jinxi,ACM Transactions on information systems,2000年,18卷,1期,79页
  • 2吴立德,大规模中文文本处理,1997年
  • 3Zha Hongyuan,SIAM J Sci Statist Comput,21卷,2期,782--791页
  • 4卢增祥,public.bta.net.cn,关宏超,李衍达.利用Bookmark服务进行网络信息过滤[J].软件学报,2000,11(4):545-550. 被引量:14

共引文献15

同被引文献5

  • 1陈基漓,牛秦洲.基于特征码的网页去重[J].微计算机信息,2006,22(03X):113-115. 被引量:11
  • 2[4]J.Zhou,P.Larson,J.C.Freytag,W.Lehner.Efficient Exploitation of Similar Subexpressions for Query Processing.ACM SIGMOD,2007:533-544.
  • 3[6]Junghoo Cho.N.Shivakumar et al.Finding replicated web collections.In Proceedings of 2000 ACM International Conference on Management of Data(SIGMOD),May 2000.
  • 4[7]Shaozhi Ye,Ji-RongWen,Wei-Ying Ma.A systematic study on parameter correlations in large-scale duplicate document detection.Knowledge and Information Systems,2007,14:217-232.
  • 5吴平博,陈群秀,马亮.基于特征串的大规模中文网页快速去重算法研究[J].中文信息学报,2003,17(2):28-35. 被引量:41

引证文献1

二级引证文献5

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部