期刊文献+

一种高稳定性词汇共现模型 被引量:2

A Highly Stable Term Co-Occurrence Model
下载PDF
导出
摘要 针对传统词汇共现模型存在的缺乏理论基础和稳定性欠佳等问题,提出了一种基于项场的高稳定性词汇共现模型.借鉴经典物理学中场的概念给出了项场的定义,其中项是语言的基本单位,是概念的抽象描述,而项场则是项在文档中的影响范围.在此基础上,引入量子场论将项与项的相关度类比为项场的叠加,由此给出了项与项之间距离和相关度的函数关系,并用其建立了词汇共现模型.实验结果证明,在小距离的情况下,所提模型中项的相关度大体呈常数,具有一定的窗口内稳定性,而同范畴的项对相关度振幅只有对照模型中最小振幅的26%,表明它具有较好的数据集稳定性. To address the issues that traditional term co-occurrence models are lack of theoretical basis and poor stabile, a highly stable term co-occurrence model based on term field is proposed. The model uses the concept of field in classical physics for reference to define the term field (terms are the basic units of language, which describe the abstract concepts, and the term field is the area affected by a term in document). Based on the definition, the model regards correlation as a superposition of term fields, and gets the functional relations of terms correlation and the distances between terms. Experimental results show that the terms correlation in this model is almost a constant for small distances and stable enough in window. While the correlation amplitude of the terms in same category is only 26% of the best result obtained with other models, which means the model is stable enough in dataset.
出处 《西安交通大学学报》 EI CAS CSCD 北大核心 2009年第6期24-27,共4页 Journal of Xi'an Jiaotong University
基金 国家高技术研究发展计划资助项目(2006AA01Z101) 教育部高等学校博士学科点专项科研基金资助项目(20060698018).
关键词 项场 词汇共现 窗口内稳定性 数据集稳定性 term field term co-occurrence in-window stability in-dataset stability
  • 相关文献

参考文献7

  • 1BEIGBEDER M, MERCIER A. An information retrieval model using the fuzzy proximity degree of term occurrences [C]//Proceedings of the 2005 ACM Symposium on Applied Computing. New York, USA: ACM, 2005: 1018-1022.
  • 2PETKOVA D, CROFT W B. Proximity-based document representation for named entity retrieval[C]// Proceedings of the 16th ACM Conference on Information and Knowledge Management. New York, USA: ACM, 2007: 731-740.
  • 3RASOLOFO Y, SAVOY J. Term proximity scoring for keyword-based retrieval systems [C]// Proceedings of 25th European Conference on IR Research. Berlin, Germany: Springer, 2003: 207-218.
  • 4YAROWSKY A D. One sense per collocation[C]// Proceedings of the ARPA Human Language Technology Workshop. Stroudsburg, PA, USA: Association for Computational Linguistics, 1993: 266-271.
  • 5鲁松,白硕.自然语言处理中词语上下文有效范围的定量描述[J].计算机学报,2001,24(7):742-747. 被引量:47
  • 6GAO Jianfeng, ZHOU Ming, NIE Jianyun, et al. Resolving query translation ambiguity using a decaying co-occurrence model and syntactic dependence relations [C] // Proceedings of the 25th Annual Interna- tional ACM SIGIR. New York, USA: ACM, 2002: 183-190
  • 7郭锋,李绍滋,周昌乐,林颖,李胜睿.基于词汇吸引与排斥模型的共现词提取[J].中文信息学报,2004,18(6):16-22. 被引量:8

二级参考文献7

  • 1白硕,语言学知识的计算机辅助发现,1995年
  • 2方开泰,实用多元统计分析,1989年
  • 3Ying Ding, IR and AI. Using Co - occurrence Theory to Generate Lightweight Ontologies[A]. Proceedings of 12th International Workshop on Database and Expert Systems Applications[C], Pages:961 -965 , Sept.,2001.
  • 4E1-Sayed Atlam, A New Method for Construction Field Association Terms Using Co-occurrence Words and Declinable Words Information[A]. Proceedings of 2002 IEEE Intemational Conference on Systems, Man and Cybernetics[C],Volume 4 ,Pages:5, Oct. 2002 .
  • 5Yuen-Hsien Tseng, Fast Co-occurrence Thesaurus Construction for Chinese News[A]. Proceedings of 2001 IEEE International Conference on Systems, Man, and Cybernetics[C], Volume 2, Pages:853- 858, Oct. 2001.
  • 6Doug Beeferman, Adam Berger, John Lafferty. A Model of Lexical Attraction and Repulsion[A]. Proceedings of the35th Annual Meeting of the Association for Computational Linguistics. [C], Pages: 373- 380, 1997.
  • 7Ido Dagan, Shaul Marcus. Contextual word similarity and estimation from sparse data[J]. Computer Speech and Language, Vol. 9, Pages: 123 - 152,1995.9.

共引文献52

同被引文献27

  • 1杜波,田怀凤,王立,陆汝占.基于多策略的专业领域术语抽取器的设计[J].计算机工程,2005,31(14):159-160. 被引量:26
  • 2耿焕同,蔡庆生,赵鹏,于琨.一种基于词共现图的文档自动摘要研究[J].情报学报,2005,24(6):651-656. 被引量:15
  • 3SUI ZHIFANG,CHEN YIRONG.The research on the automatic term extraction in the domain of information science and technology[C]// Proceedings of the 5th East Asia Forum of the Terminology.Haikou:China National Institute of Standardization Press,2007:165-169.
  • 4BOURIGAULT D,JACUEMIN C,L'HOMMM-C.Recent advances in computational terminology[M].Amsterdam:John Benjamins Publishing Company,2001:353-370.
  • 5FORTUNA B,LAVRA(C) N,VELARDI P.Advancing topic ontology learning through term extraction[C]//PRICAI 2008:Proceedings of the 10th Pacific Rim International Conference on Artificial Intelligence,LNAI5351.Berlin:Springer-Verlag,2008:626-635.
  • 6BUITELAAR P,OLEJNIK D,SINTEK M.A protégé plug-in for ontology extraction from text based on linguistic analysis[C]//The Semantic Web Research and Applications,LNCS 3053.Berlin:Springer-Verlag,2004:31-44.
  • 7PANTEL P,LIN D.A statistical corpus-based term extractor[C]//Proceedings of 14th Biennial Conference of the Canadian Society on Computational Studies of Intelligence:Advances in Artificial Intelligence.Ottawa:[s.n.],2001:36-44.
  • 8KAGEURA K,UMINO B.Methods of automatic term recognition:A review[J].Terminology,1996,3(2):259-289.
  • 9QIN LONGZHANG,QIN LU,ZHI FANGSUI.Measuring temthood in automatic terminology extraction[C]// Natural Language Processing and Knowledge Engineering.Piscataway:IEEE,2007:328-335.
  • 10LAFFERTY J,MCCAIINM A,PEREIRA F.Conditional random fields:Probabilistic models for segmenting and labeling sequence data[C]// ICML:International Conference on Machine Learning.San Francisco:Morgan Kaufmann Publishers,2001:961-965.

引证文献2

二级引证文献31

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部