期刊文献+

组织机构名称简称与全称的自动识别研究初探 被引量:2

Study on Automatic Identification of Organization's Abbreviated and full Name
下载PDF
导出
摘要 组织机构名称过多使用简写、别名、俗称等造成的机构名称简称的不确定性,使各计算机管理系统不能正确统计、分析机构信息,各独立系统无法整合,无法有效传递数据,机构名称的不确定性将增加大数据时代在数据挖掘方面的效率和成本。本文分析了组织机构名称的特点,通过对基于向量空间模型的TF-IDF方法进行改造,提出了一种比较有效的组织机构名称的别名自动识别算法,并且开发实现了识别软件。初步实验表明对实际中使用的简称名称识别的正确率可以达到70%以上,这将极大地减轻人工处理的劳动强度。 Excessive use of abbreviation, aliases, vulgo for organization name result in computer management system fail to calculate and analyze the information of organization, effectively integrate each separate system, and transfer the data. Uncertain name of organization will reduce the efficiency and increase costs in collecting information in the big data era. This paper analyzes the characteristics of the organization name, and proposes a more effective automotive recognition algorithm to identify the organization name as well as develops software to achieve recognition through transforming the approach of TF- IDF vector space model-based. Preliminary experiments show that the correct rate of identifying the short name in use can be up to over 70%, which will greatly reduce the labor intensity in manual processing.
出处 《标准科学》 2014年第8期82-86,共5页 Standard Science
关键词 组织机构名称 简称 自动识别 organization name, abbreviation, automatic identification
  • 相关文献

参考文献6

  • 1Fang Y C,Parthasarathy S,Schwartz EUsing Clustering to Boost Text Classification[C]//Proc.of the IEEE ICDM Workshop on Text Mining.Maebashi City, Japan:[s.n.],2002:1-9.
  • 2Cutting D,Karger D.Scatter/Gather:A Cluster Based Approach to Browsing Large Document Collection[C]//Proc.of SIGIR' 92.New York,USA:ACM Press,1992:318-329.
  • 3Salton G,Wong A,Yang C S.A Vector Space Model for Automatic.
  • 4Indexing[J].Communications of ACM,1995,18(11):613-620. Aekerman M,Billsus D,Gaffney S.Learning Probabilistie User Profiles[J].AI Magazine,1997,18(2):47-56.
  • 5Cheeseman P, Stutz J.Bayesian Classification(AutoClass):Theo ry and Resuhs[C]//Proc.of Advances in Knowledge Discovery and Data Mining.Menlo Park,CA,USA:American Association for Artificial Intelligence,1996:153-180.
  • 6杨思春.一种改进的句子相似度计算模型[J].电子科技大学学报,2006,35(6):956-959. 被引量:34

二级参考文献7

  • 1张民 李生 赵铁军 陈力为 袁琦等.一种汉语句子间相似度的度量算法及其实现[C].陈力为,袁琦等.计算语言学进展与应用[C].北京:清华大学出版社,1995.152-158.
  • 2穗志方 俞士汶.基于骨架依存树的语句相似度计算模型[C]..中文信息处理国际会议(ICCIP98)论文集[C].,1998.458-465.
  • 3Satoshi S,Francis B,Yamato T.A hybrid rule and example based method for machine translation[C]//Proceedings of the 4th Natural Language Processing Pacific Rim Symposium,Puket,1997.
  • 4Malavazos C,Piperidis S.Application of analogical modeling to example based machine translation[C]//Proceedings of the 18th International Conference of Computational Linguistics,Saarbrucken,2000.
  • 5王长胜,刘群.基于实例的汉英机器翻译系统研究与实现[J].计算机工程与应用,2002,38(8):126-127. 被引量:13
  • 6吕学强,任飞亮,黄志丹,姚天顺.句子相似模型和最相似句子查找算法[J].东北大学学报(自然科学版),2003,24(6):531-534. 被引量:68
  • 7李彬,刘挺,秦兵,李生.基于语义依存的汉语句子相似度计算[J].计算机应用研究,2003,20(12):15-17. 被引量:127

共引文献33

同被引文献16

引证文献2

二级引证文献1

相关作者

内容加载中请稍等...

相关机构

内容加载中请稍等...

相关主题

内容加载中请稍等...

浏览历史

内容加载中请稍等...
;
使用帮助 返回顶部