摘要
组织机构名称过多使用简写、别名、俗称等造成的机构名称简称的不确定性,使各计算机管理系统不能正确统计、分析机构信息,各独立系统无法整合,无法有效传递数据,机构名称的不确定性将增加大数据时代在数据挖掘方面的效率和成本。本文分析了组织机构名称的特点,通过对基于向量空间模型的TF-IDF方法进行改造,提出了一种比较有效的组织机构名称的别名自动识别算法,并且开发实现了识别软件。初步实验表明对实际中使用的简称名称识别的正确率可以达到70%以上,这将极大地减轻人工处理的劳动强度。
Excessive use of abbreviation, aliases, vulgo for organization name result in computer management system fail to calculate and analyze the information of organization, effectively integrate each separate system, and transfer the data. Uncertain name of organization will reduce the efficiency and increase costs in collecting information in the big data era. This paper analyzes the characteristics of the organization name, and proposes a more effective automotive recognition algorithm to identify the organization name as well as develops software to achieve recognition through transforming the approach of TF- IDF vector space model-based. Preliminary experiments show that the correct rate of identifying the short name in use can be up to over 70%, which will greatly reduce the labor intensity in manual processing.
出处
《标准科学》
2014年第8期82-86,共5页
Standard Science
关键词
组织机构名称
简称
自动识别
organization name, abbreviation, automatic identification