摘要
电信行业由于在话单采集过程中的某些异常可能会产生重复话单,如果不及时剔除,将导致用户的费用统计有误,引起客户投诉,造成客源流失。本文通过对电信行业海量数据的分析提出了一个利用文件索引、Swap技术、事务控制和并发处理相结合的方法解决话单判重和去重的问题,为设计去重算法提供了一个可以借鉴的方案。
The telecom industry in the process of collecting some unusual words alone may have to repeat words alone, if not promptly removed, the cost to consumers will lead to statistical error caused customer complaints, resulting in the loss of customers. Based on the analysis of the telecommunications industry Massive Data presented a document indexing, Equity Technology, Service control and treatment with a combination of methods to solve the case and the problem of single-heavy, for the design of the algorithm provides a reference to the program.
作者
夏明伟
施荣华
XIA Ming-wei, SHI Rong-hua (Information Science and Engineering College, Central South University in Changsha, Changsha 410075, China)
出处
《电脑知识与技术》
2007年第4期251-252,共2页
Computer Knowledge and Technology
关键词
去重
文件索引
Swap技术
并发处理
deduplication
file index
Swap technology
concurrent processing