摘要
用著名的数据挖掘工具Clementine处理数据有些大材小用,但它的确比Excel更易用、更高效,处理数据时不需要翻看复杂的编程手册、在Excel表中拉滚动条、选择各种函数等。以国家科技文献中心(NSTL)签到数据上传处理为研究实例,涉及数据查重、规范、筛选、映射、比对、频次统计等各种常见任务,介绍了如何根据不同处理需求定制相应Clementine数据流和Clementine工具在海量数据处理中的优势。
It is to put a large material to a small use when Clementine, a well-known data mining tool is used to process da. However, it is easier to use with a higher efficacy in processing data than Excel because it does not need to read the complex programming manual, to pull the scroll bar in Excel, and to select the different functions. How to build the corresponding data flow according to the requirements of different data processing and bring Clementine into full play was described by taking the uploading of registered attendance data in National Science and Technology Literature Center as an example, including duplicate data check, data standardization, data screening, data mapping, comparison and frequency.
出处
《中华医学图书情报杂志》
CAS
2011年第4期59-62,共4页
Chinese Journal of Medical Library and Information Science
基金
中国医学科学院医学信息研究所基本科研业务费支持项目:基于Web挖掘的读者行为分析(编号R0830)