摘要
拆分大数据文档是日常生活中所需的,随着大数据文档的增加,选择拆分行数是一个值得研究的问题。运用PyCharm Community和Python来拆分大文档,对比分析在不同行数的条件下,大文档拆分成小文档所用的数量以及时间。报告了拆分数量适中是最佳的,文档数据量越大,消耗的时间越不稳定。通过对同一文档拆分最短时间的拆分行数进行研究,得到拆分时间规律,选择最佳的拆分行数,以此提高拆分文档效率。
Splitting big data documents is necessary in daily life.With the increase of big data documents,choosing the number of splitting rows is a problem worthy of study.This paper uses the PyCharm Community and Python to split large documents,compares and analyzes the quantity and time used to split large documents into small documents under the condition of different numbers of rows.It is reported that a moderate number of splitting rows is the best,and the larger the amount of document data,the more unstable the time consumed.By studying the number of splitting rows in the shortest time of splitting the same document,the rule of splitting time is obtained,and the best number of splitting rows is selected to improve the efficiency of splitting the document.
作者
丁思蓉
何静茹
李真
DING Sirong;HE Jingru;LI Zhen(Chengdu Jincheng College,Chengdu 611731,China)
出处
《现代信息科技》
2022年第6期107-109,共3页
Modern Information Technology
关键词
拆分大数据文档
对比分析
拆分行数
splitting big data document
comparative analysis
the number of splitting rows