We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression...We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression ratio for data with significant alphabet variation along with a faster string sorting approach based on the application of the O(n)complexity counting sort with permutation reindexing.展开更多
基金国家自然科学基金优秀青年科学基金项目(61822202)国家自然科学基金项目(61872152,61872409)+2 种基金广东省自然科学基金杰出青年基金项目(2014A030306021)广东省特支计划科技创新青年拔尖人才项目(2015TQ01X796)广东省基础与应用基础研究重大项目(2019B030302008)This work was~~
文摘We describe practical improvements for parallel BWT-based lossless compressors frequently utilized in modern day big data applications.We propose a clustering-based data permutation approach for improving compression ratio for data with significant alphabet variation along with a faster string sorting approach based on the application of the O(n)complexity counting sort with permutation reindexing.