With the full development of disk-resident databases(DRDB)in recent years,it is widely used in business and transactional applications.In long-term use,some problems of disk databases are gradually exposed.For applica...With the full development of disk-resident databases(DRDB)in recent years,it is widely used in business and transactional applications.In long-term use,some problems of disk databases are gradually exposed.For applications with high real-time requirements,the performance of using disk database is not satisfactory.In the context of the booming development of the Internet of things,domestic real-time databases have also gradually developed.Still,most of them only support the storage,processing,and analysis of data values with fewer data types,which can not fully meet the current industrial process control system data types,complex sources,fast update speed,and other needs.Facing the business needs of efficient data collection and storage of the Internet of things,this paper optimizes the transaction processing efficiency and data storage performance of the memory database,constructs a lightweight real-time memory database transaction processing and data storage model,realizes a lightweight real-time memory database transaction processing and data storage model,and improves the reliability and efficiency of the database.Through simulation,we proved that the cache hit rate of the cache replacement algorithm proposed in this paper is higher than the traditional LRU(Least Recently Used)algorithm.Using the cache replacement algorithm proposed in this paper can improve the performance of the system cache.展开更多
Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations....Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.With memory constraints,machine learning(ML)developers cannot train large-scale ML models in their rather small local clusters.Moreover,renting large-scale cloud servers is always economically infeasible for research teams and small companies.In this paper,we propose a disk-resident parameter server system named DRPS,which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk.To further improve the performance of DRPS,we build an efficient index structure for parameters to reduce the disk I/O cost.Based on this index structure,we propose a novel multi-objective partitioning algorithm for the parameters.Finally,a flexible workerselection parallel model of computation(WSP)is proposed to strike a right balance between the problem of inconsistent parameter versions(staleness)and that of inconsistent execution progresses(straggler).Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.展开更多
基金supported by the National Key R&D Program of China“Key technologies for coordination and interoperation of power distribution service resource”[2021YFB1302400]“Research on Digitization and Intelligent Application of Low-Voltage Power Distribution Equipment”[SGSDDK00PDJS2000375].
文摘With the full development of disk-resident databases(DRDB)in recent years,it is widely used in business and transactional applications.In long-term use,some problems of disk databases are gradually exposed.For applications with high real-time requirements,the performance of using disk database is not satisfactory.In the context of the booming development of the Internet of things,domestic real-time databases have also gradually developed.Still,most of them only support the storage,processing,and analysis of data values with fewer data types,which can not fully meet the current industrial process control system data types,complex sources,fast update speed,and other needs.Facing the business needs of efficient data collection and storage of the Internet of things,this paper optimizes the transaction processing efficiency and data storage performance of the memory database,constructs a lightweight real-time memory database transaction processing and data storage model,realizes a lightweight real-time memory database transaction processing and data storage model,and improves the reliability and efficiency of the database.Through simulation,we proved that the cache hit rate of the cache replacement algorithm proposed in this paper is higher than the traditional LRU(Least Recently Used)algorithm.Using the cache replacement algorithm proposed in this paper can improve the performance of the system cache.
基金supported by the National Key R&D Program of China(2018YFB1003404)the National Natural Seience Foundation of China(Grant Nos.62072083,U1811261,61902366)+2 种基金Basal Research Fund(N180716010)Liao Ning Revitalization Talents Program(XLYC1807158)the China Postdoctoral Science Foundation(2020T130623).
文摘Parameter server(PS)as the state-of-the-art distributed framework for large-scale iterative machine learning tasks has been extensively studied.However,existing PS-based systems often depend on memory implementations.With memory constraints,machine learning(ML)developers cannot train large-scale ML models in their rather small local clusters.Moreover,renting large-scale cloud servers is always economically infeasible for research teams and small companies.In this paper,we propose a disk-resident parameter server system named DRPS,which reduces the hardware requirement of large-scale machine learning tasks by storing high dimensional models on disk.To further improve the performance of DRPS,we build an efficient index structure for parameters to reduce the disk I/O cost.Based on this index structure,we propose a novel multi-objective partitioning algorithm for the parameters.Finally,a flexible workerselection parallel model of computation(WSP)is proposed to strike a right balance between the problem of inconsistent parameter versions(staleness)and that of inconsistent execution progresses(straggler).Extensive experiments on many typical machine learning applications with real and synthetic datasets validate the effectiveness of DRPS.