In the world of big data,it’s quite a task to organize different files based on their similarities.Dealing with heterogeneous data and keeping a record of every single file stored in any folder is one of the biggest ...In the world of big data,it’s quite a task to organize different files based on their similarities.Dealing with heterogeneous data and keeping a record of every single file stored in any folder is one of the biggest problems encountered by almost every computer user.Much of file management related tasks will be solved if the files on any operating system are somehow categorized according to their similarities.Then,the browsing process can be performed quickly and easily.This research aims to design a system to automatically organize files based on their similarities in terms of content.The proposed methodology is based on a novel strategy that employs the charactaristics of both supervised and unsupervised machine learning approaches for learning categories of digital files stored on any computer system.The results demonstrate that the proposed architecture can effectively and efficiently address the file organization challenges using real-world user files.The results suggest that the proposed system has great potential to automatically categorize almost all of the user files based on their content.The proposed system is completely automated and does not require any human effort in managing the files and the task of file organization become more efficient as the number of files grows.展开更多
File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most ...File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.展开更多
Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50%of the total cost of ownership in a typical dat...Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50%of the total cost of ownership in a typical data center, with about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption. Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces and we make detailed comparisons between our method and competitor methods according to different metrics. The results show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms. In general, our method exhibits the best energy e?ciency in all experiments, and it is capable of maintaining an improved trade-off between performance and energy consumption.展开更多
文摘In the world of big data,it’s quite a task to organize different files based on their similarities.Dealing with heterogeneous data and keeping a record of every single file stored in any folder is one of the biggest problems encountered by almost every computer user.Much of file management related tasks will be solved if the files on any operating system are somehow categorized according to their similarities.Then,the browsing process can be performed quickly and easily.This research aims to design a system to automatically organize files based on their similarities in terms of content.The proposed methodology is based on a novel strategy that employs the charactaristics of both supervised and unsupervised machine learning approaches for learning categories of digital files stored on any computer system.The results demonstrate that the proposed architecture can effectively and efficiently address the file organization challenges using real-world user files.The results suggest that the proposed system has great potential to automatically categorize almost all of the user files based on their content.The proposed system is completely automated and does not require any human effort in managing the files and the task of file organization become more efficient as the number of files grows.
文摘File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.
基金The work was partially supported by the National Natural Science Foundation of China under Grant Nos, 61379037 and 61472376, and the Oversea Academic Training Funds (OATF) sponsored by the University of Science and Technology of China. Acknowledgements We would like to thank the anonymous reviewers and editors for their valuable sug- gestions and comments to improve the quality of the paper.
文摘Energy consumption has been a critical issue for data storage systems, especially for modern data centers. A recent survey has showed that power costs amount to about 50%of the total cost of ownership in a typical data center, with about 27% of the system power being consumed by storage systems. This paper aims at providing an effective solution to reducing the energy consumed by disk storage systems, by proposing a new approach to reduce the energy consumption. Differing from previous approaches, we adopt two new designs. 1) We introduce a hotness-aware and group-based system model (HAG) to organize the disks, in which all disks are partitioned into a hot group and a cold group. We only make file migration between the two groups and avoid the migration within a single group, so that we are able to reduce the total cost of file migration. 2) We use an on-demand approach to reorganize files among the disks that is based on workload change as well as the change of data hotness. We conduct trace-driven experiments involving two real and nine synthetic traces and we make detailed comparisons between our method and competitor methods according to different metrics. The results show that our method can dynamically select hot files and disks when the workload changes and that it is able to reduce energy consumption for all the traces. Furthermore, its time performance is comparable to that of the compared algorithms. In general, our method exhibits the best energy e?ciency in all experiments, and it is capable of maintaining an improved trade-off between performance and energy consumption.