File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most ...File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.展开更多
In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and ...In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.展开更多
Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a signi...Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.展开更多
In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches d...In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches data before it is needed according to the file access pattern,which can reduce the I/O waiting time and increase the system concurrency.However,prefetching model needs to mine the degree of association between files to ensure the accuracy of prefetching.In the massive small file situation,the sheer volume of files poses a challenge to the efficiency and accuracy of relevance mining.In this paper,we propose a massive files prefetching model based on LSTM neural network with cache transaction strategy to improve file access efficiency.Firstly,we propose a file clustering algorithm based on temporal locality and spatial locality to reduce the computational complexity.Secondly,we propose a definition of cache transaction according to files occurrence in cache instead of time-offset distance based methods to extract file block feature accurately.Lastly,we innovatively propose a file access prediction algorithm based on LSTM neural network which predict the file that have high possibility to be accessed.Experiments show that compared with the traditional LRU and the plain grouping methods,the proposed model notably increase the cache hit rate and effectively reduces the I/O wait time.展开更多
Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from...Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from theft.However,to the best of our knowledge,few studies have looked at how watermarking can resist attacks that involve vertex-reordering.Here,we present a lossless and robust watermarking scheme for STL files to protect against vertexreordering attacks.Specifically,we designed a novel error-correcting code(ECC)that can correct the error of any one-bit in a bitstream by inserting several check digits.In addition,ECC is designed to make use of redundant information according to the characteristics of STL files,which introduces further robustness for defense against attacks.No modifications are made to the geometric information of the three-dimensional model,which respects the requirements of a highprecision model.The experimental results show that the proposed watermarking scheme can survive numerous kinds of attack,including rotation,scaling and translation(RST),facet reordering,and vertex-reordering attacks.展开更多
In order to improve the performance of peer-to-peer files sharing system under mobile distributed en- vironments, a novel always-optimally-coordinated (AOC) criterion and corresponding candidate selection algorithm ...In order to improve the performance of peer-to-peer files sharing system under mobile distributed en- vironments, a novel always-optimally-coordinated (AOC) criterion and corresponding candidate selection algorithm are proposed in this paper. Compared with the traditional min-hops criterion, the new approach introduces a fuzzy knowledge combination theory to investigate several important factors that influence files transfer success rate and efficiency. Whereas the min-hops based protocols only ask the nearest candidate peer for desired files, the selection algorithm based on AOC comprehensively considers users' preferences and network requirements with flexible balancing rules. Furthermore, its advantage also expresses in the independence of specified resource discovering protocols, allowing for scalability. The simulation results show that when using the AOC based peer selection algorithm, system performance is much better than the rain-hops scheme, with files successful transfer rate improved more than 50% and transfer time re- duced at least 20%.展开更多
在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提...在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提升了文件读取时副本选择的效率,进一步提高了系统效率并使负载均衡。展开更多
To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution...To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.展开更多
In order to improve the management strategy for personnel files in colleges and universities,simplify the complex process of file management,and improve file management security and content preservation of the files.T...In order to improve the management strategy for personnel files in colleges and universities,simplify the complex process of file management,and improve file management security and content preservation of the files.This paper elaborates on the application of Artificial Intelligence(AI)technology in university personnel file management through theoretical analysis based on the understanding of Al technology.展开更多
The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environmen...The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environment is mobile synchronization to the cloud, e.g., synchronizing contacts, text messages, imag-es, and videos. Owing to the expected high volume of traffic and high time complexity required for synchronization, an appropriate synchronization algo-rithm needs to be developed. Delta synchronization is one method of synchro-nizing compressed files that requires uploading the whole file, even when no changes were made or if it was only partially changed. In the present study, we proposed an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified). To measure the effi-ciency of our proposed algorithm, we compared it to the Dropbox application algorithm. The results demonstrated that our algorithm outperformed the regular Dropbox synchronization mechanism by reducing the synchronization time, cost, and traffic load between clients and the cloud service provider.展开更多
IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this...IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this paper.The new model provides an effective approach on choosing an optimal peer from several resource discovering results for the best file transfer.Compared with the traditional min-hops scheme that uses hops as the only selecting criterion,the proposed model uses FCM to investigate the complex relationships among various relative factors in wireless environments and gives an overall evaluation score on the candidate.It also has strong scalability for being independent of specified P2P resource discovering protocols.Furthermore,a complete implementation is explained in concrete modules.The simulation results show that the proposed model is effective and feasible compared with min-hops scheme,with the success transfer rate increased by at least 20% and transfer time improved as high as 34%.展开更多
In public health emergencies, the collection of archival information in the area of public health emergencies is very important, which can provide a reference for the early warning and processing mechanism. At the pre...In public health emergencies, the collection of archival information in the area of public health emergencies is very important, which can provide a reference for the early warning and processing mechanism. At the present stage, the unlimited demand of archival users for resources and the convenience of obtaining resources have become the two driving forces to promote the “transformation” and “growth” of archives. Public health emergencies, the transformation of archives collection and service mode, social media has become an indispensable platform for user information exchange, sharing and transmission, which requires archives to change the mode of archives acquisition and storage. Archival users require more interactive, targeted and cutting-edge forms of access and archival services are also developing toward diversified functions and connotations. Archival information resource sharing is also an important link in this development trend. This paper attempts to analyze the collection methods of archives departments in public health emergencies, and then puts forward the corresponding measures for archives departments to play their functions, such as flexibly solving the needs of archives access, strengthening the development of information resources, doing a good job in the collection of relevant archives, and publicizing archives work in combination with hot spots. This paper discusses the completeness of archival data collection, the means of archival management, the scientific classification of archival data, the ways of archival data collection and so on.展开更多
文摘File labeling techniques have a long history in analyzing the anthological trends in computational linguistics.The situation becomes worse in the case of files downloaded into systems from the Internet.Currently,most users either have to change file names manually or leave a meaningless name of the files,which increases the time to search required files and results in redundancy and duplications of user files.Currently,no significant work is done on automated file labeling during the organization of heterogeneous user files.A few attempts have been made in topic modeling.However,one major drawback of current topic modeling approaches is better results.They rely on specific language types and domain similarity of the data.In this research,machine learning approaches have been employed to analyze and extract the information from heterogeneous corpus.A different file labeling technique has also been used to get the meaningful and`cohesive topic of the files.The results show that the proposed methodology can generate relevant and context-sensitive names for heterogeneous data files and provide additional insight into automated file labeling in operating systems.
基金This research was supported by the Universiti Sains Malaysia(USM)and the ministry of Higher Education Malaysia through Fundamental Research Grant Scheme(FRGS-Grant No:FRGS/1/2020/TK0/USM/02/1).
文摘In the Big Data era,numerous sources and environments generate massive amounts of data.This enormous amount of data necessitates specialized advanced tools and procedures that effectively evaluate the information and anticipate decisions for future changes.Hadoop is used to process this kind of data.It is known to handle vast volumes of data more efficiently than tiny amounts,which results in inefficiency in the framework.This study proposes a novel solution to the problem by applying the Enhanced Best Fit Merging algorithm(EBFM)that merges files depending on predefined parameters(type and size).Implementing this algorithm will ensure that the maximum amount of the block size and the generated file size will be in the same range.Its primary goal is to dynamically merge files with the stated criteria based on the file type to guarantee the efficacy and efficiency of the established system.This procedure takes place before the files are available for the Hadoop framework.Additionally,the files generated by the system are named with specific keywords to ensure there is no data loss(file overwrite).The proposed approach guarantees the generation of the fewest possible large files,which reduces the input/output memory burden and corresponds to the Hadoop framework’s effectiveness.The findings show that the proposed technique enhances the framework’s performance by approximately 64%while comparing all other potential performance-impairing variables.The proposed approach is implementable in any environment that uses the Hadoop framework,not limited to smart cities,real-time data analysis,etc.
基金supported by ZTE Industry⁃University⁃Institute Coopera⁃tion Funds under Grant No.HC⁃CN⁃20181128026.
文摘Byte-addressable non-volatile memory(NVM),as a new participant in the storage hierarchy,gives extremely high performance in storage,which forces changes to be made on current filesystem designs.Page cache,once a significant mechanism filling the performance gap between Dynamic Random Access Memory(DRAM)and block devices,is now a liability that heavily hinders the writing performance of NVM filesystems.Therefore state-of-the-art NVM filesystems leverage the direct access(DAX)technology to bypass the page cache entirely.However,the DRAM still provides higher bandwidth than NVM,which prevents skewed read workloads from benefiting from a higher bandwidth of the DRAM and leads to sub-optimal performance for the system.In this paper,we propose RCache,a readintensive workload-aware page cache for NVM filesystems.Different from traditional caching mechanisms where all reads go through DRAM,RCache uses a tiered page cache design,including assigning DRAM and NVM to hot and cold data separately,and reading data from both sides.To avoid copying data to DRAM in a critical path,RCache migrates data from NVM to DRAM in a background thread.Additionally,RCache manages data in DRAM in a lock-free manner for better latency and scalability.Evaluations on Intel Optane Data Center(DC)Persistent Memory Modules show that,compared with NOVA,RCache achieves 3 times higher bandwidth for read-intensive workloads and introduces little performance loss for write operations.
基金This work is supported by‘The Fundamental Research Funds for the Central Universities(Grant No.HIT.NSRIF.201714)’‘Weihai Science and Technology Development Program(2016DXGJMS15)’‘Key Research and Development Program in Shandong Provincial(2017GGX90103)’.
文摘In distributed storage systems,file access efficiency has an important impact on the real-time nature of information forensics.As a popular approach to improve file accessing efficiency,prefetching model can fetches data before it is needed according to the file access pattern,which can reduce the I/O waiting time and increase the system concurrency.However,prefetching model needs to mine the degree of association between files to ensure the accuracy of prefetching.In the massive small file situation,the sheer volume of files poses a challenge to the efficiency and accuracy of relevance mining.In this paper,we propose a massive files prefetching model based on LSTM neural network with cache transaction strategy to improve file access efficiency.Firstly,we propose a file clustering algorithm based on temporal locality and spatial locality to reduce the computational complexity.Secondly,we propose a definition of cache transaction according to files occurrence in cache instead of time-offset distance based methods to extract file block feature accurately.Lastly,we innovatively propose a file access prediction algorithm based on LSTM neural network which predict the file that have high possibility to be accessed.Experiments show that compared with the traditional LRU and the plain grouping methods,the proposed model notably increase the cache hit rate and effectively reduces the I/O wait time.
基金This work was supported in part by the National Science Foundation of China(No.61772539,6187212,61972405),STITSX(No.201705D131025),1331KITSX,and CiCi3D.
文摘Stereolithographic(STL)files have been extensively used in rapid prototyping industries as well as many other fields as watermarking algorithms to secure intellectual property and protect three-dimensional models from theft.However,to the best of our knowledge,few studies have looked at how watermarking can resist attacks that involve vertex-reordering.Here,we present a lossless and robust watermarking scheme for STL files to protect against vertexreordering attacks.Specifically,we designed a novel error-correcting code(ECC)that can correct the error of any one-bit in a bitstream by inserting several check digits.In addition,ECC is designed to make use of redundant information according to the characteristics of STL files,which introduces further robustness for defense against attacks.No modifications are made to the geometric information of the three-dimensional model,which respects the requirements of a highprecision model.The experimental results show that the proposed watermarking scheme can survive numerous kinds of attack,including rotation,scaling and translation(RST),facet reordering,and vertex-reordering attacks.
基金supported by the National Nature Science Foundation of China(No.60672124)the National High Technology Research and Development Programme the of China(No.2007AA01Z221)
文摘In order to improve the performance of peer-to-peer files sharing system under mobile distributed en- vironments, a novel always-optimally-coordinated (AOC) criterion and corresponding candidate selection algorithm are proposed in this paper. Compared with the traditional min-hops criterion, the new approach introduces a fuzzy knowledge combination theory to investigate several important factors that influence files transfer success rate and efficiency. Whereas the min-hops based protocols only ask the nearest candidate peer for desired files, the selection algorithm based on AOC comprehensively considers users' preferences and network requirements with flexible balancing rules. Furthermore, its advantage also expresses in the independence of specified resource discovering protocols, allowing for scalability. The simulation results show that when using the AOC based peer selection algorithm, system performance is much better than the rain-hops scheme, with files successful transfer rate improved more than 50% and transfer time re- duced at least 20%.
文摘在对HDFS进行分析和研究的基础上,在HDFS文件分布式系统中应用File System API进行文件存储和访问,并通过改进的蚁群算法对副本选择进行优化。HDFS API能够有效完成海量数据的存储和管理,提高海量数据存储的效率。通过改进的蚁群算法提升了文件读取时副本选择的效率,进一步提高了系统效率并使负载均衡。
文摘To better understand different users' accessing intentions, a novel clustering and supervising method based on accessing path is presented. This method divides users' interest space to express the distribution of users' interests, and directly to instruct the constructing process of web pages indexing for advanced performance.
文摘In order to improve the management strategy for personnel files in colleges and universities,simplify the complex process of file management,and improve file management security and content preservation of the files.This paper elaborates on the application of Artificial Intelligence(AI)technology in university personnel file management through theoretical analysis based on the understanding of Al technology.
文摘The fast growing market of mobile device adoption and cloud computing has led to exploitation of mobile devices utilizing cloud services. One major chal-lenge facing the usage of mobile devices in the cloud environment is mobile synchronization to the cloud, e.g., synchronizing contacts, text messages, imag-es, and videos. Owing to the expected high volume of traffic and high time complexity required for synchronization, an appropriate synchronization algo-rithm needs to be developed. Delta synchronization is one method of synchro-nizing compressed files that requires uploading the whole file, even when no changes were made or if it was only partially changed. In the present study, we proposed an algorithm, based on Delta synchronization, to solve the problem of synchronizing compressed files under various forms of modification (e.g., not modified, partially modified, or completely modified). To measure the effi-ciency of our proposed algorithm, we compared it to the Dropbox application algorithm. The results demonstrated that our algorithm outperformed the regular Dropbox synchronization mechanism by reducing the synchronization time, cost, and traffic load between clients and the cloud service provider.
基金Sponsored by the National Natural Science Foundation of China(Grant No.60672124 and 60832009)Hi-Tech Research and Development Program(National 863 Program)(Grant No.2007AA01Z221)
文摘IIn order to improve the performance of wireless distributed peer-to-peer(P2P)files sharing systems,a general system architecture and a novel peer selecting model based on fuzzy cognitive maps(FCM)are proposed in this paper.The new model provides an effective approach on choosing an optimal peer from several resource discovering results for the best file transfer.Compared with the traditional min-hops scheme that uses hops as the only selecting criterion,the proposed model uses FCM to investigate the complex relationships among various relative factors in wireless environments and gives an overall evaluation score on the candidate.It also has strong scalability for being independent of specified P2P resource discovering protocols.Furthermore,a complete implementation is explained in concrete modules.The simulation results show that the proposed model is effective and feasible compared with min-hops scheme,with the success transfer rate increased by at least 20% and transfer time improved as high as 34%.
文摘In public health emergencies, the collection of archival information in the area of public health emergencies is very important, which can provide a reference for the early warning and processing mechanism. At the present stage, the unlimited demand of archival users for resources and the convenience of obtaining resources have become the two driving forces to promote the “transformation” and “growth” of archives. Public health emergencies, the transformation of archives collection and service mode, social media has become an indispensable platform for user information exchange, sharing and transmission, which requires archives to change the mode of archives acquisition and storage. Archival users require more interactive, targeted and cutting-edge forms of access and archival services are also developing toward diversified functions and connotations. Archival information resource sharing is also an important link in this development trend. This paper attempts to analyze the collection methods of archives departments in public health emergencies, and then puts forward the corresponding measures for archives departments to play their functions, such as flexibly solving the needs of archives access, strengthening the development of information resources, doing a good job in the collection of relevant archives, and publicizing archives work in combination with hot spots. This paper discusses the completeness of archival data collection, the means of archival management, the scientific classification of archival data, the ways of archival data collection and so on.