With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big d...With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big data, and data has become a core asset. Big data is challenging in terms of effective storage, efficient computation and analysis, and deep data mining. In this paper, we discuss the signif- icance of big data and discuss key technologies and problems in big-data analyties. We also discuss the future prospects of big-data analylics.展开更多
Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithm...Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.展开更多
High performance with low power consumption is an essential factor in wireless sensor networks (WSN). In order to address the issue on the lifetime and the consumption of nodes in WSNs, an improved ad hoc on-demand ...High performance with low power consumption is an essential factor in wireless sensor networks (WSN). In order to address the issue on the lifetime and the consumption of nodes in WSNs, an improved ad hoc on-demand distance vector routing (IAODV) algorithm is proposed based on AODV and LAR protocols. This algorithm is a modified on-demand routing algorithm that limits data forwarding in the searching domain, and then chooses the route on basis of hop count and power consumption. The simulation results show that the algorithm can effectively reduce power consumption as well as prolong the network lifetime.展开更多
With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information....With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this in-formation using natural language processing and data-mining techniques.展开更多
User-analysis techniques are mainly used to recommend friends and information. This paper discusses the data characteristics of microblog users and describes a multidimensional user rec- ommendation algorithm that tak...User-analysis techniques are mainly used to recommend friends and information. This paper discusses the data characteristics of microblog users and describes a multidimensional user rec- ommendation algorithm that takes into account microblog length, relativity between microblog and users, and familiarity between users. The experimental results show that this multidi- mensional algorithm is more accurate than a traditional recom- mendation algorithm.展开更多
Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed a...Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical in dexing mechanism and a prototype distributed datastorage system, called HMIBase, which has hierarchical indexes for nonprima ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessortoprocess update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo rithm is better than other cachereplacement strategies.展开更多
Integration of the cloud desktop and cloud storage platform is urgent for enterprises. However, current proposals for cloud disk are not satisfactory in terms of the decoupling of virtual computing and business data s...Integration of the cloud desktop and cloud storage platform is urgent for enterprises. However, current proposals for cloud disk are not satisfactory in terms of the decoupling of virtual computing and business data storage in the cloud desktop environment. In this paper, we present a new virtual disk mapping method for cloud desktop storage. In Windows, compared with virtual hard disk method of popular cloud disks, the proposed implementation of client based on the virtual disk driver and the file system filter driver is available for widespread desktop environments, especially for the cloud desktop with limited storage resources. Further more, our method supports customizable local cache storage, resulting in userfriendly experience for thinclients of the cloud desktop. The evaluation results show that our virtual disk mapping method performs well in the readwrite throughput of different scale files.展开更多
文摘With the rapid development of the internet, internet of things, mobile internet, and cloud computing, the amount of data in circulation has grown rapidly. More social information has contributed to the growth of big data, and data has become a core asset. Big data is challenging in terms of effective storage, efficient computation and analysis, and deep data mining. In this paper, we discuss the signif- icance of big data and discuss key technologies and problems in big-data analyties. We also discuss the future prospects of big-data analylics.
基金supported by the National Natural Science Foundation of China (No. 61175052,60975039, 61203297, 60933004, 61035003)National High-tech R&D Program of China (863 Program) (No.2012AA011003)supported by the ZTE research found of Parallel Web Mining project
文摘Traditional machine-learning algorithms are struggling to handle the exceedingly large amount of data being generated by the internet. In real-world applications, there is an urgent need for machine-learning algorithms to be able to handle large-scale, high-dimensional text data. Cloud computing involves the delivery of computing and storage as a service to a heterogeneous community of recipients, Recently, it has aroused much interest in industry and academia. Most previous works on cloud platforms only focus on the parallel algorithms for structured data. In this paper, we focus on the parallel implementation of web-mining algorithms and develop a parallel web-mining system that includes parallel web crawler; parallel text extract, transform and load (ETL) and modeling; and parallel text mining and application subsystems. The complete system enables variable real-world web-mining applications for mass data.
基金supported by the National Natural Science Foundation of China under Grant Nos.61373135,60973140,and 61170276Key University Science Research Project of Jiangsu Province under Grant No.12KJA520003+1 种基金Project for Production Study&Research of Jiangsu Province under Grant No.BY2013011The Science and Technology Enterprises Innovation Fund Project of Jiangsu Province under Grant No.BC2013027
文摘High performance with low power consumption is an essential factor in wireless sensor networks (WSN). In order to address the issue on the lifetime and the consumption of nodes in WSNs, an improved ad hoc on-demand distance vector routing (IAODV) algorithm is proposed based on AODV and LAR protocols. This algorithm is a modified on-demand routing algorithm that limits data forwarding in the searching domain, and then chooses the route on basis of hop count and power consumption. The simulation results show that the algorithm can effectively reduce power consumption as well as prolong the network lifetime.
文摘With user-generated content, anyone can De a content creator. This phenomenon has infinitely increased the amount of information circulated online, and it is beeoming harder to efficiently obtain required information. In this paper, we describe how natural language processing and text mining can be parallelized using Hadoop and Message Passing Interface. We propose a parallel web text mining platform that processes massive amounts data quickly and efficiently. Our web knowledge service platform is designed to collect information about the IT and telecommunications industries from the web and process this in-formation using natural language processing and data-mining techniques.
文摘User-analysis techniques are mainly used to recommend friends and information. This paper discusses the data characteristics of microblog users and describes a multidimensional user rec- ommendation algorithm that takes into account microblog length, relativity between microblog and users, and familiarity between users. The experimental results show that this multidi- mensional algorithm is more accurate than a traditional recom- mendation algorithm.
基金supported by China National Science Foundation(Grant 61223003)ZTE Industry-Academia-Research Cooperation Funds
文摘Relational database management systems are usually deployed on singlenode machines and have strict limitations in terms of da ta structure. This means they do not work well with big data, and NoSQL has been proposed as a solution. To make data querying more efficient, indexes and memory cache techniques are used in NoSQL databases. In this paper, we propose a hierarchical in dexing mechanism and a prototype distributed datastorage system, called HMIBase, which has hierarchical indexes for nonprima ry keys in tables and makes data querying more efficient. HMIBase uses HBase as the lower data storage and creates a memory cache for more efficient data transmission. HMIBase supports coprocessortoprocess update requests. It also provides a client with query and update APIs and a server to support RPCs from the client and finish jobs. To improve the cache hit ratio, we propose a memory cache replacement strategy, called Hot Score algorithm, in HMIBase. The experimental results show that Hot Score algo rithm is better than other cachereplacement strategies.
基金key technologies of the integration of cloud desktop and cloud storage Platform is supported by ZTE Industry-Academia-Research Cooperation Funds
文摘Integration of the cloud desktop and cloud storage platform is urgent for enterprises. However, current proposals for cloud disk are not satisfactory in terms of the decoupling of virtual computing and business data storage in the cloud desktop environment. In this paper, we present a new virtual disk mapping method for cloud desktop storage. In Windows, compared with virtual hard disk method of popular cloud disks, the proposed implementation of client based on the virtual disk driver and the file system filter driver is available for widespread desktop environments, especially for the cloud desktop with limited storage resources. Further more, our method supports customizable local cache storage, resulting in userfriendly experience for thinclients of the cloud desktop. The evaluation results show that our virtual disk mapping method performs well in the readwrite throughput of different scale files.