The Cloud is increasingly being used to store and process big data for its tenants and classical security mechanisms using encryption are neither sufficiently efficient nor suited to the task of protecting big data in...The Cloud is increasingly being used to store and process big data for its tenants and classical security mechanisms using encryption are neither sufficiently efficient nor suited to the task of protecting big data in the Cloud.In this paper,we present an alternative approach which divides big data into sequenced parts and stores them among multiple Cloud storage service providers.Instead of protecting the big data itself,the proposed scheme protects the mapping of the various data elements to each provider using a trapdoor function.Analysis,comparison and simulation prove that the proposed scheme is efficient and secure for the big data of Cloud tenants.展开更多
With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for ...With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for cloud service providers,they need some information of the data to provide high QoS services;and for authorized users,they need to access to the true value of data.The existing privacy-preserving methods can't meet all the needs of the three parties at the same time.To address this issue,we propose a retrievable data perturbation method and use it in the privacy-preserving in data outsourcing in cloud computing.Our scheme comes in four steps.Firstly,an improved random generator is proposed to generate an accurate "noise".Next,a perturbation algorithm is introduced to add noise to the original data.By doing this,the privacy information is hidden,but the mean and covariance of data which the service providers may need remain unchanged.Then,a retrieval algorithm is proposed to get the original data back from the perturbed data.Finally,we combine the retrievable perturbation with the access control process to ensure only the authorized users can retrieve the original data.The experiments show that our scheme perturbs date correctly,efficiently,and securely.展开更多
Digital evidences can be obtained from computers and various kinds of digital devices, such as telephones, mp3/mp4 players, printers, cameras, etc. Telephone Call Detail Records (CDRs) are one important source of di...Digital evidences can be obtained from computers and various kinds of digital devices, such as telephones, mp3/mp4 players, printers, cameras, etc. Telephone Call Detail Records (CDRs) are one important source of digital evidences that can identify suspects and their partners. Law enforcement authorities may intercept and record specific conversations with a court order and CDRs can be obtained from telephone service providers. However, the CDRs of a suspect for a period of time are often fairly large in volume. To obtain useful information and make appropriate decisions automatically from such large amount of CDRs become more and more difficult. Current analysis tools are designed to present only numerical results rather than help us make useful decisions. In this paper, an algorithm based on Fuzzy Decision Tree (FDT) for analyzing CDRs is proposed. We conducted experimental evaluation to verify the proposed algorithm and the result is very promising.展开更多
Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system...Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.展开更多
Understanding the dynamic traffic and usage characteristics of data services in cellular networks is important for optimising network resources and improving user experience.Recent studies have illustrated traffic cha...Understanding the dynamic traffic and usage characteristics of data services in cellular networks is important for optimising network resources and improving user experience.Recent studies have illustrated traffic characteristics from specific perspectives,such as user behaviour,device type,and applications.In this paper,we present the results of our study from a different perspective,namely service providers,to reveal the traffic characteristics of cellular data networks.Our study is based on traffic data collected over a five-day period from a leading mobile operator's core network in China.We propose a Zipf-like model to characterise the distributions of the traffic volume,subscribers,and requests among service providers.Nine distinct diurnal traffic patterns of service providers are identified by formulating and solving a time series clustering problem.Our work differs from previous related works in that we perform measurements on a large quantity of data covering 2.2 billion traffic records,and we first explore the traffic patterns of thousands of service providers.Results of our study present mobile Internet participants with a better understanding of the traffic and usage characteristics of service providers,which play a critical role in the mobile Internet era.展开更多
Discovery of service nodes in flows is a challenging task, especially in large ISPs or campus networks where the amount of traffic across net-work is rmssive. We propose an effective data structure called Round-robin ...Discovery of service nodes in flows is a challenging task, especially in large ISPs or campus networks where the amount of traffic across net-work is rmssive. We propose an effective data structure called Round-robin Buddy Bloom Filters (RBBF) to detect duplicate elements in flows. A two-stage approximate algorithm based on RBBF which can be used for detecting service nodes from NetFlow data is also given and the perfonmnce of the algorithm is analyzed. In our case, the proposed algorithm uses about 1% memory of hash table with false positive error rate less than 5%. A proto-type system, which is compatible with both IPv4 and IPv6, using the proposed data structure and al-gorithm is introduced. Some real world case studies based on the prototype system are discussed.展开更多
User generated content,e.g.,from YouTube,the most popular online video sharing site,is one of the major sources of today's big data and it is crucial to understand their inherent characteristics.Recently,YouTube h...User generated content,e.g.,from YouTube,the most popular online video sharing site,is one of the major sources of today's big data and it is crucial to understand their inherent characteristics.Recently,YouTube has started working with content providers(known as YouTube partners) to promote the users' watching and sharing activities.The substantial benefit is to further augment its service and monetize more videos,which is crucial to both YouTube and its partners,as well as to other providers of relevant services.In this paper,our main contribution is to analyze the massive amounts of video data from a YouTube partner's view.We make effective use of Insight,a new analytics service of YouTube that offers simple data analysis for partners.To provide the practical guidance from the raw Insight data,we enable more complex investigations for the inherent features that affect the popularity of the videos.Our findings facilitate YouTube partners to re-design current video publishing strategies,having more opportunities to attract more views.展开更多
基金supported in part by the National Nature Science Foundation of China under Grant No.61402413 and 61340058 the "Six Kinds Peak Talents Plan" project of Jiangsu Province under Grant No.ll-JY-009+2 种基金the Nature Science Foundation of Zhejiang Province under Grant No.LY14F020019, Z14F020006 and Y1101183the China Postdoctoral Science Foundation funded project under Grant No.2012M511732Jiangsu Province Postdoctoral Science Foundation funded project Grant No.1102014C
文摘The Cloud is increasingly being used to store and process big data for its tenants and classical security mechanisms using encryption are neither sufficiently efficient nor suited to the task of protecting big data in the Cloud.In this paper,we present an alternative approach which divides big data into sequenced parts and stores them among multiple Cloud storage service providers.Instead of protecting the big data itself,the proposed scheme protects the mapping of the various data elements to each provider using a trapdoor function.Analysis,comparison and simulation prove that the proposed scheme is efficient and secure for the big data of Cloud tenants.
基金supported in part by NSFC under Grant No.61172090National Science and Technology Major Project under Grant 2012ZX03002001+3 种基金Research Fund for the Doctoral Program of Higher Education of China under Grant No.20120201110013Scientific and Technological Project in Shaanxi Province under Grant(No.2012K06-30, No.2014JQ8322)Basic Science Research Fund in Xi'an Jiaotong University(No. XJJ2014049,No.XKJC2014008)Shaanxi Science and Technology Innovation Project (2013SZS16-Z01/P01/K01)
文摘With the increasing popularity of cloud computing,privacy has become one of the key problem in cloud security.When data is outsourced to the cloud,for data owners,they need to ensure the security of their privacy;for cloud service providers,they need some information of the data to provide high QoS services;and for authorized users,they need to access to the true value of data.The existing privacy-preserving methods can't meet all the needs of the three parties at the same time.To address this issue,we propose a retrievable data perturbation method and use it in the privacy-preserving in data outsourcing in cloud computing.Our scheme comes in four steps.Firstly,an improved random generator is proposed to generate an accurate "noise".Next,a perturbation algorithm is introduced to add noise to the original data.By doing this,the privacy information is hidden,but the mean and covariance of data which the service providers may need remain unchanged.Then,a retrieval algorithm is proposed to get the original data back from the perturbed data.Finally,we combine the retrievable perturbation with the access control process to ensure only the authorized users can retrieve the original data.The experiments show that our scheme perturbs date correctly,efficiently,and securely.
文摘Digital evidences can be obtained from computers and various kinds of digital devices, such as telephones, mp3/mp4 players, printers, cameras, etc. Telephone Call Detail Records (CDRs) are one important source of digital evidences that can identify suspects and their partners. Law enforcement authorities may intercept and record specific conversations with a court order and CDRs can be obtained from telephone service providers. However, the CDRs of a suspect for a period of time are often fairly large in volume. To obtain useful information and make appropriate decisions automatically from such large amount of CDRs become more and more difficult. Current analysis tools are designed to present only numerical results rather than help us make useful decisions. In this paper, an algorithm based on Fuzzy Decision Tree (FDT) for analyzing CDRs is proposed. We conducted experimental evaluation to verify the proposed algorithm and the result is very promising.
基金partially supported by grants from the China 863 High-tech Program (Grant No. 2015AA016002)the Specialized Research Fund for the Doctoral Program of Higher Education (Grant No. 20131103120001)+2 种基金the National Key Research and Development Program of China (Grant No. 2016YFB0800204)the National Science Foundation of China (No. 61502017)the Scientific Research Common Program of Beijing Municipal Commission of Education (KM201710005024)
文摘Cloud computing is very useful for big data owner who doesn't want to manage IT infrastructure and big data technique details. However, it is hard for big data owner to trust multi-layer outsourced big data system in cloud environment and to verify which outsourced service leads to the problem. Similarly, the cloud service provider cannot simply trust the data computation applications. At last,the verification data itself may also leak the sensitive information from the cloud service provider and data owner. We propose a new three-level definition of the verification, threat model, corresponding trusted policies based on different roles for outsourced big data system in cloud. We also provide two policy enforcement methods for building trusted data computation environment by measuring both the Map Reduce application and its behaviors based on trusted computing and aspect-oriented programming. To prevent sensitive information leakage from verification process,we provide a privacy-preserved verification method. Finally, we implement the TPTVer, a Trusted third Party based Trusted Verifier as a proof of concept system. Our evaluation and analysis show that TPTVer can provide trusted verification for multi-layered outsourced big data system in the cloud with low overhead.
基金supported by the National Natural Science Foundation of China under Grant No.61072061the 111 Project of China under Grant No.B08004the Fundamental Research Funds for the Central Universities under Grant No.2013RC0114
文摘Understanding the dynamic traffic and usage characteristics of data services in cellular networks is important for optimising network resources and improving user experience.Recent studies have illustrated traffic characteristics from specific perspectives,such as user behaviour,device type,and applications.In this paper,we present the results of our study from a different perspective,namely service providers,to reveal the traffic characteristics of cellular data networks.Our study is based on traffic data collected over a five-day period from a leading mobile operator's core network in China.We propose a Zipf-like model to characterise the distributions of the traffic volume,subscribers,and requests among service providers.Nine distinct diurnal traffic patterns of service providers are identified by formulating and solving a time series clustering problem.Our work differs from previous related works in that we perform measurements on a large quantity of data covering 2.2 billion traffic records,and we first explore the traffic patterns of thousands of service providers.Results of our study present mobile Internet participants with a better understanding of the traffic and usage characteristics of service providers,which play a critical role in the mobile Internet era.
基金supported by the National Basic Research Program of China under Grant No. 2009CB320505
文摘Discovery of service nodes in flows is a challenging task, especially in large ISPs or campus networks where the amount of traffic across net-work is rmssive. We propose an effective data structure called Round-robin Buddy Bloom Filters (RBBF) to detect duplicate elements in flows. A two-stage approximate algorithm based on RBBF which can be used for detecting service nodes from NetFlow data is also given and the perfonmnce of the algorithm is analyzed. In our case, the proposed algorithm uses about 1% memory of hash table with false positive error rate less than 5%. A proto-type system, which is compatible with both IPv4 and IPv6, using the proposed data structure and al-gorithm is introduced. Some real world case studies based on the prototype system are discussed.
文摘User generated content,e.g.,from YouTube,the most popular online video sharing site,is one of the major sources of today's big data and it is crucial to understand their inherent characteristics.Recently,YouTube has started working with content providers(known as YouTube partners) to promote the users' watching and sharing activities.The substantial benefit is to further augment its service and monetize more videos,which is crucial to both YouTube and its partners,as well as to other providers of relevant services.In this paper,our main contribution is to analyze the massive amounts of video data from a YouTube partner's view.We make effective use of Insight,a new analytics service of YouTube that offers simple data analysis for partners.To provide the practical guidance from the raw Insight data,we enable more complex investigations for the inherent features that affect the popularity of the videos.Our findings facilitate YouTube partners to re-design current video publishing strategies,having more opportunities to attract more views.