In interactive platforms, we often want to predict which items could be more relevant for users, either based on their previous interactions with the system or their preferences. Such systems are called Recommender Sy...In interactive platforms, we often want to predict which items could be more relevant for users, either based on their previous interactions with the system or their preferences. Such systems are called Recommender Systems. They are divided into three main groups, including content-based, collaborative and hybrid recommenders. In this paper, we focus on collaborative filtering and the improvement of the accuracy of its techniques. Then, we suggest an Ensemble Learning Recommender System model made of a probabilistic model and an efficient matrix factorization method. The interactions between users and the platform are scored by explicit and implicit scores. At each user session, implicit scores are used to train a probabilistic model to compute the maximum likelihood estimator for the probability that an item will be recommended in the next session. The explicit scores are used to know the impact of the user’s vote on an item at the time of the recommendation.展开更多
Mining the core value of Industrial Internet of Things(IIoT)data safely and reducing the risk of malicious attacks are the inherent requirements of industrial data visualization.Visualization technology has become the...Mining the core value of Industrial Internet of Things(IIoT)data safely and reducing the risk of malicious attacks are the inherent requirements of industrial data visualization.Visualization technology has become the main tool for data aggregation,mining and analysis of IIoT data through graphical representation.However,visualization technology still has two shortcomings in big data calculation and analysis scenarios.On the one hand,visual results will lead to the disclosure of sensitive privacy.On the other hand,most visualization tools can't provide an interactive framework for users to select the suitable solutions.To address these problems,we present an open accessible Visual framework based on Differential Privacy theory(VisDP),which provides Multi-index Quantitative comprehensive Evaluation technology(MQE)for data mining results.Considering the advantages of interactive mechanism,VisDP provides rich optional schemes,including the operating web,calling API and the downloading SDK.Finally,we verify the availability and privacy of MQE through mathematical proofs,analyze the hospital medical waste detection system that actually applies the framework,and the experimental results have showed the effectiveness and practicality of the proposed platform.展开更多
To provide answers to the problem of the management of its coastal zone, Côte d’Ivoire has initiated a pooling of data collected on the coast to feed its environmental information management system. To this...To provide answers to the problem of the management of its coastal zone, Côte d’Ivoire has initiated a pooling of data collected on the coast to feed its environmental information management system. To this end, it was a question of creating an interactive platform for decision support for the development of this coastal zone. To achieve this objective, high spatial resolution raster data from 15 to 90 m from the Shuttle Radar Topography Mission and land cover vector data from 2017 were collected for processing in Websig software (QGIS 3.4, PostGreSql 10.5, PostGIS), published and displayed in Geoserveur for programming HTML, CSS and JavaScript codes in Atom. The results first made it possible to visualize the main issues in the interface, in particular, the rivers, the classified forests, the degraded forests, the intact forests, the housing and the industrial plantations and then to assess the risks of floods in Sassandra and San-Pédro. For overflow hazards 100 m beyond the shore, it is the houses, part of the forests and some bare soil that are submerged. As for the risks of overflowing 200 to 500 m beyond the shore, it is a large part of the housing, soils and intact forests that will be flooded. This tool must be made available to the final beneficiaries (users) by putting it online and listing it in the main search engines.展开更多
The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB le...The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers:(1) a storage layer using HDFS for the distributed storage of massive data;(2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and(3)an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency.展开更多
文摘In interactive platforms, we often want to predict which items could be more relevant for users, either based on their previous interactions with the system or their preferences. Such systems are called Recommender Systems. They are divided into three main groups, including content-based, collaborative and hybrid recommenders. In this paper, we focus on collaborative filtering and the improvement of the accuracy of its techniques. Then, we suggest an Ensemble Learning Recommender System model made of a probabilistic model and an efficient matrix factorization method. The interactions between users and the platform are scored by explicit and implicit scores. At each user session, implicit scores are used to train a probabilistic model to compute the maximum likelihood estimator for the probability that an item will be recommended in the next session. The explicit scores are used to know the impact of the user’s vote on an item at the time of the recommendation.
基金supported by the National Key Research and Development Program of China under Grant No.2020YFC2006600the National Natural Science Foundation of China under Grant No.62003291the National Science and Technology Foundation Project under Grant No.2019FY100100,and the QingLan Project.
文摘Mining the core value of Industrial Internet of Things(IIoT)data safely and reducing the risk of malicious attacks are the inherent requirements of industrial data visualization.Visualization technology has become the main tool for data aggregation,mining and analysis of IIoT data through graphical representation.However,visualization technology still has two shortcomings in big data calculation and analysis scenarios.On the one hand,visual results will lead to the disclosure of sensitive privacy.On the other hand,most visualization tools can't provide an interactive framework for users to select the suitable solutions.To address these problems,we present an open accessible Visual framework based on Differential Privacy theory(VisDP),which provides Multi-index Quantitative comprehensive Evaluation technology(MQE)for data mining results.Considering the advantages of interactive mechanism,VisDP provides rich optional schemes,including the operating web,calling API and the downloading SDK.Finally,we verify the availability and privacy of MQE through mathematical proofs,analyze the hospital medical waste detection system that actually applies the framework,and the experimental results have showed the effectiveness and practicality of the proposed platform.
文摘To provide answers to the problem of the management of its coastal zone, Côte d’Ivoire has initiated a pooling of data collected on the coast to feed its environmental information management system. To this end, it was a question of creating an interactive platform for decision support for the development of this coastal zone. To achieve this objective, high spatial resolution raster data from 15 to 90 m from the Shuttle Radar Topography Mission and land cover vector data from 2017 were collected for processing in Websig software (QGIS 3.4, PostGreSql 10.5, PostGIS), published and displayed in Geoserveur for programming HTML, CSS and JavaScript codes in Atom. The results first made it possible to visualize the main issues in the interface, in particular, the rivers, the classified forests, the degraded forests, the intact forests, the housing and the industrial plantations and then to assess the risks of floods in Sassandra and San-Pédro. For overflow hazards 100 m beyond the shore, it is the houses, part of the forests and some bare soil that are submerged. As for the risks of overflowing 200 to 500 m beyond the shore, it is a large part of the housing, soils and intact forests that will be flooded. This tool must be made available to the final beneficiaries (users) by putting it online and listing it in the main search engines.
基金supported by the National High-Tech Research and Development (863) Program of China (No. 2012AA012609)
文摘The rapid growth of structured data has presented new technological challenges in the research fields of big data and relational database. In this paper, we present an efficient system for managing and analyzing PB level structured data called Banian. Banian overcomes the storage structure limitation of relational database and effectively integrates interactive query with large-scale storage management. It provides a uniform query interface for cross-platform datasets and thus shows favorable compatibility and scalability. Banian's system architecture mainly includes three layers:(1) a storage layer using HDFS for the distributed storage of massive data;(2) a scheduling and execution layer employing the splitting and scheduling technology of parallel database; and(3)an application layer providing a cross-platform query interface and supporting standard SQL. We evaluate Banian using PB level Internet data and the TPC-H benchmark. The results show that when compared with Hive, Banian improves the query performance to a maximum of 30 times and achieves better scalability and concurrency.