期刊文献+
共找到42篇文章
< 1 2 3 >
每页显示 20 50 100
A Conceptual and Computational Framework for Aspect-Based Collaborative Filtering Recommender Systems 被引量:1
1
作者 Samin Poudel Marwan Bikdash 《Journal of Computer and Communications》 2023年第3期110-130,共21页
Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspe... Many datasets in E-commerce have rich information about items and users who purchase or rate them. This information can enable advanced machine learning algorithms to extract and assign user sentiments to various aspects of the items thus leading to more sophisticated and justifiable recommendations. However, most Collaborative Filtering (CF) techniques rely mainly on the overall preferences of users toward items only. And there is lack of conceptual and computational framework that enables an understandable aspect-based AI approach to recommending items to users. In this paper, we propose concepts and computational tools that can sharpen the logic of recommendations and that rely on users’ sentiments along various aspects of items. These concepts include: The sentiment of a user towards a specific aspect of a specific item, the emphasis that a given user places on a specific aspect in general, the popularity and controversy of an aspect among groups of users, clusters of users emphasizing a given aspect, clusters of items that are popular among a group of users and so forth. The framework introduced in this study is developed in terms of user emphasis, aspect popularity, aspect controversy, and users and items similarity. Towards this end, we introduce the Aspect-Based Collaborative Filtering Toolbox (ABCFT), where the tools are all developed based on the three-index sentiment tensor with the indices being the user, item, and aspect. The toolbox computes solutions to the questions alluded to above. We illustrate the methodology using a hotel review dataset having around 6000 users, 400 hotels and 6 aspects. 展开更多
关键词 Recommender System Collaborative Filtering Aspect based recommendation Recommendation System Framework Aspect Sentiments
下载PDF
Assessment of Municipal Solid Waste Management in the Farmgate Area of Dhaka North City Corporation
2
作者 Seyedali Mirmotalebi Shoeb Rahman +1 位作者 Mayida Rubya Tithi Imran Khan Apu 《World Journal of Engineering and Technology》 2024年第1期1-23,共23页
This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing un... This investigation is focused on conducting a thorough analysis of Municipal Solid Waste Management (MSWM). MSWM encompasses a range of interdisciplinary measures that govern the various stages involved in managing unwanted or non-utilizable solid materials, commonly known as rubbish, trash, junk, refuse, and garbage. These stages include generation, storage, collection, recycling, transportation, handling, disposal, and monitoring. The waste materials mentioned in this context exhibit a wide range of items, such as organic waste from food and vegetables, paper, plastic, polyethylene, iron, tin cans, deceased animals, byproducts from demolition activities, manure, and various other discarded materials. This study aims to provide insights into the possibilities of enhancing solid waste management in the Farmgate area of Dhaka North City Corporation (DNCC). To accomplish this objective, the research examines the conventional waste management methods employed in this area. It conducts extensive field surveys, collecting valuable data through interviews with local residents and key individuals involved in waste management, such as waste collectors, dealers, intermediate dealers, recyclers, and shopkeepers. The results indicate that significant amounts of distinct waste categories are produced daily. These include food and vegetable waste, which amount to 52.1 tons/day;polythene and plastic, which total 4.5 tons/day;metal and tin-can waste, which amounts to 1.4 tons/day;and paper waste, which totals 5.9 tons/day. This study highlights the significance of promoting environmental consciousness to effectively shape the attitudes of urban residents toward waste disposal and management. It emphasizes the need for collaboration between authorities and researchers to improve the current waste management system. 展开更多
关键词 Solid Waste Management Dhaka North City Corporation Sustainable Growth Integrated Waste Management Practice Waste Recycling
下载PDF
Cloud detection from visual band of satellite image based on variance of fractal dimension
3
作者 TIAN Pingfang GUANG Qiang LIU Xing 《Journal of Systems Engineering and Electronics》 SCIE EI CSCD 2019年第3期485-491,共7页
Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud de... Cover ratio of cloud is a very important factor which affects the quality of a satellite image, therefore cloud detection from satellite images is a necessary step in assessing the image quality. The study on cloud detection from the visual band of a satellite image is developed. Firstly, we consider the differences between the cloud and ground including high grey level, good continuity of grey level, area of cloud region, and the variance of local fractal dimension (VLFD) of the cloud region. A single cloud region detection method is proposed. Secondly, by introducing a reference satellite image and by comparing the variance in the dimensions corresponding to the reference and the tested images, a method that detects multiple cloud regions and determines whether or not the cloud exists in an image is described. By using several Ikonos images, the performance of the proposed method is demonstrated. 展开更多
关键词 CLOUD detection VISUAL IMAGE satellite IMAGE variance of local FRACTAL DIMENSION (VLFD)
下载PDF
Scalable and quantitative contention generation for performance evaluation on OLTP databases
4
作者 Chunxi ZHANG Yuming LI +2 位作者 Rong ZHANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期15-31,共17页
Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's pl... Massive scale of transactions with critical requirements become popular for emerging businesses,especially in E-commerce.One of the most representative applications is the promotional event running on Alibaba's platform on some special dates,widely expected by global customers.Although we have achieved significant progress in improving the scalability of transactional database systems(OLTP),the presence of contention operations in workloads is still one of the fundamental obstacles to performance improving.The reason is that the overhead of managing conflict transactions with concurrency control mechanisms is proportional to the amount of contentions.As a consequence,generating contented workloads is urgent to evaluate performance of modern OLTP database systems.Though we have kinds of standard benchmarks which provide some ways in simulating contentions,e.g.,skew distribution control of transactions,they can not control the generation of contention quantitatively;even worse,the simulation effectiveness of these methods is affected by the scale of data.So in this paper we design a scalable quantitative contention generation method with fine contention granularity control.We conduct a comprehensive set of experiments on popular opensourced DBMSs compared with the latest contention simulation method to demonstrate the effectiveness of our generation work. 展开更多
关键词 high contention OLTP database performance evaluation database benchmarking
原文传递
Accurate and efficient follower log repair for Raft-replicated database systems 被引量:2
5
作者 Jinwei GUO Peng CAI +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2021年第2期91-103,共13页
State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates th... State machine replication has been widely used in modern cluster-based database systems.Most commonly deployed configurations adopt the Raft-like consensus protocol,which has a single strong leader which replicates the log to other followers.Since the followers can handle read requests and many real workloads are usually read-intensive,the recovery speed of a crashed follower may significantly impact on the throughput.Different from traditional database recovery,the recovering follower needs to repair its local log first.Original Raft protocol takes many network round trips to do log comparison between leader and the crashed follower.To reduce network round trips,an optimization method is to truncate the follower’s uncertain log entries behind the latest local commit point,and then to directly fetch all committed log entries from the leader in one round trip.However,if the commit point is not persisted,the recovering follower has to get the whole log from the leader.In this paper,we propose an accurate and efficient log repair(AELR)algorithm for follower recovery.AELR is more robust and resilient to follower failure,and it only needs one network round trip to fetch the least number of log entries for follower recovery.This approach is implemented in the open source database system OceanBase.We experimentally show that the system adopting AELR has a good performance in terms of recovery time. 展开更多
关键词 RAFT high availability log replication log repair
原文传递
A Fast Filling Algorithm for Image Restoration Based on Contour Parity 被引量:1
6
作者 Yan Liu Wenxin Hu +2 位作者 Longzhe Han Maksymyuk Taras Zhiyun Chen 《Computers, Materials & Continua》 SCIE EI 2020年第4期509-519,共11页
Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper propos... Filling techniques are often used in the restoration of images.Yet the existing filling technique approaches either have high computational costs or present problems such as filling holes redundantly.This paper proposes a novel algorithm for filling holes and regions of the images.The proposed algorithm combines the advantages of both the parity-check filling approach and the region-growing inpainting technique.Pairing points of the region’s boundary are used to search and to fill the region.The scanning range of the filling method is within the target regions.The proposed method does not require additional working memory or assistant colors,and it can correctly fill any complex contours.Experimental results show that,compared to other approaches,the proposed algorithm fills regions faster and with lower computational cost. 展开更多
关键词 Region filling image restoration parity check region growing
下载PDF
Efficient and stable quorum-based log replication and replay for modern cluster-databases
7
作者 Donghui WANG Peng CAI +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第5期143-158,共16页
The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raf... The modern in-memory database(IMDB)can support highly concurrent on-line transaction processing(OLTP)workloads and generate massive transactional logs per second.Quorum-based replication protocols such as Paxos or Raft have been widely used in the distributed databases to offer higher availability and fault-tolerance.However,it is non-trivial to replicate IMDB because high transaction rate has brought new challenges.First,the leader node in quorum replication should have adaptivity by considering various transaction arrival rates and the processing capability of follower nodes.Second,followers are required to replay logs to catch up the state of the leader in the highly concurrent setting to reduce visibility gap.Third,modern databases are often built with a cluster of commodity machines connected by low configuration networks,in which the network anomalies often happen.In this case,the performance would be significantly affected because the follower node falls into the long-duration exception handling process(e.g.,fetch lost logs from the leader).To this end,we build QuorumX,an efficient and stable quorum-based replication framework for IMDB under heavy OLTP workloads.QuorumX combines critical path based batching and pipeline batching to provide an adaptive log propagation scheme to obtain a stable and high performance at various settings.Further,we propose a safe and coordination-free log replay scheme to minimize the visibility gap between the leader and follower IMDBs.We further carefully design the process for the follower node in order to alleviate the influence of the unreliable network on the replication performance.Our evaluation results with the YCSB,TPC-C and a realistic microbenchmark demonstrate that QuorumX achieves the performance close to asynchronous primary-backup replication and could always provide a stable service with data consistency and a low-level visibility gap. 展开更多
关键词 log replication log replay consensus protocol high performance high availability QUORUM unreliable network packet loss
原文传递
Development and validation of an artificial intelligence model for predicting post-transplant hepatocellular cancer recurrence 被引量:1
8
作者 Quirino Lai Carmine De Stefano +18 位作者 Jean Emond Prashant Bhangui Toru Ikegami Benedikt Schaefer Maria Hoppe-Lotichius Anna Mrzljak Takashi Ito Marco Vivarelli Giuseppe Tisone Salvatore Agnes Giuseppe Maria Ettorre Massimo Rossi Emmanuel Tsochatzis Chung Mau Lo Chao-Long Chen Umberto Cillo Matteo Ravaioli Jan Paul Lerut the EurHeCaLT and the West-East LT Study Group 《Cancer Communications》 SCIE 2023年第12期1381-1385,共5页
Dear Editor,In recent years,criteria based on the combinationof morphology and biology have been proposed forimproving the selection of hepatocellular cancer(HCC)patients waiting for liver transplantation(LT)[1,2].Sin... Dear Editor,In recent years,criteria based on the combinationof morphology and biology have been proposed forimproving the selection of hepatocellular cancer(HCC)patients waiting for liver transplantation(LT)[1,2].Since all the proposed models showed suboptimalresults in predicting the risk of postLT recurrence,aprediction model constructed using artificial intelligence(Al)could be an attractive way to surpass this limit[3,4].Therefore,the Time_Radiological-response_Alpha-fetoproteIN_Artificial-Intelligence(TRAIN-AI)modelwas developed,combining morphology and biology tumorvariables. 展开更多
关键词 HEPATOCELLULAR CANCER artificial
原文传递
Scalable and adaptive log manager in distributed systems
9
作者 Huan ZHOU Weining QIAN +3 位作者 Xuan ZHOU Qiwen DONG Aoying ZHOU Wenrong TAN 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第2期45-62,共18页
On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component o... On-line transaction processing(OLTP)systems rely on transaction logging and quorum-based consensus protocol to guarantee durability,high availability and strong consistency.This makes the log manager a key component of distributed database management systems(DDBMSs).The leader of DDBMSs commonly adopts a centralized logging method to writing log entries into a stable storage device and uses a constant log replication strategy to periodically synchronize its state to followers.With the advent of new hardware and high parallelism of transaction processing,the traditional centralized design of logging limits scalability,and the constant trigger condition of replication can not always maintain optimal performance under dynamic workloads.In this paper,we propose a new log manager named Salmo with scalable logging and adaptive replication for distributed database systems.The scalable logging eliminates centralized contention by utilizing a highly concurrent data structure and speedy log hole tracking.The kernel of adaptive replication is an adaptive log shipping method,which dynamically adjusts the number of log entries transmitted between leader and followers based on the real-time workload.We implemented and evaluated Salmo in the open-sourced transaction processing systems Cedar and DBx1000.Experimental results show that Salmo scales well by increasing the number of working threads,improves peak throughput by 1.56×and reduces latency by more than 4×over log replication of Raft,and maintains efficient and stable performance under dynamic workloads all the time. 展开更多
关键词 distributed database systems transaction log-ging log replication SCALABLE ADAPTIVE
原文传递
D-Cubicle:boosting data transfer dynamically for large-scale analytical queries in single-GPU systems
10
作者 Jialun WANG Wenhao PANG +1 位作者 Chuliang WENG Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第4期141-153,共13页
In analytical queries,a number of important operators like JOIN and GROUP BY are suitable for parallelization,and GPU is an ideal accelerator considering its power of parallel computing.However,when data size increase... In analytical queries,a number of important operators like JOIN and GROUP BY are suitable for parallelization,and GPU is an ideal accelerator considering its power of parallel computing.However,when data size increases to hundreds of gigabytes,one GPU card becomes insufficient due to the small capacity of global memory and the slow data transfer between host and device.A straightforward solution is to equip more GPUs linked with high-bandwidth connectors,but the cost will be highly increased.We utilize unified memory(UM)produced by NVIDIA CUDA(Compute Unified Device Architecture)to make it possible to accelerate large-scale queries on just one GPU,but we notice that the transfer performance between host and UM,which happens before kernel execution,is often significantly slower than the theoretical bandwidth.An important reason is that,in singleGPU environment,data processing systems usually invoke only one or a static number of threads for data copy,leading to an inefficient transfer which slows down the overall performance heavily.In this paper,we present D-Cubicle,a runtime module to accelerate data transfer between host-managed memory and unified memory.D-Cubicle boosts the actual transfer speed dynamically through a self-adaptive approach.In our experiments,taking data transfer into account,D-Cubicle processes 200 GB of data on a single GPU with 32 GB of global memory,achieving 1.43x averagely and 2.09x maximally the performance of the baseline system. 展开更多
关键词 data analytics GPU unified memory
原文传递
High-availability in-memory key-value store using RDMA and Optane DCPMM
11
作者 Xuecheng QI Huiqi HU +5 位作者 Jinwei GUO Chenchen HUANG Xuan ZHOU Ning XU Yu FU Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2023年第1期221-223,共3页
1 Introduction and main contributiions Emerging hardwares like remote Direct Memory Access(RDMA)capable networks and persistent memory(PM)are promising to build fast high availability in-memory key-value stores.The re... 1 Introduction and main contributiions Emerging hardwares like remote Direct Memory Access(RDMA)capable networks and persistent memory(PM)are promising to build fast high availability in-memory key-value stores.The recent advent of Intel Optane DC Persistent Memory Modules(Optane DCPMM)brings the future closer.However,existing studies to combine the two devices cannot deliver the desired performance due to their two-phase protocols for log shipping and most of them were based on emulation that perform sub-optimally on real PM hardware. 展开更多
关键词 HARDWARE RDMA VALUE
原文传递
SMEC:Scene Mining for E-Commerce
12
作者 王罡 李翔 +2 位作者 郭子义 殷大伟 马帅 《Journal of Computer Science & Technology》 SCIE EI CSCD 2024年第1期192-210,共19页
Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.I... Scene-based recommendation has proven its usefulness in E-commerce,by recommending commodities based on a given scene.However,scenes are typically unknown in advance,which necessitates scene discovery for E-commerce.In this article,we study scene discovery for E-commerce systems.We first formalize a scene as a set of commodity cate-gories that occur simultaneously and frequently in real-world situations,and model an E-commerce platform as a heteroge-neous information network(HIN),whose nodes and links represent different types of objects and different types of rela-tionships between objects,respectively.We then formulate the scene mining problem for E-commerce as an unsupervised learning problem that finds the overlapping clusters of commodity categories in the HIN.To solve the problem,we pro-pose a non-negative matrix factorization based method SMEC(Scene Mining for E-Commerce),and theoretically prove its convergence.Using six real-world E-commerce datasets,we finally conduct an extensive experimental study to evaluate SMEC against 13 other methods,and show that SMEC consistently outperforms its competitors with regard to various evaluation measures. 展开更多
关键词 graph clustering E-COMMERCE heterogeneous information network(HIN) scene mining
原文传递
Fault-tolerant precise data access on distributed log-structured merge-tree 被引量:2
13
作者 Tao ZHU Huiqi HU +2 位作者 Weining QIAN Huan ZHOU Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第4期760-777,共18页
Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a mem... Log-structured merge tree has been adopted by many distributed storage systems. It decomposes a large database into multiple parts: an in?writing part and several read-only ones. Records are firstly written into a memoryoptimized structure and then compacted into in-disk struc? tures periodically. It achieves high write throughput. However, it brings side effect that read requests have to go through multiple structures to find the required record. In a distributed database system, different parts of the LSM-tree are stored in distributed fashion. To this end, a server in the query layer has to issues multiple network communications to pull data items from the underlying storage layer. Coming to its rescue, this work proposes a precise data access strategy which includes: an efficient structure with low maintaining overhead designed to test whether a record exists in the in?writing part of the LSM-tree;a lease-based synchronization strategy proposed to maintain consistent copies of the structure on remote query servers. We further prove the technique is capable of working robustly when the LSM-Tree is re?organizing multiple structures in the backend. It is also fault-tolerant, which is able to recover the structures used in data access after node failures happen. Experiments using the YCSB benchmark show that the solution has 6x throughput improvement over existing methods. 展开更多
关键词 DISTRIBUTED data storage log-structured MERGE TREE LINEARIZABILITY fault tolerance
原文传递
EnAli:entity alignment across multiple heterogeneous data sources 被引量:2
14
作者 Chao KONG Ming GAO +3 位作者 Chen XU Yunbin FU Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第1期157-169,共13页
Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fie... Entity alignment is the problem of identifying which entities in a data source refer to the same real-world entity in the others.Identifying entities across heterogeneous data sources is paramount to many research fields,such as data cleaning,data integration,.information retrieval and machine learning.The aligning process is not only overwhelmingly expensive for large data sources since it involves all tuples from two or more data sources,but also need to handle heterogeneous entity attributes.In this paper,we propose an unsupervised approach,called EnAli,to match entities across two or more heterogeneous data sources.EnAli employs a generative probabilistic model to incorporate the heterogeneous entity attributes via employing exponential family,handle missing values,and also utilize the locality sensitive hashing schema to reduce the candidate tuples and speed up the aligning process.EnAli is highly accurate and efficient even without any ground-truth tuples.We illustrate the performance of EnAli on re-identifying entities from the same data source,as well as aligning entities across three real data sources.Our experimental results manifest that our proposed approach outperforms the comparable baseline. 展开更多
关键词 ENTITY ALIGNMENT EXPONENTIAL family LOCALITY sensitive HASHING EM-algofithm
原文传递
Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text 被引量:2
15
作者 Danyang Hu Meng Wang +2 位作者 Feng Gao Fangfang Xu Jinguang Gu 《Data Intelligence》 EI 2022年第3期573-598,共26页
Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a hol... Temporal information is pervasive and crucial in medical records and other clinical text,as it formulates the development process of medical conditions and is vital for clinical decision making.However,providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging.In order to capture complex temporal semantics in clinical text,we propose a novel Clinical Time Ontology(CTO)as an extension from OWL framework.More specifically,we identified eight timerelated problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time,cyclic time,irregular time,negations and other complex aspects of clinical time.Then,we extended Allen’s and TEO’s temporal relations and defined the relation concept description between complex and simple time.Simultaneously,we provided a formulaic and graphical presentation of complex time and complex time relationships.We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets.Finally,experiment results demonstrate that CTO could faithfully represent and reason over 93%of the temporal expressions,and it can cover a wider range of time-related classes in clinical domain. 展开更多
关键词 Clinical text Temporal ontology Temporal relations OWL Negation of temporal relation
原文传递
Product-oriented review summarization and scoring 被引量:1
16
作者 Rong ZHANG Wenzhe YU +2 位作者 Chaofeng SHA Xiaofeng HE Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2015年第2期210-223,共14页
Currently, mere are many onune review weo sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the num... Currently, mere are many onune review weo sites where consumers can freely write comments about different kinds of products and services. These comments are quite useful for other potential consumers. However, the number of online comments is often large and the number continues to grow as more and more consumers contribute. In addition, one comment may mention more than one product and con- tain opinions about different products, mentioning something good and something bad. However, they share only a single overall score, Therefore, it is not easy to know the quality of an individual product from these comments. This paper presents a novel approach to generate review summaries including scores and description snippets with re- spect to each individual product. From the large number of comments, we first extract the context (snippet) that includes a description of the products and choose those snippets that express consumer opinions on them. We then propose several methods to predict the rating (from 1 to 5 stars) of the snip- pets. Finally, we derive a generic framework for generating summaries from the snippets. We design a new snippet selec- tion algorithm to ensure that the returned results preserve the opinion-aspect statistical properties and attribute-aspect cov- erage based on a standard seat allocation algorithm. Through experiments we demonstrate empirically that our methods are effective. We also quantitatively evaluate each step of our ap- proach. 展开更多
关键词 online transaction DIVERSIFICATION review sum-marization review scoring
原文传递
Optimal Dependence of Performance and Efficiency of Collaborative Filtering on Random Stratified Subsampling 被引量:1
17
作者 Samin Poudel Marwan Bikdash 《Big Data Mining and Analytics》 EI 2022年第3期192-205,共14页
Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understo... Dropping fractions of users or items judiciously can reduce the computational cost of Collaborative Filtering(CF)algorithms.The effect of this subsampling on the computing time and accuracy of CF is not fully understood,and clear guidelines for selecting optimal or even appropriate subsampling levels are not available.In this paper,we present a Density-based Random Stratified Subsampling using Clustering(DRSC)algorithm in which the desired Fraction of Users Dropped(FUD)and Fraction of Items Dropped(FID)are specified,and the overall density during subsampling is maintained.Subsequently,we develop simple models of the Training Time Improvement(TTI)and the Accuracy Loss(AL)as functions of FUD and FID,based on extensive simulations of seven standard CF algorithms as applied to various primary matrices from MovieLens,Yahoo Music Rating,and Amazon Automotive data.Simulations show that both TTI and a scaled AL are bi-linear in FID and FUD for all seven methods.The TTI linear regression of a CF method appears to be same for all datasets.Extensive simulations illustrate that TTI can be estimated reliably with FUD and FID only,but AL requires considering additional dataset characteristics.The derived models are then used to optimize the levels of subsampling addressing the tradeoff between TTI and AL.A simple sub-optimal approximation was found,in which the optimal AL is proportional to the optimal Training Time Reduction Factor(TTRF)for higher values of TTRF,and the optimal subsampling levels,like optimal FID/(1-FID),are proportional to the square root of TTRF. 展开更多
关键词 Collaborative Filtering(CF) SUBSAMPLING Training Time Improvement(TTI) performance loss Recommendation System(RS) collaborative filtering optimal solutions rating matrix
原文传递
A parallel data generator for efficiently generating “realistic” social streams
18
作者 Chengcheng YU Fan XIA +1 位作者 Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2019年第5期1072-1101,共30页
A social stream refers to the data stream that records a series of social entities and the dynamic interac-tions between two entities. It can be employed to model the changes of entity states in numerous applications.... A social stream refers to the data stream that records a series of social entities and the dynamic interac-tions between two entities. It can be employed to model the changes of entity states in numerous applications. The social streams, the combination of graph and streaming data, pose great challenge to efficient analytical query processing, and are key to better understanding users' behavior. Considering of privacy and other related issues, a social stream genera-tor is of great significance. A framework of synthetic social stream generator (SSG) is proposed in this paper. The gener-ated social streams using SSG can be tuned to capture sev-eral kinds of fundamental social stream properties, includ-ing patterns about users' behavior and graph patterns. Ex-tensive empirical studies with several real-life social stream data sets show that SSG can produce data that better fit to real data. It is also confirmed that SSG can generate social stream data continuously with stable throughput and memory consumption. Furthermore, we propose a parallel implemen-tation of SSG with the help of asynchronized parallel pro-cessing model and delayed update strategy. Our experiments verify that the throughput of the parallel implementation can increase linearly by increasing nodes. 展开更多
关键词 SOCIAL STREAM data GENERATOR SSG parallel generation
原文传递
Image copy-move forgery passive detection based on improved PCNN and self-selected sub-images
19
作者 Guoshuai Zhou Xiuxia Tian Aoying Zhou 《Frontiers of Computer Science》 SCIE EI CSCD 2022年第4期131-146,共16页
Image forgery detection remains a challenging problem.For the most common copy-move forgery detection,the robustness and accuracy of existing methods can still be further improved.To the best of our knowledge,we are t... Image forgery detection remains a challenging problem.For the most common copy-move forgery detection,the robustness and accuracy of existing methods can still be further improved.To the best of our knowledge,we are the first to propose an image copy-move forgery passive detection method by combining the improved pulse coupled neural network(PCNN)and the self-selected sub-images.Our method has the following steps:First,contour detection is performed on the input color image,and bounding boxes are drawn to frame the contours to form suspected forgery sub-images.Second,by improving PCNN to perform feature extraction of sub-images,the feature invariance of rotation,scaling,noise adding,and so on can be achieved.Finally,the dual feature matching is used to match the features and locate the forgery regions.What’s more,the self-selected sub-images can quickly obtain suspected forgery sub-images and lessen the workload of feature extraction,and the improved PCNN can extract image features with high robustness.Through experiments on the standard image forgery datasets CoMoFoD and CASIA,it is effectively verified that the robustness score and accuracy of proposed method are much higher than the current best method,which is a more efficient image copy-move forgery passive detection method. 展开更多
关键词 image copy-move forgery passive detection self-selected sub-images pulse coupled neural network(PCNN) dual feature matching
原文传递
Benchmarking in-memory database
20
作者 Cheqing JIN Yangxin KONG +2 位作者 Qiangqiang KANG Weining QIAN Aoying ZHOU 《Frontiers of Computer Science》 SCIE EI CSCD 2016年第6期1067-1081,共15页
We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer sys... We have witnessed exciting development of RAM technology in the past decade. The memory size grows rapidly and the price continues to decrease, so that it is fea- sible to deploy large amounts of RAM in a computer system. Several companies and research institutions have devoted a lot of resources to develop in-memory databases (IMDB) that implement queries after loading data into (virtual) memory in advance. The bloom of various in-memory databases pursues us to test and evaluate their performance objectively and fairly. Although the existing database benchmarks like Wisconsin benchmark and TPC-X series have achieved great success, they cannot suit for in-memory databases due to the lack of consideration of unique characteristics of an IMDB. In this study, we propose MemTest, a novel benchmark that concerns some major characteristics of an in-memory database. This benchmark constructs particular metrics, which cover processing time, compression ratio, minimal memory space and column strength of an in-memory database. We design a data model based on inter-bank transaction applications, and a data generator to support uniform and skew data distributions. The MemTest workload includes a set of queries and transactions against the metrics and data model. Finally, we illustrate the efficacy of MemTest through the implementations on two different in-memory databases. 展开更多
关键词 BENCHMARK in-memory database MEMORY
原文传递
上一页 1 2 3 下一页 到第
使用帮助 返回顶部