期刊文献+
共找到9篇文章
< 1 >
每页显示 20 50 100
Improving Image Copy-Move Forgery Detection with Particle Swarm Optimization Techniques 被引量:7
1
作者 SHI Wenchang ZHAO Fei +1 位作者 QIN Bo LIANG Bin 《China Communications》 SCIE CSCD 2016年第1期139-149,共11页
Copy-Move Forgery(CMF) is one of the simple and effective operations to create forged digital images.Recently,techniques based on Scale Invariant Features Transform(SIFT) are widely used to detect CMF.Various approach... Copy-Move Forgery(CMF) is one of the simple and effective operations to create forged digital images.Recently,techniques based on Scale Invariant Features Transform(SIFT) are widely used to detect CMF.Various approaches under the SIFT-based framework are the most acceptable ways to CMF detection due to their robust performance.However,for some CMF images,these approaches cannot produce satisfactory detection results.For instance,the number of the matched keypoints may be too less to prove an image to be a CMF image or to generate an accurate result.Sometimes these approaches may even produce error results.According to our observations,one of the reasons is that detection results produced by the SIFT-based framework depend highly on parameters whose values are often determined with experiences.These values are only applicable to a few images,which limits their application.To solve the problem,a novel approach named as CMF Detection with Particle Swarm Optimization(CMFDPSO) is proposed in this paper.CMFD-PSO integrates the Particle Swarm Optimization(PSO) algorithm into the SIFT-based framework.It utilizes the PSO algorithm to generate customized parameter values for images,which are used for CMF detection under the SIFT-based framework.Experimental results show that CMFD-PSO has good performance. 展开更多
关键词 copy-move forgery detection SIFT region duplication digital image forensics
下载PDF
Random Forests Algorithm Based Duplicate Detection in On-Site Programming Big Data Environment 被引量:1
2
作者 Qianqian Li Meng Li +1 位作者 Lei Guo Zhen Zhang 《Journal of Information Hiding and Privacy Protection》 2020年第4期199-205,共7页
On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is e... On-site programming big data refers to the massive data generated in the process of software development with the characteristics of real-time,complexity and high-difficulty for processing.Therefore,data cleaning is essential for on-site programming big data.Duplicate data detection is an important step in data cleaning,which can save storage resources and enhance data consistency.Due to the insufficiency in traditional Sorted Neighborhood Method(SNM)and the difficulty of high-dimensional data detection,an optimized algorithm based on random forests with the dynamic and adaptive window size is proposed.The efficiency of the algorithm can be elevated by improving the method of the key-selection,reducing dimension of data set and using an adaptive variable size sliding window.Experimental results show that the improved SNM algorithm exhibits better performance and achieve higher accuracy. 展开更多
关键词 On-site programming big data duplicate record detection random forests adaptive sliding window
下载PDF
Approximate Discovery of Service Nodes by Duplicate Detection in Flows
3
作者 Zhou Changling Xiao Jianguo +2 位作者 Cui Jian Zhang Bei Li Feng 《China Communications》 SCIE CSCD 2012年第5期75-89,共15页
Discovery of service nodes in flows is a challenging task, especially in large ISPs or campus networks where the amount of traffic across net-work is rmssive. We propose an effective data structure called Round-robin ... Discovery of service nodes in flows is a challenging task, especially in large ISPs or campus networks where the amount of traffic across net-work is rmssive. We propose an effective data structure called Round-robin Buddy Bloom Filters (RBBF) to detect duplicate elements in flows. A two-stage approximate algorithm based on RBBF which can be used for detecting service nodes from NetFlow data is also given and the perfonmnce of the algorithm is analyzed. In our case, the proposed algorithm uses about 1% memory of hash table with false positive error rate less than 5%. A proto-type system, which is compatible with both IPv4 and IPv6, using the proposed data structure and al-gorithm is introduced. Some real world case studies based on the prototype system are discussed. 展开更多
关键词 duplicate detection service nodes dis-covery buddy bloom filter round-robin schema NETFLOW
下载PDF
An Automatic Threshold Selection Using ALO for Healthcare Duplicate Record Detection with Reciprocal Neuro-Fuzzy Inference System
4
作者 Ala Saleh Alluhaidan Pushparaj +4 位作者 Anitha Subbappa Ved Prakash Mishra P.V.Chandrika Anurika Vaish Sarthak Sengupta 《Computers, Materials & Continua》 SCIE EI 2023年第3期5821-5836,共16页
ESystems based on EHRs(Electronic health records)have been in use for many years and their amplified realizations have been felt recently.They still have been pioneering collections of massive volumes of health data.D... ESystems based on EHRs(Electronic health records)have been in use for many years and their amplified realizations have been felt recently.They still have been pioneering collections of massive volumes of health data.Duplicate detections involve discovering records referring to the same practical components,indicating tasks,which are generally dependent on several input parameters that experts yield.Record linkage specifies the issue of finding identical records across various data sources.The similarity existing between two records is characterized based on domain-based similarity functions over different features.De-duplication of one dataset or the linkage of multiple data sets has become a highly significant operation in the data processing stages of different data mining programmes.The objective is to match all the records associated with the same entity.Various measures have been in use for representing the quality and complexity about data linkage algorithms,and many other novel metrics have been introduced.An outline of the problem existing in themeasurement of data linkage and de-duplication quality and complexity is presented.This article focuses on the reprocessing of health data that is horizontally divided among data custodians,with the purpose of custodians giving similar features to sets of patients.The first step in this technique is about an automatic selection of training examples with superior quality from the compared record pairs and the second step involves training the reciprocal neuro-fuzzy inference system(RANFIS)classifier.Using the Optimal Threshold classifier,it is presumed that there is information about the original match status for all compared record pairs(i.e.,Ant Lion Optimization),and therefore an optimal threshold can be computed based on the respective RANFIS.Febrl,Clinical Decision(CD),and Cork Open Research Archive(CORA)data repository help analyze the proposed method with evaluated benchmarks with current techniques. 展开更多
关键词 Duplicate detection healthcare record linkage dataset pre-processing reciprocal neuro-fuzzy inference system and ant lion optimization fuzzy system
下载PDF
An Experimental Simulation of Addressing Auto-Configuration Issues for Wireless Sensor Networks 被引量:2
5
作者 Idrees Sarhan Kocher 《Computers, Materials & Continua》 SCIE EI 2022年第5期3821-3838,共18页
Applications of Wireless Sensor devices are widely used byvarious monitoring sections such as environmental monitoring, industrialsensing, habitat modeling, healthcare and enemy movement detection systems.Researchers ... Applications of Wireless Sensor devices are widely used byvarious monitoring sections such as environmental monitoring, industrialsensing, habitat modeling, healthcare and enemy movement detection systems.Researchers were found that 16 bytes packet size (payload) requires MediaAccess Control (MAC) and globally unique network addresses overheads asmore as the payload itself which is not reasonable in most situations. Theapproach of using a unique address isn’t preferable for most Wireless SensorNetworks (WSNs) applications as well. Based on the mentioned drawbacks,the current work aims to fill the existing gap in the field area by providingtwo strategies. First, name/address solutions that assign unique addresseslocally to clustered topology-based sensor devices, reutilized in a spatialmanner, and reduce name/address size by a noticeable amount of 2.9 basedon conducted simulation test. Second, name/address solutions that assignreutilizing of names/addresses to location-unaware spanning-tree topologyin an event-driven WSNs case (that is providing minimal low latenciesand delivering addressing packet in an efficient manner). Also, to declinethe approach of needing both addresses (MAC and network) separately, itdiscloses how in a spatial manner to reutilize locally unique sensor devicename approach and could be utilized in both contexts and providing anenergy-efficient protocol for location unawareness clustered based WSNs.In comparison, an experimental simulation test performed and given theaddresses solution with less overhead in the header and 62 percent fairpayload efficiency that outperforms 34 percent less effective globally uniqueaddresses. Furthermore, the proposed work provides addresses uniquenessfor network-level without using network-wide Duplicate Address Detection(DAD) algorithm. Consequently, the current study provides a roadmap foraddressing/naming scheme to help researchers in this field of study. In general,some assumptions were taken during the work phases of this study such asnumber of Cluster Head (CH) nodes is 6% of entire sensor nodes, locationunawareness for entire sensor network and 4 bits per node address space whichconsidered as the limitation of the study. 展开更多
关键词 Addressing\Naming MAC address global address locally unique address tree spanning clustering duplicate address detection(DAD)
下载PDF
Improved Approximate Detection of Duplicates for Data Streams Over Sliding Windows 被引量:3
6
作者 沈鸿 张育 《Journal of Computer Science & Technology》 SCIE EI CSCD 2008年第6期973-987,共15页
Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, a... Detecting duplicates in data streams is an important problem that has a wide range of applications. In general, precisely detecting duplicates in an unbounded data stream is not feasible in most streaming scenarios, and, on the other hand, the elements in data streams are always time sensitive. These make it particular significant approximately detecting duplicates among newly arrived elements of a data stream within a fixed time frame. In this paper, we present a novel data structure, Decaying Bloom Filter (DBF), as an extension of the Counting Bloom Filter, that effectively removes stale elements as new elements continuously arrive over sliding windows. On the DBF basis we present an efficient algorithm to approximately detect duplicates over sliding windows. Our algorithm may produce false positive errors, but not false negative errors as in many previous results. We analyze the time complexity and detection accuracy, and give a tight upper bound of false positive rate. For a given space G bits and sliding window size W, our algorithm has an amortized time complexity of O(√G/W). Both analytical and experimental results on synthetic data demonstrate that our algorithm is superior in both execution time and detection accuracy to the previous results. 展开更多
关键词 data stream duplicate detection bloom filter approximate query sliding window
原文传递
Leach: an automatic learning cache for inline primary deduplication system 被引量:2
7
作者 Bin LIN Shanshan LI Xiangke LIAO Jing ZHANG Xiaodong LIU 《Frontiers of Computer Science》 SCIE EI CSCD 2014年第2期175-183,共9页
Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage system... Deduplication technology has been increasingly used to reduce storage costs. Though it has been successfully applied to backup and archival systems, existing techniques can hardly be deployed in primary storage systems due to the associated latency cost of detecting duplicated data, where every unit has to be checked against a substantially large fin- gerprint index before it is written. In this paper we introduce Leach, for inline primary storage, a self-learning in-memory fingerprints cache to reduce the writing cost in deduplica- tion system. Leach is motivated by the characteristics of real- world I/O workloads: highly data skew exist in the access patterns of duplicated data. Leach adopts a splay tree to or- ganize the on-disk fingerprint index, automatically learns the access patterns and maintains hot working sets in cache mem- ory, with a goal to service a majority of duplicated data de- tection. Leveraging the working set property, Leach provides optimization to reduce the cost of splay operations on the fin- gerprint index and cache updates. In comprehensive experi- ments on several real-world datasets, Leach outperforms con- ventional LRU (least recently used) cache policy by reducing the number of cache misses, and significantly improves write performance without great impact to cache hits. 展开更多
关键词 DEduplication duplicate detection splay tree cache
原文传递
Learning to combine multiple string similarity metrics for effective toponym matching 被引量:1
8
作者 Rui Santos Patricia Murrieta-Flores Bruno Martins 《International Journal of Digital Earth》 SCIE EI 2018年第9期913-938,共26页
Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this articl... Several tasks related to geographical information retrieval and to the geographical information sciences involve toponym matching,that is,the problem of matching place names that share a common referent.In this article,we present the results of a wide-ranging evaluation on the performance of different string similarity metrics over the toponym matching task.We also report on experiments involving the usage of supervised machine learning for combining multiple similarity metrics,which has the natural advantage of avoiding the manual tuning of similarity thresholds.Experiments with a very large dataset show that the performance differences for the individual similarity metrics are relatively small,and that carefully tuning the similarity threshold is important for achieving good results.The methods based on supervised machine learning,particularly when considering ensembles of decision trees,can achieve good results on this task,significantly outperforming the individual similarity metrics. 展开更多
关键词 Toponym matching supervised learning string similarity metrics duplicate detection ensemble learning geographic information retrieval
原文传递
Detecting Duplicate Contributions in Pull-Based Model CombiningTextual and Change Similarities
9
作者 Zhi-Xing Li Yue Yu +3 位作者 Tao Wang Gang Yin Xin-Jun Mao Huai-Min Wang 《Journal of Computer Science & Technology》 SCIE EI CSCD 2021年第1期191-206,共16页
Communication and coordination between OSS developers who do not work physically in the same location have always been the challenging issues.The pull-based development model,as the state-of-art collaborative developm... Communication and coordination between OSS developers who do not work physically in the same location have always been the challenging issues.The pull-based development model,as the state-of-art collaborative development mechanism,provides high openness and transparency to improve the visibility of contributors'work.However,duplicate contributions may still be submitted by more than one contributors to solve the same problem due to the parallel and uncoordinated nature of this model.If not detected in time,duplicate pull-requests can cause contributors and reviewers to waste time and energy on redundant work.In this paper,we propose an approach combining textual and change similarities to automatically detect duplicate contributions in pull-based model at submission time.For a new-arriving contribution,we first compute textual similarity and change similarity between it and other existing contributions.And then our method returns a list of candidate duplicate contributions that are most similar with the new contribution in terms of the combined textual and change similarity.The evaluation shows that 83.4%of the duplicates can be found in average when we use the combined textual and change similarity compared to 54.8%using only textual similarity and 78.2%using only change similarity. 展开更多
关键词 Pull-request Duplicate detection textual similarity change similarity
原文传递
上一页 1 下一页 到第
使用帮助 返回顶部