For rechargeable wireless sensor networks,limited energy storage capacity,dynamic energy supply,low and dynamic duty cycles cause that it is unpractical to maintain a fixed routing path for packets delivery permanentl...For rechargeable wireless sensor networks,limited energy storage capacity,dynamic energy supply,low and dynamic duty cycles cause that it is unpractical to maintain a fixed routing path for packets delivery permanently from a source to destination in a distributed scenario.Therefore,before data delivery,a sensor has to update its waking schedule continuously and share them to its neighbors,which lead to high energy expenditure for reestablishing path links frequently and low efficiency of energy utilization for collecting packets.In this work,we propose the maximum data generation rate routing protocol based on data flow controlling technology.For a sensor,it does not share its waking schedule to its neighbors and cache any waking schedules of other sensors.Hence,the energy consumption for time synchronization,location information and waking schedule shared will be reduced significantly.The saving energy can be used for improving data collection rate.Simulation shows our scheme is efficient to improve packets generation rate in rechargeable wireless sensor networks.展开更多
By analyzing some existing test data generation methods, a new automated test data generation approach was presented. The linear predicate functions on a given path was directly used to construct a linear constrain sy...By analyzing some existing test data generation methods, a new automated test data generation approach was presented. The linear predicate functions on a given path was directly used to construct a linear constrain system for input variables. Only when the predicate function is nonlinear, does the linear arithmetic representation need to be computed. If the entire predicate functions on the given path are linear, either the desired test data or the guarantee that the path is infeasible can be gotten from the solution of the constrain system. Otherwise, the iterative refining for the input is required to obtain the desired test data. Theoretical analysis and test results show that the approach is simple and effective, and takes less computation. The scheme can also be used to generate path-based test data for the programs with arrays and loops.展开更多
The automatic generation of test data is a key step in realizing automated testing.Most automated testing tools for unit testing only provide test case execution drivers and cannot generate test data that meets covera...The automatic generation of test data is a key step in realizing automated testing.Most automated testing tools for unit testing only provide test case execution drivers and cannot generate test data that meets coverage requirements.This paper presents an improved Whale Genetic Algorithm for generating test data re-quired for unit testing MC/DC coverage.The proposed algorithm introduces an elite retention strategy to avoid the genetic algorithm from falling into iterative degradation.At the same time,the mutation threshold of the whale algorithm is introduced to balance the global exploration and local search capabilities of the genetic al-gorithm.The threshold is dynamically adjusted according to the diversity and evolution stage of current popu-lation,which positively guides the evolution of the population.Finally,an improved crossover strategy is pro-posed to accelerate the convergence of the algorithm.The improved whale genetic algorithm is compared with genetic algorithm,whale algorithm and particle swarm algorithm on two benchmark programs.The results show that the proposed algorithm is faster for test data generation than comparison methods and can provide better coverage with fewer evaluations,and has great advantages in generating test data.展开更多
Testing is an integral part of software development.Current fastpaced system developments have rendered traditional testing techniques obsolete.Therefore,automated testing techniques are needed to adapt to such system...Testing is an integral part of software development.Current fastpaced system developments have rendered traditional testing techniques obsolete.Therefore,automated testing techniques are needed to adapt to such system developments speed.Model-based testing(MBT)is a technique that uses system models to generate and execute test cases automatically.It was identified that the test data generation(TDG)in many existing model-based test case generation(MB-TCG)approaches were still manual.An automatic and effective TDG can further reduce testing cost while detecting more faults.This study proposes an automated TDG approach in MB-TCG using the extended finite state machine model(EFSM).The proposed approach integrates MBT with combinatorial testing.The information available in an EFSM model and the boundary value analysis strategy are used to automate the domain input classifications which were done manually by the existing approach.The results showed that the proposed approach was able to detect 6.62 percent more faults than the conventionalMB-TCG but at the same time generated 43 more tests.The proposed approach effectively detects faults,but a further treatment to the generated tests such as test case prioritization should be done to increase the effectiveness and efficiency of testing.展开更多
Dynamic numerical simulation of water conditions is useful for reservoir management. In remote semi-arid areas, however, meteorological and hydrological time-series data needed for computation are not frequently measu...Dynamic numerical simulation of water conditions is useful for reservoir management. In remote semi-arid areas, however, meteorological and hydrological time-series data needed for computation are not frequently measured and must be obtained using other information. This paper presents a case study of data generation for the computation of thermal conditions in the Joumine Reservoir, Tunisia. Data from the Wind Finder web site and daily sunshine duration at the nearest weather stations were utilized to generate cloud cover and solar radiation data based on meteorological correlations obtained in Japan, which is located at the same latitude as Tunisia. A time series of inflow water temperature was estimated from air temperature using a numerical filter expressed as a linear second-order differential equation. A numerical simulation using a vertical 2-D (two-dimensional) turbulent flow model for a stratified water body with generated data successfully reproduced seasonal thermal conditions in the reservoir, which were monitored using a thermistor chain.展开更多
This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential prop-erties:(1)It is not initially defined as a ...This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential prop-erties:(1)It is not initially defined as a privacy attribute;(2)it is strongly associated with privacy attributes.In other words,attackers could utilize it to infer privacy attributes with a certain probability,indirectly resulting in the disclosure of private information.To deal with the implicit privacy disclosure problem,we give a measurable definition of implicit privacy,and propose an ex-ante implicit privacy-preserving framework based on data generation,called IMPOSTER.The framework consists of an implicit privacy detection module and an implicit privacy protection module.The former uses normalized mutual information to detect implicit privacy attributes that are strongly related to traditional privacy attributes.Based on the idea of data generation,the latter equips the Generative Adversarial Network(GAN)framework with an additional discriminator,which is used to eliminate the association between traditional privacy attributes and implicit ones.We elaborate a theoretical analysis for the convergence of the framework.Experiments demonstrate that with the learned gen-erator,IMPOSTER can alleviate the disclosure of implicit privacy while maintaining good data utility.展开更多
This paper explores the data theory of value along the line of reasoning epochal characteristics of data-theoretical innovation-paradigmatic transformation and,through a comparison of hard and soft factors and observa...This paper explores the data theory of value along the line of reasoning epochal characteristics of data-theoretical innovation-paradigmatic transformation and,through a comparison of hard and soft factors and observation of data peculiar features,it draws the conclusion that data have the epochal characteristics of non-competitiveness and non-exclusivity,decreasing marginal cost and increasing marginal return,non-physical and intangible form,and non-finiteness and non-scarcity.It is the epochal characteristics of data that undermine the traditional theory of value and innovate the“production-exchange”theory,including data value generation,data value realization,data value rights determination and data value pricing.From the perspective of data value generation,the levels of data quality,processing,use and connectivity,data application scenarios and data openness will influence data value.From the perspective of data value realization,data,as independent factors of production,show value creation effect,create a value multiplier effect by empowering other factors of production,and substitute other factors of production to create a zero-price effect.From the perspective of data value rights determination,based on the theory of property,the tragedy of the private outweighs the comedy of the private with respect to data,and based on the theory of sharing economy,the comedy of the commons outweighs the tragedy of the commons with respect to data.From the perspective of data pricing,standardized data products can be priced according to the physical product attributes,and non-standardized data products can be priced according to the virtual product attributes.Based on the epochal characteristics of data and theoretical innovation,the“production-exchange”paradigm has undergone a transformation from“using tangible factors to produce tangible products and exchanging tangible products for tangible products”to“using intangible factors to produce tangible products and exchanging intangible products for tangible products”and ultimately to“using intangible factors to produce intangible products and exchanging intangible products for intangible products”.展开更多
To improve the efficiency and coverage of stateful network protocol fuzzing, this paper proposes a new method, using a rule-based state machine and a stateful rule tree to guide the generation of fuzz testing data. Th...To improve the efficiency and coverage of stateful network protocol fuzzing, this paper proposes a new method, using a rule-based state machine and a stateful rule tree to guide the generation of fuzz testing data. The method first builds a rule-based state machine model as a formal description of the states of a network protocol. This removes safety paths, to cut down the scale of the state space. Then it uses a stateful rule tree to describe the relationship between states and messages, and then remove useless items from it. According to the message sequence obtained by the analysis of paths using the stateful rule tree and the protocol specification, an abstract data model of test case generation is defined. The fuzz testing data is produced by various generation algorithms through filling data in the fields of the data model. Using the rule-based state machine and the stateful rule tree, the quantity of test data can be reduced. Experimental results indicate that our method can discover the same vulnerabilities as traditional approaches, using less test data, while optimizing test data generation and improving test efficiency.展开更多
Software testing is one of the most crucial and analytical aspect to assure that developed software meets pre- scribed quality standards. Software development process in- vests at least 50% of the total cost in softwa...Software testing is one of the most crucial and analytical aspect to assure that developed software meets pre- scribed quality standards. Software development process in- vests at least 50% of the total cost in software testing process. Optimum and efficacious test data design of software is an important and challenging activity due to the nonlinear struc- ture of software. Moreover, test case type and scope deter- mines the quality of test data. To address this issue, software testing tools should employ intelligence based soft comput- ing techniques like particle swarm optimization (PSO) and genetic algorithm (GA) to generate smart and efficient test data automatically. This paper presents a hybrid PSO and GA based heuristic for automatic generation of test suites. In this paper, we described the design and implementation of the proposed strategy and evaluated our model by performing ex- periments with ten container classes from the Java standard library. We analyzed our algorithm statistically with test ad- equacy criterion as branch coverage. The performance ade- quacy criterion is taken as percentage coverage per unit time and percentage of faults detected by the generated test data. We have compared our work with the heuristic based upon GA, PSO, existing hybrid strategies based on GA and PSO and memetic algorithm. The results showed that the test case generation is efficient in our work.展开更多
Load modeling is one of the crucial tasks for improving smart grids’ energy efficiency. Among manyalternatives, machine learning-based load models have become popular in applications and have shownoutstanding perform...Load modeling is one of the crucial tasks for improving smart grids’ energy efficiency. Among manyalternatives, machine learning-based load models have become popular in applications and have shownoutstanding performance in recent years. The performance of these models highly relies on data quality andquantity available for training. However, gathering a sufficient amount of high-quality data is time-consumingand extremely expensive. In the last decade, Generative Adversarial Networks (GANs) have demonstrated theirpotential to solve the data shortage problem by generating synthetic data by learning from recorded/empiricaldata. Educated synthetic datasets can reduce prediction error of electricity consumption when combined withempirical data. Further, they can be used to enhance risk management calculations. Therefore, we proposeRCGAN, TimeGAN, CWGAN, and RCWGAN which take individual electricity consumption data as input toprovide synthetic data in this study. Our work focuses on one dimensional times series, and numericalexperiments on an empirical dataset show that GANs are indeed able to generate synthetic data with realisticappearance.展开更多
Offline handwritten mathematical expression recognition is a challenging optical character recognition(OCR)task due to various ambiguities of handwritten symbols and complicated two-dimensional structures.Recent work ...Offline handwritten mathematical expression recognition is a challenging optical character recognition(OCR)task due to various ambiguities of handwritten symbols and complicated two-dimensional structures.Recent work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the performance.However,the higher the complexity of the network,the more the computing resources and time required.To improve the performance without more computing requirements,we concentrate on the training data and the training strategy in this paper.We propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of CROHME.Moreover,we propose a novel training strategy called Shuffled Multi-Round Training(SMRT)to regularize the model.With the generated data and the shuffled multi-round training strategy,we achieve the state-of-the-art result in expression accuracy,i.e.,59.74%and 61.57%on CROHME 2014 and 2016,respectively,by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.展开更多
The program slicing technique is employed to calculate the current values of the variables at some interest points in software test data generation. This paper introduces the concept of statement domination to represe...The program slicing technique is employed to calculate the current values of the variables at some interest points in software test data generation. This paper introduces the concept of statement domination to represent the multiple nests, and presents a dynamic program slice algorithm based on forward analysis to generate dynamic slices. In the approach, more attention is given to the statement itself or its domination node, so computing program slices is more easy and accurate, especially for those programs with multiple nests. In addition, a case study is discussed to illustrate our algorithm. Experimental results show that the slicing technique can be used in software test data generation to enhance the effectiveness.展开更多
This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential properties:(1)It is not initially de ned as a pr...This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential properties:(1)It is not initially de ned as a privacy attribute;(2)it is strongly associated with privacy attributes.In other words,attackers could utilize it to infer privacy attributes with a certain probability,indirectly resulting in the disclosure of private information.To deal with the implicit privacy disclosure problem,we give a measurable de nition of implicit privacy,and propose an ex-ante implicit privacy-preserving framework based on data generation,called IMPOSTER.The framework consists of an implicit privacy detection module and an implicit privacy protection module.The former uses normalized mutual information to detect implicit privacy attributes that are strongly related to traditional privacy attributes.Based on the idea of data generation,the latter equips the Generative Adversarial Network(GAN)framework with an additional discriminator,which is used to eliminate the association between traditional privacy attributes and implicit ones.We elaborate a theoretical analysis for the convergence of the framework.Experiments demonstrate that with the learned generator,IMPOSTER can alleviate the disclosure of implicit privacy while maintaining good data utility.展开更多
To solve the emerging complex optimization problems, multi objectiveoptimization algorithms are needed. By introducing the surrogate model forapproximate fitness calculation, the multi objective firefly algorithm with...To solve the emerging complex optimization problems, multi objectiveoptimization algorithms are needed. By introducing the surrogate model forapproximate fitness calculation, the multi objective firefly algorithm with surrogatemodel (MOFA-SM) is proposed in this paper. Firstly, the population wasinitialized according to the chaotic mapping. Secondly, the external archive wasconstructed based on the preference sorting, with the lightweight clustering pruningstrategy. In the process of evolution, the elite solutions selected from archivewere used to guide the movement to search optimal solutions. Simulation resultsshow that the proposed algorithm can achieve better performance in terms ofconvergence iteration and stability.展开更多
Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems....Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.They imi-tate the theory of natural selection and evolution.The harmony search algorithm(HSA)is one of the most recent search algorithms in the last years.It imitates the behavior of a musician tofind the best harmony.Scholars have estimated the simi-larities and the differences between genetic algorithms and the harmony search algorithm in diverse research domains.The test data generation process represents a critical task in software validation.Unfortunately,there is no work comparing the performance of genetic algorithms and the harmony search algorithm in the test data generation process.This paper studies the similarities and the differences between genetic algorithms and the harmony search algorithm based on the ability and speed offinding the required test data.The current research performs an empirical comparison of the HSA and the GAs,and then the significance of the results is estimated using the t-Test.The study investigates the efficiency of the harmony search algorithm and the genetic algorithms according to(1)the time performance,(2)the significance of the generated test data,and(3)the adequacy of the generated test data to satisfy a given testing criterion.The results showed that the harmony search algorithm is significantly faster than the genetic algo-rithms because the t-Test showed that the p-value of the time values is 0.026<α(αis the significance level=0.05 at 95%confidence level).In contrast,there is no significant difference between the two algorithms in generating the adequate test data because the t-Test showed that the p-value of thefitness values is 0.25>α.展开更多
ZTE Corporation (ZTE) announced on February 16,2009 that their complete line of mobile broadband data cards would support Windows 7 and be compliant with the Windows Network Driver Interface Specification 6.20,NDIS6.20.
Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects an...Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.展开更多
Software testing has been attracting a lot of attention for effective software development.In model driven approach,Unified Modelling Language(UML)is a conceptual modelling approach for obligations and other features ...Software testing has been attracting a lot of attention for effective software development.In model driven approach,Unified Modelling Language(UML)is a conceptual modelling approach for obligations and other features of the system in a model-driven methodology.Specialized tools interpret these models into other software artifacts such as code,test data and documentation.The generation of test cases permits the appropriate test data to be determined that have the aptitude to ascertain the requirements.This paper focuses on optimizing the test data obtained from UML activity and state chart diagrams by using Basic Genetic Algorithm(BGA).For generating the test cases,both diagrams were converted into their corresponding intermediate graphical forms namely,Activity Diagram Graph(ADG)and State Chart Diagram Graph(SCDG).Then both graphs will be combined to form a single graph called,Activity State Chart Diagram Graph(ASCDG).Both graphs were then joined to create a single graph known as the Activity State Chart Diagram Graph(ASCDG).Next,the ASCDG will be optimized using BGA to generate the test data.A case study involving a withdrawal from the automated teller machine(ATM)of a bank was employed to demonstrate the approach.The approach successfully identified defects in various ATM functions such as messaging and operation.展开更多
This paper outlines research findings from an investigation into a range of options for generating vehicle data relevant to traffic management systems.Linking data from freight vehicles with traffic management systems...This paper outlines research findings from an investigation into a range of options for generating vehicle data relevant to traffic management systems.Linking data from freight vehicles with traffic management systems stands to provide a number of benefits.These include reducing congestion,improving safety,reducing freight vehicle trip times,informing alternative routing for freight vehicles,and informing transport planning and investment decisions.This paper will explore a number of different methods to detect,classify,and track vehicles,each having strengths and weaknesses,and each with different levels of accuracy and associated costs.In terms of freight management applications,the key feature is the capability to track in real time the position of the vehicle.This can be done using a range of technologies that either are located on the vehicle such as GPS(global positioning system)trackers and RFID(Radio Frequency Identification)Tags or are part of the network infrastructure such as CCTV(Closed Circuit Television)cameras,satellites,mobile phone towers,Wi-Fi receivers and RFID readers.Technology in this space is advancing quickly having started with a focus on infrastructure based sensors and communications devices and more recently shifting to GPS and mobile devices.The paper concludes with an overview of considerations for how data from freight vehicles may interact with traffic management systems for mutual benefit.This new area of research and practice seeks to balance the needs of traffic management systems in order to better manage traffic and prevent bottlenecks and congestion while delivering tangible benefits to freight companies stands to be of great interest in the coming decade.This research has been developed with funding and support provided by Australia’s SBEnrc(Sustainable Built Environment National Research Centre)and its partners.展开更多
At present,deep learning has been well applied in many fields.However,due to the high complexity of hypothesis space,numerous training samples are usually required to ensure the reliability of minimizing experience ri...At present,deep learning has been well applied in many fields.However,due to the high complexity of hypothesis space,numerous training samples are usually required to ensure the reliability of minimizing experience risk.Therefore,training a classifier with a small number of training examples is a challenging task.From a biological point of view,based on the assumption that rich prior knowledge and analogical association should enable human beings to quickly distinguish novel things from a few or even one example,we proposed a dynamic analogical association algorithm to make the model use only a few labeled samples for classification.To be specific,the algorithm search for knowledge structures similar to existing tasks in prior knowledge based on manifold matching,and combine sampling distributions to generate offsets instead of two sample points,thereby ensuring high confidence and significant contribution to the classification.The comparative results on two common benchmark datasets substantiate the superiority of the proposed method compared to existing data generation approaches for few-shot learning,and the effectiveness of the algorithm has been proved through ablation experiments.展开更多
基金This work was supported by The National Natural Science Fund of China(Grant No.31670554)The Natural Science Foundation of Jiangsu Province of China(Grant No.BK20161527)+1 种基金We also received three Projects Funded by The Project funded by China Postdoctoral Science Foundation(Grant Nos.2018T110505,2017M611828)The Priority Academic Program Development(PAPD)of Jiangsu Higher Education Institutions.The authors wish to express their appreciation to the reviewers for their helpful suggestions which greatly improved the presentation of this paper.
文摘For rechargeable wireless sensor networks,limited energy storage capacity,dynamic energy supply,low and dynamic duty cycles cause that it is unpractical to maintain a fixed routing path for packets delivery permanently from a source to destination in a distributed scenario.Therefore,before data delivery,a sensor has to update its waking schedule continuously and share them to its neighbors,which lead to high energy expenditure for reestablishing path links frequently and low efficiency of energy utilization for collecting packets.In this work,we propose the maximum data generation rate routing protocol based on data flow controlling technology.For a sensor,it does not share its waking schedule to its neighbors and cache any waking schedules of other sensors.Hence,the energy consumption for time synchronization,location information and waking schedule shared will be reduced significantly.The saving energy can be used for improving data collection rate.Simulation shows our scheme is efficient to improve packets generation rate in rechargeable wireless sensor networks.
文摘By analyzing some existing test data generation methods, a new automated test data generation approach was presented. The linear predicate functions on a given path was directly used to construct a linear constrain system for input variables. Only when the predicate function is nonlinear, does the linear arithmetic representation need to be computed. If the entire predicate functions on the given path are linear, either the desired test data or the guarantee that the path is infeasible can be gotten from the solution of the constrain system. Otherwise, the iterative refining for the input is required to obtain the desired test data. Theoretical analysis and test results show that the approach is simple and effective, and takes less computation. The scheme can also be used to generate path-based test data for the programs with arrays and loops.
文摘The automatic generation of test data is a key step in realizing automated testing.Most automated testing tools for unit testing only provide test case execution drivers and cannot generate test data that meets coverage requirements.This paper presents an improved Whale Genetic Algorithm for generating test data re-quired for unit testing MC/DC coverage.The proposed algorithm introduces an elite retention strategy to avoid the genetic algorithm from falling into iterative degradation.At the same time,the mutation threshold of the whale algorithm is introduced to balance the global exploration and local search capabilities of the genetic al-gorithm.The threshold is dynamically adjusted according to the diversity and evolution stage of current popu-lation,which positively guides the evolution of the population.Finally,an improved crossover strategy is pro-posed to accelerate the convergence of the algorithm.The improved whale genetic algorithm is compared with genetic algorithm,whale algorithm and particle swarm algorithm on two benchmark programs.The results show that the proposed algorithm is faster for test data generation than comparison methods and can provide better coverage with fewer evaluations,and has great advantages in generating test data.
基金The research was funded by Universiti Teknologi Malaysia(UTM)and the MalaysianMinistry of Higher Education(MOHE)under the Industry-International Incentive Grant Scheme(IIIGS)(Vote Number:Q.J130000.3651.02M67 and Q.J130000.3051.01M86)the Aca-demic Fellowship Scheme(SLAM).
文摘Testing is an integral part of software development.Current fastpaced system developments have rendered traditional testing techniques obsolete.Therefore,automated testing techniques are needed to adapt to such system developments speed.Model-based testing(MBT)is a technique that uses system models to generate and execute test cases automatically.It was identified that the test data generation(TDG)in many existing model-based test case generation(MB-TCG)approaches were still manual.An automatic and effective TDG can further reduce testing cost while detecting more faults.This study proposes an automated TDG approach in MB-TCG using the extended finite state machine model(EFSM).The proposed approach integrates MBT with combinatorial testing.The information available in an EFSM model and the boundary value analysis strategy are used to automate the domain input classifications which were done manually by the existing approach.The results showed that the proposed approach was able to detect 6.62 percent more faults than the conventionalMB-TCG but at the same time generated 43 more tests.The proposed approach effectively detects faults,but a further treatment to the generated tests such as test case prioritization should be done to increase the effectiveness and efficiency of testing.
文摘Dynamic numerical simulation of water conditions is useful for reservoir management. In remote semi-arid areas, however, meteorological and hydrological time-series data needed for computation are not frequently measured and must be obtained using other information. This paper presents a case study of data generation for the computation of thermal conditions in the Joumine Reservoir, Tunisia. Data from the Wind Finder web site and daily sunshine duration at the nearest weather stations were utilized to generate cloud cover and solar radiation data based on meteorological correlations obtained in Japan, which is located at the same latitude as Tunisia. A time series of inflow water temperature was estimated from air temperature using a numerical filter expressed as a linear second-order differential equation. A numerical simulation using a vertical 2-D (two-dimensional) turbulent flow model for a stratified water body with generated data successfully reproduced seasonal thermal conditions in the reservoir, which were monitored using a thermistor chain.
基金supported in part by the National Key Research and Development Program of China under Grant 2018YFB2100801in part by the National Natural Science Foundation of China(NSFC)under Grant 61972287in part by the Fundamental Research Funds for the Central Universities under Grant 22120210524.
文摘This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential prop-erties:(1)It is not initially defined as a privacy attribute;(2)it is strongly associated with privacy attributes.In other words,attackers could utilize it to infer privacy attributes with a certain probability,indirectly resulting in the disclosure of private information.To deal with the implicit privacy disclosure problem,we give a measurable definition of implicit privacy,and propose an ex-ante implicit privacy-preserving framework based on data generation,called IMPOSTER.The framework consists of an implicit privacy detection module and an implicit privacy protection module.The former uses normalized mutual information to detect implicit privacy attributes that are strongly related to traditional privacy attributes.Based on the idea of data generation,the latter equips the Generative Adversarial Network(GAN)framework with an additional discriminator,which is used to eliminate the association between traditional privacy attributes and implicit ones.We elaborate a theoretical analysis for the convergence of the framework.Experiments demonstrate that with the learned gen-erator,IMPOSTER can alleviate the disclosure of implicit privacy while maintaining good data utility.
基金funded by“Management Model Innovation of Chinese Enterprises”Research Project,Institute of Industrial Economics,CASS(Grant No.2019-gjs-06)Project under the Graduate Student Scientific and Research Innovation Support Program,University of Chinese Academy of Social Sciences(Graduate School)(Grant No.2022-KY-118).
文摘This paper explores the data theory of value along the line of reasoning epochal characteristics of data-theoretical innovation-paradigmatic transformation and,through a comparison of hard and soft factors and observation of data peculiar features,it draws the conclusion that data have the epochal characteristics of non-competitiveness and non-exclusivity,decreasing marginal cost and increasing marginal return,non-physical and intangible form,and non-finiteness and non-scarcity.It is the epochal characteristics of data that undermine the traditional theory of value and innovate the“production-exchange”theory,including data value generation,data value realization,data value rights determination and data value pricing.From the perspective of data value generation,the levels of data quality,processing,use and connectivity,data application scenarios and data openness will influence data value.From the perspective of data value realization,data,as independent factors of production,show value creation effect,create a value multiplier effect by empowering other factors of production,and substitute other factors of production to create a zero-price effect.From the perspective of data value rights determination,based on the theory of property,the tragedy of the private outweighs the comedy of the private with respect to data,and based on the theory of sharing economy,the comedy of the commons outweighs the tragedy of the commons with respect to data.From the perspective of data pricing,standardized data products can be priced according to the physical product attributes,and non-standardized data products can be priced according to the virtual product attributes.Based on the epochal characteristics of data and theoretical innovation,the“production-exchange”paradigm has undergone a transformation from“using tangible factors to produce tangible products and exchanging tangible products for tangible products”to“using intangible factors to produce tangible products and exchanging intangible products for tangible products”and ultimately to“using intangible factors to produce intangible products and exchanging intangible products for intangible products”.
基金supported by the Key Project of National Defense Basic Research Program of China (No.B1120132031)supported by the Cultivation and Development Program for Technology Innovation Base of Beijing Municipal Science and Technology Commission (No.Z151100001615034)
文摘To improve the efficiency and coverage of stateful network protocol fuzzing, this paper proposes a new method, using a rule-based state machine and a stateful rule tree to guide the generation of fuzz testing data. The method first builds a rule-based state machine model as a formal description of the states of a network protocol. This removes safety paths, to cut down the scale of the state space. Then it uses a stateful rule tree to describe the relationship between states and messages, and then remove useless items from it. According to the message sequence obtained by the analysis of paths using the stateful rule tree and the protocol specification, an abstract data model of test case generation is defined. The fuzz testing data is produced by various generation algorithms through filling data in the fields of the data model. Using the rule-based state machine and the stateful rule tree, the quantity of test data can be reduced. Experimental results indicate that our method can discover the same vulnerabilities as traditional approaches, using less test data, while optimizing test data generation and improving test efficiency.
文摘Software testing is one of the most crucial and analytical aspect to assure that developed software meets pre- scribed quality standards. Software development process in- vests at least 50% of the total cost in software testing process. Optimum and efficacious test data design of software is an important and challenging activity due to the nonlinear struc- ture of software. Moreover, test case type and scope deter- mines the quality of test data. To address this issue, software testing tools should employ intelligence based soft comput- ing techniques like particle swarm optimization (PSO) and genetic algorithm (GA) to generate smart and efficient test data automatically. This paper presents a hybrid PSO and GA based heuristic for automatic generation of test suites. In this paper, we described the design and implementation of the proposed strategy and evaluated our model by performing ex- periments with ten container classes from the Java standard library. We analyzed our algorithm statistically with test ad- equacy criterion as branch coverage. The performance ade- quacy criterion is taken as percentage coverage per unit time and percentage of faults detected by the generated test data. We have compared our work with the heuristic based upon GA, PSO, existing hybrid strategies based on GA and PSO and memetic algorithm. The results showed that the test case generation is efficient in our work.
文摘Load modeling is one of the crucial tasks for improving smart grids’ energy efficiency. Among manyalternatives, machine learning-based load models have become popular in applications and have shownoutstanding performance in recent years. The performance of these models highly relies on data quality andquantity available for training. However, gathering a sufficient amount of high-quality data is time-consumingand extremely expensive. In the last decade, Generative Adversarial Networks (GANs) have demonstrated theirpotential to solve the data shortage problem by generating synthetic data by learning from recorded/empiricaldata. Educated synthetic datasets can reduce prediction error of electricity consumption when combined withempirical data. Further, they can be used to enhance risk management calculations. Therefore, we proposeRCGAN, TimeGAN, CWGAN, and RCWGAN which take individual electricity consumption data as input toprovide synthetic data in this study. Our work focuses on one dimensional times series, and numericalexperiments on an empirical dataset show that GANs are indeed able to generate synthetic data with realisticappearance.
基金the National Key Research and Development Program of China No.2020YFB1313602.
文摘Offline handwritten mathematical expression recognition is a challenging optical character recognition(OCR)task due to various ambiguities of handwritten symbols and complicated two-dimensional structures.Recent work in this area usually constructs deeper and deeper neural networks trained with end-to-end approaches to improve the performance.However,the higher the complexity of the network,the more the computing resources and time required.To improve the performance without more computing requirements,we concentrate on the training data and the training strategy in this paper.We propose a data augmentation method which can generate synthetic samples with new LaTeX notations by only using the official training data of CROHME.Moreover,we propose a novel training strategy called Shuffled Multi-Round Training(SMRT)to regularize the model.With the generated data and the shuffled multi-round training strategy,we achieve the state-of-the-art result in expression accuracy,i.e.,59.74%and 61.57%on CROHME 2014 and 2016,respectively,by using attention-based encoder-decoder models for offline handwritten mathematical expression recognition.
基金the National Natural Science Foundation of China (No. 60473032)the Science and Technology Emphases Item of China Ministry of Education (No. 105018)the Beijing Natural Science Foundation (No. 4072021)
文摘The program slicing technique is employed to calculate the current values of the variables at some interest points in software test data generation. This paper introduces the concept of statement domination to represent the multiple nests, and presents a dynamic program slice algorithm based on forward analysis to generate dynamic slices. In the approach, more attention is given to the statement itself or its domination node, so computing program slices is more easy and accurate, especially for those programs with multiple nests. In addition, a case study is discussed to illustrate our algorithm. Experimental results show that the slicing technique can be used in software test data generation to enhance the effectiveness.
基金the National Key Research and Development Program of China under Grant 2018YFB2100801in part by the National Natural Science Foundation of China(NSFC)under Grant 61972287in part by the Fundamental Research Funds for the Central Universities under Grant 22120210524.
文摘This paper addresses a special and imperceptible class of privacy,called implicit privacy.In contrast to traditional(explicit)privacy,implicit privacy has two essential properties:(1)It is not initially de ned as a privacy attribute;(2)it is strongly associated with privacy attributes.In other words,attackers could utilize it to infer privacy attributes with a certain probability,indirectly resulting in the disclosure of private information.To deal with the implicit privacy disclosure problem,we give a measurable de nition of implicit privacy,and propose an ex-ante implicit privacy-preserving framework based on data generation,called IMPOSTER.The framework consists of an implicit privacy detection module and an implicit privacy protection module.The former uses normalized mutual information to detect implicit privacy attributes that are strongly related to traditional privacy attributes.Based on the idea of data generation,the latter equips the Generative Adversarial Network(GAN)framework with an additional discriminator,which is used to eliminate the association between traditional privacy attributes and implicit ones.We elaborate a theoretical analysis for the convergence of the framework.Experiments demonstrate that with the learned generator,IMPOSTER can alleviate the disclosure of implicit privacy while maintaining good data utility.
文摘To solve the emerging complex optimization problems, multi objectiveoptimization algorithms are needed. By introducing the surrogate model forapproximate fitness calculation, the multi objective firefly algorithm with surrogatemodel (MOFA-SM) is proposed in this paper. Firstly, the population wasinitialized according to the chaotic mapping. Secondly, the external archive wasconstructed based on the preference sorting, with the lightweight clustering pruningstrategy. In the process of evolution, the elite solutions selected from archivewere used to guide the movement to search optimal solutions. Simulation resultsshow that the proposed algorithm can achieve better performance in terms ofconvergence iteration and stability.
文摘Many search-based algorithms have been successfully applied in sev-eral software engineering activities.Genetic algorithms(GAs)are the most used in the scientific domains by scholars to solve software testing problems.They imi-tate the theory of natural selection and evolution.The harmony search algorithm(HSA)is one of the most recent search algorithms in the last years.It imitates the behavior of a musician tofind the best harmony.Scholars have estimated the simi-larities and the differences between genetic algorithms and the harmony search algorithm in diverse research domains.The test data generation process represents a critical task in software validation.Unfortunately,there is no work comparing the performance of genetic algorithms and the harmony search algorithm in the test data generation process.This paper studies the similarities and the differences between genetic algorithms and the harmony search algorithm based on the ability and speed offinding the required test data.The current research performs an empirical comparison of the HSA and the GAs,and then the significance of the results is estimated using the t-Test.The study investigates the efficiency of the harmony search algorithm and the genetic algorithms according to(1)the time performance,(2)the significance of the generated test data,and(3)the adequacy of the generated test data to satisfy a given testing criterion.The results showed that the harmony search algorithm is significantly faster than the genetic algo-rithms because the t-Test showed that the p-value of the time values is 0.026<α(αis the significance level=0.05 at 95%confidence level).In contrast,there is no significant difference between the two algorithms in generating the adequate test data because the t-Test showed that the p-value of thefitness values is 0.25>α.
文摘ZTE Corporation (ZTE) announced on February 16,2009 that their complete line of mobile broadband data cards would support Windows 7 and be compliant with the Windows Network Driver Interface Specification 6.20,NDIS6.20.
文摘Discovering floating wastes,especially bottles on water,is a crucial research problem in environmental hygiene.Nevertheless,real-world applications often face challenges such as interference from irrelevant objects and the high cost associated with data collection.Consequently,devising algorithms capable of accurately localizing specific objects within a scene in scenarios where annotated data is limited remains a formidable challenge.To solve this problem,this paper proposes an object discovery by request problem setting and a corresponding algorithmic framework.The proposed problem setting aims to identify specified objects in scenes,and the associated algorithmic framework comprises pseudo data generation and object discovery by request network.Pseudo-data generation generates images resembling natural scenes through various data augmentation rules,using a small number of object samples and scene images.The network structure of object discovery by request utilizes the pre-trained Vision Transformer(ViT)model as the backbone,employs object-centric methods to learn the latent representations of foreground objects,and applies patch-level reconstruction constraints to the model.During the validation phase,we use the generated pseudo datasets as training sets and evaluate the performance of our model on the original test sets.Experiments have proved that our method achieves state-of-the-art performance on Unmanned Aerial Vehicles-Bottle Detection(UAV-BD)dataset and self-constructed dataset Bottle,especially in multi-object scenarios.
基金support from the Deanship of Scientific Research,University of Hail,Saudi Arabia through the project Ref.(RG-191315).
文摘Software testing has been attracting a lot of attention for effective software development.In model driven approach,Unified Modelling Language(UML)is a conceptual modelling approach for obligations and other features of the system in a model-driven methodology.Specialized tools interpret these models into other software artifacts such as code,test data and documentation.The generation of test cases permits the appropriate test data to be determined that have the aptitude to ascertain the requirements.This paper focuses on optimizing the test data obtained from UML activity and state chart diagrams by using Basic Genetic Algorithm(BGA).For generating the test cases,both diagrams were converted into their corresponding intermediate graphical forms namely,Activity Diagram Graph(ADG)and State Chart Diagram Graph(SCDG).Then both graphs will be combined to form a single graph called,Activity State Chart Diagram Graph(ASCDG).Both graphs were then joined to create a single graph known as the Activity State Chart Diagram Graph(ASCDG).Next,the ASCDG will be optimized using BGA to generate the test data.A case study involving a withdrawal from the automated teller machine(ATM)of a bank was employed to demonstrate the approach.The approach successfully identified defects in various ATM functions such as messaging and operation.
基金funding and support provided by Australia’s SBEnrc(Sustainable Built Environment National Research Centre)and its partners.
文摘This paper outlines research findings from an investigation into a range of options for generating vehicle data relevant to traffic management systems.Linking data from freight vehicles with traffic management systems stands to provide a number of benefits.These include reducing congestion,improving safety,reducing freight vehicle trip times,informing alternative routing for freight vehicles,and informing transport planning and investment decisions.This paper will explore a number of different methods to detect,classify,and track vehicles,each having strengths and weaknesses,and each with different levels of accuracy and associated costs.In terms of freight management applications,the key feature is the capability to track in real time the position of the vehicle.This can be done using a range of technologies that either are located on the vehicle such as GPS(global positioning system)trackers and RFID(Radio Frequency Identification)Tags or are part of the network infrastructure such as CCTV(Closed Circuit Television)cameras,satellites,mobile phone towers,Wi-Fi receivers and RFID readers.Technology in this space is advancing quickly having started with a focus on infrastructure based sensors and communications devices and more recently shifting to GPS and mobile devices.The paper concludes with an overview of considerations for how data from freight vehicles may interact with traffic management systems for mutual benefit.This new area of research and practice seeks to balance the needs of traffic management systems in order to better manage traffic and prevent bottlenecks and congestion while delivering tangible benefits to freight companies stands to be of great interest in the coming decade.This research has been developed with funding and support provided by Australia’s SBEnrc(Sustainable Built Environment National Research Centre)and its partners.
基金This work was supported by The National Natural Science Foundation of China(No.61402537)Sichuan Science and Technology Program(Nos.2019ZDZX0006,2020YFQ0056)+1 种基金the West Light Foundation of Chinese Academy of Sciences(201899)the Talents by Sichuan provincial Party Committee Organization Department,and Science and Technology Service Network Initiative(KFJ-STS-QYZD-2021-21-001).
文摘At present,deep learning has been well applied in many fields.However,due to the high complexity of hypothesis space,numerous training samples are usually required to ensure the reliability of minimizing experience risk.Therefore,training a classifier with a small number of training examples is a challenging task.From a biological point of view,based on the assumption that rich prior knowledge and analogical association should enable human beings to quickly distinguish novel things from a few or even one example,we proposed a dynamic analogical association algorithm to make the model use only a few labeled samples for classification.To be specific,the algorithm search for knowledge structures similar to existing tasks in prior knowledge based on manifold matching,and combine sampling distributions to generate offsets instead of two sample points,thereby ensuring high confidence and significant contribution to the classification.The comparative results on two common benchmark datasets substantiate the superiority of the proposed method compared to existing data generation approaches for few-shot learning,and the effectiveness of the algorithm has been proved through ablation experiments.