Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount impo...Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI.One widely used testing method for this purpose is fuzz testing,which detects bugs by inputting random test cases into the target program.However,this process consumes significant time and resources.To improve the efficiency of compiler fuzz testing,it is common practice to utilize test case prioritization techniques.Some researchers use machine learning to predict the code coverage of test cases,aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases.Nevertheless,these methods can only forecast the code coverage of the compiler at a specific optimization level,potentially missing many optimization-related bugs.In this paper,we introduce C-CORE(short for Clustering by Code Representation),the first framework to prioritize test cases according to their code representations,which are derived directly from the source codes.This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs.Specifically,we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer.Using this pre-trained model,we then train two downstream models:one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs.Subsequently,we cluster the test cases according to their code representations and select the highest-scoring test case from each cluster as the high-quality test case.This reduction in redundant testing cases leads to time savings.Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities,and C-CORE significantly enhances testing efficiency.Across four datasets,C-CORE increases the average of the percentage of faults detected(APFD)value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases.When compared to the best results from approaches using predicted code coverage,C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%.展开更多
As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge device...As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.展开更多
The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,o...The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.展开更多
Numerous clothing enterprises in the market have a relatively low efficiency of assembly line planning due to insufficient optimization of bottleneck stations.As a result,the production efficiency of the enterprise is...Numerous clothing enterprises in the market have a relatively low efficiency of assembly line planning due to insufficient optimization of bottleneck stations.As a result,the production efficiency of the enterprise is not high,and the production organization is not up to expectations.Aiming at the problem of flexible process route planning in garment workshops,a multi-object genetic algorithm is proposed to solve the assembly line bal-ance optimization problem and minimize the machine adjustment path.The encoding method adopts the object-oriented path representation method,and the initial population is generated by random topology sorting based on an in-degree selection mechanism.The multi-object genetic algorithm improves the mutation and crossover operations according to the characteristics of the clothing process to avoid the generation of invalid offspring.In the iterative process,the bottleneck station is optimized by reasonable process splitting,and process allocation conforms to the strict limit of the station on the number of machines in order to improve the compilation efficiency.The effectiveness and feasibility of the multi-object genetic algorithm are proven by the analysis of clothing cases.Compared with the artificial allocation process,the compilation efficiency of MOGA is increased by more than 15%and completes the optimization of the minimum machine adjustment path.The results are in line with the expected optimization effect.展开更多
The diversity of software and hardware forces programmers to spend a great deal of time optimizing their source code,which often requires specific treatment for each platform.The problem becomes critical on embedded d...The diversity of software and hardware forces programmers to spend a great deal of time optimizing their source code,which often requires specific treatment for each platform.The problem becomes critical on embedded devices,where computational and memory resources are strictly constrained.Compilers play an essential role in deploying source code on a target device through the backend.In this work,a novel backend for the Open Neural Network Compiler(ONNC)is proposed,which exploits machine learning to optimize code for the ARM Cortex-M device.The backend requires minimal changes to Open Neural Network Exchange(ONNX)models.Several novel optimization techniques are also incorporated in the backend,such as quantizing the ONNX model’s weight and automatically tuning the dimensions of operators in computations.The performance of the proposed framework is evaluated for two applications:handwritten digit recognition on the Modified National Institute of Standards and Technology(MNIST)dataset and model,and image classification on the Canadian Institute For Advanced Research and 10(CIFAR-10)dataset with the AlexNet-Light model.The system achieves 98.90%and 90.55%accuracy for handwritten digit recognition and image classification,respectively.Furthermore,the proposed architecture is significantly more lightweight than other state-of-theart models in terms of both computation time and generated source code complexity.From the system perspective,this work provides a novel approach to deploying direct computations from the available ONNX models to target devices by optimizing compilers while maintaining high efficiency in accuracy performance.展开更多
To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic...To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic scheduling algorithms can generate local optima easily,while the dynamic programming algorithm has a long convergence time for complex structural models.This paper mainly studies the parallel scheduling between operators and proposes an inter-operator parallelism schedule(IOPS)scheduling algorithm that guarantees the minimum similar execution delay.Firstly,a graph partitioning algorithm based on the largest block is designed to split the neural network model into multiple subgraphs.Then,the operators that meet the conditions is replaced according to the defined operator replacement rules.Finally,the optimal scheduling method based on backtracking is used to schedule the computational graph.Network models such as Inception-v3,ResNet-50,and RandWire are selected for testing.The experimental results show that the algorithm designed in this paper can achieve a 1.6×speedup compared with the existing sequential execution methods.展开更多
A large number texts depicting heroes have been compiled in the ministry edition of Chinese textbooks in junior high school,such as Deng Jiaxian,a scientist who has ardent love for the motherland and is fearless of di...A large number texts depicting heroes have been compiled in the ministry edition of Chinese textbooks in junior high school,such as Deng Jiaxian,a scientist who has ardent love for the motherland and is fearless of difficulties and dangers;Wen Yiduo,a patriot who is devoted to study and has a strong sense of righteousness;Ye Shengtao,a scholar who is noted for his meticulous scholarship and attends to everything personally;Alizer Buffy,an unknown old man who is devoted to tree-planting in obscurity,and so on.The analysis,understanding and reading of these heroic stories and the interpretation of the significance and value of the selection of heroic images into Chinese textbooks are conducive to grasping the core quality of Chinese courses and improving the teaching of heroic images;inheriting and carrying forward the fine civilization and historical and cultural traditions of mankind,and establishing national cultural self-confidence and pride;perfecting the value system of educators and empowering students to grow healthily;guiding teenagers with immature values to establish healthy and upward values.展开更多
Prototype landscape refers to the impressive scenes that one has experienced in his/her living environment before 20 years old.Based on the analysis of the existing literature,the authors compiled a standard scale typ...Prototype landscape refers to the impressive scenes that one has experienced in his/her living environment before 20 years old.Based on the analysis of the existing literature,the authors compiled a standard scale type questionnaire by means of a field survey,which was about the influences of prototype landscape on one's landscape perception.Taking Likert scale as the main part,this questionnaire analyzed the influence of prototype landscape on landscape perception from perception,attitude,and behavior dimensions.In order to further improve its rationality,the authors tested some other aspects of this questionnaire,including logic validity,construct validity,congeniality reliability,split-half reliability,etc..The results validated that the questionnaire possessed good theoretical structure and validity target,which can evaluate various aspects of prototype landscape on one's landscape perception in an effective and reliable way.Therefore,the questionnaire put forward by this study not only enriched the studies of prototype landscape on landscape designing,but also provided an effective tool for quantitative analysis of "the influences of prototype landscape on one's landscape perception".展开更多
This study examines whether the level of Certified Public Accounting(CPA)firm assurance associated with financial statements affects commercial lending decisions.A between-subjects behavioral experiment is used with t...This study examines whether the level of Certified Public Accounting(CPA)firm assurance associated with financial statements affects commercial lending decisions.A between-subjects behavioral experiment is used with three conditions involving different levels of CPA firm assurance—compilations,reviews,and audits.Findings indicate that neither the lenders’risk assessments of loan applicants nor their elicited probabilities of granting credit differed among the three levels of CPA firm assurance.展开更多
In view of lake scenic areas with abundant tourist resources but less-developed economy,contradictions between the urgency of its tourist resource development and complexity of urban-rural planning compilation were an...In view of lake scenic areas with abundant tourist resources but less-developed economy,contradictions between the urgency of its tourist resource development and complexity of urban-rural planning compilation were analyzed,and also limitations summarized as insufficient time and fund.Statutory planning contents included in the integrated compilation system were elaborated,and compilation of the integrated planning for the Longhe Lake Scenic Area in the Taihang Mountains was taken for example to introduce planning concepts of the compilation technical system.Considering characteristics of the study area,"regionality" was stressed as the foundation of planning compilation,concise,convenient and practical planning compilation contents were advocated and further explained from the perspectives of compiling by layer and category.In view of this,it is necessary to apply integrated compilation mode under certain circumstances,so as to provide a new approach for the planning compilation of other regions in China and enhance economic and social development of local areas.展开更多
随着SoC技术的不断发展以及集成应用设计规模和复杂度的不断提升,使用传统的RTL设计方法难度越来越大。高级综合技术(High-level synthesis,HLS)可以实现将C语言描述的算法级设计自动转换成HDL语言描述的寄存器级设计。使用Synphony C C...随着SoC技术的不断发展以及集成应用设计规模和复杂度的不断提升,使用传统的RTL设计方法难度越来越大。高级综合技术(High-level synthesis,HLS)可以实现将C语言描述的算法级设计自动转换成HDL语言描述的寄存器级设计。使用Synphony C Compiler综合工具进行RS编、译码算法设计,利用综合工具快速的架构探索以及高效的验证方法,在综合性能、面积、功耗等要求之后,完成算法C语言到Verilog HDL语言的快速转换。这种设计方法大大缩短了设计周期。展开更多
文摘Edge devices,due to their limited computational and storage resources,often require the use of compilers for program optimization.Therefore,ensuring the security and reliability of these compilers is of paramount importance in the emerging field of edge AI.One widely used testing method for this purpose is fuzz testing,which detects bugs by inputting random test cases into the target program.However,this process consumes significant time and resources.To improve the efficiency of compiler fuzz testing,it is common practice to utilize test case prioritization techniques.Some researchers use machine learning to predict the code coverage of test cases,aiming to maximize the test capability for the target compiler by increasing the overall predicted coverage of the test cases.Nevertheless,these methods can only forecast the code coverage of the compiler at a specific optimization level,potentially missing many optimization-related bugs.In this paper,we introduce C-CORE(short for Clustering by Code Representation),the first framework to prioritize test cases according to their code representations,which are derived directly from the source codes.This approach avoids being limited to specific compiler states and extends to a broader range of compiler bugs.Specifically,we first train a scaled pre-trained programming language model to capture as many common features as possible from the test cases generated by a fuzzer.Using this pre-trained model,we then train two downstream models:one for predicting the likelihood of triggering a bug and another for identifying code representations associated with bugs.Subsequently,we cluster the test cases according to their code representations and select the highest-scoring test case from each cluster as the high-quality test case.This reduction in redundant testing cases leads to time savings.Comprehensive evaluation results reveal that code representations are better at distinguishing test capabilities,and C-CORE significantly enhances testing efficiency.Across four datasets,C-CORE increases the average of the percentage of faults detected(APFD)value by 0.16 to 0.31 and reduces test time by over 50% in 46% of cases.When compared to the best results from approaches using predicted code coverage,C-CORE improves the APFD value by 1.1% to 12.3% and achieves an overall time-saving of 159.1%.
基金supported by the National Natural Science Foundation of China(U21A20519)。
文摘As a large amount of data is increasingly generated from edge devices,such as smart homes,mobile phones,and wearable devices,it becomes crucial for many applications to deploy machine learning modes across edge devices.The execution speed of the deployed model is a key element to ensure service quality.Considering a highly heterogeneous edge deployment scenario,deep learning compiling is a novel approach that aims to solve this problem.It defines models using certain DSLs and generates efficient code implementations on different hardware devices.However,there are still two aspects that are not yet thoroughly investigated yet.The first is the optimization of memory-intensive operations,and the second problem is the heterogeneity of the deployment target.To that end,in this work,we propose a system solution that optimizes memory-intensive operation,optimizes the subgraph distribution,and enables the compiling and deployment of DNN models on multiple targets.The evaluation results show the performance of our proposed system.
基金The author would like to thank the Deanship of Scientific Research at Majmaah University for supporting this work under Project Number No.R-2022-85.
文摘The paper addresses the challenge of transmitting a big number offiles stored in a data center(DC),encrypting them by compilers,and sending them through a network at an acceptable time.Face to the big number offiles,only one compiler may not be sufficient to encrypt data in an acceptable time.In this paper,we consider the problem of several compilers and the objective is tofind an algorithm that can give an efficient schedule for the givenfiles to be compiled by the compilers.The main objective of the work is to minimize the gap in the total size of assignedfiles between compilers.This minimization ensures the fair distribution offiles to different compilers.This problem is considered to be a very hard problem.This paper presents two research axes.Thefirst axis is related to architecture.We propose a novel pre-compiler architecture in this context.The second axis is algorithmic development.We develop six algorithms to solve the problem,in this context.These algorithms are based on the dispatching rules method,decomposition method,and an iterative approach.These algorithms give approximate solutions for the studied problem.An experimental result is imple-mented to show the performance of algorithms.Several indicators are used to measure the performance of the proposed algorithms.In addition,five classes are proposed to test the algorithms with a total of 2350 instances.A comparison between the proposed algorithms is presented in different tables discussed to show the performance of each algorithm.The result showed that the best algorithm is the Iterative-mixed Smallest-Longest-Heuristic(ISL)with a percentage equal to 97.7%and an average running time equal to 0.148 s.All other algorithms did not exceed 22%as a percentage.The best algorithm excluding ISL is Iterative-mixed Longest-Smallest Heuristic(ILS)with a percentage equal to 21,4%and an average running time equal to 0.150 s.
基金supported by Key R&D project of Zhejiang Province (2018C01005),http://kjt.zj.gov.cn/.
文摘Numerous clothing enterprises in the market have a relatively low efficiency of assembly line planning due to insufficient optimization of bottleneck stations.As a result,the production efficiency of the enterprise is not high,and the production organization is not up to expectations.Aiming at the problem of flexible process route planning in garment workshops,a multi-object genetic algorithm is proposed to solve the assembly line bal-ance optimization problem and minimize the machine adjustment path.The encoding method adopts the object-oriented path representation method,and the initial population is generated by random topology sorting based on an in-degree selection mechanism.The multi-object genetic algorithm improves the mutation and crossover operations according to the characteristics of the clothing process to avoid the generation of invalid offspring.In the iterative process,the bottleneck station is optimized by reasonable process splitting,and process allocation conforms to the strict limit of the station on the number of machines in order to improve the compilation efficiency.The effectiveness and feasibility of the multi-object genetic algorithm are proven by the analysis of clothing cases.Compared with the artificial allocation process,the compilation efficiency of MOGA is increased by more than 15%and completes the optimization of the minimum machine adjustment path.The results are in line with the expected optimization effect.
基金This work was supported in part by the Ministry of Science and Technology of Taiwan,R.O.C.,the Grant Number of project 108-2218-E-194-007.
文摘The diversity of software and hardware forces programmers to spend a great deal of time optimizing their source code,which often requires specific treatment for each platform.The problem becomes critical on embedded devices,where computational and memory resources are strictly constrained.Compilers play an essential role in deploying source code on a target device through the backend.In this work,a novel backend for the Open Neural Network Compiler(ONNC)is proposed,which exploits machine learning to optimize code for the ARM Cortex-M device.The backend requires minimal changes to Open Neural Network Exchange(ONNX)models.Several novel optimization techniques are also incorporated in the backend,such as quantizing the ONNX model’s weight and automatically tuning the dimensions of operators in computations.The performance of the proposed framework is evaluated for two applications:handwritten digit recognition on the Modified National Institute of Standards and Technology(MNIST)dataset and model,and image classification on the Canadian Institute For Advanced Research and 10(CIFAR-10)dataset with the AlexNet-Light model.The system achieves 98.90%and 90.55%accuracy for handwritten digit recognition and image classification,respectively.Furthermore,the proposed architecture is significantly more lightweight than other state-of-theart models in terms of both computation time and generated source code complexity.From the system perspective,this work provides a novel approach to deploying direct computations from the available ONNX models to target devices by optimizing compilers while maintaining high efficiency in accuracy performance.
基金Supported by the National Key Research and Development Project of China(No.2020AAA0104603)the National Natural Science Foundation of China(No.61834005,61772417)the Shaanxi Province Key R&D Plan(No.2021GY-029).
文摘To improve the inference efficiency of convolutional neural networks(CNN),the existing neural networks mainly adopt heuristic and dynamic programming algorithms to realize parallel scheduling among operators.Heuristic scheduling algorithms can generate local optima easily,while the dynamic programming algorithm has a long convergence time for complex structural models.This paper mainly studies the parallel scheduling between operators and proposes an inter-operator parallelism schedule(IOPS)scheduling algorithm that guarantees the minimum similar execution delay.Firstly,a graph partitioning algorithm based on the largest block is designed to split the neural network model into multiple subgraphs.Then,the operators that meet the conditions is replaced according to the defined operator replacement rules.Finally,the optimal scheduling method based on backtracking is used to schedule the computational graph.Network models such as Inception-v3,ResNet-50,and RandWire are selected for testing.The experimental results show that the algorithm designed in this paper can achieve a 1.6×speedup compared with the existing sequential execution methods.
文摘A large number texts depicting heroes have been compiled in the ministry edition of Chinese textbooks in junior high school,such as Deng Jiaxian,a scientist who has ardent love for the motherland and is fearless of difficulties and dangers;Wen Yiduo,a patriot who is devoted to study and has a strong sense of righteousness;Ye Shengtao,a scholar who is noted for his meticulous scholarship and attends to everything personally;Alizer Buffy,an unknown old man who is devoted to tree-planting in obscurity,and so on.The analysis,understanding and reading of these heroic stories and the interpretation of the significance and value of the selection of heroic images into Chinese textbooks are conducive to grasping the core quality of Chinese courses and improving the teaching of heroic images;inheriting and carrying forward the fine civilization and historical and cultural traditions of mankind,and establishing national cultural self-confidence and pride;perfecting the value system of educators and empowering students to grow healthily;guiding teenagers with immature values to establish healthy and upward values.
文摘Prototype landscape refers to the impressive scenes that one has experienced in his/her living environment before 20 years old.Based on the analysis of the existing literature,the authors compiled a standard scale type questionnaire by means of a field survey,which was about the influences of prototype landscape on one's landscape perception.Taking Likert scale as the main part,this questionnaire analyzed the influence of prototype landscape on landscape perception from perception,attitude,and behavior dimensions.In order to further improve its rationality,the authors tested some other aspects of this questionnaire,including logic validity,construct validity,congeniality reliability,split-half reliability,etc..The results validated that the questionnaire possessed good theoretical structure and validity target,which can evaluate various aspects of prototype landscape on one's landscape perception in an effective and reliable way.Therefore,the questionnaire put forward by this study not only enriched the studies of prototype landscape on landscape designing,but also provided an effective tool for quantitative analysis of "the influences of prototype landscape on one's landscape perception".
文摘This study examines whether the level of Certified Public Accounting(CPA)firm assurance associated with financial statements affects commercial lending decisions.A between-subjects behavioral experiment is used with three conditions involving different levels of CPA firm assurance—compilations,reviews,and audits.Findings indicate that neither the lenders’risk assessments of loan applicants nor their elicited probabilities of granting credit differed among the three levels of CPA firm assurance.
文摘In view of lake scenic areas with abundant tourist resources but less-developed economy,contradictions between the urgency of its tourist resource development and complexity of urban-rural planning compilation were analyzed,and also limitations summarized as insufficient time and fund.Statutory planning contents included in the integrated compilation system were elaborated,and compilation of the integrated planning for the Longhe Lake Scenic Area in the Taihang Mountains was taken for example to introduce planning concepts of the compilation technical system.Considering characteristics of the study area,"regionality" was stressed as the foundation of planning compilation,concise,convenient and practical planning compilation contents were advocated and further explained from the perspectives of compiling by layer and category.In view of this,it is necessary to apply integrated compilation mode under certain circumstances,so as to provide a new approach for the planning compilation of other regions in China and enhance economic and social development of local areas.
文摘随着SoC技术的不断发展以及集成应用设计规模和复杂度的不断提升,使用传统的RTL设计方法难度越来越大。高级综合技术(High-level synthesis,HLS)可以实现将C语言描述的算法级设计自动转换成HDL语言描述的寄存器级设计。使用Synphony C Compiler综合工具进行RS编、译码算法设计,利用综合工具快速的架构探索以及高效的验证方法,在综合性能、面积、功耗等要求之后,完成算法C语言到Verilog HDL语言的快速转换。这种设计方法大大缩短了设计周期。