Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro...Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.展开更多
[Objective]Real-time monitoring of cow ruminant behavior is of paramount importance for promptly obtaining relevant information about cow health and predicting cow diseases.Currently,various strategies have been propo...[Objective]Real-time monitoring of cow ruminant behavior is of paramount importance for promptly obtaining relevant information about cow health and predicting cow diseases.Currently,various strategies have been proposed for monitoring cow ruminant behavior,including video surveillance,sound recognition,and sensor monitoring methods.How‐ever,the application of edge device gives rise to the issue of inadequate real-time performance.To reduce the volume of data transmission and cloud computing workload while achieving real-time monitoring of dairy cow rumination behavior,a real-time monitoring method was proposed for cow ruminant behavior based on edge computing.[Methods]Autono‐mously designed edge devices were utilized to collect and process six-axis acceleration signals from cows in real-time.Based on these six-axis data,two distinct strategies,federated edge intelligence and split edge intelligence,were investigat‐ed for the real-time recognition of cow ruminant behavior.Focused on the real-time recognition method for cow ruminant behavior leveraging federated edge intelligence,the CA-MobileNet v3 network was proposed by enhancing the MobileNet v3 network with a collaborative attention mechanism.Additionally,a federated edge intelligence model was designed uti‐lizing the CA-MobileNet v3 network and the FedAvg federated aggregation algorithm.In the study on split edge intelli‐gence,a split edge intelligence model named MobileNet-LSTM was designed by integrating the MobileNet v3 network with a fusion collaborative attention mechanism and the Bi-LSTM network.[Results and Discussions]Through compara‐tive experiments with MobileNet v3 and MobileNet-LSTM,the federated edge intelligence model based on CA-Mo‐bileNet v3 achieved an average Precision rate,Recall rate,F1-Score,Specificity,and Accuracy of 97.1%,97.9%,97.5%,98.3%,and 98.2%,respectively,yielding the best recognition performance.[Conclusions]It is provided a real-time and effective method for monitoring cow ruminant behavior,and the proposed federated edge intelligence model can be ap‐plied in practical settings.展开更多
Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,...Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.展开更多
The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is present...The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.展开更多
This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition...This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.展开更多
Fog computing is an emerging paradigm that has broad applications including storage, measurement and control. In this paper, we propose a novel real-time notification protocol called RT-Notification for wireless contr...Fog computing is an emerging paradigm that has broad applications including storage, measurement and control. In this paper, we propose a novel real-time notification protocol called RT-Notification for wireless control in fog computing. RT-Notification provides low-latency TDMA communication between an access point in Fog and a large number of portable monitoring devices equipped with sensor and actuator. RT-Notification differentiates two types of controls: urgent downlink actuator-oriented control and normal uplink access & scheduling control. Different from existing protocols, RT-Notification has two salient features:(i) support real-time notification of control frames, while not interrupting ongoing other transmissions, and(ii) support on-demand channel allocation for normal uplink access & scheduling control. RT-Notification can be implemented based on the commercial off-the-shelf 802.11 hardware. Our extensive simulations verify that RT-Notification is very effective in supporting the above two features.展开更多
In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware m...In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.展开更多
Block-in-matrix-soils(bimsoils)are geological mixtures that have distinct structures consisting of relatively strong rock blocks and weak matrix soils.It is still a challenge to evaluate the mechanical behaviors of bi...Block-in-matrix-soils(bimsoils)are geological mixtures that have distinct structures consisting of relatively strong rock blocks and weak matrix soils.It is still a challenge to evaluate the mechanical behaviors of bimsoils because of the heterogeneity,chaotic structure,and lithological variability.As a result,only very limited laboratory studies have been reported on the evolution of their internal deformation.In this study,the deformation evolution of bimsoils under uniaxial loading is investigated using real-time X-ray computed tomography(CT)and image correlation algorithm(with a rock block percentage(RBP)of 40%).Three parameters,i.e.heterogeneity coefficient(K),correlation coefficient(CC),and standard deviation(STD)of displacement fields,are proposed to quantify the heterogeneity of the motion of the rock blocks and the progressive deformation of the bimsoils.Experimental results show that the rock blocks in bimsoils are prone to forming clusters with increasing loading,and the sliding surface goes around only one side of a cluster.Based on the movement of the rock blocks recorded by STD and CC,the progressive deformation of the bimsoils is quantitatively divided into three stages:initialization of the rotation of rock blocks,formation of rock block clusters,and formation of a shear band by rock blocks with significant rotation.Moreover,the experimental results demonstrate that the meso-motion of rock blocks controls the macroscopic mechanical properties of the samples.展开更多
Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requi...Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requirements through the integration of enabler paradigms,including federated learning(FL),cloud/edge computing,softwaredefined/virtualized networking infrastructure,and converged prediction algorithms.The study focuses on achieving reliability and efficiency in real-time prediction models,which depend on the interaction flows and network topology.In response to these challenges,we introduce a modified version of federated logistic regression(FLR)that takes into account convergence latencies and the accuracy of the final FL model within healthcare networks.To establish the FLR framework for mission-critical healthcare applications,we provide a comprehensive workflow in this paper,introducing framework setup,iterative round communications,and model evaluation/deployment.Our optimization process delves into the formulation of loss functions and gradients within the domain of federated optimization,which concludes with the generation of service experience batches for model deployment.To assess the practicality of our approach,we conducted experiments using a hypertension prediction model with data sourced from the 2019 annual dataset(Version 2.0.1)of the Korea Medical Panel Survey.Performance metrics,including end-to-end execution delays,model drop/delivery ratios,and final model accuracies,are captured and compared between the proposed FLR framework and other baseline schemes.Our study offers an FLR framework setup for the enhancement of real-time prediction modeling within intelligent healthcare networks,addressing the critical demands of QoS reliability and privacy preservation.展开更多
A user-programmable computational/control platform was developed at the University of Toronto that offers real-time hybrid simulation (RTHS) capabilities. The platform was verified previously using several linear ph...A user-programmable computational/control platform was developed at the University of Toronto that offers real-time hybrid simulation (RTHS) capabilities. The platform was verified previously using several linear physical substructures. The study presented in this paper is focused on further validating the RTHS platform using a nonlinear viscoelastic-plastic damper that has displacement, frequency and temperature-dependent properties. The validation study includes damper component characterization tests, as well as RTHS of a series of single-degree-of-freedom (SDOF) systems equipped with viscoelastic-plastic dampers that represent different structural designs. From the component characterization tests, it was found that for a wide range of excitation frequencies and friction slip loads, the tracking errors are comparable to the errors in RTHS of linear spring systems. The hybrid SDOF results are compared to an independently validated thermal- mechanical viscoelastic model to further validate the ability for the platform to test nonlinear systems. After the validation, as an application study, nonlinear SDOF hybrid tests were used to develop performance spectra to predict the response of structures equipped with damping systems that are more challenging to model analytically. The use of the experimental performance spectra is illustrated by comparing the predicted response to the hybrid test response of 2DOF systems equipped with viscoelastic-plastic dampers.展开更多
Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signal...Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signals,including variations in tone of voice.This paper explores the potential of intelligent technologies to bridge this gap and improve the quality of conversations.In particular,the authors propose a real-time processing method that captures and evaluates emotions in speech,utilizing a terminal device like the Raspberry Pi computer.Furthermore,the authors provide an overview of the current research landscape surrounding speech emotional recognition and delve into our methodology,which involves analyzing audio files from renowned emotional speech databases.To aid incomprehension,the authors present visualizations of these audio files in situ,employing dB-scaled Mel spectrograms generated through TensorFlow and Matplotlib.The authors use a support vector machine kernel and a Convolutional Neural Network with transfer learning to classify emotions.Notably,the classification accuracies achieved are 70% and 77%,respectively,demonstrating the efficacy of our approach when executed on an edge device rather than relying on a server.The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the speaker’s emotional state in less than one second on a Raspberry Pi.These findings pave the way for more effective and emotionally intelligent human-machine interactions in various domains.展开更多
Model predictive control (MPC) could not be deployed in real-time control systems for its computation time is not well defined. A real-time fault tolerant implementation algorithm based on imprecise computation is pro...Model predictive control (MPC) could not be deployed in real-time control systems for its computation time is not well defined. A real-time fault tolerant implementation algorithm based on imprecise computation is proposed for MPC, according to the solving process of quadratic programming (QP) problem. In this algorithm, system stability is guaranteed even when computation resource is not enough to finish optimization completely. By this kind of graceful degradation, the behavior of real-time control systems is still predictable and determinate. The algorithm is demonstrated by experiments on servomotor, and the simulation results show its effectiveness.展开更多
With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggr...With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggregation of domain applications,domain-oriented computing architecture has become the technical direction that considers the high flexibility and efficiency of information system.Aiming at the characteristics of data-intensive computing in different scenarios such as Internet of Things(IoT),big data,artificial intelligence(AI),this paper presents a domain-oriented software defined computing architecture,discusses the hierarchical interconnection structure,hybrid granularity computing element and its computational kernel extraction method,finally proves the flexibility and high efficiency of this architecture by experimental comparison.展开更多
Today,integrated circuit technology is approaching the physical limit.From performance and energy consumption perspective,reconfigurable computing is regarded as the most promising technology for future computing syst...Today,integrated circuit technology is approaching the physical limit.From performance and energy consumption perspective,reconfigurable computing is regarded as the most promising technology for future computing systems with excellent feature in computing and energy efficiency.From the perspective of computing performance,compared with single thread performance stagnation of general purpose processors(GPPS),reconfigurable computing may customize hardware according to application requirements,so as to achieve higher performance and lower energy consumption.From the perspective of economics,a microchip based on reconfigurable computing technology has post-silicon reconfigurability,which can be applied in different fields,so as to better share the cost of non-recurring engineering(NRE).High computing and energy efficiency together with unique reconfigurability make reconfigurable computing one of the most important technologies of artificial intelligent microchips.展开更多
AIM:To evaluate the usefulness of real-time virtual sonography(RVS)in biliary and pancreatic diseases.METHODS:This study included 15 patients with biliary and pancreatic diseases.RVS can be used to observe an ultrasou...AIM:To evaluate the usefulness of real-time virtual sonography(RVS)in biliary and pancreatic diseases.METHODS:This study included 15 patients with biliary and pancreatic diseases.RVS can be used to observe an ultrasound image in real time by merging the ultrasound image with a multiplanar reconstruction computed tomography(CT)image,using pre-scanned CT volume data.The ultrasound used was EUB-8500with a convex probe EUP-C514.The RVS images were evaluated based on 3 levels,namely,excellent,good and poor,by the displacement in position.RESULTS:By combining the objectivity of CT with free scanning using RVS,it was possible to easily interpret the relationship between lesions and the surrounding organs as well as the position of vascular structures.The resulting evaluation levels of the RVS images were12 excellent(pancreatic cancer,bile duct cancer,cholecystolithiasis and cholangiocellular carcinoma)and 3 good(pancreatic cancer and gallbladder cancer).Compared with conventional B-mode ultrasonography and CT,RVS images achieved a rate of 80%superior visualization and 20%better visualization.CONCLUSION:RVS has potential usefulness in objective visualization and diagnosis in the field of biliary and pancreatic diseases.展开更多
Internet of Things (IoT) is a widely distributed network which requires small amount of power supply having limited storage and processing capacity. On the other hand, Cloud computing has virtually unlimited storage a...Internet of Things (IoT) is a widely distributed network which requires small amount of power supply having limited storage and processing capacity. On the other hand, Cloud computing has virtually unlimited storage and processing capabilities and is a much more mature technology. Therefore, combination of Cloud computing and IoT can provide the best performance for users. Cloud computing nowadays provides lifesaving healthcare application by collecting data from bedside devices, viewing patient information and diagnose in real time. There may some concerns about security and other issues of the patient’s data but utilization of IoT and Cloud technologies in healthcare industry would open a new era in the field of healthcare. To ensure basic healthcare needs of the people in the rural areas, we have proposed Cloud-IoT based smart healthcare system. In this system various types of sensors (Temperature, Heart bit, ECG, etc.) are equipped in the patient side to sense the patient’s physiological data. For securing data RSA based authentication algorithm and mitigation of several security threats have been used. The sensed data will process and store in the Cloud server. Stored data can be used by the authorized and/or concerned medical practitioner upon approved by the user for patient caring.展开更多
Foreground moving object detection is an important process in various computer vision applications such as intelligent visual surveillance, HCI, object-based video compression, etc. One of the most successful moving o...Foreground moving object detection is an important process in various computer vision applications such as intelligent visual surveillance, HCI, object-based video compression, etc. One of the most successful moving object detection algorithms is based on Adaptive Gaussian Mixture Model (AGMM). Although ACMM-hased object detection shows very good performance with respect to object detection accuracy, AGMM is very complex model requiring lots of floatingpoint arithmetic so that it should pay for expensive computational cost. Thus, direct implementation of the AGMM-based object detection for embedded DSPs without floating-point arithmetic HW support cannot satisfy the real-time processing requirement. This paper presents a novel rcal-time implementation of adaptive Gaussian mixture model-based moving object detection algorithm for fixed-point DSPs. In the proposed implementation, in addition to changes of data types into fixed-point ones, magnification of the Gaussian distribution technique is introduced so that the integer and fixed-point arithmetic can be easily and consistently utilized instead of real nmnher and floatingpoint arithmetic in processing of AGMM algorithm. Experimental results shows that the proposed implementation have a high potential in real-time applications.展开更多
The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible t...The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible to substantially accelerate calculations with hardware accelerators.Accordingly,this study develops a fast MC tool,called THUBrachy,which can be accelerated by several types of hardware accelerators.THUBrachy can simulate photons with energy less than 3 MeV and considers all photon interactions in the energy range.It was benchmarked against the American Association of Physicists in Medicine Task Group No.43 Report using a water phantom and validated with Geant4 using a clinical case.A performance test was conducted using the clinical case,showing that a multicore central processing unit,Intel Xeon Phi,and graphics processing unit(GPU)can efficiently accelerate the simulation.GPU-accelerated THUBrachy is the fastest version,which is 200 times faster than the serial version and approximately 500 times faster than Geant4.The proposed tool shows great potential for fast and accurate dose calculations in clinical applications.展开更多
Emerging memristive devices offer enormous advantages for applications such as non-volatile memories and inmemory computing(IMC),but there is a rising interest in using memristive technologies for security application...Emerging memristive devices offer enormous advantages for applications such as non-volatile memories and inmemory computing(IMC),but there is a rising interest in using memristive technologies for security applications in the era of internet of things(IoT).In this review article,for achieving secure hardware systems in IoT,lowpower design techniques based on emerging memristive technology for hardware security primitives/systems are presented.By reviewing the state-of-the-art in three highlighted memristive application areas,i.e.memristive non-volatile memory,memristive reconfigurable logic computing and memristive artificial intelligent computing,their application-level impacts on the novel implementations of secret key generation,crypto functions and machine learning attacks are explored,respectively.For the low-power security applications in IoT,it is essential to understand how to best realize cryptographic circuitry using memristive circuitries,and to assess the implications of memristive crypto implementations on security and to develop novel computing paradigms that will enhance their security.This review article aims to help researchers to explore security solutions,to analyze new possible threats and to develop corresponding protections for the secure hardware systems based on low-cost memristive circuit designs.展开更多
Finite element(FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations(RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy...Finite element(FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations(RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time(TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method(CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ(λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.展开更多
基金This work was supported by Open Fund Project of State Key Laboratory of Intelligent Vehicle Safety Technology by Grant with No.IVSTSKL-202311Key Projects of Science and Technology Research Programme of Chongqing Municipal Education Commission by Grant with No.KJZD-K202301505+1 种基金Cooperation Project between Chongqing Municipal Undergraduate Universities and Institutes Affiliated to the Chinese Academy of Sciences in 2021 by Grant with No.HZ2021015Chongqing Graduate Student Research Innovation Program by Grant with No.CYS240801.
文摘Massive computational complexity and memory requirement of artificial intelligence models impede their deploy-ability on edge computing devices of the Internet of Things(IoT).While Power-of-Two(PoT)quantization is pro-posed to improve the efficiency for edge inference of Deep Neural Networks(DNNs),existing PoT schemes require a huge amount of bit-wise manipulation and have large memory overhead,and their efficiency is bounded by the bottleneck of computation latency and memory footprint.To tackle this challenge,we present an efficient inference approach on the basis of PoT quantization and model compression.An integer-only scalar PoT quantization(IOS-PoT)is designed jointly with a distribution loss regularizer,wherein the regularizer minimizes quantization errors and training disturbances.Additionally,two-stage model compression is developed to effectively reduce memory requirement,and alleviate bandwidth usage in communications of networked heterogenous learning systems.The product look-up table(P-LUT)inference scheme is leveraged to replace bit-shifting with only indexing and addition operations for achieving low-latency computation and implementing efficient edge accelerators.Finally,comprehensive experiments on Residual Networks(ResNets)and efficient architectures with Canadian Institute for Advanced Research(CIFAR),ImageNet,and Real-world Affective Faces Database(RAF-DB)datasets,indicate that our approach achieves 2×∼10×improvement in the reduction of both weight size and computation cost in comparison to state-of-the-art methods.A P-LUT accelerator prototype is implemented on the Xilinx KV260 Field Programmable Gate Array(FPGA)platform for accelerating convolution operations,with performance results showing that P-LUT reduces memory footprint by 1.45×,achieves more than 3×power efficiency and 2×resource efficiency,compared to the conventional bit-shifting scheme.
文摘[Objective]Real-time monitoring of cow ruminant behavior is of paramount importance for promptly obtaining relevant information about cow health and predicting cow diseases.Currently,various strategies have been proposed for monitoring cow ruminant behavior,including video surveillance,sound recognition,and sensor monitoring methods.How‐ever,the application of edge device gives rise to the issue of inadequate real-time performance.To reduce the volume of data transmission and cloud computing workload while achieving real-time monitoring of dairy cow rumination behavior,a real-time monitoring method was proposed for cow ruminant behavior based on edge computing.[Methods]Autono‐mously designed edge devices were utilized to collect and process six-axis acceleration signals from cows in real-time.Based on these six-axis data,two distinct strategies,federated edge intelligence and split edge intelligence,were investigat‐ed for the real-time recognition of cow ruminant behavior.Focused on the real-time recognition method for cow ruminant behavior leveraging federated edge intelligence,the CA-MobileNet v3 network was proposed by enhancing the MobileNet v3 network with a collaborative attention mechanism.Additionally,a federated edge intelligence model was designed uti‐lizing the CA-MobileNet v3 network and the FedAvg federated aggregation algorithm.In the study on split edge intelli‐gence,a split edge intelligence model named MobileNet-LSTM was designed by integrating the MobileNet v3 network with a fusion collaborative attention mechanism and the Bi-LSTM network.[Results and Discussions]Through compara‐tive experiments with MobileNet v3 and MobileNet-LSTM,the federated edge intelligence model based on CA-Mo‐bileNet v3 achieved an average Precision rate,Recall rate,F1-Score,Specificity,and Accuracy of 97.1%,97.9%,97.5%,98.3%,and 98.2%,respectively,yielding the best recognition performance.[Conclusions]It is provided a real-time and effective method for monitoring cow ruminant behavior,and the proposed federated edge intelligence model can be ap‐plied in practical settings.
文摘Molecular Dynamics(MD)simulation for computing Interatomic Potential(IAP)is a very important High-Performance Computing(HPC)application.MD simulation on particles of experimental relevance takes huge computation time,despite using an expensive high-end server.Heterogeneous computing,a combination of the Field Programmable Gate Array(FPGA)and a computer,is proposed as a solution to compute MD simulation efficiently.In such heterogeneous computation,communication between FPGA and Computer is necessary.One such MD simulation,explained in the paper,is the(Artificial Neural Network)ANN-based IAP computation of gold(Au_(147)&Au_(309))nanoparticles.MD simulation calculates the forces between atoms and the total energy of the chemical system.This work proposes the novel design and implementation of an ANN IAP-based MD simulation for Au_(147)&Au_(309) using communication protocols,such as Universal Asynchronous Receiver-Transmitter(UART)and Ethernet,for communication between the FPGA and the host computer.To improve the latency of MD simulation through heterogeneous computing,Universal Asynchronous Receiver-Transmitter(UART)and Ethernet communication protocols were explored to conduct MD simulation of 50,000 cycles.In this study,computation times of 17.54 and 18.70 h were achieved with UART and Ethernet,respectively,compared to the conventional server time of 29 h for Au_(147) nanoparticles.The results pave the way for the development of a Lab-on-a-chip application.
基金This project was supported by the National Natural Science Foundation of China (60135020).
文摘The flexibility of traditional image processing system is limited because those system are designed for specific applications. In this paper, a new TMS320C64x-based multi-DSP parallel computing architecture is presented. It has many promising characteristics such as powerful computing capability, broad I/O bandwidth, topology flexibility, and expansibility. The parallel system performance is evaluated by practical experiment.
基金The National Natural Science Foundation of China (91438203,91638301,91438111,41601476).
文摘This paper focuses on the time efficiency for machine vision and intelligent photogrammetry, especially high accuracy on-board real-time cloud detection method. With the development of technology, the data acquisition ability is growing continuously and the volume of raw data is increasing explosively. Meanwhile, because of the higher requirement of data accuracy, the computation load is also becoming heavier. This situation makes time efficiency extremely important. Moreover, the cloud cover rate of optical satellite imagery is up to approximately 50%, which is seriously restricting the applications of on-board intelligent photogrammetry services. To meet the on-board cloud detection requirements and offer valid input data to subsequent processing, this paper presents a stream-computing of high accuracy on-board real-time cloud detection solution which follows the “bottom-up” understanding strategy of machine vision and uses multiple embedded GPU with significant potential to be applied on-board. Without external memory, the data parallel pipeline system based on multiple processing modules of this solution could afford the “stream-in, processing, stream-out” real-time stream computing. In experiments, images of GF-2 satellite are used to validate the accuracy and performance of this approach, and the experimental results show that this solution could not only bring up cloud detection accuracy, but also match the on-board real-time processing requirements.
基金supported by Macao FDCTMOST grant001/2015/AMJMacao FDCT grants 005/2016/A1, and 056/2017/A2
文摘Fog computing is an emerging paradigm that has broad applications including storage, measurement and control. In this paper, we propose a novel real-time notification protocol called RT-Notification for wireless control in fog computing. RT-Notification provides low-latency TDMA communication between an access point in Fog and a large number of portable monitoring devices equipped with sensor and actuator. RT-Notification differentiates two types of controls: urgent downlink actuator-oriented control and normal uplink access & scheduling control. Different from existing protocols, RT-Notification has two salient features:(i) support real-time notification of control frames, while not interrupting ongoing other transmissions, and(ii) support on-demand channel allocation for normal uplink access & scheduling control. RT-Notification can be implemented based on the commercial off-the-shelf 802.11 hardware. Our extensive simulations verify that RT-Notification is very effective in supporting the above two features.
基金supported partially by the National High Technical Research and Development Program of China (863 Program) under Grants No. 2011AA040101, No. 2008AA01Z134the National Natural Science Foundation of China under Grants No. 61003251, No. 61172049, No. 61173150+2 种基金the Doctoral Fund of Ministry of Education of China under Grant No. 20100006110015Beijing Municipal Natural Science Foundation under Grant No. Z111100054011078the 2012 Ladder Plan Project of Beijing Key Laboratory of Knowledge Engineering for Materials Science under Grant No. Z121101002812005
文摘In order to eliminate the energy waste caused by the traditional static hardware multithreaded processor used in real-time embedded system working in the low workload situation, the energy efficiency of the hardware multithread is discussed and a novel dynamic multithreaded architecture is proposed. The proposed architecture saves the energy wasted by removing idle threads without manipulation on the original architecture, fulfills a seamless switching mechanism which protects active threads and avoids pipeline stall during power mode switching. The report of an implemented dynamic multithreaded processor with 45 nm process from synthesis tool indicates that the area of dynamic multithreaded architecture is only 2.27% higher than the static one in achieving dynamic power dissipation, and consumes 1.3% more power in the same peak performance.
基金This work was supported by the National Natural Science Foundation of China(Grants Nos.41972287 and 42090023)the Second Tibetan Plateau Scientific Expedition and Research Program(STEP)(Grant No.2019QZKK0904).
文摘Block-in-matrix-soils(bimsoils)are geological mixtures that have distinct structures consisting of relatively strong rock blocks and weak matrix soils.It is still a challenge to evaluate the mechanical behaviors of bimsoils because of the heterogeneity,chaotic structure,and lithological variability.As a result,only very limited laboratory studies have been reported on the evolution of their internal deformation.In this study,the deformation evolution of bimsoils under uniaxial loading is investigated using real-time X-ray computed tomography(CT)and image correlation algorithm(with a rock block percentage(RBP)of 40%).Three parameters,i.e.heterogeneity coefficient(K),correlation coefficient(CC),and standard deviation(STD)of displacement fields,are proposed to quantify the heterogeneity of the motion of the rock blocks and the progressive deformation of the bimsoils.Experimental results show that the rock blocks in bimsoils are prone to forming clusters with increasing loading,and the sliding surface goes around only one side of a cluster.Based on the movement of the rock blocks recorded by STD and CC,the progressive deformation of the bimsoils is quantitatively divided into three stages:initialization of the rotation of rock blocks,formation of rock block clusters,and formation of a shear band by rock blocks with significant rotation.Moreover,the experimental results demonstrate that the meso-motion of rock blocks controls the macroscopic mechanical properties of the samples.
基金supported by Institute of Information&Communications Technology Planning&Evaluation(IITP)grant funded by the Korea government(MSIT)(No.RS2022-00167197Development of Intelligent 5G/6G Infrastructure Technology for the Smart City)+2 种基金in part by the National Research Foundation of Korea(NRF),Ministry of Education,through Basic Science Research Program under Grant NRF-2020R1I1A3066543in part by BK21 FOUR(Fostering Outstanding Universities for Research)under Grant 5199990914048in part by the Soonchunhyang University Research Fund.
文摘Intelligent healthcare networks represent a significant component in digital applications,where the requirements hold within quality-of-service(QoS)reliability and safeguarding privacy.This paper addresses these requirements through the integration of enabler paradigms,including federated learning(FL),cloud/edge computing,softwaredefined/virtualized networking infrastructure,and converged prediction algorithms.The study focuses on achieving reliability and efficiency in real-time prediction models,which depend on the interaction flows and network topology.In response to these challenges,we introduce a modified version of federated logistic regression(FLR)that takes into account convergence latencies and the accuracy of the final FL model within healthcare networks.To establish the FLR framework for mission-critical healthcare applications,we provide a comprehensive workflow in this paper,introducing framework setup,iterative round communications,and model evaluation/deployment.Our optimization process delves into the formulation of loss functions and gradients within the domain of federated optimization,which concludes with the generation of service experience batches for model deployment.To assess the practicality of our approach,we conducted experiments using a hypertension prediction model with data sourced from the 2019 annual dataset(Version 2.0.1)of the Korea Medical Panel Survey.Performance metrics,including end-to-end execution delays,model drop/delivery ratios,and final model accuracies,are captured and compared between the proposed FLR framework and other baseline schemes.Our study offers an FLR framework setup for the enhancement of real-time prediction modeling within intelligent healthcare networks,addressing the critical demands of QoS reliability and privacy preservation.
基金NSERC Discovery under Grant 371627-2009 and NSERC RTI under Grant 374707-2009 EQPEQ programs
文摘A user-programmable computational/control platform was developed at the University of Toronto that offers real-time hybrid simulation (RTHS) capabilities. The platform was verified previously using several linear physical substructures. The study presented in this paper is focused on further validating the RTHS platform using a nonlinear viscoelastic-plastic damper that has displacement, frequency and temperature-dependent properties. The validation study includes damper component characterization tests, as well as RTHS of a series of single-degree-of-freedom (SDOF) systems equipped with viscoelastic-plastic dampers that represent different structural designs. From the component characterization tests, it was found that for a wide range of excitation frequencies and friction slip loads, the tracking errors are comparable to the errors in RTHS of linear spring systems. The hybrid SDOF results are compared to an independently validated thermal- mechanical viscoelastic model to further validate the ability for the platform to test nonlinear systems. After the validation, as an application study, nonlinear SDOF hybrid tests were used to develop performance spectra to predict the response of structures equipped with damping systems that are more challenging to model analytically. The use of the experimental performance spectra is illustrated by comparing the predicted response to the hybrid test response of 2DOF systems equipped with viscoelastic-plastic dampers.
文摘Humans,as intricate beings driven by a multitude of emotions,possess a remarkable ability to decipher and respond to socio-affective cues.However,many individuals and machines struggle to interpret such nuanced signals,including variations in tone of voice.This paper explores the potential of intelligent technologies to bridge this gap and improve the quality of conversations.In particular,the authors propose a real-time processing method that captures and evaluates emotions in speech,utilizing a terminal device like the Raspberry Pi computer.Furthermore,the authors provide an overview of the current research landscape surrounding speech emotional recognition and delve into our methodology,which involves analyzing audio files from renowned emotional speech databases.To aid incomprehension,the authors present visualizations of these audio files in situ,employing dB-scaled Mel spectrograms generated through TensorFlow and Matplotlib.The authors use a support vector machine kernel and a Convolutional Neural Network with transfer learning to classify emotions.Notably,the classification accuracies achieved are 70% and 77%,respectively,demonstrating the efficacy of our approach when executed on an edge device rather than relying on a server.The system can evaluate pure emotion in speech and provide corresponding visualizations to depict the speaker’s emotional state in less than one second on a Raspberry Pi.These findings pave the way for more effective and emotionally intelligent human-machine interactions in various domains.
文摘Model predictive control (MPC) could not be deployed in real-time control systems for its computation time is not well defined. A real-time fault tolerant implementation algorithm based on imprecise computation is proposed for MPC, according to the solving process of quadratic programming (QP) problem. In this algorithm, system stability is guaranteed even when computation resource is not enough to finish optimization completely. By this kind of graceful degradation, the behavior of real-time control systems is still predictable and determinate. The algorithm is demonstrated by experiments on servomotor, and the simulation results show its effectiveness.
基金supported by National Science and Technology Major Project granted No.2016ZX01012101
文摘With the introduction of software defined hardware by DARPA Electronics Resurgence Initiative,software definition will be the basic attribute of information system.Benefiting from boundary certainty and algorithm aggregation of domain applications,domain-oriented computing architecture has become the technical direction that considers the high flexibility and efficiency of information system.Aiming at the characteristics of data-intensive computing in different scenarios such as Internet of Things(IoT),big data,artificial intelligence(AI),this paper presents a domain-oriented software defined computing architecture,discusses the hierarchical interconnection structure,hybrid granularity computing element and its computational kernel extraction method,finally proves the flexibility and high efficiency of this architecture by experimental comparison.
文摘Today,integrated circuit technology is approaching the physical limit.From performance and energy consumption perspective,reconfigurable computing is regarded as the most promising technology for future computing systems with excellent feature in computing and energy efficiency.From the perspective of computing performance,compared with single thread performance stagnation of general purpose processors(GPPS),reconfigurable computing may customize hardware according to application requirements,so as to achieve higher performance and lower energy consumption.From the perspective of economics,a microchip based on reconfigurable computing technology has post-silicon reconfigurability,which can be applied in different fields,so as to better share the cost of non-recurring engineering(NRE).High computing and energy efficiency together with unique reconfigurability make reconfigurable computing one of the most important technologies of artificial intelligent microchips.
文摘AIM:To evaluate the usefulness of real-time virtual sonography(RVS)in biliary and pancreatic diseases.METHODS:This study included 15 patients with biliary and pancreatic diseases.RVS can be used to observe an ultrasound image in real time by merging the ultrasound image with a multiplanar reconstruction computed tomography(CT)image,using pre-scanned CT volume data.The ultrasound used was EUB-8500with a convex probe EUP-C514.The RVS images were evaluated based on 3 levels,namely,excellent,good and poor,by the displacement in position.RESULTS:By combining the objectivity of CT with free scanning using RVS,it was possible to easily interpret the relationship between lesions and the surrounding organs as well as the position of vascular structures.The resulting evaluation levels of the RVS images were12 excellent(pancreatic cancer,bile duct cancer,cholecystolithiasis and cholangiocellular carcinoma)and 3 good(pancreatic cancer and gallbladder cancer).Compared with conventional B-mode ultrasonography and CT,RVS images achieved a rate of 80%superior visualization and 20%better visualization.CONCLUSION:RVS has potential usefulness in objective visualization and diagnosis in the field of biliary and pancreatic diseases.
文摘Internet of Things (IoT) is a widely distributed network which requires small amount of power supply having limited storage and processing capacity. On the other hand, Cloud computing has virtually unlimited storage and processing capabilities and is a much more mature technology. Therefore, combination of Cloud computing and IoT can provide the best performance for users. Cloud computing nowadays provides lifesaving healthcare application by collecting data from bedside devices, viewing patient information and diagnose in real time. There may some concerns about security and other issues of the patient’s data but utilization of IoT and Cloud technologies in healthcare industry would open a new era in the field of healthcare. To ensure basic healthcare needs of the people in the rural areas, we have proposed Cloud-IoT based smart healthcare system. In this system various types of sensors (Temperature, Heart bit, ECG, etc.) are equipped in the patient side to sense the patient’s physiological data. For securing data RSA based authentication algorithm and mitigation of several security threats have been used. The sensed data will process and store in the Cloud server. Stored data can be used by the authorized and/or concerned medical practitioner upon approved by the user for patient caring.
基金supported by Soongsil University Research Fund and BK 21 of Korea
文摘Foreground moving object detection is an important process in various computer vision applications such as intelligent visual surveillance, HCI, object-based video compression, etc. One of the most successful moving object detection algorithms is based on Adaptive Gaussian Mixture Model (AGMM). Although ACMM-hased object detection shows very good performance with respect to object detection accuracy, AGMM is very complex model requiring lots of floatingpoint arithmetic so that it should pay for expensive computational cost. Thus, direct implementation of the AGMM-based object detection for embedded DSPs without floating-point arithmetic HW support cannot satisfy the real-time processing requirement. This paper presents a novel rcal-time implementation of adaptive Gaussian mixture model-based moving object detection algorithm for fixed-point DSPs. In the proposed implementation, in addition to changes of data types into fixed-point ones, magnification of the Gaussian distribution technique is introduced so that the integer and fixed-point arithmetic can be easily and consistently utilized instead of real nmnher and floatingpoint arithmetic in processing of AGMM algorithm. Experimental results shows that the proposed implementation have a high potential in real-time applications.
基金supported by the National Natural Science Foundation of China(No.11875036)。
文摘The Monte Carlo(MC)simulation is regarded as the gold standard for dose calculation in brachytherapy,but it consumes a large amount of computing resources.The development of heterogeneous computing makes it possible to substantially accelerate calculations with hardware accelerators.Accordingly,this study develops a fast MC tool,called THUBrachy,which can be accelerated by several types of hardware accelerators.THUBrachy can simulate photons with energy less than 3 MeV and considers all photon interactions in the energy range.It was benchmarked against the American Association of Physicists in Medicine Task Group No.43 Report using a water phantom and validated with Geant4 using a clinical case.A performance test was conducted using the clinical case,showing that a multicore central processing unit,Intel Xeon Phi,and graphics processing unit(GPU)can efficiently accelerate the simulation.GPU-accelerated THUBrachy is the fastest version,which is 200 times faster than the serial version and approximately 500 times faster than Geant4.The proposed tool shows great potential for fast and accurate dose calculations in clinical applications.
基金supported by the DFG(German Research Foundation)Priority Program Nano Security,Project MemCrypto(Projektnummer 439827659/funding id DU 1896/2–1,PO 1220/15–1)the funding by the Fraunhofer Internal Programs under Grant No.Attract 600768。
文摘Emerging memristive devices offer enormous advantages for applications such as non-volatile memories and inmemory computing(IMC),but there is a rising interest in using memristive technologies for security applications in the era of internet of things(IoT).In this review article,for achieving secure hardware systems in IoT,lowpower design techniques based on emerging memristive technology for hardware security primitives/systems are presented.By reviewing the state-of-the-art in three highlighted memristive application areas,i.e.memristive non-volatile memory,memristive reconfigurable logic computing and memristive artificial intelligent computing,their application-level impacts on the novel implementations of secret key generation,crypto functions and machine learning attacks are explored,respectively.For the low-power security applications in IoT,it is essential to understand how to best realize cryptographic circuitry using memristive circuitries,and to assess the implications of memristive crypto implementations on security and to develop novel computing paradigms that will enhance their security.This review article aims to help researchers to explore security solutions,to analyze new possible threats and to develop corresponding protections for the secure hardware systems based on low-cost memristive circuit designs.
基金National Natural Science Foundation of China under Grant Nos.51639006 and 51725901
文摘Finite element(FE) is a powerful tool and has been applied by investigators to real-time hybrid simulations(RTHSs). This study focuses on the computational efficiency, including the computational time and accuracy, of numerical integrations in solving FE numerical substructure in RTHSs. First, sparse matrix storage schemes are adopted to decrease the computational time of FE numerical substructure. In this way, the task execution time(TET) decreases such that the scale of the numerical substructure model increases. Subsequently, several commonly used explicit numerical integration algorithms, including the central difference method(CDM), the Newmark explicit method, the Chang method and the Gui-λ method, are comprehensively compared to evaluate their computational time in solving FE numerical substructure. CDM is better than the other explicit integration algorithms when the damping matrix is diagonal, while the Gui-λ(λ = 4) method is advantageous when the damping matrix is non-diagonal. Finally, the effect of time delay on the computational accuracy of RTHSs is investigated by simulating structure-foundation systems. Simulation results show that the influences of time delay on the displacement response become obvious with the mass ratio increasing, and delay compensation methods may reduce the relative error of the displacement peak value to less than 5% even under the large time-step and large time delay.