The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Co...The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.展开更多
The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a cer...The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.展开更多
Purpose-The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and the corresponding carbon-based field emission controlled switching.The d...Purpose-The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and the corresponding carbon-based field emission controlled switching.The developed implementations are performed in the reversible domain to perform the required bijective parallel computing,where the implementations for parallel computations that utilize the presented field-emission controlled switching and their corresponding m-ary(many-valued)extensions for the use in nano systolic networks are introduced.The first part of the paper presents important fundamentals with regards to systolic computing and carbon-based field emission that will be utilized in the implementations within the second part of the paper.Design/methodology/approach-The introduced systolic systems utilize recent findings in field emission and nano applications to implement the functionality of the basic bijective systolic network.This includes many-valued systolic computing via field emission techniques using carbon-based nanotubes and nanotips.The realization of bijective logic circuits in current and emerging technologies can be very important for various reasons.The reduction of power consumption is a major requirement for the circuit design in future technologies,and thus,the new nano systolic circuits can play an important role in the design of circuits that consume minimal power for future applications such as in low-power signal processing.In addition,the implemented bijective systems can be utilized to implement massive parallel processing and thus obtaining very high processing performance,where the implementation will also utilize the significant size reduction within the nano domain.The extensions of implementations to field emission-based many-valued systolic networks using the introduced bijective nano systolic architectures are also presented.Findings-Novel bijective systolic architectures using nano-based field emission implementations are introduced in this paper,and the implementation using the general scheme of many-valued computing is presented.The carbon-based field emission implementation of nano systolic networks is also introduced.This is accomplished using the introduced field emission carbon-based devices,where field emission from carbon nanotubes and nano-apex carbon fibers is utilized.The implementations of the many-valued bijective systolic networks utilizing the introduced nano-based architectures are also presented.Originality/value-The introduced bijective systolic implementations form new important directions in the systolic realizations using the newly emerging nano-based technologies.The 2-to-1 multiplexer is a basic building block in“switch logic,”where in switch logic,a logic circuit is realized as a combination of switches rather than a combination of logic gates as in the gate logic,which proves to be less costly in synthesizing multiplexer-based wide variety of modern circuits and systems since nano implementations exist in very compact space where carbon-based devices switch reliably using much less power than silicon-based devices.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of minimum power consumption and minimum size layout such as in low-power control of autonomous robots and in the adiabatic low-power very-large-scale-integration circuit design for signal processing applications.展开更多
Purpose–The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and their corresponding carbon-based field emission controlled switching.Th...Purpose–The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and their corresponding carbon-based field emission controlled switching.The developed implementations are performed in the reversible domain to perform the required bijective parallel computing,where the implementations for parallel computations that utilize the presented field-emission controlled switching and their corresponding many-valued(m-ary)extensions for the use in nano systolic networks are introduced.The second part of the paper introduces the implementation of systolic computing using two-to-one controlled switching via carbon-based field emission that were presented in the first part of the paper,and the computational extension to the general case of many-valued(m-ary)systolic networks utilizing many-to-one carbon-based field emission is also introduced.Design/methodology/approach–The introduced systolic systems utilize recent findings in field emission and nano applications to implement the functionality of the basic bijective systolic network.This includes many-valued systolic computing via field-emission techniques using carbon-based nanotubes and nanotips.The realization of bijective logic circuits in current and emerging technologies can be very important for various reasons.The reduction of power consumption is a major requirement for the circuit design in future technologies,and thus,the new nano systolic circuits can play an important role in the design of circuits that consume minimal power for future applications such as in low-power signal processing.In addition,the implemented bijective systems can be utilized to implement massive parallel processing and thus obtaining very high processing performance,where the implementation will also utilize the significant size reduction within the nano domain.The extensions of implementations to field emission-based many-valued systolic networks using the introduced bijective nano systolic architectures are also presented.Findings–Novel bijective systolic architectures using nano-based field emission implementations are introduced in this paper,and the implementation using the general scheme of many-valued computing is presented.The carbon-based field emission implementation of nano systolic networks is also introduced.This is accomplished using the introduced field-emission carbon-based devices,where field emission from carbon nanotubes and nano-apex carbon fibersisutilized.The implementationsof the many-valued bijective systolic networks utilizing the introduced nano-based architectures are also presented.Practical implications–The introduced bijective systolic implementations form new important directions in the systolic realizations using the newly emerging nano-based technologies.The 2-to-1 multiplexer is a basic building block in“switch logic,”where in switch logic,a logic circuit is realized as a combination of switches rather than a combination of logic gates as in the gate logic,which proves to be less costly in synthesizing multiplexer-based wide variety of modern circuits and systems since nano implementations exist in very compact space where carbon-based devices switch reliably using much less power than silicon-based devices.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of minimum power consumption and minimum size layout such as in low-power control of autonomous robots and in the adiabatic low-power VLSI circuit design for signal processing applications.Originality/value–The introduced bijective systolic implementations form new important directions in the systolic realizations utilizing the newly emerging nanotechnologies.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of high performance,minimum power and minimum size.展开更多
In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier ...In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier hologram on LCOS,and a high uniformity of several diffraction peaks in the computer reconstruction is achieved.Application of this method to the parallel femtosecond laser processing is also demonstrated,and two intersecting rings and three tangent rings are fabricated respectively by one time in the photoresist.展开更多
Suspicious mass traffic constantly evolves,making network behaviour tracing and structure more complex.Neural networks yield promising results by considering a sufficient number of processing elements with strong inte...Suspicious mass traffic constantly evolves,making network behaviour tracing and structure more complex.Neural networks yield promising results by considering a sufficient number of processing elements with strong interconnections between them.They offer efficient computational Hopfield neural networks models and optimization constraints used by undergoing a good amount of parallelism to yield optimal results.Artificial neural network(ANN)offers optimal solutions in classifying and clustering the various reels of data,and the results obtained purely depend on identifying a problem.In this research work,the design of optimized applications is presented in an organized manner.In addition,this research work examines theoretical approaches to achieving optimized results using ANN.It mainly focuses on designing rules.The optimizing design approach of neural networks analyzes the internal process of the neural networks.Practices in developing the network are based on the interconnections among the hidden nodes and their learning parameters.The methodology is proven best for nonlinear resource allocation problems with a suitable design and complex issues.The ANN proposed here considers more or less 46k nodes hidden inside 49 million connections employed on full-fledged parallel processors.The proposed ANN offered optimal results in real-world application problems,and the results were obtained using MATLAB.展开更多
The finite element method is a key player in computational electromag-netics for designing RF(Radio Frequency)components such as waveguides.The frequency-domain analysis is fundamental to identify the characteristics ...The finite element method is a key player in computational electromag-netics for designing RF(Radio Frequency)components such as waveguides.The frequency-domain analysis is fundamental to identify the characteristics of the components.For the conventional frequency-domain electromagnetic analysis using FEM(Finite Element Method),the system matrix is complex-numbered as well as indefinite.The iterative solvers can be faster than the direct solver when the solver convergence is guaranteed and done in a few steps.However,such complex-numbered and indefinite systems are hard to exploit the merit of the iterative solver.It is also hard to benefit from matrix factorization techniques due to varying system matrix parts according to frequency.Overall,it is hard to adopt conventional iterative solvers even though the system matrix is sparse.A new parallel iterative FEM solver for frequency domain analysis is implemented for inhomogeneous waveguide structures in this paper.In this implementation,the previous solution of the iterative solver of Matlab(Matrix Laboratory)employ-ing the preconditioner is used for the initial guess for the next step’s solution process.The overlapped parallel stage using Matlab’s Parallel Computing Toolbox is also proposed to alleviate the cold starting,which ruins the convergence of early steps in each parallel stage.Numerical experiments based on waveguide structures have demonstrated the accuracy and efficiency of the proposed scheme.展开更多
This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied t...This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied to scalability analysis. An example of algorithms is given to analyze some typical architectures.展开更多
Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the ig...Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.展开更多
MVP is a digital signal processor, which is of MIMD structure and fit for multimedia application. MVP has several processors in it, and its operation is characteristic of parallelism and pipeline; therefore, real-time...MVP is a digital signal processor, which is of MIMD structure and fit for multimedia application. MVP has several processors in it, and its operation is characteristic of parallelism and pipeline; therefore, real-time signal processing can be done on it. This paper presents the image processing system based on MVP, explains the principles of parallel task assignment and hardware pipeline design, and gives out the example of target tracking and edge detection.展开更多
This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and li...This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.展开更多
This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results cor...This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results correct and speedup the simulation, the scheme for efficient PAPSoCS is proposed and the virtual topology star is constructed to match the path of message passing for solving algorithm-architecture adequation problem. Under the circumstances that messages frequently passed inter-processor are much shorter, typically within several 4 bytes, asynchronous communication mode is employed to reduce the communication ratio. Experiment results show that asynchronous parallel simulation has much higher efficiency than its synchronous counterpart.展开更多
This paper takes the Sobel operator as example to study parallel sequential algorithm onto a memory-sharing multiprocessor by using a virtual machine. Several different parallel algorithms using function decomposition...This paper takes the Sobel operator as example to study parallel sequential algorithm onto a memory-sharing multiprocessor by using a virtual machine. Several different parallel algorithms using function decomposition and/or data decomposition methods are compared and their performances are analyzed in terms of processor utilization, data traffic, shared memory access, and synchronization overhead. The analysis is validated through a simulation experiment on the virtual machine of 64 parallel processors. Conclusions are presented at the end of this paper.展开更多
In this paper, according to the parallel environment of ELXSI computer, a parallel solving process of substructure method in static and dynamic analyses of large-scale and complex structure has been put forward, and t...In this paper, according to the parallel environment of ELXSI computer, a parallel solving process of substructure method in static and dynamic analyses of large-scale and complex structure has been put forward, and the corresponding parallel computational program has been developed.展开更多
A systolic array architecture computer (FXCQ) has been designed for signal processing. R can handle floating point data at very high speed. It is composed of 16 processing cells and a cache that are connected linearly...A systolic array architecture computer (FXCQ) has been designed for signal processing. R can handle floating point data at very high speed. It is composed of 16 processing cells and a cache that are connected linearly and form a ring structure. All processing cells are identical and programmable. Each processing cell has the peak performance of 20 million floating-point operations per second (20MFLOPS). The machine therefore has a peak performance of 320 M FLOPS. It is integrated as an attached processor into a host system through VME bus interface. Programs for FXCQ are written in a high-level language -B language, which is supported by a parallel optimizing compiler. This paper describes the architecture of FXCQ, B language and its compiler.展开更多
Parallel versions of prestack KirchhofT 3D integral migration algorithm, which is suitable forseismic data processing, are described in this paper. Firstly, the inherent parallel characteristics of seismicdata process...Parallel versions of prestack KirchhofT 3D integral migration algorithm, which is suitable forseismic data processing, are described in this paper. Firstly, the inherent parallel characteristics of seismicdata processing are analyzed. Then some principles in algorithm partition are discussed. Based on these analyses and the system architecture, communication mechanism, this algorithm is divided into four subtasksallocated to four nodes of 990 STAR-l. Then we describe in detail a module-partitioning method-theI / O processing and communication are separated from the computation process, the processes includingI / O processing and communication are allocated to transputer T805 and the other is allocated to processori860. These two processes are synchronized by shared memory and memory-lock mechanism, but the communication betWeen different nodes is implemented through links of transputer. Load balance among fourprocessor modules is performed dynamically. Finally, we discussed the speed--up of the parallel versions ofprestack KirchhofT 3D integral migration algorithm running on four nodes. Some further researches are also melltioned in this paper.展开更多
It is critical in terms of approximate computation errors in VLSI multiplier circuits are increasing with technology scaling. The most common method for fast and energy efficient execution of multiplication result is ...It is critical in terms of approximate computation errors in VLSI multiplier circuits are increasing with technology scaling. The most common method for fast and energy efficient execution of multiplication result is approximation of operands. But this traditional approximate result is not suitable for image processing applications. This paper proposes the two architectures of high accurate hybrid segment approximate multiplier (HSAM) and enhanced HSAM for image compression. Existing static segment method based approximate multiplier is not suitable for certain accurate applications and dynamic segment method based approximate multiplier is not suitable for cost efficient applications. The proposed work combines the advantages of both static segment method and dynamic segment method to drive the efficiency in accuracy and cost. The proposed approximate multipliers HSAM8 × 8 and EHSAM8 × 8 provide 99.85% and 99.999% accuracy respectively for various inputs. The proposed HSAM consumes less energy with small increase of area overhead. The proposed EHSAM consumes less energy without any area overhead. The proposed HSAM and EHSAM is improved the speed by 40% and 85% compared to the existing SSM8 × 8 technique.展开更多
We present a solution based on a suitable combination of heuristics and parallel processing techniques for finding the best allocation of the financial assets of a pension fund, taking into account all the specific ru...We present a solution based on a suitable combination of heuristics and parallel processing techniques for finding the best allocation of the financial assets of a pension fund, taking into account all the specific rules of the fund. We compare the values of an objective function computed with respect to a large set (thousands) of possible scenarios for the evolution of the Net Asset Value (NAV) of the share of each asset class in which the financial capital of the fund is invested. Our approach does not depend neither on the model used for the evolution of the NAVs nor on the objective function. In particular, it does not require any linearization or similar approximations of the problem. Although we applied it to a situation in which the number of possible asset classes is limited to few units (six in the specific case), the same approach can be followed also in other cases by grouping asset classes according to their features.展开更多
Three parallel anaerobic-anoxic/anaerobic-aerobic (AN/AO) processes were developed to enrich denitrifying phosphorus removal bacteria (DPB) for low strength wastewater treatment. The main body of the parallel AN/A...Three parallel anaerobic-anoxic/anaerobic-aerobic (AN/AO) processes were developed to enrich denitrifying phosphorus removal bacteria (DPB) for low strength wastewater treatment. The main body of the parallel AN/AO process consists of an AN (anaerobic-anoxic) process and an AO (anaerobic-aerobic) process. In the AO process, the common phosphorus accumulating organisms (PAOs) was dominate, while in the AN process, DPB was dominate, The volume of anaerobic zone(Vana):anoxie zone(Vano) : aerobic zone (Vaer) for the parallel AN/AO process is 1:1:1 in contrast with a Vana:Vaer and Vano:Vaer of 1:2 and 1:4 for a traditional biological nutrient removal process (BNR). Process 3 excels in the 3 processes on the basis of COD, TN and TP removal. For 4 month operation, the effluent COD concentration of process 3 did not exceed 60 mg/L; the effluent TN concentration of process 3 was lower than 15 mg/L; and the effluent TP concentration of process 3 was lower than 1 mg/L.展开更多
文摘The Long Term Evolution (LTE) system imposes high requirements for dispatching delay.Moreover,very large air interface rate of LTE requires good processing capability for the devices processing the baseband signals.Consequently,the single-core processor cannot meet the requirements of LTE system.This paper analyzes how to use multi-core processors to achieve parallel processing of uplink demodulation and decoding in LTE systems and designs an approach to parallel processing.The test results prove that this approach works quite well.
基金The authors received no specific funding for this work.
文摘The k-Nearest Neighbor method is one of the most popular techniques for both classification and regression purposes.Because of its operation,the application of this classification may be limited to problems with a certain number of instances,particularly,when run time is a consideration.However,the classification of large amounts of data has become a fundamental task in many real-world applications.It is logical to scale the k-Nearest Neighbor method to large scale datasets.This paper proposes a new k-Nearest Neighbor classification method(KNN-CCL)which uses a parallel centroid-based and hierarchical clustering algorithm to separate the sample of training dataset into multiple parts.The introduced clustering algorithm uses four stages of successive refinements and generates high quality clusters.The k-Nearest Neighbor approach subsequently makes use of them to predict the test datasets.Finally,sets of experiments are conducted on the UCI datasets.The experimental results confirm that the proposed k-Nearest Neighbor classification method performs well with regard to classification accuracy and performance.
基金This research was performed during sabbatical leave in 2015-2016 granted to the author from The University of Jordan and spent at Philadelphia University.
文摘Purpose-The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and the corresponding carbon-based field emission controlled switching.The developed implementations are performed in the reversible domain to perform the required bijective parallel computing,where the implementations for parallel computations that utilize the presented field-emission controlled switching and their corresponding m-ary(many-valued)extensions for the use in nano systolic networks are introduced.The first part of the paper presents important fundamentals with regards to systolic computing and carbon-based field emission that will be utilized in the implementations within the second part of the paper.Design/methodology/approach-The introduced systolic systems utilize recent findings in field emission and nano applications to implement the functionality of the basic bijective systolic network.This includes many-valued systolic computing via field emission techniques using carbon-based nanotubes and nanotips.The realization of bijective logic circuits in current and emerging technologies can be very important for various reasons.The reduction of power consumption is a major requirement for the circuit design in future technologies,and thus,the new nano systolic circuits can play an important role in the design of circuits that consume minimal power for future applications such as in low-power signal processing.In addition,the implemented bijective systems can be utilized to implement massive parallel processing and thus obtaining very high processing performance,where the implementation will also utilize the significant size reduction within the nano domain.The extensions of implementations to field emission-based many-valued systolic networks using the introduced bijective nano systolic architectures are also presented.Findings-Novel bijective systolic architectures using nano-based field emission implementations are introduced in this paper,and the implementation using the general scheme of many-valued computing is presented.The carbon-based field emission implementation of nano systolic networks is also introduced.This is accomplished using the introduced field emission carbon-based devices,where field emission from carbon nanotubes and nano-apex carbon fibers is utilized.The implementations of the many-valued bijective systolic networks utilizing the introduced nano-based architectures are also presented.Originality/value-The introduced bijective systolic implementations form new important directions in the systolic realizations using the newly emerging nano-based technologies.The 2-to-1 multiplexer is a basic building block in“switch logic,”where in switch logic,a logic circuit is realized as a combination of switches rather than a combination of logic gates as in the gate logic,which proves to be less costly in synthesizing multiplexer-based wide variety of modern circuits and systems since nano implementations exist in very compact space where carbon-based devices switch reliably using much less power than silicon-based devices.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of minimum power consumption and minimum size layout such as in low-power control of autonomous robots and in the adiabatic low-power very-large-scale-integration circuit design for signal processing applications.
文摘Purpose–The purpose of this paper is to introduce new implementations for parallel processing applications using bijective systolic networks and their corresponding carbon-based field emission controlled switching.The developed implementations are performed in the reversible domain to perform the required bijective parallel computing,where the implementations for parallel computations that utilize the presented field-emission controlled switching and their corresponding many-valued(m-ary)extensions for the use in nano systolic networks are introduced.The second part of the paper introduces the implementation of systolic computing using two-to-one controlled switching via carbon-based field emission that were presented in the first part of the paper,and the computational extension to the general case of many-valued(m-ary)systolic networks utilizing many-to-one carbon-based field emission is also introduced.Design/methodology/approach–The introduced systolic systems utilize recent findings in field emission and nano applications to implement the functionality of the basic bijective systolic network.This includes many-valued systolic computing via field-emission techniques using carbon-based nanotubes and nanotips.The realization of bijective logic circuits in current and emerging technologies can be very important for various reasons.The reduction of power consumption is a major requirement for the circuit design in future technologies,and thus,the new nano systolic circuits can play an important role in the design of circuits that consume minimal power for future applications such as in low-power signal processing.In addition,the implemented bijective systems can be utilized to implement massive parallel processing and thus obtaining very high processing performance,where the implementation will also utilize the significant size reduction within the nano domain.The extensions of implementations to field emission-based many-valued systolic networks using the introduced bijective nano systolic architectures are also presented.Findings–Novel bijective systolic architectures using nano-based field emission implementations are introduced in this paper,and the implementation using the general scheme of many-valued computing is presented.The carbon-based field emission implementation of nano systolic networks is also introduced.This is accomplished using the introduced field-emission carbon-based devices,where field emission from carbon nanotubes and nano-apex carbon fibersisutilized.The implementationsof the many-valued bijective systolic networks utilizing the introduced nano-based architectures are also presented.Practical implications–The introduced bijective systolic implementations form new important directions in the systolic realizations using the newly emerging nano-based technologies.The 2-to-1 multiplexer is a basic building block in“switch logic,”where in switch logic,a logic circuit is realized as a combination of switches rather than a combination of logic gates as in the gate logic,which proves to be less costly in synthesizing multiplexer-based wide variety of modern circuits and systems since nano implementations exist in very compact space where carbon-based devices switch reliably using much less power than silicon-based devices.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of minimum power consumption and minimum size layout such as in low-power control of autonomous robots and in the adiabatic low-power VLSI circuit design for signal processing applications.Originality/value–The introduced bijective systolic implementations form new important directions in the systolic realizations utilizing the newly emerging nanotechnologies.The introduced implementations for nano systolic computation are new and interesting for the design in future nanotechnologies that require optimal design specifications of high performance,minimum power and minimum size.
基金National Natural Science Foundation of China(No.51275502)Natural Science Key Project of Anhui Province(No.KJ2011A014)+1 种基金China Postdoctoral Science Foundation funded project(NO.2012M511416)The Innovation Foundationof Anhui University and the Personnel Construction Project of Anhui University
文摘In order to improve femtosecond laser throughput,a parallel processing system consisting of liquid crystal on silicon(LCOS)device as spatial light modulator is put forward.A method is described for displaying Fourier hologram on LCOS,and a high uniformity of several diffraction peaks in the computer reconstruction is achieved.Application of this method to the parallel femtosecond laser processing is also demonstrated,and two intersecting rings and three tangent rings are fabricated respectively by one time in the photoresist.
基金This research is funded by Princess Nourah bint Abdulrahman University Researchers Supporting Project number(PNURSP2022R 151)Princess Nourah bint Abdulrahman University,Riyadh,Saudi Arabia.
文摘Suspicious mass traffic constantly evolves,making network behaviour tracing and structure more complex.Neural networks yield promising results by considering a sufficient number of processing elements with strong interconnections between them.They offer efficient computational Hopfield neural networks models and optimization constraints used by undergoing a good amount of parallelism to yield optimal results.Artificial neural network(ANN)offers optimal solutions in classifying and clustering the various reels of data,and the results obtained purely depend on identifying a problem.In this research work,the design of optimized applications is presented in an organized manner.In addition,this research work examines theoretical approaches to achieving optimized results using ANN.It mainly focuses on designing rules.The optimizing design approach of neural networks analyzes the internal process of the neural networks.Practices in developing the network are based on the interconnections among the hidden nodes and their learning parameters.The methodology is proven best for nonlinear resource allocation problems with a suitable design and complex issues.The ANN proposed here considers more or less 46k nodes hidden inside 49 million connections employed on full-fledged parallel processors.The proposed ANN offered optimal results in real-world application problems,and the results were obtained using MATLAB.
基金supported by Institute of Information&communications Technology Planning&Evaluation(ITP)grant funded by the Korea govermment(MSIT)(No.2019-0-00098,Advanced and Integrated Software Development for Electromagnetic Analysis)supported by Research Assistance Program(2021)in the Incheon National University.
文摘The finite element method is a key player in computational electromag-netics for designing RF(Radio Frequency)components such as waveguides.The frequency-domain analysis is fundamental to identify the characteristics of the components.For the conventional frequency-domain electromagnetic analysis using FEM(Finite Element Method),the system matrix is complex-numbered as well as indefinite.The iterative solvers can be faster than the direct solver when the solver convergence is guaranteed and done in a few steps.However,such complex-numbered and indefinite systems are hard to exploit the merit of the iterative solver.It is also hard to benefit from matrix factorization techniques due to varying system matrix parts according to frequency.Overall,it is hard to adopt conventional iterative solvers even though the system matrix is sparse.A new parallel iterative FEM solver for frequency domain analysis is implemented for inhomogeneous waveguide structures in this paper.In this implementation,the previous solution of the iterative solver of Matlab(Matrix Laboratory)employ-ing the preconditioner is used for the initial guess for the next step’s solution process.The overlapped parallel stage using Matlab’s Parallel Computing Toolbox is also proposed to alleviate the cold starting,which ruins the convergence of early steps in each parallel stage.Numerical experiments based on waveguide structures have demonstrated the accuracy and efficiency of the proposed scheme.
文摘This peper defines the communication-efficiency, which is directly related to the cost-efficiency, and Studies the relationship between the communication-efficiency and the processor-efficiency when they are applied to scalability analysis. An example of algorithms is given to analyze some typical architectures.
基金financially supported by the National Natural Science Foundation of China (No.41174085)
文摘Organic reefs, the targets of deep-water petro- leum exploration, developed widely in Xisha area. However, there are concealed igneous rocks undersea, to which organic rocks have nearly equal wave impedance. So the igneous rocks have become interference for future explo- ration by having similar seismic reflection characteristics. Yet, the density and magnetism of organic reefs are very different from igneous rocks. It has obvious advantages to identify organic reefs and igneous rocks by gravity and magnetic data. At first, frequency decomposition was applied to the free-air gravity anomaly in Xisha area to obtain the 2D subdivision of the gravity anomaly and magnetic anomaly in the vertical direction. Thus, the dis- tribution of igneous rocks in the horizontal direction can be acquired according to high-frequency field, low-frequency field, and its physical properties. Then, 3D forward model- ing of gravitational field was carried out to establish the density model of this area by reference to physical properties of rocks based on former researches. Furthermore, 3D inversion of gravity anomaly by genetic algorithm method of the graphic processing unit (GPU) parallel processing in Xisha target area was applied, and 3D density structure of this area was obtained. By this way, we can confine the igneous rocks to the certain depth according to the density of the igneous rocks. The frequency decomposition and 3D inversion of gravity anomaly by genetic algorithm method of the GPU parallel processing proved to be a useful method for recognizing igneous rocks to its 3D geological position. So organic reefs and igneous rocks can be identified, which provide a prescient information for further exploration.
文摘MVP is a digital signal processor, which is of MIMD structure and fit for multimedia application. MVP has several processors in it, and its operation is characteristic of parallelism and pipeline; therefore, real-time signal processing can be done on it. This paper presents the image processing system based on MVP, explains the principles of parallel task assignment and hardware pipeline design, and gives out the example of target tracking and edge detection.
基金Supported by Foundation of High Technology Pro-ject of Jiangsu (BG2004034) , Foundation of Graduate Creative Pro-gramof Jiangsu (xm04-36)
文摘This paper focuses on the parallel aggregation processing of data streams based on the shared-nothing architecture. A novel granularity-aware parallel aggregating model is proposed. It employs parallel sampling and linear regression to describe the characteristics of the data quantity in the query window in order to determine the partition granularity of tuples, and utilizes equal depth histogram to implement partitio ning. This method can avoid data skew and reduce communi cation cost. The experiment results on both synthetic data and actual data prove that the proposed method is efficient, practical and suitable for time-varying data streams processing.
文摘This paper presents partially asynchronous parallel simulation of continuous-system (PAPSoCS) and some approaches to the issues of its implementation on a multicomputer system. To guarantee the simulation results correct and speedup the simulation, the scheme for efficient PAPSoCS is proposed and the virtual topology star is constructed to match the path of message passing for solving algorithm-architecture adequation problem. Under the circumstances that messages frequently passed inter-processor are much shorter, typically within several 4 bytes, asynchronous communication mode is employed to reduce the communication ratio. Experiment results show that asynchronous parallel simulation has much higher efficiency than its synchronous counterpart.
文摘This paper takes the Sobel operator as example to study parallel sequential algorithm onto a memory-sharing multiprocessor by using a virtual machine. Several different parallel algorithms using function decomposition and/or data decomposition methods are compared and their performances are analyzed in terms of processor utilization, data traffic, shared memory access, and synchronization overhead. The analysis is validated through a simulation experiment on the virtual machine of 64 parallel processors. Conclusions are presented at the end of this paper.
文摘In this paper, according to the parallel environment of ELXSI computer, a parallel solving process of substructure method in static and dynamic analyses of large-scale and complex structure has been put forward, and the corresponding parallel computational program has been developed.
文摘A systolic array architecture computer (FXCQ) has been designed for signal processing. R can handle floating point data at very high speed. It is composed of 16 processing cells and a cache that are connected linearly and form a ring structure. All processing cells are identical and programmable. Each processing cell has the peak performance of 20 million floating-point operations per second (20MFLOPS). The machine therefore has a peak performance of 320 M FLOPS. It is integrated as an attached processor into a host system through VME bus interface. Programs for FXCQ are written in a high-level language -B language, which is supported by a parallel optimizing compiler. This paper describes the architecture of FXCQ, B language and its compiler.
文摘Parallel versions of prestack KirchhofT 3D integral migration algorithm, which is suitable forseismic data processing, are described in this paper. Firstly, the inherent parallel characteristics of seismicdata processing are analyzed. Then some principles in algorithm partition are discussed. Based on these analyses and the system architecture, communication mechanism, this algorithm is divided into four subtasksallocated to four nodes of 990 STAR-l. Then we describe in detail a module-partitioning method-theI / O processing and communication are separated from the computation process, the processes includingI / O processing and communication are allocated to transputer T805 and the other is allocated to processori860. These two processes are synchronized by shared memory and memory-lock mechanism, but the communication betWeen different nodes is implemented through links of transputer. Load balance among fourprocessor modules is performed dynamically. Finally, we discussed the speed--up of the parallel versions ofprestack KirchhofT 3D integral migration algorithm running on four nodes. Some further researches are also melltioned in this paper.
文摘It is critical in terms of approximate computation errors in VLSI multiplier circuits are increasing with technology scaling. The most common method for fast and energy efficient execution of multiplication result is approximation of operands. But this traditional approximate result is not suitable for image processing applications. This paper proposes the two architectures of high accurate hybrid segment approximate multiplier (HSAM) and enhanced HSAM for image compression. Existing static segment method based approximate multiplier is not suitable for certain accurate applications and dynamic segment method based approximate multiplier is not suitable for cost efficient applications. The proposed work combines the advantages of both static segment method and dynamic segment method to drive the efficiency in accuracy and cost. The proposed approximate multipliers HSAM8 × 8 and EHSAM8 × 8 provide 99.85% and 99.999% accuracy respectively for various inputs. The proposed HSAM consumes less energy with small increase of area overhead. The proposed EHSAM consumes less energy without any area overhead. The proposed HSAM and EHSAM is improved the speed by 40% and 85% compared to the existing SSM8 × 8 technique.
文摘We present a solution based on a suitable combination of heuristics and parallel processing techniques for finding the best allocation of the financial assets of a pension fund, taking into account all the specific rules of the fund. We compare the values of an objective function computed with respect to a large set (thousands) of possible scenarios for the evolution of the Net Asset Value (NAV) of the share of each asset class in which the financial capital of the fund is invested. Our approach does not depend neither on the model used for the evolution of the NAVs nor on the objective function. In particular, it does not require any linearization or similar approximations of the problem. Although we applied it to a situation in which the number of possible asset classes is limited to few units (six in the specific case), the same approach can be followed also in other cases by grouping asset classes according to their features.
基金The Shuguang Program of Shanghai Education Committee (No. 03SG20)
文摘Three parallel anaerobic-anoxic/anaerobic-aerobic (AN/AO) processes were developed to enrich denitrifying phosphorus removal bacteria (DPB) for low strength wastewater treatment. The main body of the parallel AN/AO process consists of an AN (anaerobic-anoxic) process and an AO (anaerobic-aerobic) process. In the AO process, the common phosphorus accumulating organisms (PAOs) was dominate, while in the AN process, DPB was dominate, The volume of anaerobic zone(Vana):anoxie zone(Vano) : aerobic zone (Vaer) for the parallel AN/AO process is 1:1:1 in contrast with a Vana:Vaer and Vano:Vaer of 1:2 and 1:4 for a traditional biological nutrient removal process (BNR). Process 3 excels in the 3 processes on the basis of COD, TN and TP removal. For 4 month operation, the effluent COD concentration of process 3 did not exceed 60 mg/L; the effluent TN concentration of process 3 was lower than 15 mg/L; and the effluent TP concentration of process 3 was lower than 1 mg/L.