Machine vision faces bottlenecks in computing power consumption and large amounts of data.Although opto-electronic hybrid neural networks can provide assistance,they usually have complex structures and are highly depe...Machine vision faces bottlenecks in computing power consumption and large amounts of data.Although opto-electronic hybrid neural networks can provide assistance,they usually have complex structures and are highly dependent on a coherent light source;therefore,they are not suitable for natural lighting environment applications.In this paper,we propose a novel lensless opto-electronic neural network architecture for machine vision applications.The architecture optimizes a passive optical mask by means of a task-oriented neural network design,performs the optical convolution calculation operation using the lensless architecture,and reduces the device size and amount of calculation required.We demonstrate the performance of handwritten digit classification tasks with a multiple-kernel mask in which accuracies of as much as 97.21%were achieved.Furthermore,we optimize a large-kernel mask to perform optical encryption for privacy-protecting face recognition,thereby obtaining the same recognition accuracy performance as no-encryption methods.Compared with the random MLS pattern,the recognition accuracy is improved by more than 6%.展开更多
The novel camera architecture facilitates the development of machine vision. Instead of capturing frame sequences in the temporal domain as traditional video cameras, Fourier Cam directly measures the pixel-wise tempo...The novel camera architecture facilitates the development of machine vision. Instead of capturing frame sequences in the temporal domain as traditional video cameras, Fourier Cam directly measures the pixel-wise temporal spectrum of the video in a single shot through optical coding. Compared to the classic video cameras and timefrequency transformation pipeline, this programmable frequency-domain sampling strategy has an attractive combination of characteristics for low detection bandwidth, low computational burden, and low data volume.Based on the various temporal filter kernel designed by Fourier Cam, we demonstrated a series of exciting machine vision functions, such as video compression, background subtraction, object extraction, and trajectory tracking.展开更多
Ever-growing deep-learning technologies are making revolutionary changes for modern life.However,conventional computing architectures are designed to process sequential and digital programs but are burdened with perfo...Ever-growing deep-learning technologies are making revolutionary changes for modern life.However,conventional computing architectures are designed to process sequential and digital programs but are burdened with performing massive parallel and adaptive deep-learning applications.Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and the power-wall brought on by its electronic counterparts,showing great potential in ultrafast and energy-free high-performance computation.Here,we propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration,termed“optical convolution unit”(OCU).We demonstrate that any real-valued convolution kernels can be exploited by the OCU with a prominent computational throughput boosting via the concept of structral reparameterization.With the OCU as the fundamental unit,we build an optical convolutional neural network(oCNN)to implement two popular deep learning tasks:classification and regression.For classification,Fashion Modified National Institute of Standards and Technology(Fashion-MNIST)and Canadian Institute for Advanced Research(CIFAR-4)data sets are tested with accuracies of 91.63%and 86.25%,respectively.For regression,we build an optical denoising convolutional neural network to handle Gaussian noise in gray-scale images with noise levelσ=10,15,and 20,resulting in clean images with an average peak signal-to-noise ratio(PSNR)of 31.70,29.39,and 27.72 dB,respectively.The proposed OCU presents remarkable performance of low energy consumption and high information density due to its fully passive nature and compact footprint,providing a parallel while lightweight solution for future compute-in-memory architecture to handle high dimensional tensors in deep learning.展开更多
For moving objects,3D mapping and tracking has found important applications in the 3D reconstruction for vision odometry or simultaneous localization and mapping.This paper presents a novel camera architecture to loca...For moving objects,3D mapping and tracking has found important applications in the 3D reconstruction for vision odometry or simultaneous localization and mapping.This paper presents a novel camera architecture to locate the fast-moving objects in four-dimensional(4D)space(x,y,z,t)through a single-shot image.Our 3D tracking system records two orthogonal fields-of-view(FoVs)with different polarization states on one polarization sensor.An optical spatial modulator is applied to build up temporal Fourier-phase coding channels,and the integration is performed in the corresponding CMOS pixels during the exposure time.With the 8 bit grayscale modulation,each coding channel can achieve 256 times temporal resolution improvement.A fast single-shot 3D tracking system with 0.78 ms temporal resolution in 200 ms exposure is experimentally demonstrated.Furthermore,it provides a new image format,Fourier-phase map,which has a compact data volume.The latent spatio-temporal information in one 2D image can be efficiently reconstructed at relatively low computation cost through the straightforward phase matching algorithm.Cooperated with scene-driven exposure as well as reasonable Fourier-phase prediction,one could acquire 4D data(x,y,z,t)of the moving objects,segment 3D motion based on temporal cues,and track targets in a complicated environment.展开更多
Integrated diffractive optical neural networks(DONNs)have significant potential for complex machine learning tasks with high speed and ultralow energy consumption.However,the on-chip implementation of a high-performan...Integrated diffractive optical neural networks(DONNs)have significant potential for complex machine learning tasks with high speed and ultralow energy consumption.However,the on-chip implementation of a high-performance optical neural network is limited by input dimensions.In contrast to existing photonic neural networks,a space-time interleaving technology based on arrayed waveguides is designed to realize an on-chip DONN with high-speed,high-dimensional,and all-optical input signal modulation.To demonstrate the performance of the on-chip DONN with high-speed space-time interleaving modulation,an on-chip DONN with a designed footprint of 0.0945 mm~2is proposed to resolve the vowel recognition task,reaching a computation speed of about 1.4×10^(13)operations per second and yielding an accuracy of 98.3%in numerical calculation.In addition,the function of the specially designed arrayed waveguides for realizing parallel signal inputs using space-time conversion has been verified experimentally.This method can realize the on-chip DONN with higher input dimension and lower energy consumption.展开更多
Non-line-of-sight[NLOS]imaging is an emerging technique for detecting objects behind obstacles or around corners.Recent studies on passive NLOS mainly focus on steady-state measurement and reconstruction methods,which...Non-line-of-sight[NLOS]imaging is an emerging technique for detecting objects behind obstacles or around corners.Recent studies on passive NLOS mainly focus on steady-state measurement and reconstruction methods,which show limitations in recognition of moving targets.To the best of our knowledge,we propose a novel event-based passive NLOS imaging method.We acquire asynchronous event-based data of the diffusion spot on the relay surface,which contains detailed dynamic information of the NLOS target,and efficiently ease the degradation caused by target movement.In addition,we demonstrate the event-based cues based on the derivation of an event-NLOS forward model.Furthermore,we propose the first event-based NLOS imaging data set,EM-NLOS,and the movement feature is extracted by time-surface representation.We compare the reconstructions through event-based data with frame-based data.The event-based method performs well on peak signal-to-noise ratio and learned perceptual image patch similarity,which is 20%and 10%better than the frame-based method.展开更多
Figures 5(b)and 5(c)in the original article[1]are not consistent with their captions.Correct images are shown as follows.The article[1]was corrected online on 29 March 2022.
基金The authors wish to acknowledge the support of the National Natural Science Foundation of China(62135009)the National Key Research and Development Program of China(2019YFB1803500)the Institute for Guo Qiang Tsinghua University.
文摘Machine vision faces bottlenecks in computing power consumption and large amounts of data.Although opto-electronic hybrid neural networks can provide assistance,they usually have complex structures and are highly dependent on a coherent light source;therefore,they are not suitable for natural lighting environment applications.In this paper,we propose a novel lensless opto-electronic neural network architecture for machine vision applications.The architecture optimizes a passive optical mask by means of a task-oriented neural network design,performs the optical convolution calculation operation using the lensless architecture,and reduces the device size and amount of calculation required.We demonstrate the performance of handwritten digit classification tasks with a multiple-kernel mask in which accuracies of as much as 97.21%were achieved.Furthermore,we optimize a large-kernel mask to perform optical encryption for privacy-protecting face recognition,thereby obtaining the same recognition accuracy performance as no-encryption methods.Compared with the random MLS pattern,the recognition accuracy is improved by more than 6%.
基金National Key Research and Development Program of China(2019YFB1803500)National Natural Science Foundation of China(61771284)。
文摘The novel camera architecture facilitates the development of machine vision. Instead of capturing frame sequences in the temporal domain as traditional video cameras, Fourier Cam directly measures the pixel-wise temporal spectrum of the video in a single shot through optical coding. Compared to the classic video cameras and timefrequency transformation pipeline, this programmable frequency-domain sampling strategy has an attractive combination of characteristics for low detection bandwidth, low computational burden, and low data volume.Based on the various temporal filter kernel designed by Fourier Cam, we demonstrated a series of exciting machine vision functions, such as video compression, background subtraction, object extraction, and trajectory tracking.
基金National Natural Science Foundation of China(62135009)Beijing Municipal Science and Technology Commission(Z221100005322010)。
文摘Ever-growing deep-learning technologies are making revolutionary changes for modern life.However,conventional computing architectures are designed to process sequential and digital programs but are burdened with performing massive parallel and adaptive deep-learning applications.Photonic integrated circuits provide an efficient approach to mitigate bandwidth limitations and the power-wall brought on by its electronic counterparts,showing great potential in ultrafast and energy-free high-performance computation.Here,we propose an optical computing architecture enabled by on-chip diffraction to implement convolutional acceleration,termed“optical convolution unit”(OCU).We demonstrate that any real-valued convolution kernels can be exploited by the OCU with a prominent computational throughput boosting via the concept of structral reparameterization.With the OCU as the fundamental unit,we build an optical convolutional neural network(oCNN)to implement two popular deep learning tasks:classification and regression.For classification,Fashion Modified National Institute of Standards and Technology(Fashion-MNIST)and Canadian Institute for Advanced Research(CIFAR-4)data sets are tested with accuracies of 91.63%and 86.25%,respectively.For regression,we build an optical denoising convolutional neural network to handle Gaussian noise in gray-scale images with noise levelσ=10,15,and 20,resulting in clean images with an average peak signal-to-noise ratio(PSNR)of 31.70,29.39,and 27.72 dB,respectively.The proposed OCU presents remarkable performance of low energy consumption and high information density due to its fully passive nature and compact footprint,providing a parallel while lightweight solution for future compute-in-memory architecture to handle high dimensional tensors in deep learning.
基金National Key Research and Development Program of China(2019YFB1803500)National Natural Science Foundation of China(61771284)Institute for Guo Qiang Tsinghua University.
文摘For moving objects,3D mapping and tracking has found important applications in the 3D reconstruction for vision odometry or simultaneous localization and mapping.This paper presents a novel camera architecture to locate the fast-moving objects in four-dimensional(4D)space(x,y,z,t)through a single-shot image.Our 3D tracking system records two orthogonal fields-of-view(FoVs)with different polarization states on one polarization sensor.An optical spatial modulator is applied to build up temporal Fourier-phase coding channels,and the integration is performed in the corresponding CMOS pixels during the exposure time.With the 8 bit grayscale modulation,each coding channel can achieve 256 times temporal resolution improvement.A fast single-shot 3D tracking system with 0.78 ms temporal resolution in 200 ms exposure is experimentally demonstrated.Furthermore,it provides a new image format,Fourier-phase map,which has a compact data volume.The latent spatio-temporal information in one 2D image can be efficiently reconstructed at relatively low computation cost through the straightforward phase matching algorithm.Cooperated with scene-driven exposure as well as reasonable Fourier-phase prediction,one could acquire 4D data(x,y,z,t)of the moving objects,segment 3D motion based on temporal cues,and track targets in a complicated environment.
基金supported by the National Natural Science Foundation of China(NSFC)(No.62135009)the Beijing Municipal Science&Technology Commission,Administrative Commission of Zhongguancun Science Park(No.Z221100005322010)。
文摘Integrated diffractive optical neural networks(DONNs)have significant potential for complex machine learning tasks with high speed and ultralow energy consumption.However,the on-chip implementation of a high-performance optical neural network is limited by input dimensions.In contrast to existing photonic neural networks,a space-time interleaving technology based on arrayed waveguides is designed to realize an on-chip DONN with high-speed,high-dimensional,and all-optical input signal modulation.To demonstrate the performance of the on-chip DONN with high-speed space-time interleaving modulation,an on-chip DONN with a designed footprint of 0.0945 mm~2is proposed to resolve the vowel recognition task,reaching a computation speed of about 1.4×10^(13)operations per second and yielding an accuracy of 98.3%in numerical calculation.In addition,the function of the specially designed arrayed waveguides for realizing parallel signal inputs using space-time conversion has been verified experimentally.This method can realize the on-chip DONN with higher input dimension and lower energy consumption.
基金supported by the National Natural Science Foundation of China(No.62031018)。
文摘Non-line-of-sight[NLOS]imaging is an emerging technique for detecting objects behind obstacles or around corners.Recent studies on passive NLOS mainly focus on steady-state measurement and reconstruction methods,which show limitations in recognition of moving targets.To the best of our knowledge,we propose a novel event-based passive NLOS imaging method.We acquire asynchronous event-based data of the diffusion spot on the relay surface,which contains detailed dynamic information of the NLOS target,and efficiently ease the degradation caused by target movement.In addition,we demonstrate the event-based cues based on the derivation of an event-NLOS forward model.Furthermore,we propose the first event-based NLOS imaging data set,EM-NLOS,and the movement feature is extracted by time-surface representation.We compare the reconstructions through event-based data with frame-based data.The event-based method performs well on peak signal-to-noise ratio and learned perceptual image patch similarity,which is 20%and 10%better than the frame-based method.
文摘Figures 5(b)and 5(c)in the original article[1]are not consistent with their captions.Correct images are shown as follows.The article[1]was corrected online on 29 March 2022.