This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating populatio...This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating population figures and studying their movement,thereby implying significant contributions to urban planning.However,existing research grapples with issues pertinent to preprocessing base station data and the modeling of population prediction.To address this,we propose methodologies for preprocessing cellular station data to eliminate any irregular or redundant data.The preprocessing reveals a distinct cyclical characteristic and high-frequency variation in population shift.Further,we devise a multi-view enhancement model grounded on the Transformer(MVformer),targeting the improvement of the accuracy of extended time-series population predictions.Comparative experiments,conducted on the above-mentioned population dataset using four alternate Transformer-based models,indicate that our proposedMVformer model enhances prediction accuracy by approximately 30%for both univariate and multivariate time-series prediction assignments.The performance of this model in tasks pertaining to population prediction exhibits commendable results.展开更多
This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, w...This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, which can characterize more than 92% of a foot, are defined by using the principal component analysis method. Then, using "active shape models", the initial 3D model is adapted to the real foot captured in multiple images by applying some constraints (edge points' distance and color variance). We insist here on the experiment part where we demonstrate the efficiency of the proposed method on a plastic foot model, and also on real human feet with various shapes. We propose and compare different ways of texturing the foot which is needed for reconstruction. We present an experiment performed on the plastic foot model and on human feet and propose two different ways to improve the final 3D shapers accuracy according to the previous experiments' results. The first improvement proposed is the densification of the cloud of points used to represent the initial model and the foot database. The second improvement concerns the projected patterns used to texture the foot. We conclude by showing the obtained results for a human foot with the average computed shape error being only 1.06 mm.展开更多
Human activity recognition is a recent area of research for researchers.Activity recognition has many applications in smart homes to observe and track toddlers or oldsters for their safety,monitor indoor and outdoor a...Human activity recognition is a recent area of research for researchers.Activity recognition has many applications in smart homes to observe and track toddlers or oldsters for their safety,monitor indoor and outdoor activities,develop Tele immersion systems,or detect abnormal activity recognition.Three dimensions(3D)skeleton data is robust and somehow view-invariant.Due to this,it is one of the popular choices for human action recognition.This paper proposed using a transversal tree from 3D skeleton data to represent videos in a sequence.Further proposed two neural networks:convolutional neural network recurrent neural network_1(CNN_RNN_1),used to find the optimal features and convolutional neural network recurrent neural network network_2(CNN_RNN_2),used to classify actions.The deep neural network-based model proposed CNN_RNN_1 and CNN_RNN_2 that uses a convolutional neural network(CNN),Long short-term memory(LSTM)and Bidirectional Long shortterm memory(BiLSTM)layered.The systemefficiently achieves the desired accuracy over state-of-the-art models,i.e.,88.89%.The performance of the proposed model compared with the existing state-of-the-art models.The NTURGB+D dataset uses for analyzing experimental results.It is one of the large benchmark datasets for human activity recognition.Moreover,the comparison results show that the proposed model outperformed the state-ofthe-art models.展开更多
Traffic accidents are caused by driver fatigue or distraction in many cases.To prevent accidents,several low-cost hypovigilance(hypo-V)systems were developed in the past based on a multimodal-hybrid(physiological and ...Traffic accidents are caused by driver fatigue or distraction in many cases.To prevent accidents,several low-cost hypovigilance(hypo-V)systems were developed in the past based on a multimodal-hybrid(physiological and behavioral)feature set.Similarly in this paper,real-time driver inattention and fatigue(Hypo-Driver)detection system is proposed through multi-view cameras and biosignal sensors to extract hybrid features.The considered features are derived from non-intrusive sensors that are related to the changes in driving behavior and visual facial expressions.To get enhanced visual facial features in uncontrolled environment,three cameras are deployed on multiview points(0◦,45◦,and 90◦)of the drivers.To develop a Hypo-Driver system,the physiological signals(electroencephalography(EEG),electrocardiography(ECG),electro-myography(sEMG),and electrooculography(EOG))and behavioral information(PERCLOS70-80-90%,mouth aspect ratio(MAR),eye aspect ratio(EAR),blinking frequency(BF),head-titled ratio(HT-R))are collected and pre-processed,then followed by feature selection and fusion techniques.The driver behaviors are classified into five stages such as normal,fatigue,visual inattention,cognitive inattention,and drowsy.This improved hypo-Driver system utilized trained behavioral features by a convolutional neural network(CNNs),recurrent neural network and long short-term memory(RNN-LSTM)model is used to extract physiological features.After fusion of these features,the Hypo-Driver system is classified hypo-V into five stages based on trained layers and dropout-layer in the deep-residual neural network(DRNN)model.To test the performance of a hypo-Driver system,data from 20 drivers are acquired.The results of Hypo-Driver compared to state-of-theart methods are presented.Compared to the state-of-the-art Hypo-V system,on average,the Hypo-Driver system achieved a detection accuracy(AC)of 96.5%.The obtained results indicate that the Hypo-Driver system based on multimodal and multiview features outperforms other state-of-the-art driver Hypo-V systems by handling many anomalies.展开更多
In this paper, we propose a new algorithm for temporally consistent depth map estimation to generate three-dimensional video. The proposed algorithm adaptively computes the matching cost using a temporal weighting fun...In this paper, we propose a new algorithm for temporally consistent depth map estimation to generate three-dimensional video. The proposed algorithm adaptively computes the matching cost using a temporal weighting function, which is obtained by block-based moving object detection and motion estimation with variable block sizes. Experimental results show that the proposed algorithm improves the temporal consistency of the depth video and reduces by about 38% both the flickering artefact in the synthesized view and the number of coding bits for depth video coding.展开更多
Unmanned Aerial Vehicles(UAV)tilt photogrammetry technology can quickly acquire image data in a short time.This technology has been widely used in all walks of life with the rapid development in recent years especiall...Unmanned Aerial Vehicles(UAV)tilt photogrammetry technology can quickly acquire image data in a short time.This technology has been widely used in all walks of life with the rapid development in recent years especially in the rapid acquisition of high-resolution remote sensing images,because of its advantages of high efficiency,reliability,low cost and high precision.Fully using the UAV tilt photogrammetry technology,the construction image progress can be observed by stages,and the construction site can be reasonably and optimally arranged through three-dimensional modeling to create a civilized,safe and tidy construction environment.展开更多
Variable size motion estimation (ME) and disparity estimation (DE) are employed to select the best coding mode for each macroblock (MB) in the current joint multiview video model (JMVM). This technique achieve...Variable size motion estimation (ME) and disparity estimation (DE) are employed to select the best coding mode for each macroblock (MB) in the current joint multiview video model (JMVM). This technique achieves the highest possible coding efficiency, but it results in extremely large computation complexity which obstructs the multiview video coding (MVC) from practical application. This paper proposes an adaptive early termination of fast mode decision algorithm for MVC. It makes use of the coding information of the corresponding MBs in neighbor view based on inter-view correlation to early terminate the mode decision procedure. Experimental results show that the proposed fast mode decision algorithm can achieve computational 50% computation saving with no significant loss of rate distortion (RD) performance.展开更多
Recently, the reference functions for the synthesis and analysis of the autostereoscopic multiview and integral images in three-dimensional displays were introduced. In the current paper, we propose the wavelets to an...Recently, the reference functions for the synthesis and analysis of the autostereoscopic multiview and integral images in three-dimensional displays were introduced. In the current paper, we propose the wavelets to analyze such images. The wavelets are built on these reference functions as on the scaling functions of the wavelet analysis. The continuous wavelet transform was successfully applied to the testing wireframe binary objects. The restored locations correspond to the structure of the testing wireframe binary objects.展开更多
View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compr...View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compression) or view rendering(for multiview-display).The quality of view synthesis depends on how one fills the occlusion area as well as how the pixels are created.Consequently,luminance adjustment and hole filling are two key issues in view synthesis.In this paper,two views are used to produce an arbitrary virtual synthesized view.One view is merged into another view using a local luminance adjustment method,based on local neighborhood region for the calculation of adjustment coefficient.Moreover,a maximum neighborhood spreading strength hole filling method is presented to deal with the micro texture structure when the hole is being filled.For each pixel at the hole boundary,its neighborhood pixels with the maximum spreading strength direction are selected as candidates;and among them,the pixel with the maximum spreading strength is used to fill the hole from boundary to center.If there still exist disocclusion pixels after once scan,the filling process is repeated until all hole pixels are filled.Simulation results show that the proposed method is efficient,robust and achieves high performance in subjection and objection.展开更多
The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached ...The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.展开更多
Targeting at a reliable image matching of multiple remote sensing images for the generation of digital surface models,this paper presents a geometric-constrained multi-view image matching method,based on an energy min...Targeting at a reliable image matching of multiple remote sensing images for the generation of digital surface models,this paper presents a geometric-constrained multi-view image matching method,based on an energy minimization framework.By employing a geometrical constraint,the cost value of the energy function was calculated from multiple images,and the cost value was aggregated in an image space using a semi-global optimization approach.A homography transform parameter calculation method is proposed for fast calculation of projection pixel on each image when calculation cost values.It is based on the known interior orientation parameters,exterior orientation parameters,and a given elevation value.For an efficient and reliable processing of multiple remote sensing images,the proposed matching method was performed via a coarse-to-fine strategy through image pyramid.Three sets of airborne remote sensing images were used to evaluate the performance of the proposed method.Results reveal that the multi-view image matching can improve matching reliability.Moreover,the experimental results show that the proposed method performs better than traditional methods.展开更多
Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users c...Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.展开更多
The aspect-based sentiment analysis(ABSA) consists of two subtasks—aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies...The aspect-based sentiment analysis(ABSA) consists of two subtasks—aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies some problems in performance and real application. This study investigates the end-to-end ABSA and proposes a novel multitask multiview network(MTMVN) architecture. Specifically, the architecture takes the unified ABSA as the main task with the two subtasks as auxiliary tasks. Meanwhile, the representation obtained from the branch network of the main task is regarded as the global view, whereas the representations of the two subtasks are considered two local views with different emphases. Through multitask learning, the main task can be facilitated by additional accurate aspect boundary information and sentiment polarity information. By enhancing the correlations between the views under the idea of multiview learning, the representation of the global view can be optimized to improve the overall performance of the model. The experimental results on three benchmark datasets show that the proposed method exceeds the existing pipeline methods and end-to-end methods, proving the superiority of our MTMVN architecture.展开更多
Whole-body optical imaging of post-embryonic stage model organisms is a challenging and long sought-after goal.It requires a combination of high-resolution performance and high-penetration depth.Optoacoustic(photoacou...Whole-body optical imaging of post-embryonic stage model organisms is a challenging and long sought-after goal.It requires a combination of high-resolution performance and high-penetration depth.Optoacoustic(photoacoustic)mesoscopy holds great promise,as it penetrates deeper than optical and optoacoustic microscopy while providing high-spatial resolution.However,optoacoustic mesoscopic techniques only offer partial visibility of oriented structures,such as blood vessels,due to a limited angular detection aperture or the use of ultrasound frequencies that yield insufficient resolution.We introduce 3601 multi orientation(multi-projection)raster scan optoacoustic mesoscopy(MORSOM)based on detecting an ultra-wide frequency bandwidth(up to 160 MHz)and weighted deconvolution to synthetically enlarge the angular aperture.We report unprecedented isotropic inplane resolution at the 9–17μm range and improved signal to noise ratio in phantoms and opaque 21-day-old Zebrafish.We find that MORSOM performance defines a new operational specification for optoacoustic mesoscopy of adult organisms,with possible applications in the developmental biology of adulthood and aging.展开更多
基金Guangdong Basic and Applied Basic Research Foundation under Grant No.2024A1515012485in part by the Shenzhen Fundamental Research Program under Grant JCYJ20220810112354002.
文摘This paper addresses the problem of predicting population density leveraging cellular station data.As wireless communication devices are commonly used,cellular station data has become integral for estimating population figures and studying their movement,thereby implying significant contributions to urban planning.However,existing research grapples with issues pertinent to preprocessing base station data and the modeling of population prediction.To address this,we propose methodologies for preprocessing cellular station data to eliminate any irregular or redundant data.The preprocessing reveals a distinct cyclical characteristic and high-frequency variation in population shift.Further,we devise a multi-view enhancement model grounded on the Transformer(MVformer),targeting the improvement of the accuracy of extended time-series population predictions.Comparative experiments,conducted on the above-mentioned population dataset using four alternate Transformer-based models,indicate that our proposedMVformer model enhances prediction accuracy by approximately 30%for both univariate and multivariate time-series prediction assignments.The performance of this model in tasks pertaining to population prediction exhibits commendable results.
基金This work was supported by Grant-in-Aid for Scientific Research (C) (No.17500119)
文摘This paper describes a multiple camera-based method to reconstruct the 3D shape of a human foot. From a foot database, an initial 3D model of the foot represented by a cloud of points is built. The shape parameters, which can characterize more than 92% of a foot, are defined by using the principal component analysis method. Then, using "active shape models", the initial 3D model is adapted to the real foot captured in multiple images by applying some constraints (edge points' distance and color variance). We insist here on the experiment part where we demonstrate the efficiency of the proposed method on a plastic foot model, and also on real human feet with various shapes. We propose and compare different ways of texturing the foot which is needed for reconstruction. We present an experiment performed on the plastic foot model and on human feet and propose two different ways to improve the final 3D shapers accuracy according to the previous experiments' results. The first improvement proposed is the densification of the cloud of points used to represent the initial model and the foot database. The second improvement concerns the projected patterns used to texture the foot. We conclude by showing the obtained results for a human foot with the average computed shape error being only 1.06 mm.
文摘Human activity recognition is a recent area of research for researchers.Activity recognition has many applications in smart homes to observe and track toddlers or oldsters for their safety,monitor indoor and outdoor activities,develop Tele immersion systems,or detect abnormal activity recognition.Three dimensions(3D)skeleton data is robust and somehow view-invariant.Due to this,it is one of the popular choices for human action recognition.This paper proposed using a transversal tree from 3D skeleton data to represent videos in a sequence.Further proposed two neural networks:convolutional neural network recurrent neural network_1(CNN_RNN_1),used to find the optimal features and convolutional neural network recurrent neural network network_2(CNN_RNN_2),used to classify actions.The deep neural network-based model proposed CNN_RNN_1 and CNN_RNN_2 that uses a convolutional neural network(CNN),Long short-term memory(LSTM)and Bidirectional Long shortterm memory(BiLSTM)layered.The systemefficiently achieves the desired accuracy over state-of-the-art models,i.e.,88.89%.The performance of the proposed model compared with the existing state-of-the-art models.The NTURGB+D dataset uses for analyzing experimental results.It is one of the large benchmark datasets for human activity recognition.Moreover,the comparison results show that the proposed model outperformed the state-ofthe-art models.
基金The authors extend their appreciation to the Deanship of Scientific Research at Imam Mohammad Ibn Saud Islamic University for funding this work through Research Group no.RG-21-07-01.
文摘Traffic accidents are caused by driver fatigue or distraction in many cases.To prevent accidents,several low-cost hypovigilance(hypo-V)systems were developed in the past based on a multimodal-hybrid(physiological and behavioral)feature set.Similarly in this paper,real-time driver inattention and fatigue(Hypo-Driver)detection system is proposed through multi-view cameras and biosignal sensors to extract hybrid features.The considered features are derived from non-intrusive sensors that are related to the changes in driving behavior and visual facial expressions.To get enhanced visual facial features in uncontrolled environment,three cameras are deployed on multiview points(0◦,45◦,and 90◦)of the drivers.To develop a Hypo-Driver system,the physiological signals(electroencephalography(EEG),electrocardiography(ECG),electro-myography(sEMG),and electrooculography(EOG))and behavioral information(PERCLOS70-80-90%,mouth aspect ratio(MAR),eye aspect ratio(EAR),blinking frequency(BF),head-titled ratio(HT-R))are collected and pre-processed,then followed by feature selection and fusion techniques.The driver behaviors are classified into five stages such as normal,fatigue,visual inattention,cognitive inattention,and drowsy.This improved hypo-Driver system utilized trained behavioral features by a convolutional neural network(CNNs),recurrent neural network and long short-term memory(RNN-LSTM)model is used to extract physiological features.After fusion of these features,the Hypo-Driver system is classified hypo-V into five stages based on trained layers and dropout-layer in the deep-residual neural network(DRNN)model.To test the performance of a hypo-Driver system,data from 20 drivers are acquired.The results of Hypo-Driver compared to state-of-theart methods are presented.Compared to the state-of-the-art Hypo-V system,on average,the Hypo-Driver system achieved a detection accuracy(AC)of 96.5%.The obtained results indicate that the Hypo-Driver system based on multimodal and multiview features outperforms other state-of-the-art driver Hypo-V systems by handling many anomalies.
基金supported by the National Research Foundation of Korea Grant funded by the Korea Ministry of Science and Technology under Grant No. 2012-0009228
文摘In this paper, we propose a new algorithm for temporally consistent depth map estimation to generate three-dimensional video. The proposed algorithm adaptively computes the matching cost using a temporal weighting function, which is obtained by block-based moving object detection and motion estimation with variable block sizes. Experimental results show that the proposed algorithm improves the temporal consistency of the depth video and reduces by about 38% both the flickering artefact in the synthesized view and the number of coding bits for depth video coding.
文摘Unmanned Aerial Vehicles(UAV)tilt photogrammetry technology can quickly acquire image data in a short time.This technology has been widely used in all walks of life with the rapid development in recent years especially in the rapid acquisition of high-resolution remote sensing images,because of its advantages of high efficiency,reliability,low cost and high precision.Fully using the UAV tilt photogrammetry technology,the construction image progress can be observed by stages,and the construction site can be reasonably and optimally arranged through three-dimensional modeling to create a civilized,safe and tidy construction environment.
基金supported by the Natural Science Foundation of Shanghai Municipality(Grant No.09ZR1412500)the Shanghai Rising-Star Program(Grant No.11QA1402400)+1 种基金the National Natural Science Foundation of China(Grant Nos.60832003and60902085)the Key Laboratory of Advanced Display and System Applications(Shanghai University),Ministry of Education,China(Grant No.P200801)
文摘Variable size motion estimation (ME) and disparity estimation (DE) are employed to select the best coding mode for each macroblock (MB) in the current joint multiview video model (JMVM). This technique achieves the highest possible coding efficiency, but it results in extremely large computation complexity which obstructs the multiview video coding (MVC) from practical application. This paper proposes an adaptive early termination of fast mode decision algorithm for MVC. It makes use of the coding information of the corresponding MBs in neighbor view based on inter-view correlation to early terminate the mode decision procedure. Experimental results show that the proposed fast mode decision algorithm can achieve computational 50% computation saving with no significant loss of rate distortion (RD) performance.
文摘Recently, the reference functions for the synthesis and analysis of the autostereoscopic multiview and integral images in three-dimensional displays were introduced. In the current paper, we propose the wavelets to analyze such images. The wavelets are built on these reference functions as on the scaling functions of the wavelet analysis. The continuous wavelet transform was successfully applied to the testing wireframe binary objects. The restored locations correspond to the structure of the testing wireframe binary objects.
基金supported by the National Natural Science Foundation of China(61075013)
文摘View synthesis is an important building block in three dimension(3D) video processing and communications.Based on one or several views,view synthesis creates other views for the purpose of view prediction(for compression) or view rendering(for multiview-display).The quality of view synthesis depends on how one fills the occlusion area as well as how the pixels are created.Consequently,luminance adjustment and hole filling are two key issues in view synthesis.In this paper,two views are used to produce an arbitrary virtual synthesized view.One view is merged into another view using a local luminance adjustment method,based on local neighborhood region for the calculation of adjustment coefficient.Moreover,a maximum neighborhood spreading strength hole filling method is presented to deal with the micro texture structure when the hole is being filled.For each pixel at the hole boundary,its neighborhood pixels with the maximum spreading strength direction are selected as candidates;and among them,the pixel with the maximum spreading strength is used to fill the hole from boundary to center.If there still exist disocclusion pixels after once scan,the filling process is repeated until all hole pixels are filled.Simulation results show that the proposed method is efficient,robust and achieves high performance in subjection and objection.
文摘The use of hand gestures can be the most intuitive human-machine interaction medium.The early approaches for hand gesture recognition used device-based methods.These methods use mechanical or optical sensors attached to a glove or markers,which hinder the natural human-machine communication.On the other hand,vision-based methods are less restrictive and allow for a more spontaneous communication without the need of an intermediary between human and machine.Therefore,vision gesture recognition has been a popular area of research for the past thirty years.Hand gesture recognition finds its application in many areas,particularly the automotive industry where advanced automotive human-machine interface(HMI)designers are using gesture recognition to improve driver and vehicle safety.However,technology advances go beyond active/passive safety and into convenience and comfort.In this context,one of America’s big three automakers has partnered with the Centre of Pattern Analysis and Machine Intelligence(CPAMI)at the University of Waterloo to investigate expanding their product segment through machine learning to provide an increased driver convenience and comfort with the particular application of hand gesture recognition for autonomous car parking.The present paper leverages the state-of-the-art deep learning and optimization techniques to develop a vision-based multiview dynamic hand gesture recognizer for a self-parking system.We propose a 3D-CNN gesture model architecture that we train on a publicly available hand gesture database.We apply transfer learning methods to fine-tune the pre-trained gesture model on custom-made data,which significantly improves the proposed system performance in a real world environment.We adapt the architecture of end-to-end solution to expand the state-of-the-art video classifier from a single image as input(fed by monocular camera)to a Multiview 360 feed,offered by a six cameras module.Finally,we optimize the proposed solution to work on a limited resource embedded platform(Nvidia Jetson TX2)that is used by automakers for vehicle-based features,without sacrificing the accuracy robustness and real time functionality of the system.
基金This work was supported by the National Key Research and Development Program of China[grant number 2017YFC0803802]and the National Natural Science Foundation of China[grant number 41771486].
文摘Targeting at a reliable image matching of multiple remote sensing images for the generation of digital surface models,this paper presents a geometric-constrained multi-view image matching method,based on an energy minimization framework.By employing a geometrical constraint,the cost value of the energy function was calculated from multiple images,and the cost value was aggregated in an image space using a semi-global optimization approach.A homography transform parameter calculation method is proposed for fast calculation of projection pixel on each image when calculation cost values.It is based on the known interior orientation parameters,exterior orientation parameters,and a given elevation value.For an efficient and reliable processing of multiple remote sensing images,the proposed matching method was performed via a coarse-to-fine strategy through image pyramid.Three sets of airborne remote sensing images were used to evaluate the performance of the proposed method.Results reveal that the multi-view image matching can improve matching reliability.Moreover,the experimental results show that the proposed method performs better than traditional methods.
基金partially supported by Innoviris(3-DLicornea project)FWO(project G.0256.15)+3 种基金supported by the National Natural Science Foundation of China(Nos.61272226 and 61373069)Research Grant of Beijing Higher Institution Engineering Research CenterTsinghua-Tencent Joint Laboratory for Internet Innovation TechnologyTsinghua University Initiative Scientific Research Program
文摘Multiview video can provide more immersive perception than traditional single 2-D video. It enables both interactive free navigation applications as well as high-end autostereoscopic displays on which multiple users can perceive genuine 3-D content without glasses. The multiview format also comprises much more visual information than classical 2-D or stereo 3-D content, which makes it possible to perform various interesting editing operations both on pixel-level and object-level. This survey provides a comprehensive review of existing multiview video synthesis and editing algorithms and applications. For each topic, the related technologies in classical 2-D image and video processing are reviewed. We then continue to the discussion of recent advanced techniques for multiview video virtual view synthesis and various interactive editing applications. Due to the ongoing progress on multiview video synthesis and editing, we can foresee more and more immersive 3-D video applications will appear in the future.
基金supported by the National Natural Science Foundation of China(No.61976247)
文摘The aspect-based sentiment analysis(ABSA) consists of two subtasks—aspect term extraction and aspect sentiment prediction. Existing methods deal with both subtasks one by one in a pipeline manner, in which there lies some problems in performance and real application. This study investigates the end-to-end ABSA and proposes a novel multitask multiview network(MTMVN) architecture. Specifically, the architecture takes the unified ABSA as the main task with the two subtasks as auxiliary tasks. Meanwhile, the representation obtained from the branch network of the main task is regarded as the global view, whereas the representations of the two subtasks are considered two local views with different emphases. Through multitask learning, the main task can be facilitated by additional accurate aspect boundary information and sentiment polarity information. By enhancing the correlations between the views under the idea of multiview learning, the representation of the global view can be optimized to improve the overall performance of the model. The experimental results on three benchmark datasets show that the proposed method exceeds the existing pipeline methods and end-to-end methods, proving the superiority of our MTMVN architecture.
基金sponsored by the Federal Ministry of Education and Research,Photonic Science Germany,Tech2See-13N12624.
文摘Whole-body optical imaging of post-embryonic stage model organisms is a challenging and long sought-after goal.It requires a combination of high-resolution performance and high-penetration depth.Optoacoustic(photoacoustic)mesoscopy holds great promise,as it penetrates deeper than optical and optoacoustic microscopy while providing high-spatial resolution.However,optoacoustic mesoscopic techniques only offer partial visibility of oriented structures,such as blood vessels,due to a limited angular detection aperture or the use of ultrasound frequencies that yield insufficient resolution.We introduce 3601 multi orientation(multi-projection)raster scan optoacoustic mesoscopy(MORSOM)based on detecting an ultra-wide frequency bandwidth(up to 160 MHz)and weighted deconvolution to synthetically enlarge the angular aperture.We report unprecedented isotropic inplane resolution at the 9–17μm range and improved signal to noise ratio in phantoms and opaque 21-day-old Zebrafish.We find that MORSOM performance defines a new operational specification for optoacoustic mesoscopy of adult organisms,with possible applications in the developmental biology of adulthood and aging.