With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capac...With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.展开更多
We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3...We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3D image mo- saicing technique developed in our previous work is a very powerful method for creating textured 3D-GIS data without excessive data processing like the laser or stereo system. For the Web-based open access to the 3D video mosaics, we build an interactive visualization environment using X3D, the emerging standard of Web 3D. We conduct the data preprocessing for 3D video mosaics and the X3D modeling for textured 3D data. The data preprocessing includes the conversion of each frame of 3D video mosaics into concatenated image files that can be hyperlinked on the Web. The X3D modeling handles the representation of concatenated images using necessary X3D nodes. By employing X3D as the data format for 3D image mosaics, the real 3D representation of roadside buildings is extended to the Web and mobile service systems.展开更多
In recent years,real-time video streaming has grown in popularity.The growing popularity of the Internet of Things(IoT)and other wireless heterogeneous networks mandates that network resources be carefully apportioned...In recent years,real-time video streaming has grown in popularity.The growing popularity of the Internet of Things(IoT)and other wireless heterogeneous networks mandates that network resources be carefully apportioned among versatile users in order to achieve the best Quality of Experience(QoE)and performance objectives.Most researchers focused on Forward Error Correction(FEC)techniques when attempting to strike a balance between QoE and performance.However,as network capacity increases,the performance degrades,impacting the live visual experience.Recently,Deep Learning(DL)algorithms have been successfully integrated with FEC to stream videos across multiple heterogeneous networks.But these algorithms need to be changed to make the experience better without sacrificing packet loss and delay time.To address the previous challenge,this paper proposes a novel intelligent algorithm that streams video in multi-home heterogeneous networks based on network-centric characteristics.The proposed framework contains modules such as Intelligent Content Extraction Module(ICEM),Channel Status Monitor(CSM),and Adaptive FEC(AFEC).This framework adopts the Cognitive Learning-based Scheduling(CLS)Module,which works on the deep Reinforced Gated Recurrent Networks(RGRN)principle and embeds them along with the FEC to achieve better performances.The complete framework was developed using the Objective Modular Network Testbed in C++(OMNET++),Internet networking(INET),and Python 3.10,with Keras as the front end and Tensorflow 2.10 as the back end.With extensive experimentation,the proposed model outperforms the other existing intelligentmodels in terms of improving the QoE,minimizing the End-to-End Delay(EED),and maintaining the highest accuracy(98%)and a lower Root Mean Square Error(RMSE)value of 0.001.展开更多
The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design ...The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design principles of H.264/AVC. Self-contained network abstraction layer units (NAL units) form natural entities for packetization. The SVC specification is by no means finalized yet, but nevertheless the work towards an optimized RTP payload format has already started. RFC 3984, the RTP payload specification for H.264/AVC has been taken as a starting point, but it became quickly clear that the scalable features of SVC require adaptation in at least the areas of capability/operation point signaling and documentation of the extended NAL unit header. This paper first gives an overview of the history of scalable video coding, and then reviews the video coding layer (VCL) and NAL of the latest SVC draft specification. Finally, it discusses different aspects of the draft SVC RTP payload format, in- cluding the design criteria, use cases, signaling and payload structure.展开更多
Image mosaicking is widely used in Geographic Information Systems(GISs)for largescale ground surface analysis.However,most existing mosaicking methods can only be used in offline processing due to the enormous amounts...Image mosaicking is widely used in Geographic Information Systems(GISs)for largescale ground surface analysis.However,most existing mosaicking methods can only be used in offline processing due to the enormous amounts of computation.In this paper,we propose a novel and practical algorithm for real-time infrared video mosaicking.To achieve this,a novel fast template matching algorithm based on Sum of Cosine Differences(SCD)is proposed to coarsely match the sequential images.The high speed of the proposed template matching algorithm is obtained by computing correlation with Fast Fourier Transform(FFT).We also propose a novel fast Least Squares Matching(LSM)algorithm for inter-frame fine registration,which can significantly reduce the computation without degrading the matching accuracy.In addition,the proposed fast LSM can effectively adapt for noise degradation and geometric distortion.Based on the proposed fast template matching algorithm and fine registration algorithm,we develop a practical real-time mosaicking approach which can produce seamless mosaic image highly efficiently.Experiments on synthetic and real-world datasets demonstrate that the proposed algorithm is not just computationally efficient but also robust against various noise distortions.展开更多
360 video streaming services over the network are becoming popular. In particular, it is easy to experience 360 video through the already popular smartphone. However, due to the nature of 360 video, it is difficult to...360 video streaming services over the network are becoming popular. In particular, it is easy to experience 360 video through the already popular smartphone. However, due to the nature of 360 video, it is difficult to provide stable streaming service in general network environment because the size of data to send is larger than that of conventional video. Also, the real user's viewing area is very small compared to the sending amount. In this paper, we propose a system that can provide high quality 360 video streaming services to the users more efficiently in the cloud. In particular, we propose a streaming system focused on using a head mount display (HMD).展开更多
This paper presents a full view video navigation platform and the approach to buildup such a platform.Unlike the traditional full view navigation system,which always bases on image panorama,our system is based on full...This paper presents a full view video navigation platform and the approach to buildup such a platform.Unlike the traditional full view navigation system,which always bases on image panorama,our system is based on full view video created by video mosaics.Neither the current concept of video mosaics,which means posting together video frames to get a wide view,while these frames are from the same video.Video mosaics in this paper mean stitching different videos together to get a full view video.Videos,together recording 360° of the scenery,are captured through a certain route by digital cameras arranged in a circle around the view point.Then break up these videos into frame sequence.Frame coming from different videos with the same timestamp are stitched into a panoramic image.Consider these panoramic images as frames;montage these panoramic frames into a full view video,by sequence of time.While displaying the full view video people could decide to go forward,or stop to see around,or proceed in a certain direction.While talking about full view navigation,we are mostly talking about image-based cylindrical or spherical panorama [2].One way to create such a panorama is,after a series of translation,stitching together a group of images taken by panning camera [1].Each of these images presents part of the scene and with certain degrees of overlap to adjacent ones.After stitching,we get a panorama image with width is much bigger than the raw images,map it onto a cylinder or sphere surface is a better way to show it.The mapped one is called cylindrical panorama or spherical panorama.While navigating,taking sphere panorama as an example,people were supposed to standing at the center of the sphere,by rotating the sphere,different view coming into eyes.This kind of navigation makes people feel as the same as standing in the real world.But one limitation is that the step is stuck to one point,could see around but not move forward or back(there is moving forward and back by zooming in and zooming out respectively,but it's not truly move).In our work,videos are used as the stitching source instead of images.Video,or could say a sequence of images taken at continuous view point(actually it is the definition of 'video',and is the saying we are going to take advantage of in this paper),are captured by a camera circle,and are stitched into a full view video.Frames coming from different video with the same timestamp are stitched into a panorama frame.Transform the processed panorama frames into a video.展开更多
Automated live video stream analytics has been extensively researched in recent times.Most of the traditional methods for video anomaly detection is supervised and use a single classifier to identify an anomaly in a f...Automated live video stream analytics has been extensively researched in recent times.Most of the traditional methods for video anomaly detection is supervised and use a single classifier to identify an anomaly in a frame.We propose a 3-stage ensemble-based unsupervised deep reinforcement algorithm with an underlying Long Short Term Memory(LSTM)based Recurrent Neural Network(RNN).In the first stage,an ensemble of LSTM-RNNs are deployed to generate the anomaly score.The second stage uses the least square method for optimal anomaly score generation.The third stage adopts award-based reinforcement learning to update the model.The proposed Hybrid Ensemble RR Model was tested on standard pedestrian datasets UCSDPed1,USDPed2.The data set has 70 videos in UCSD Ped1 and 28 videos in UCSD Ped2 with a total of 18560 frames.Since a real-time stream has strict memory constraints and storage issues,a simple computing machine does not suffice in performing analytics with stream data.Hence the proposed research is designed to work on a GPU(Graphics Processing Unit),TPU(Tensor Processing Unit)supported framework.As shown in the experimental results section,recorded observations on framelevel EER(Equal Error Rate)and AUC(Area Under Curve)showed a 9%reduction in EER in UCSD Ped1,a 13%reduction in ERR in UCSD Ped2 and a 4%improvement in accuracy in both datasets.展开更多
An Augmented virtual environment(AVE)is concerned with the fusion of real-time video with 3D models or scenes so as to augment the virtual environment.In this paper,a new approach to establish an AVE with a wide field...An Augmented virtual environment(AVE)is concerned with the fusion of real-time video with 3D models or scenes so as to augment the virtual environment.In this paper,a new approach to establish an AVE with a wide field of view is proposed,including real-time video projection,multiple video texture fusion and 3D visualization of moving objects.A new diagonally weighted algorithm is proposed to smooth the apparent gaps within the overlapping area between the two adjacent videos.A visualization method for the location and trajectory of a moving virtual object is proposed to display the moving object and its trajectory in the 3D virtual environment.The experimental results showed that the proposed set of algorithms are able to fuse multiple real-time videos with 3D models efficiently,and the experiment runs a 3D scene containing two million triangles and six real-time videos at around 55 frames per second on a laptop with 1GB of graphics card memory.In addition,a realistic AVE with a wide field of view was created based on the Digital Earth Science Platform by fusing three videos with a complex indoor virtual scene,visualizing a moving object and drawing its trajectory in the real time.展开更多
Real-time variable bit rate(VBR) video is expected to take a significant portion of multimedia applications.However,plentiful challenges to VBR video service provision have been raised for its characteristic of high...Real-time variable bit rate(VBR) video is expected to take a significant portion of multimedia applications.However,plentiful challenges to VBR video service provision have been raised for its characteristic of high traffic abruptness.To support multi-user real-time VBR video transmission with high bandwidth utilization and satisfied quality of service(QoS),this article proposes a practical dynamic bandwidth management scheme.This scheme forecasts future media rate of VBR video by employing time-domain adaptive linear predictor and using media delivery index(MDI) as both QoS measurement and complementary management reference.In addition,to support multi-user application,an adjustment priorities classified strategy is also put forward.Finally,a test-bed based on this management scheme is established.The experimental results demonstrate that the scheme proposed in this article is efficient with bandwidth utilization increased by 20%-60% compared to a fixed service rate and QoS guaranteed.展开更多
Real-time video streaming using ultra-wideband(UWB) technology is experimentally demonstrated along long-reach passive optical networks(LR-PONs) with different wired and wireless reaches. Experimental tests using exte...Real-time video streaming using ultra-wideband(UWB) technology is experimentally demonstrated along long-reach passive optical networks(LR-PONs) with different wired and wireless reaches. Experimental tests using external and direct modulation with UWB wireless radiation in the 10- and 60-GHz bands are performed. An ultra-bendable fiber is also considered for a last-mile distribution. The video quality at the output of the optical fiber infrastructure of the LR-PON is assessed using the error vector magnitude(EVM), and the link quality indicator(LQI) is used as a figure of merit after wireless radiation. An EVM below –17 dB is achieved for both externally and directly modulated LR-PONs comprising up to 125 km of optical fiber. EVM improvement is observed for longer LR-PONs when directly modulated lasers(DMLs) are used because of the amplitude gain provided by the combined effect of dispersion and DML's chirp. Compared with optical back-to-back operation, the LQI level degrades to the maximum around 20% for LR-PONs ranging between 75 and 125 km of fiber reach and with a wireless coverage of 2 m in the 10-GHz UWB band. The same level of LQI degradation is observed using the 60-GHz UWB band with a LR-PON integrating 101 km of access network, a last-mile distribution using ultra-bendable fiber, and a 5.2-m wireless link.展开更多
文摘With the increasing popularity of solid sate lighting devices, Visible Light Communication (VLC) is globally recognized as an advanced and promising technology to realize short-range, high speed as well as large capacity wireless data transmission. In this paper, we propose a prototype of real-time audio and video broadcast system using inexpensive commercially available light emitting diode (LED) lamps. Experimental results show that real-time high quality audio and video with the maximum distance of 3 m can be achieved through proper layout of LED sources and improvement of concentration effects. Lighting model within room environment is designed and simulated which indicates close relationship between layout of light sources and distribution of illuminance.
文摘We present a method of 3D image mosaicing for real 3D representation of roadside buildings, and implement a Web-based interactive visualization environment for the 3D video mosaics created by 3D image mosaicing. The 3D image mo- saicing technique developed in our previous work is a very powerful method for creating textured 3D-GIS data without excessive data processing like the laser or stereo system. For the Web-based open access to the 3D video mosaics, we build an interactive visualization environment using X3D, the emerging standard of Web 3D. We conduct the data preprocessing for 3D video mosaics and the X3D modeling for textured 3D data. The data preprocessing includes the conversion of each frame of 3D video mosaics into concatenated image files that can be hyperlinked on the Web. The X3D modeling handles the representation of concatenated images using necessary X3D nodes. By employing X3D as the data format for 3D image mosaics, the real 3D representation of roadside buildings is extended to the Web and mobile service systems.
文摘In recent years,real-time video streaming has grown in popularity.The growing popularity of the Internet of Things(IoT)and other wireless heterogeneous networks mandates that network resources be carefully apportioned among versatile users in order to achieve the best Quality of Experience(QoE)and performance objectives.Most researchers focused on Forward Error Correction(FEC)techniques when attempting to strike a balance between QoE and performance.However,as network capacity increases,the performance degrades,impacting the live visual experience.Recently,Deep Learning(DL)algorithms have been successfully integrated with FEC to stream videos across multiple heterogeneous networks.But these algorithms need to be changed to make the experience better without sacrificing packet loss and delay time.To address the previous challenge,this paper proposes a novel intelligent algorithm that streams video in multi-home heterogeneous networks based on network-centric characteristics.The proposed framework contains modules such as Intelligent Content Extraction Module(ICEM),Channel Status Monitor(CSM),and Adaptive FEC(AFEC).This framework adopts the Cognitive Learning-based Scheduling(CLS)Module,which works on the deep Reinforced Gated Recurrent Networks(RGRN)principle and embeds them along with the FEC to achieve better performances.The complete framework was developed using the Objective Modular Network Testbed in C++(OMNET++),Internet networking(INET),and Python 3.10,with Keras as the front end and Tensorflow 2.10 as the back end.With extensive experimentation,the proposed model outperforms the other existing intelligentmodels in terms of improving the QoE,minimizing the End-to-End Delay(EED),and maintaining the highest accuracy(98%)and a lower Root Mean Square Error(RMSE)value of 0.001.
文摘The scalable extension of H.264/AVC, known as scalable video coding or SVC, is currently the main focus of the Joint Video Team’s work. In its present working draft, the higher level syntax of SVC follows the design principles of H.264/AVC. Self-contained network abstraction layer units (NAL units) form natural entities for packetization. The SVC specification is by no means finalized yet, but nevertheless the work towards an optimized RTP payload format has already started. RFC 3984, the RTP payload specification for H.264/AVC has been taken as a starting point, but it became quickly clear that the scalable features of SVC require adaptation in at least the areas of capability/operation point signaling and documentation of the extended NAL unit header. This paper first gives an overview of the history of scalable video coding, and then reviews the video coding layer (VCL) and NAL of the latest SVC draft specification. Finally, it discusses different aspects of the draft SVC RTP payload format, in- cluding the design criteria, use cases, signaling and payload structure.
基金supported by the National Natural Science Foundation of China(No.61802423)the Natural Science Foundation of Hunan Province,China(No.2019JJ50739)。
文摘Image mosaicking is widely used in Geographic Information Systems(GISs)for largescale ground surface analysis.However,most existing mosaicking methods can only be used in offline processing due to the enormous amounts of computation.In this paper,we propose a novel and practical algorithm for real-time infrared video mosaicking.To achieve this,a novel fast template matching algorithm based on Sum of Cosine Differences(SCD)is proposed to coarsely match the sequential images.The high speed of the proposed template matching algorithm is obtained by computing correlation with Fast Fourier Transform(FFT).We also propose a novel fast Least Squares Matching(LSM)algorithm for inter-frame fine registration,which can significantly reduce the computation without degrading the matching accuracy.In addition,the proposed fast LSM can effectively adapt for noise degradation and geometric distortion.Based on the proposed fast template matching algorithm and fine registration algorithm,we develop a practical real-time mosaicking approach which can produce seamless mosaic image highly efficiently.Experiments on synthetic and real-world datasets demonstrate that the proposed algorithm is not just computationally efficient but also robust against various noise distortions.
文摘360 video streaming services over the network are becoming popular. In particular, it is easy to experience 360 video through the already popular smartphone. However, due to the nature of 360 video, it is difficult to provide stable streaming service in general network environment because the size of data to send is larger than that of conventional video. Also, the real user's viewing area is very small compared to the sending amount. In this paper, we propose a system that can provide high quality 360 video streaming services to the users more efficiently in the cloud. In particular, we propose a streaming system focused on using a head mount display (HMD).
文摘This paper presents a full view video navigation platform and the approach to buildup such a platform.Unlike the traditional full view navigation system,which always bases on image panorama,our system is based on full view video created by video mosaics.Neither the current concept of video mosaics,which means posting together video frames to get a wide view,while these frames are from the same video.Video mosaics in this paper mean stitching different videos together to get a full view video.Videos,together recording 360° of the scenery,are captured through a certain route by digital cameras arranged in a circle around the view point.Then break up these videos into frame sequence.Frame coming from different videos with the same timestamp are stitched into a panoramic image.Consider these panoramic images as frames;montage these panoramic frames into a full view video,by sequence of time.While displaying the full view video people could decide to go forward,or stop to see around,or proceed in a certain direction.While talking about full view navigation,we are mostly talking about image-based cylindrical or spherical panorama [2].One way to create such a panorama is,after a series of translation,stitching together a group of images taken by panning camera [1].Each of these images presents part of the scene and with certain degrees of overlap to adjacent ones.After stitching,we get a panorama image with width is much bigger than the raw images,map it onto a cylinder or sphere surface is a better way to show it.The mapped one is called cylindrical panorama or spherical panorama.While navigating,taking sphere panorama as an example,people were supposed to standing at the center of the sphere,by rotating the sphere,different view coming into eyes.This kind of navigation makes people feel as the same as standing in the real world.But one limitation is that the step is stuck to one point,could see around but not move forward or back(there is moving forward and back by zooming in and zooming out respectively,but it's not truly move).In our work,videos are used as the stitching source instead of images.Video,or could say a sequence of images taken at continuous view point(actually it is the definition of 'video',and is the saying we are going to take advantage of in this paper),are captured by a camera circle,and are stitched into a full view video.Frames coming from different video with the same timestamp are stitched into a panorama frame.Transform the processed panorama frames into a video.
文摘Automated live video stream analytics has been extensively researched in recent times.Most of the traditional methods for video anomaly detection is supervised and use a single classifier to identify an anomaly in a frame.We propose a 3-stage ensemble-based unsupervised deep reinforcement algorithm with an underlying Long Short Term Memory(LSTM)based Recurrent Neural Network(RNN).In the first stage,an ensemble of LSTM-RNNs are deployed to generate the anomaly score.The second stage uses the least square method for optimal anomaly score generation.The third stage adopts award-based reinforcement learning to update the model.The proposed Hybrid Ensemble RR Model was tested on standard pedestrian datasets UCSDPed1,USDPed2.The data set has 70 videos in UCSD Ped1 and 28 videos in UCSD Ped2 with a total of 18560 frames.Since a real-time stream has strict memory constraints and storage issues,a simple computing machine does not suffice in performing analytics with stream data.Hence the proposed research is designed to work on a GPU(Graphics Processing Unit),TPU(Tensor Processing Unit)supported framework.As shown in the experimental results section,recorded observations on framelevel EER(Equal Error Rate)and AUC(Area Under Curve)showed a 9%reduction in EER in UCSD Ped1,a 13%reduction in ERR in UCSD Ped2 and a 4%improvement in accuracy in both datasets.
基金Research presented in this paper was funded by the National Key Research and Development Program of China[grant numbers 2016YFB0501503 and 2016YFB0501502]Hainan Provincial Department of Science and Technology[grant number ZDKJ2016021].
文摘An Augmented virtual environment(AVE)is concerned with the fusion of real-time video with 3D models or scenes so as to augment the virtual environment.In this paper,a new approach to establish an AVE with a wide field of view is proposed,including real-time video projection,multiple video texture fusion and 3D visualization of moving objects.A new diagonally weighted algorithm is proposed to smooth the apparent gaps within the overlapping area between the two adjacent videos.A visualization method for the location and trajectory of a moving virtual object is proposed to display the moving object and its trajectory in the 3D virtual environment.The experimental results showed that the proposed set of algorithms are able to fuse multiple real-time videos with 3D models efficiently,and the experiment runs a 3D scene containing two million triangles and six real-time videos at around 55 frames per second on a laptop with 1GB of graphics card memory.In addition,a realistic AVE with a wide field of view was created based on the Digital Earth Science Platform by fusing three videos with a complex indoor virtual scene,visualizing a moving object and drawing its trajectory in the real time.
基金supported by the National Basic Research Program of China (2007CB310705)the National Natural Science Foundation of China (60772024, 60711140087)+4 种基金the Hi-Tech Research and Development Program of China (2007AA01Z255)the NCET (06-0090)the PCSIRT (IRT0609)the ISTCP (2006DFA11040)the 111 Project of China (B07005)
文摘Real-time variable bit rate(VBR) video is expected to take a significant portion of multimedia applications.However,plentiful challenges to VBR video service provision have been raised for its characteristic of high traffic abruptness.To support multi-user real-time VBR video transmission with high bandwidth utilization and satisfied quality of service(QoS),this article proposes a practical dynamic bandwidth management scheme.This scheme forecasts future media rate of VBR video by employing time-domain adaptive linear predictor and using media delivery index(MDI) as both QoS measurement and complementary management reference.In addition,to support multi-user application,an adjustment priorities classified strategy is also put forward.Finally,a test-bed based on this management scheme is established.The experimental results demonstrate that the scheme proposed in this article is efficient with bandwidth utilization increased by 20%-60% compared to a fixed service rate and QoS guaranteed.
基金supported by the Fundao para a Ciência e a Tecnologia from Portugal under projects PEst-OE/EEI/LA0008/2013 and TURBO-PTDC/EEATEL/104358/2008by the European FIVER-FP7-ICT-2009-4-249142 project
文摘Real-time video streaming using ultra-wideband(UWB) technology is experimentally demonstrated along long-reach passive optical networks(LR-PONs) with different wired and wireless reaches. Experimental tests using external and direct modulation with UWB wireless radiation in the 10- and 60-GHz bands are performed. An ultra-bendable fiber is also considered for a last-mile distribution. The video quality at the output of the optical fiber infrastructure of the LR-PON is assessed using the error vector magnitude(EVM), and the link quality indicator(LQI) is used as a figure of merit after wireless radiation. An EVM below –17 dB is achieved for both externally and directly modulated LR-PONs comprising up to 125 km of optical fiber. EVM improvement is observed for longer LR-PONs when directly modulated lasers(DMLs) are used because of the amplitude gain provided by the combined effect of dispersion and DML's chirp. Compared with optical back-to-back operation, the LQI level degrades to the maximum around 20% for LR-PONs ranging between 75 and 125 km of fiber reach and with a wireless coverage of 2 m in the 10-GHz UWB band. The same level of LQI degradation is observed using the 60-GHz UWB band with a LR-PON integrating 101 km of access network, a last-mile distribution using ultra-bendable fiber, and a 5.2-m wireless link.