Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted vid...Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.展开更多
There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information...There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.展开更多
为了实现利用视频车辆检测器数据计算和预测路段行程时间,将排队长度数据应用到路段行程时间的计算中,采用改进粒子群的BP神经网络算法和时间序列分析对路段进行实证研究.将排队长度加入计算得到的决定系数为93.36%,比只有流量数据的BP...为了实现利用视频车辆检测器数据计算和预测路段行程时间,将排队长度数据应用到路段行程时间的计算中,采用改进粒子群的BP神经网络算法和时间序列分析对路段进行实证研究.将排队长度加入计算得到的决定系数为93.36%,比只有流量数据的BP神经网络算法改善了41.03%,比BPR(bureau of public roads)路阻函数算法改善了23.37%.利用实时的路段行程时间对后续行程时间预测通过时间序列分析得到相对误差为0.06,预测下个时段和下个周期的路段行程时间平均相对误差分别为0.14、0.15.结果表明排队长度对于路段行程时间的计算具有较高的准确性,可以用于城市道路交通时间的预测,并能有效为智能交通算法的其他指数计算提供思路,为改善交通状况提供决策支持.展开更多
基金This work is supported in part by the National Natural Science Foundation of China(Grant Number 61971078)which provided domain expertise and computational power that greatly assisted the activity+1 种基金This work was financially supported by Chongqing Municipal Education Commission Grants forMajor Science and Technology Project(KJZD-M202301901)the Science and Technology Research Project of Jiangxi Department of Education(GJJ2201049).
文摘Text perception is crucial for understanding the semantics of outdoor scenes,making it a key requirement for building intelligent systems for driver assistance or autonomous driving.Text information in car-mounted videos can assist drivers in making decisions.However,Car-mounted video text images pose challenges such as complex backgrounds,small fonts,and the need for real-time detection.We proposed a robust Car-mounted Video Text Detector(CVTD).It is a lightweight text detection model based on ResNet18 for feature extraction,capable of detecting text in arbitrary shapes.Our model efficiently extracted global text positions through the Coordinate Attention Threshold Activation(CATA)and enhanced the representation capability through stacking two Feature Pyramid Enhancement Fusion Modules(FPEFM),strengthening feature representation,and integrating text local features and global position information,reinforcing the representation capability of the CVTD model.The enhanced feature maps,when acted upon by Text Activation Maps(TAM),effectively distinguished text foreground from non-text regions.Additionally,we collected and annotated a dataset containing 2200 images of Car-mounted Video Text(CVT)under various road conditions for training and evaluating our model’s performance.We further tested our model on four other challenging public natural scene text detection benchmark datasets,demonstrating its strong generalization ability and real-time detection speed.This model holds potential for practical applications in real-world scenarios.
文摘There is a tremendous growth of digital data due to the stunning progress of digital devices which facilitates capturing them. Digital data include image, text, and video. Video represents a rich source of information. Thus, there is an urgent need to retrieve, organize, and automate videos. Video retrieval is a vital process in multimedia applications such as video search engines, digital museums, and video-on-demand broadcasting. In this paper, the different approaches of video retrieval are outlined and briefly categorized. Moreover, the different methods that bridge the semantic gap in video retrieval are discussed in more details.
文摘为了实现利用视频车辆检测器数据计算和预测路段行程时间,将排队长度数据应用到路段行程时间的计算中,采用改进粒子群的BP神经网络算法和时间序列分析对路段进行实证研究.将排队长度加入计算得到的决定系数为93.36%,比只有流量数据的BP神经网络算法改善了41.03%,比BPR(bureau of public roads)路阻函数算法改善了23.37%.利用实时的路段行程时间对后续行程时间预测通过时间序列分析得到相对误差为0.06,预测下个时段和下个周期的路段行程时间平均相对误差分别为0.14、0.15.结果表明排队长度对于路段行程时间的计算具有较高的准确性,可以用于城市道路交通时间的预测,并能有效为智能交通算法的其他指数计算提供思路,为改善交通状况提供决策支持.