Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balan...Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balance between network parameters and training data.It makes the information provided by a small amount of picture data insufficient to optimize model parameters,resulting in unsatisfactory detection results.To improve the accuracy of few shot object detection,this paper proposes a network based on the transformer and high-resolution feature extraction(THR).High-resolution feature extractionmaintains the resolution representation of the image.Channels and spatial attention are used to make the network focus on features that are more useful to the object.In addition,the recently popular transformer is used to fuse the features of the existing object.This compensates for the previous network failure by making full use of existing object features.Experiments on the Pascal VOC and MS-COCO datasets prove that the THR network has achieved better results than previous mainstream few shot object detection.展开更多
Shot boundary detection is the fundamental part in many real applications as video retrieval and so on. This paper tackles the problem of video segment obtaining in complex movie videos. Firstly, intermediate descript...Shot boundary detection is the fundamental part in many real applications as video retrieval and so on. This paper tackles the problem of video segment obtaining in complex movie videos. Firstly, intermediate descriptor is proposed to depict the variation of both abrupt and gradual change in shot boundaries, which is formed by distance vector on Local Binary Pattern(LBP), GIST(GIST) or their fusion. Instead of just using the adjacent frames distance, intermediate descriptor keeps the distances between current frame and consecutive frames. It comprehensively characterizes local temporal structure, which is especially important for gradual change. For the excellent ability for feature fusion in random forests, it is adopted here to verify the fusion effect of intermediate descriptor on LBP and GIST. The whole experiments are designed on the subset of TRECVid 2013 INS(INstance Search) task to verify the effectiveness of proposed intermediate descriptor and the fusion ability for random forest. Compared with static and adaptive thresholds approaches, the best performance can be achieved by post-fusion of intermediate descriptor on LBP and GIST.展开更多
A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't...A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't accurately detected because it involves the camera operation and objectmovement.In this paper,a method based on support vector machine (SVM) is proposed to detect thedissolve shot boundary in MPEG compressed sequence.The problem of detection between the dissolveshot boundary and other boundaries is considered as two-class classification in our method.Featuresfrom the compressed sequences are directly extracted without decoding them,and the optimal classboundary between two classes are learned from training data by using SVM.Experiments,whichcompare various classification methods,show that using proposed method encourages performance ofvideo shot boundary detection.展开更多
Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract ...Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.展开更多
It is difficult to detect dissolve accurately in video segmentation. Two new parameters AEI and IDM are computed to describe dissolve. An improved method based on the change curves of AEI and IDM is proposed to detect...It is difficult to detect dissolve accurately in video segmentation. Two new parameters AEI and IDM are computed to describe dissolve. An improved method based on the change curves of AEI and IDM is proposed to detect dissolve accurately. The experiments show that this method can detect dissolve accurately.展开更多
基金the National Natural Science Foundation of China under grant 62172059 and 62072055Hunan Provincial Natural Science Foundations of China under Grant 2020JJ4626+2 种基金Scientific Research Fund of Hunan Provincial Education Department of China under Grant 19B004“Double First-class”International Cooperation and Development Scientific Research Project of Changsha University of Science and Technology under Grant 2018IC25the Young Teacher Growth Plan Project of Changsha University of Science and Technology under Grant 2019QJCZ076.
文摘Now object detection based on deep learning tries different strategies.It uses fewer data training networks to achieve the effect of large dataset training.However,the existing methods usually do not achieve the balance between network parameters and training data.It makes the information provided by a small amount of picture data insufficient to optimize model parameters,resulting in unsatisfactory detection results.To improve the accuracy of few shot object detection,this paper proposes a network based on the transformer and high-resolution feature extraction(THR).High-resolution feature extractionmaintains the resolution representation of the image.Channels and spatial attention are used to make the network focus on features that are more useful to the object.In addition,the recently popular transformer is used to fuse the features of the existing object.This compensates for the previous network failure by making full use of existing object features.Experiments on the Pascal VOC and MS-COCO datasets prove that the THR network has achieved better results than previous mainstream few shot object detection.
基金Supported by the Young Teacher Support Plan by Heilongjiang Province and Harbin Engineering University in China(No.1155G17)partially by the Fundamental Research Funds for the Central Universities Grant to X.Xiang
文摘Shot boundary detection is the fundamental part in many real applications as video retrieval and so on. This paper tackles the problem of video segment obtaining in complex movie videos. Firstly, intermediate descriptor is proposed to depict the variation of both abrupt and gradual change in shot boundaries, which is formed by distance vector on Local Binary Pattern(LBP), GIST(GIST) or their fusion. Instead of just using the adjacent frames distance, intermediate descriptor keeps the distances between current frame and consecutive frames. It comprehensively characterizes local temporal structure, which is especially important for gradual change. For the excellent ability for feature fusion in random forests, it is adopted here to verify the fusion effect of intermediate descriptor on LBP and GIST. The whole experiments are designed on the subset of TRECVid 2013 INS(INstance Search) task to verify the effectiveness of proposed intermediate descriptor and the fusion ability for random forest. Compared with static and adaptive thresholds approaches, the best performance can be achieved by post-fusion of intermediate descriptor on LBP and GIST.
文摘A number of automated video shot boundary detection methods for indexing a videosequence to facilitate browsing and retrieval have been proposed in recent years.Among these methods,the dissolve shot boundary isn't accurately detected because it involves the camera operation and objectmovement.In this paper,a method based on support vector machine (SVM) is proposed to detect thedissolve shot boundary in MPEG compressed sequence.The problem of detection between the dissolveshot boundary and other boundaries is considered as two-class classification in our method.Featuresfrom the compressed sequences are directly extracted without decoding them,and the optimal classboundary between two classes are learned from training data by using SVM.Experiments,whichcompare various classification methods,show that using proposed method encourages performance ofvideo shot boundary detection.
基金The authors would like to thank Research Supporting Project Number(RSP2024R444)King Saud University,Riyadh,Saudi Arabia.
文摘Due to the exponential growth of video data,aided by rapid advancements in multimedia technologies.It became difficult for the user to obtain information from a large video series.The process of providing an abstract of the entire video that includes the most representative frames is known as static video summarization.This method resulted in rapid exploration,indexing,and retrieval of massive video libraries.We propose a framework for static video summary based on a Binary Robust Invariant Scalable Keypoint(BRISK)and bisecting K-means clustering algorithm.The current method effectively recognizes relevant frames using BRISK by extracting keypoints and the descriptors from video sequences.The video frames’BRISK features are clustered using a bisecting K-means,and the keyframe is determined by selecting the frame that is most near the cluster center.Without applying any clustering parameters,the appropriate clusters number is determined using the silhouette coefficient.Experiments were carried out on a publicly available open video project(OVP)dataset that contained videos of different genres.The proposed method’s effectiveness is compared to existing methods using a variety of evaluation metrics,and the proposed method achieves a trade-off between computational cost and quality.
文摘It is difficult to detect dissolve accurately in video segmentation. Two new parameters AEI and IDM are computed to describe dissolve. An improved method based on the change curves of AEI and IDM is proposed to detect dissolve accurately. The experiments show that this method can detect dissolve accurately.