The number of films is numerous and the film contents are complex over the Internet and multimedia sources. It is time consuming for a viewer to select a favorite film. This paper presents an automatic recognition sys...The number of films is numerous and the film contents are complex over the Internet and multimedia sources. It is time consuming for a viewer to select a favorite film. This paper presents an automatic recognition system of film types. Initially, a film is firstly sampled as frame sequences. The color space, including hue, saturation,and brightness value(HSV), is analyzed for each sampled frame by computing the deviation and mean of HSV for each film. These features are utilized as inputs to a deep-learning neural network(DNN) for the recognition of film types. One hundred films are utilized to train and validate the model parameters of DNN. In the testing phase, a film is recognized as one of the five categories, including action, comedy, horror thriller, romance, and science fiction, by the trained DNN. The experimental results reveal that the film types can be effectively recognized by the proposed approach, enabling the viewer to select an interesting film accurately and quickly.展开更多
Action recognition is important for understanding the human behaviors in the video,and the video representation is the basis for action recognition.This paper provides a new video representation based on convolution n...Action recognition is important for understanding the human behaviors in the video,and the video representation is the basis for action recognition.This paper provides a new video representation based on convolution neural networks(CNN).For capturing human motion information in one CNN,we take both the optical flow maps and gray images as input,and combine multiple convolutional features by max pooling across frames.In another CNN,we input single color frame to capture context information.Finally,we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine.The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element.On the most challenging data sets HMDB51 and UCF101,this video representation obtains competitive performance.展开更多
基金supported by MOST under Grant No.MOST 104-2221-E-468-007。
文摘The number of films is numerous and the film contents are complex over the Internet and multimedia sources. It is time consuming for a viewer to select a favorite film. This paper presents an automatic recognition system of film types. Initially, a film is firstly sampled as frame sequences. The color space, including hue, saturation,and brightness value(HSV), is analyzed for each sampled frame by computing the deviation and mean of HSV for each film. These features are utilized as inputs to a deep-learning neural network(DNN) for the recognition of film types. One hundred films are utilized to train and validate the model parameters of DNN. In the testing phase, a film is recognized as one of the five categories, including action, comedy, horror thriller, romance, and science fiction, by the trained DNN. The experimental results reveal that the film types can be effectively recognized by the proposed approach, enabling the viewer to select an interesting film accurately and quickly.
基金Supported by the National High Technology Research and Development Program of China(863 Program,2015AA016306)National Nature Science Foundation of China(61231015)+2 种基金Internet of Things Development Funding Project of Ministry of Industry in 2013(25)Technology Research Program of Ministry of Public Security(2016JSYJA12)the Nature Science Foundation of Hubei Province(2014CFB712)
文摘Action recognition is important for understanding the human behaviors in the video,and the video representation is the basis for action recognition.This paper provides a new video representation based on convolution neural networks(CNN).For capturing human motion information in one CNN,we take both the optical flow maps and gray images as input,and combine multiple convolutional features by max pooling across frames.In another CNN,we input single color frame to capture context information.Finally,we take the top full connected layer vectors as video representation and train the classifiers by linear support vector machine.The experimental results show that the representation which integrates the optical flow maps and gray images obtains more discriminative properties than those which depend on only one element.On the most challenging data sets HMDB51 and UCF101,this video representation obtains competitive performance.