What is it about?

A solution to improve the performance in the case of limited data usage to train 3D CNNs for human action recognition. We generate four different channels of information by optical flows in horizontal and vertical directions, and gradients in horizontal and vertical directions from each frame to apply to 3D CNNs. Then we propose our 3D CNNs model in three architectures, which are single-stream, two-stream, and four-stream CNNs. In single-stream model, we apply four channels of information from each frame to a single stream. In two-stream architecture, we apply optical flow-x and optical flow-y into one stream and gradient-x and gradient-y to another stream. In four-stream architecture, we apply each one of information channels to four separate streams

Featured Image

Why is it important?

Training a 3D CNN structure in the presence of limited training data for human action recognition is a challenging problem. Convolutional Neural Networks require a large number of samples as they can abstract the efficient features automatically. Although CNN can adapt itself with a large number of training samples; however the large number of sample as a requirement is a limitation; especially when there is limited available training data. In addition, In practice, the size of training set is very limited in video action recognition applications, because providing and labeling a large number of samples is very hard and time consuming. To overcome the limitation of training data, we propose to help CNN by applying pre-extracted features to CNN. The usage of several streams makes the number of parameters in each CNN small and this is the main philosophy which supports the learning capability in our multi-stream CNN.

Perspectives

I hope this article make us to going deep in Artificial intelligence and help the others to develop in this area because the artificial intelligence can help people to live better and easier. I hope you find this article thought-provoking.

vahid ashkani chenarlogh
Islamic Azad University

Read the Original

This page is a summary of: A Multi-Stream 3D CNN Structure for Human Action Recognition Trained by Limited Data, IET Computer Vision, November 2018, the Institution of Engineering and Technology (the IET),
DOI: 10.1049/iet-cvi.2018.5088.
You can read the full text:

Read

Contributors

The following have contributed to this page