What is it about?

Action progress is estimated from videos with deep learning. A convolutional backbone captures appearance and the temporal evolution of the action is modeled with LSTM layers. Multiple actors can be recognized and localized at once and their progress is estimated online. We model fine grained action phases introducing a parallelism with telic and atelic actions from the natural language literature.

Featured Image

Why is it important?

Estimating action progress is important for interaction applications, where intelligent observers, such as robots, must promptly react to what is observed an interact with the actor or the environment. This is the first attempt at modeling action progress in literature.

Read the Original

This page is a summary of: Am I Done? Predicting Action Progress in Videos, ACM Transactions on Multimedia Computing Communications and Applications, November 2020, ACM (Association for Computing Machinery),
DOI: 10.1145/3402447.
You can read the full text:




The following have contributed to this page