Two-Stream Transformer Architecture for Long Video Understanding
Published in British Machine Vision Conference (BMVC) 2022, 2022
This paper introduces STAN, an efficient two-stream Spatio-Temporal Attention Network that effectively models long videos for classification tasks on a single GPU.
Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2022). "Two-Stream Transformer Architecture for Long Video Understanding." Proceedings of the British Machine Vision Conference (BMVC).
Download Paper