Posts by Collection

publications

Two-Stream Transformer Architecture for Long Video Understanding

Published in British Machine Vision Conference (BMVC) 2022, 2022

This paper introduces STAN, an efficient two-stream Spatio-Temporal Attention Network that effectively models long videos for classification tasks on a single GPU.

Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2022). "Two-Stream Transformer Architecture for Long Video Understanding." Proceedings of the British Machine Vision Conference (BMVC).
Download Paper

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Published in Interspeech 2023, 2023

This paper introduces myQASR, a novel method for personalizing and compressing large ASR models for on-device deployment without fine-tuning, improving performance for specific users and languages.

Recommended citation: Fish, E., Michieli, U., & Ozay, M. (2023). "A model for every user and budget: Label-free and personalized mixed-precision quantization." Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech).
Download Paper

Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization

Published in NeurIPS Workshop - Machine Learning for Audio (ML4Audio) 2023, 2023

This paper introduces MRAV-FF, a versatile method that enhances Temporal Action Localization by effectively fusing audio-visual data across multiple resolutions using a novel gated cross-attention mechanism.

Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2023). "Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization." NeurIPS Workshop on Machine Learning for Audio.
Download Paper

Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation

Published in arXiv Preprint (Under Review), 2025

This paper introduces Geo-Sign, a method that uses hyperbolic geometry to create more discriminative skeletal representations for Sign Language Translation, improving on SOTA methods while enhancing privacy and efficiency.

Recommended citation: Fish, E., & Bowden, R. (2025). "Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation." arXiv preprint arXiv:2506.00129.
Download Paper

SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work

Published in CVPR Workshop - Sign Language Recognition, Translation & Production (SLRTP) 2025, 2025

This paper presents the methodology and results of the first Sign Language Production Challenge, held at the SLRTP Workshop at CVPR 2025, establishing a new baseline for the field.

Recommended citation: Walsh, H., Fish, E., et al. (2025). "SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
Download Paper

VALLR: Visual ASR Language Model for Lip Reading

Published in International Conference on Computer Vision (ICCV) 2025, 2025

This paper introduces a novel, data-efficient, two-stage framework for lip reading that first predicts phonemes from video and then uses a Large Language Model to reconstruct sentences, achieving state-of-the-art results.

Recommended citation: Thomas, M., Fish, E., & Bowden, R. (2025). "VALLR: Visual ASR Language Model for Lip Reading." Proceedings of the International Conference on Computer Vision (ICCV).
Download Paper

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Published in ICCV Workshop - Closing the Loop Between Vision and Language (CLVL) 2025, 2025

This paper introduces PLOT-TAL, a framework that uses multi-prompt ensembles and Optimal Transport to achieve state-of-the-art results in few-shot temporal action localization by learning compositional sub-events.

Recommended citation: Fish, E., & Gilbert, A. (2025). "PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization." Proceedings of the ICCV Workshop on Closing the Loop Between Vision and Language (CLVL).
Download Paper