Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
publications
Two-Stream Transformer Architecture for Long Video Understanding
Published in British Machine Vision Conference (BMVC) 2022, 2022
This paper introduces STAN, an efficient two-stream Spatio-Temporal Attention Network that effectively models long videos for classification tasks on a single GPU.
Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2022). "Two-Stream Transformer Architecture for Long Video Understanding." Proceedings of the British Machine Vision Conference (BMVC).
Download Paper
A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization
Published in Interspeech 2023, 2023
This paper introduces myQASR, a novel method for personalizing and compressing large ASR models for on-device deployment without fine-tuning, improving performance for specific users and languages.
Recommended citation: Fish, E., Michieli, U., & Ozay, M. (2023). "A model for every user and budget: Label-free and personalized mixed-precision quantization." Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech).
Download Paper
Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization
Published in NeurIPS Workshop - Machine Learning for Audio (ML4Audio) 2023, 2023
This paper introduces MRAV-FF, a versatile method that enhances Temporal Action Localization by effectively fusing audio-visual data across multiple resolutions using a novel gated cross-attention mechanism.
Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2023). "Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization." NeurIPS Workshop on Machine Learning for Audio.
Download Paper
Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation
Published in arXiv Preprint (Under Review), 2025
This paper introduces Geo-Sign, a method that uses hyperbolic geometry to create more discriminative skeletal representations for Sign Language Translation, improving on SOTA methods while enhancing privacy and efficiency.
Recommended citation: Fish, E., & Bowden, R. (2025). "Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation." arXiv preprint arXiv:2506.00129.
Download Paper
SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work
Published in CVPR Workshop - Sign Language Recognition, Translation & Production (SLRTP) 2025, 2025
This paper presents the methodology and results of the first Sign Language Production Challenge, held at the SLRTP Workshop at CVPR 2025, establishing a new baseline for the field.
Recommended citation: Walsh, H., Fish, E., et al. (2025). "SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
Download Paper
VALLR: Visual ASR Language Model for Lip Reading
Published in International Conference on Computer Vision (ICCV) 2025, 2025
This paper introduces a novel, data-efficient, two-stage framework for lip reading that first predicts phonemes from video and then uses a Large Language Model to reconstruct sentences, achieving state-of-the-art results.
Recommended citation: Thomas, M., Fish, E., & Bowden, R. (2025). "VALLR: Visual ASR Language Model for Lip Reading." Proceedings of the International Conference on Computer Vision (ICCV).
Download Paper
PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization
Published in ICCV Workshop - Closing the Loop Between Vision and Language (CLVL) 2025, 2025
This paper introduces PLOT-TAL, a framework that uses multi-prompt ensembles and Optimal Transport to achieve state-of-the-art results in few-shot temporal action localization by learning compositional sub-events.
Recommended citation: Fish, E., & Gilbert, A. (2025). "PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization." Proceedings of the ICCV Workshop on Closing the Loop Between Vision and Language (CLVL).
Download Paper