Sitemap

A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.

Posts

Future Blog Post

less than 1 minute read

Published: January 01, 2199

This post will show up by default. To disable scheduling of future posts, edit config.yml and set future: false.

Blog Post number 4

less than 1 minute read

Published: August 14, 2015

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 3

less than 1 minute read

Published: August 14, 2014

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 2

less than 1 minute read

Published: August 14, 2013

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

Blog Post number 1

less than 1 minute read

Published: August 14, 2012

This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.

publications

Two-Stream Transformer Architecture for Long Video Understanding

Published in British Machine Vision Conference (BMVC) 2022, 2022

This paper introduces STAN, an efficient two-stream Spatio-Temporal Attention Network that effectively models long videos for classification tasks on a single GPU.

Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2022). "Two-Stream Transformer Architecture for Long Video Understanding." Proceedings of the British Machine Vision Conference (BMVC).
Download Paper

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Published in Interspeech 2023, 2023

This paper introduces myQASR, a novel method for personalizing and compressing large ASR models for on-device deployment without fine-tuning, improving performance for specific users and languages.

Recommended citation: Fish, E., Michieli, U., & Ozay, M. (2023). "A model for every user and budget: Label-free and personalized mixed-precision quantization." Proceedings of the Annual Conference of the International Speech Communication Association (Interspeech).
Download Paper

Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization

Published in NeurIPS Workshop - Machine Learning for Audio (ML4Audio) 2023, 2023

This paper introduces MRAV-FF, a versatile method that enhances Temporal Action Localization by effectively fusing audio-visual data across multiple resolutions using a novel gated cross-attention mechanism.

Recommended citation: Fish, E., Weinbren, J., & Gilbert, A. (2023). "Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization." NeurIPS Workshop on Machine Learning for Audio.
Download Paper

Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation

Published in NeurIPS 2025, 2025

This paper introduces Geo-Sign, a method that uses hyperbolic geometry to create more discriminative skeletal representations for Sign Language Translation, improving on SOTA methods while enhancing privacy and efficiency.

Recommended citation: Fish, E., & Bowden, R. (2025). "Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation." arXiv preprint arXiv:2506.00129.
Download Paper

SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work

Published in CVPR Workshop - Sign Language Recognition, Translation & Production (SLRTP) 2025, 2025

This paper presents the methodology and results of the first Sign Language Production Challenge, held at the SLRTP Workshop at CVPR 2025, establishing a new baseline for the field.

Recommended citation: Walsh, H., Fish, E., et al. (2025). "SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work." Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops.
Download Paper

VALLR: Visual ASR Language Model for Lip Reading

Published in International Conference on Computer Vision (ICCV) 2025, 2025

This paper introduces a novel, data-efficient, two-stage framework for lip reading that first predicts phonemes from video and then uses a Large Language Model to reconstruct sentences, achieving state-of-the-art results.

Recommended citation: Thomas, M., Fish, E., & Bowden, R. (2025). "VALLR: Visual ASR Language Model for Lip Reading." Proceedings of the International Conference on Computer Vision (ICCV).
Download Paper

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization

Published in ICCV Workshop - Closing the Loop Between Vision and Language (CLVL) 2025, 2025

This paper introduces PLOT-TAL, a framework that uses multi-prompt ensembles and Optimal Transport to achieve state-of-the-art results in few-shot temporal action localization by learning compositional sub-events.

Recommended citation: Fish, E., & Gilbert, A. (2025). "PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization." Proceedings of the ICCV Workshop on Closing the Loop Between Vision and Language (CLVL).
Download Paper

Edward Fish

Sitemap

Pages

Page Not Found

Edward Fish, PhD

Archive Layout with Content

Posts by Category

Posts by Collection

CV

CV

Markdown

Page not in menu

Page Archive

Portfolio

Publications

Sitemap

Posts by Tags

Talk map

Talks and presentations

Teaching

Terms and Privacy Policy

Blog posts

Jupyter notebook markdown generator

Posts

Future Blog Post

Blog Post number 4

Blog Post number 3

Blog Post number 2

Blog Post number 1

publications

Two-Stream Transformer Architecture for Long Video Understanding

A Model for Every User and Budget: Label-Free and Personalized Mixed-Precision Quantization

Multi-Resolution Audio-Visual Feature Fusion for Temporal Action Localization

Geo-Sign: Hyperbolic Contrastive Regularisation for Geometrically Aware Sign Language Translation

SLRTP2025 Sign Language Production Challenge: Methodology, Results and Future Work

VALLR: Visual ASR Language Model for Lip Reading

PLOT-TAL: Prompt Learning with Optimal Transport for Few-Shot Temporal Action Localization