Video-and-Language (VidL) models and their cognitive relevance

Anne Zonneveld; Albert Gatt; Iacer Calixto

Video-and-Language (VidL) models and their cognitive relevance

Anne Zonneveld, Albert Gatt, Iacer Calixto

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review

Abstract

In this paper we give a narrative review of multi-modal video-language (VidL) models. We introduce the current landscape of VidL models and benchmarks, and draw inspiration from neuroscience and cognitive science to propose avenues for future research in VidL models in particular and artificial intelligence (AI) in general. We argue that iterative feedback loops between AI, neuroscience, and cognitive science are essential to spur progress across these disciplines. We motivate why we focus specifically on VidL models and their benchmarks as a promising type of model to bring improvements in AI and categorise current VidL efforts across multiple 'cognitive relevance axioms'. Finally, we provide suggestions on how to effectively incorporate this interdisciplinary viewpoint into research on VidL models in particular and AI in general. In doing so, we hope to create awareness of the potential of VidL models to narrow the gap between neuroscience, cognitive science, and AI.

Original language	English
Title of host publication	Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops
Pages	325-338
Number of pages	8
Publication status	Published - 2023

Access to Document

https://openaccess.thecvf.com/content/ICCV2023W/MMFM/html/Zonneveld_Video-and-Language_VidL_models_and_their_cognitive_relevance_ICCVW_2023_paper.htmlLicence: Unspecified

Cite this

@inproceedings{54ff29366e6b4454932c8b4c6e098c9c,

title = "Video-and-Language (VidL) models and their cognitive relevance",

abstract = "In this paper we give a narrative review of multi-modal video-language (VidL) models. We introduce the current landscape of VidL models and benchmarks, and draw inspiration from neuroscience and cognitive science to propose avenues for future research in VidL models in particular and artificial intelligence (AI) in general. We argue that iterative feedback loops between AI, neuroscience, and cognitive science are essential to spur progress across these disciplines. We motivate why we focus specifically on VidL models and their benchmarks as a promising type of model to bring improvements in AI and categorise current VidL efforts across multiple 'cognitive relevance axioms'. Finally, we provide suggestions on how to effectively incorporate this interdisciplinary viewpoint into research on VidL models in particular and AI in general. In doing so, we hope to create awareness of the potential of VidL models to narrow the gap between neuroscience, cognitive science, and AI.",

author = "Anne Zonneveld and Albert Gatt and Iacer Calixto",

year = "2023",

language = "English",

pages = "325--338",

booktitle = "Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops",

}

TY - GEN

T1 - Video-and-Language (VidL) models and their cognitive relevance

AU - Zonneveld, Anne

AU - Gatt, Albert

AU - Calixto, Iacer

PY - 2023

Y1 - 2023

N2 - In this paper we give a narrative review of multi-modal video-language (VidL) models. We introduce the current landscape of VidL models and benchmarks, and draw inspiration from neuroscience and cognitive science to propose avenues for future research in VidL models in particular and artificial intelligence (AI) in general. We argue that iterative feedback loops between AI, neuroscience, and cognitive science are essential to spur progress across these disciplines. We motivate why we focus specifically on VidL models and their benchmarks as a promising type of model to bring improvements in AI and categorise current VidL efforts across multiple 'cognitive relevance axioms'. Finally, we provide suggestions on how to effectively incorporate this interdisciplinary viewpoint into research on VidL models in particular and AI in general. In doing so, we hope to create awareness of the potential of VidL models to narrow the gap between neuroscience, cognitive science, and AI.

AB - In this paper we give a narrative review of multi-modal video-language (VidL) models. We introduce the current landscape of VidL models and benchmarks, and draw inspiration from neuroscience and cognitive science to propose avenues for future research in VidL models in particular and artificial intelligence (AI) in general. We argue that iterative feedback loops between AI, neuroscience, and cognitive science are essential to spur progress across these disciplines. We motivate why we focus specifically on VidL models and their benchmarks as a promising type of model to bring improvements in AI and categorise current VidL efforts across multiple 'cognitive relevance axioms'. Finally, we provide suggestions on how to effectively incorporate this interdisciplinary viewpoint into research on VidL models in particular and AI in general. In doing so, we hope to create awareness of the potential of VidL models to narrow the gap between neuroscience, cognitive science, and AI.

M3 - Conference contribution

SP - 325

EP - 338

BT - Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops

ER -