A long-term goal of AI research is to build intelligent agents that can see the rich visual environment around us, communicate this understanding in natural language to humans and other agents, and act in a physical or embodied environment. To this end, recent advances at the nexus of Computer Vision and Natural Language Processing have made tremendous progress -- from generating natural language descriptions of images/videos, to answering questions about them, and to holding free-form conversations about visual content.
Most recently, Embodied AI, where embodied agents are trained to perform various tasks in egocentric perception, has attracted a surge of interest within computer vision, natural language processing and robotics communities. Vision-Language Navigation (VLN) is one fundamental topic in Embodied AI that was proposed by Anderson and Wu et al..
In this tutorial, we will not only cover the latest approaches and principles at the frontier of vision-and-language research, but also present a comprehensive overview of the field of VLN.
The tutorial will be a full-day event (9:00 am to 5:00pm) with several middle breaks.
Our program is divided into two sub-sessions: (1) Vision-and-Language Pre-training and (2) Vision-and-Language Navigation. Recording of panel discussion will be available after the tutorial.
Prerecorded Sessions | ||
4min | Opening Remarks [Video] | Jingjing Liu and Xiaodong He |
50min | Representations and Training Strategies for VLP [Video] [Slides] | Zhe Gan |
40min | Robustness, Efficiency and Extensions for VLP [Video] [Slides] | Linjie Li |
40min | Video-and-Language Pre-training [Video] [Slides] | Luowei Zhou |
42min | Introduction to VLN [Video] [Slides] | Qi Wu |
55min | Generalizable VLN Methods [Video] [Slides] | Xin Eric Wang |
58min | Forward to Realistic VLN [Video] [ Slides] | Yoav Artzi and Peter Anderson |
15min | VLN Summary [Video] [ Slides] | Qi Wu |
Live Session | ||
16:00-17:00 | Panel Discussion LIVE on Zoom [Video] | All speakers |
Contact the Organizing Committee: vqa2vln.tutorial@gmail.com