Tutorials

1. Vision-Language Models for Multimedia Applications: From Foundations to State-of-the-Art
Website: https://csyanbin.github.io/MMAsian_tutorial.html

Vision-Language Models (VLMs) are revolutionizing the multimedia landscape by seamlessly integrating visual and textual data for a wide range of applications, such as image captioning, Visual Question Answering (VQA), and multimodal retrieval. This tutorial will explore both foundational and state-of-the-art VLMs, providing attendees with a deep understanding of how these models function and how they can be applied effectively.

Participants will explore the evolution of VLMs from classical architectures like CNNs and RNNs to cutting-edge transformer-based models such as CLIP, BLIP, and more. The tutorial will also focus on key challenges such as scaling these models, optimizing their performance, and improving their interpretability for real-world multimedia applications.


Contact person:
Yanbin Liu (yanbin.liu@aut.ac.nz)
Speaker:
Yanbin Liu



2. Understanding Australian Sign Language
Website: https://uq-cvlab.github.io/MultiMediaAsia2024-Tutorial-Auslan/

This tutorial provides a comprehensive overview of Australian Sign Language (Auslan), examining its unique linguistic properties and current multimedia applications. Through a combination of theoretical lectures and hands-on sessions, participants will learn about the latest advances in sign language recognition, translation, and generation technologies. The tutorial addresses the societal impact of these innovations, focusing on the ways multimedia solutions can improve accessibility for the deaf community. Attendees will also gain practical experience with Auslan interpretation tools, exploring how advancements in machine learning and computer vision contribute to more inclusive communication technologies, particularly in education and public service sectors.


Contact person:
Heming Du (heming.du@uq.edu.au)
Speakers:
Xin Yu
Peike Li
Heming Du
Xin Shen