Tentative Program

Note: All sessions are in New Zealand time.
Tuesday 3 December - Workshops, Tutorials, PhD School
08:30-09:00 Registration
09:00-10:30 Workshop 1 – Workshop on Collaboration and Evolution of Foundation and Specialized Models
Venue: Room 405-470
Organiser: Shengyu Zhang
Workshop 2 – Workshop on Multimodal Foundation Models of Remote Sensing and Agriculture
Venue: Room 405-460
Organiser: Kun Hu
Tutorial 1 – Vision-Language Models for Multimedia Applications: From Foundations to State-of-the-Art
Venue: Room 401-401
Organiser: Yanbin Liu
10:30-10:45 Morning Tea
10:45-12:15 Workshop 1 – Workshop on Collaboration and Evolution of Foundation and Specialized Models
Venue: Room 405-470
Organiser: Shengyu Zhang
Workshop 2 – Workshop on Multimodal Foundation Models of Remote Sensing and Agriculture
Venue: Room 405-460
Organiser: Kun Hu
Tutorial 2 – Understanding Australian Sign Language
Venue: Room 401-401
Organiser: Heming Du
12:15-13:45 Lunch
13:45-15:15 Workshop 3 – Workshop of Multimodal, Multilingual, and Multitask Modeling Technologies for Oriental Languages
Venue: Room 405-470
Organiser: Sheng Li
Workshop 4 – Workshop on Multi-Biological Sensing Data for Language Deterioration Prediction
Venue: Room 405-460
Organiser: Yilin Pan
PhD School
Venue: Room 401-401
Chair: Kun Hu
15:15-15:30 Afternoon Tea
15:30-17:00 Workshop 3 – Workshop of Multimodal, Multilingual, and Multitask Modeling Technologies for Oriental Languages
Venue: Room 405-470
Organiser: Sheng Li
Workshop 4 – Workshop on Multi-Biological Sensing Data for Language Deterioration Prediction
Venue: Room 405-460
Organiser: Yilin Pan
PhD School
Venue: Room 401-401
Chair: Kun Hu
Wednesday 4 December - Main Conference - Day 1
08:00-09:00 Registration
09:00-09:15 Conference Opening
Venue: Room 401-401
Speaker: Ruili Wang and Jiaying
09:15-10:15 Keynote 1 – Prof Wenwu Zhu, Tsinghua University, China
Venue: Room 401-401
Chair: Ruili Wang
10:15-10:45 Morning Tea
10:45-12:15 Oral Session 1 – Highlighted Papers
Venue: Room 401-401
Chair: Wen-Huang Cheng
12:15-13:30 Lunch
13:30-15:00 Oral Session 2 – Deep Learning for Multimedia I
Venue: Room 405-470
Chair: Wei-Ta Chu
Oral Session 3 – Multimodal Analysis and Description I
Venue: Room 405-460
Chair: Xiao Wu
Oral Session 4 – Multimedia Systems
Venue: Room 401-401
Chair: Shengyu Zhang
15:00-15:30 Afternoon Tea
15:00-16:00 Poster Session 1
Venue: Level 4 Foyer
Chair: Kun Hu
16:00-17:30 Oral Session 5 – Multimodal Analysis and Description II
Venue: Room 405-470
Chair: Tong Qiao
Oral Session 6 – Multimedia Applications I
Venue: Room 405-460
Chair: Wong Lai Kuan
Oral Session 7 – Multimedia and Vision I
Venue: Room 401-401
Chair: Sheng Li
18:00-20:00 Welcome Reception
Venue: Level 4 Foyer
Thursday 5 December - Main Conference - Day 2
08:00-09:00 Registration
09:00-10:00 Keynote 2 – Prof Klara Nahrstedt, University of Illinois Urbana-Champaign, USA
Venue: Room 401-401
Chair: Jiaying Liu
10:00-10:30 Morning Tea
10:30-12:00 Oral Session 8 – Multimedia and Vision II
Venue: Room 405-470
Chair: Jiaying Liu
Oral Session 9 – Social Multimedia
Venue: Room 405-460
Oral Session 10 – Multimodal Analysis and Description III
Venue: Room 401-401
Chair: Laiyun Qing
12:00-13:30 Lunch
13:30-15:00 Oral Session 11 – Multimedia HCI and Quality of Experience
Venue: Room 405-470
Chair: Dongming Chen
Oral Session 12 – Emotional and Social Signals in Multimedia
Venue: Room 405-460
Chair: Weiqi Yan
Oral Session 13 – Deep Learning for Multimedia II
Venue: Room 401-401
Chair: Yanbin Liu
15:00-15:30 Afternoon Tea
15:00-16:00 Poster Session 2
Venue: Level 4 Foyer
Chair: Kun Hu
16:00-17:30 Oral Session 14 – Deep Learning for Multimedia III
Venue: Room 405-470
Chair: Abdulmotaleb El Saddik
Oral Session 15 – Multimedia and Vision III
Venue: Room 405-460
Chair: Reza Shahamiri
Oral Session 16 – Multimedia Applications II
Venue: Room 401-401
Chair: Wentong Wang
18:00-20:00 Conference Banquet/Dinner
Venue: Fale Pasifika
Chair: Ruili Wang, Jiaying Liu
Friday 6 December - Main Conference - Day 3
08:30-09:00 Registration
09:00-10:00 Keynote 3 – Prof Yonggang Wen, Nanyang Technological University (NTU), Singapore
Venue: Room 401-401
Chair: Zhiyong Wang
10:00-10:30 Morning Tea
10:30-12:00 Oral Session 17 – Music and Audio Processing in Multimedia
Venue: Room 405-470
Chair: Xiaotong Ji
Oral Session 18 – Short Papers
Venue: Room 405-460
Chair: Zihao Tang
Oral Session 19 – Deep Learning for Multimedia IV
Venue: Room 401-401
Chair: Kun Hu
12:00-13:30 Lunch
13:30-17:00 Guided Tour


Main Conference Day 1 – Wednesday 4 December

8:00-9:00

Registration

9:00-9:15

Conference Opening

9:15-10:15

Keynote 1 

Multimodal Generative AI in Dynamic and Open Environments

Professor Wenwu Zhu (Tsinghua University, China)

10:15-10:30

Morning Tea

10:30-12:00

Oral Session 1 – Highlighted Papers

10:30-10:45

TMM-CLIP: Task-guided Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning

Yuankang Pan (Southwest Jiaotong University ); Zhaoquan Yuan (Southwest Jiaotong University); Xiao Wu (Southwest Jiaotong University); Zechao Li (Nanjing University of Science and Technology); Changsheng Xu (CASIA)

10:45-11:00

SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization

Sicheng Liu (The University of Sydney); Lintao Wang (The University of Sydney); Xiaogang Zhu (The University of Adelaide); Xuequan Lu (La Trobe University); Zhiyong Wang (The University of Sydney); Kun Hu (The University of Sydney)

11:00-11:15

S2FB IoU: Improving Boundary-based Object-Centric Image Segmentation Quality Evaluation

Rim El Filali (Nanjing University of Science and Technology); Soufiane JDABA (Nanjing University of Science and Technology); Ronghui Xie (Nanjing University of Science and Technology); Ran Shi (Nanjing University of Science and Technology); Tong Qiao (Hangzhou Dianzi University); Pan Qiaodong (Shaoxing Public Security institute); Ting Wu (Hangzhou innovation Institute Beihang University)

11:15-11:30

Efficient Low-Dimensional Representation Via Manifold Learning-Based Model for Multimodal Sentiment Analysis

Xingang Wang (Dalian Maritime University); Mengyi Wang (Dalian Maritime University); Hai Cui (Dalian Maritime University); Yijia Zhang (Dalian Maritime University)

11:30-11:45

ViCo: Engaging Video Comment Generation with Human Preference Rewards

Yuchong Sun (Renmin University of China); Bei Liu (Microsoft Research); Xu Chen (Renmin University of China); Ruihua Song (Renmin University of China); Jianlong Fu (Microsoft Research)

11:45-12:00

Personalized Sentiment Estimation Based on Recall and Resting Ratio of Frontal EEG

Shun Katada (Osaka University); Kazunori Komatani (Osaka University)

12:00-13:30

Lunch

13:30-15:00

Oral Session 2 – Deep Learning for Multimedia

13:30-13:45

Accelerating Inference of Networks in the Frequency Domain

Chenqiu Zhao (University of Alberta); Guanfang Dong (University of Alberta); Anup Basu (University of Alberta)

13:45-14:00

CFRL: Coarse-Fine Decoupled Representation Learning For Long-Tailed Recognition

Yiran Song (Shanghai Jiao Tong University); Qianyu Zhou (Shanghai Jiao Tong University); Kun Hu (The University of Sydney); Lizhuang Ma (Shanghai Jiao Tong University); Xuequan Lu (La Trobe University)

14:00-14:15

Dual-Enhanced Disentangled Multi-View Clustering

Zhiqian Dong (Anhui University); Sheng Yang (Anhui University); Peng Zhou (Anhui University)

14:15-14:30

Where You See Is What You Know: A Visual-Semantic Conceptual Explainer

Luhao Zhu (Zhejiang University); Xiangwei Kong (Zhejiang University); Runsen Li (Zhejiang University); Guodong Guo (Eastern Institute of Technology)

14:30-14:45

A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding

Yingkai He (Wuhan university); Zhen Zhang (Wuhan University); Jing Xiao (Wuhan University)

14:45-15:00

Latent Variables Coding for Perceptual Image Compression

Yingkai He (Wuhan University); Zhen Zhang (Wuhan University); Liang Liao (Nanyang Technological University); Jing Xiao (Wuhan University)

13:30-15:00

Oral Session 3 – Multimodal Analysis and Description I

13:30-13:45

Multimodal Energy Prompting for Video Salient Object Detection

Tao Jiang (Massey University); Feng Hou (Massey University); Yi Wang (Dalian University of Technology)

13:45-14:00

CS-HOI: Human Object Interaction Detection Enhanced by Common Sense

CHENG-KANG TAN (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University)

14:00-14:15

Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions

Yixin Zhang (Kyoto University); Yoko Yamakata (University of Tokyo, Japan); Keishi Tajima (Kyoto University)

14:15-14:30

ScaMo: Towards Text to Video Storyboard Generation Using Scale and Movement of Shots

Xu Gu (Renmin University of China); Xihua Wang (Renmin University of China); Chuhao Jin (Renmin University of China); Ruihua Song (Renmin University of China)

14:30-14:45

TCFusion: A Three-branch Cross-domain Fusion Network for Infrared and Visible Images

Wenyu Shao (Dalian Maritime University); Hongbo Liu (Dalian Maritime University)

13:30-15:00

Oral Session 4 – Multimedia Systems

13:30-13:45

QoS-Diff: Adaptive Auto-tuning Framework for Low-latency Diffusion Model Inference

Pingyi Huo (The Pennsylvania State University); Ajay Narayanan Sridhar (The Pennsylvania State University); Md Fahim Faysal Khan (The Pennsylvania State University ); Kiwan Maeng (The Pennsylvania State University); Vijaykrishnan Narayanan (The Pennsylvania State University)

13:45-14:00

OpenVideoWalls: an Open-Source System for Building Video Walls with Recycling Heterogeneous Displays

Zichen Zhu (Rutgers University); Zhongze Tang (Rutgers University); Amir A Nassereldine (University at Buffalo); Jinjun Xiong (University at Buffalo); Sheng Wei (Rutgers University)

14:00-14:15

Advancing Multimodal LLMs: A Focus on Geometry Problem Solving, Reasoning, and Sequential Scoring

Raj Jaiswal (IIIT Delhi); Avinash Anand (IIIT Delhi); Rajiv Ratn Shah (IIIT Delhi)

14:15-14:30

Incorporating Pre-ordering Representations for Low-resource Neural Machine Translation

Yuan Gao (Massey University); Feng Hou (Massey University); Ruili Wang (Massey University)

14:30-14:45

ADP3D: Adaptive Point Selection for Efficient Multi-frame 3D Object Detection

Guohuan Gao (Beijing Institute of Technology); Gang Zhang (Tsinghua University); Xiangyang Xu (Beijing Institute of Technology)

14:45-15:00

Underwater Image Enhancement via Domain Adaptive Transfer Learning and Hybrid Reinforcement Model

Tingting Yao (Dalian Maritime University); Gao Yuan (Dalian Maritime University); Zihao Feng (Dalian Maritime University); Qing Hu (Dalian Maritime University); Zhiyong Wang (The University of Sydney)

15:00-15:15

Afternoon Tea 

15:15-16:00

Poster Session 1

HFS-HNeRV: High-Frequency Spectrum Hybrid Neural Representation for Videos

Jianhua Zhao (Auckland University of Technology); Xue Jun Li (Auckland University of Technology); Peter Han Joo Chong (Auckland University of Technology)

Following in the Footsteps: Predicting Human Trajectories Using Motion Pattern Memory

Yuxin Yang (Beijing University of Posts and Telecommunications); Pengfei Zhu (Beijing University of Posts and Telecommunications); Mengshi Qi (Beijing University of Posts and Telecommunications); Huadong Ma (Beijing University of Posts and Telecommunications)

On the Robustness of Deep Face Inpainting: An Adversarial Perspective

Wenhao Gao (Nanjing University of Science and Technology); Zhenbo Song (Nanjing University of Science and Technology); Zhenyuan Zhang (Nanjing University of Science and Technology); Jianfeng Lu (Nanjing University of Science and Technology)

Repetitive Action Counting with Feature Interaction Enhancement and Adaptive Gate Fusion

Jiazhen Zhang (Hefei University of Technology); Kun Li (Zhejiang University); Yanyan Wei (Hefei University of Technology); Fei Wang (Hefei University of Technology); Wei Qian (Hefei University of Technology); Jinxing Zhou (Hefei University of Technology); Dan Guo (Hefei University of Technology)

Focal Diffusion Process for Object-Aware 3D LiDAR Generation

Huijie Zhang (San Diego State University); Xiaobai Liu (San Diego State University)

Improving Sequential DeepFake Detection with Local Information Enhancement

Dong Long Yun (Harbin Institute of Technology); Yuanrong Xu (Harbin institute of technology); Jianping Zhong (Harbin Institute of Technology); Zhaobo Qi (Harbin Institute of Technology); Weigang Zhang (Harbin Institute of Technology, Weihai)

Moving Object Tracking based on Kernel and Random-coupled Neural Network

Yiran Chen (Chengdu University of Technology); Haoran Liu (Chengdu University of Technology); Mingzhe Liu (Chengdu University of Technology); Yanhua Liu (Chengdu University of Technology); Ruili Wang (Massey University); Peng Li (The College of Engineering Technology of Chengdu University of Technology)

Low-Light Image Enhancement via FourierTMamba: A Hybrid Frequency-Spatial Approach

Shuwei Peng (Jiangxi Normal University); Xu Zhang (Jiangxi Normal University); Aiwen Jiang (Jiangxi Normal University);  Changhong Liu (Jiangxi Normal University); Jihua Ye (Jiangxi Normal University)

RandommaskFormer: Light Weight Remote Sensing Scene Classification with Masked Transformer

Xianbin Hu (Xidian University); Wei Wu (Xidian University); Zhu Li (University of Missouri, Kansas City)

Local Feature-Emphasizing Transformer for Cloth-Changing Person Re-identification

Jieqiong Zhou (Nanjing University of Information Science and Technology); Guoqing Zhang (Nanjing University of Information Science and Technology); Yuhui Zheng (Nanjing University of Information Science & Technology); Fuguo Zhang (Nanjing University of Information Science and Technology; Suzhou Kunke Intelligent Equipment co., ltd)

Fast Online Adaptation of Visual SLAM via Variational Information Transfer and Preservation

Sangni Xu (South China University of Technology); Hao Xiong (Macquarie University); Qiuxia Wu (South China University of Technology, China); zhihui wang (Dalian University of Technology); Shlomo Berkovsky (Macquarie University); Zhiyong Wang (The University of Sydney)

MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining

Fengqi Li (Dalian Jiaotong University); Mengchao Guo (Dalian Jiaotong University); Renxuan Xiong (Dalian Jiaotong University); Donglei Yang (Dalian Jiaotong University); Yan Zhang (Department of Informatics, University of Oslo); Yi Wang (Dalian University of Technology); Fengqiang Xu (Dalian Jiaotong University)

16:00-17:30

Oral Session 5 – Multimodal Analysis and Description

16:00-16:15

Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning

Meng Shen (Nanyang Technological University); Yake Wei (Renmin University of China); Jianxiong (Terry) Yin (NVIDIA AI Tech Centre); Deepu Rajan (Nanyang Technological University); Di Hu (Renmin University of China); Simon See (NVIDIA AI Tech Centre)

16:15-16:30

Active Object Segmentation: A New Modality for Egocentric Action Recognition

Jian Ma (Tianjin University); Bin Zhu (Singapore Management University); Kun Li (Tianjin University); Dima Damen (University of Bristol)

16:30-16:45

Incremental Few-Shot Object Detection by Leveraging External Information from Large Multimodal Models

Guan-Yu Wu (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University)

16:45-17:00

CISampler: Correlated Information Guided Frame Sampling for Gesture Recognition in Video

Yuanyuan Shi (Xidian University); Yunan Li (Xidian University); Siyu Liang (Xidian University); Huizhou Chen (Xidian University); Qiguang Miao (Xidian University)

17:00-17:15

Unified Multi-view Clustering based on Joint Multi-Structure Representation Learning

Song Huang (Shenzhen University); Zeng Ziming (Shenzhen University); Min Li (Shenzhen University); Jianping Wang (Shenzhen University)

17:15-17:30

DocPointer: A parameter-efficient Pointer Network for Key Information Extraction

Haipeng Li (Shandong University of Science and Technology); Guangcun Wei (Shandong University of Science and Technology); Haochen Xu (Shandong University of Science and Technology); Boyan Guo (Shandong University of Science and Technology)

16:00-17:30

Oral Session 6 – Multimedia Applications I

16:15-16:30

Joint Frame-Level and Block-Level Rate-Perception Optimized Preprocessing for Video Coding

Huajie Tan (Peking University); Guoqing Xiang (Alibaba); Xiaodong Xie (Peking University); Huizhu Jia (Peking University)

16:30-16:45

Robust Discriminative and Modal-Consistent Feature Learning for Fine-Grained Sketch-Based Image Retrieval

Junchao Ge (Kunming University of Science and Technology); Huafeng Li (Kunming University of Science and Technology); Yafei Zhang (Kunming University of Science and Technology)

16:45-17:00

LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset

Ruikun Zhang (Beijing Institute of Technology); Hao Yang (Beijing Institute of Technology); Yan Yang (The Australian National University); Ying Fu (Beijing Institute of Technology); Liyuan Pan (The Australian National University)

17:00-17:15

Variational Stochastic Multiple Auto-Encoder for Multimodal Recommendation

Ying Qiao (Shandong University); Aoxuan Chen (Shandong University); Xiang Li (Shandong University); Jinfei Gao (Shandong University)

17:15-17:30

Flexible Semantic Watermarking for Robust Diffusion Model Detection and Tracing

Zhitong Zhou (Institute of Information Engineering, Chinese Academy of Sciences); JING YU (Institute of Information Engineering, Chinese Academy of Sciences); Keke Gai (Beijing Institute of Technology); Jiamin Zhuang (Institute of Information Engineering, Chinese Academy of Sciences); Gaopeng Gou (Institute of Information Engineering, CAS); Gang Xiong (Institute of Information Engineering, Chinese Academy of Sciences)

16:00-17:30

Multimedia and Vision – Video and Animation

16:00-16:15

LoopAnimate: Loopable Salient Object Animation

Fanyi Wang (Honor); Peng Liu (OPPO AI Center); Haotian Hu (Zhejiang Leapmotor Technology CO., LTD.); Dan Meng (OPPO Research Institute); Jingwen Su (OPPO AI Center); Jinjin Xu (OPPO research institute); Yanhao Zhang (OPPO AI Center); Xiaoming Ren (OPPO); Zhiwang Zhang (NingboTech University)

16:15-16:30

SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition

Jiaqi Chen (Northeastern University); Yan Yang (The Australian National University); Shizhuo Deng (Northeastern University); Da Teng (Northeastern University); Liyuan Pan (The Australian National University)

16:30-16:45

STODINE: Decompose video to Object-centric Spatial-Temporal Slots for physical reasoning

Haoyuan Zhang (University of Chinese Academy of Sciences); Xiangyu Zhu (Chinese Academy of Sciences); Qu Tang (Institute of Automation, Chinese Academy of Sciences); Zhaoxiang Zhang (Chinese Academy of Sciences); Zhen Lei (Institute of Automation, Chinese Academy of Sciences)

16:45-17:00

GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video

Jingxuan Chen (Jinan University-University of Birmingham Joint Institute)

17:00-17:15

Action Selection Learning for Multi-label Multi-view Action Recognition

Trung Thanh NGUYEN (Nagoya University); Yasutomo Kawanishi (RIKEN); Takahiro Komamizu (Nagoya University); Ichiro Ide (Nagoya University)

17:15-17:30

MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining

Fengqi Li (Dalian Jiaotong University); Mengchao M Guo (Dalian Jiaotong University); Renxuan Xiong (Dalian Jiaotong University); Donglei Yang (Dalian Jiaotong University); Yan Zhang (Department of Informatics, University of Oslo); Yi Wang (Dalian University of Technology); Fengqiang Xu (Dalian Jiaotong University)

18:00-20:00

Welcome Reception Title to be confirmed


Main Conference Day 2 – Thursday 5 December

8:30-9:00

Registration

9:00-10:00

Keynote 2 

End-to-End System and Networking Challenges of Multi-View Video Systems

Professor Klara Nahrstedt (University of Illinois Urbana-Champaign, USA)

10:00-10:30

Morning Tea

10:30-12:00

Oral Session 8 – Multimedia and Vision – Segmentation and Classification

10:30-10:45

CA-OVS: Cluster and Adapt Mask Proposals for Open-Vocabulary Semantic Segmentation

Son Duy Dao (RMIT University); Hengcan Shi (Monash University); Dinh Q Phung (Monash University); Jianfei Cai (Monash University)

10:45-11:00

Feature-weighted Multi-stage Bayesian Prototype for Few-shot Classification

Xiaocong Zhou (Hohai University); Fan Liu (Hohai University); Chuanyi Zhang (Hohai University); Feifan Li (Hohai University); Wenwen Cai (Hohai University); Jun Zhou (Griffith University)

11:00-11:15

Layout Relationship Decoupling Framework for Multi-target Domain Adaptative Semantic Segmentation

Yuhang Zhang (Shenzhen University); Cuixin Yang (The Hong Kong Polytechnic University); Muxin Liao (Shenzhen University); Shishun Tian (Shenzhen University); Wenbin Zou (Shenzhen University); Chen Xu (Shenzhen University)

11:15-11:30

FreqFormer: A Frequency Transformer for Semantic Segmentation of Remote Sensing Images

Xin Li (Hohai University); Feng Xu (Hohai University; Jiangsu Ocean University); Yao Tong (Nanjing University of Chinese Medicine); Fan Liu (Hohai University); Yiwei Fang (Hohai University); Xin Lyu ( Hohai University); Jun Zhou (Griffith University)

11:30-11:45

Ultrasound Video Segmentation of Pubic Symphysis and Fetal Head for Angle of Progression Measurement

Shuangping Chen (Jinan University); Huijin Wang (Jinan University); Shun Long (Jinan University); Jieyun Bai (The Univeristy of Auckland); Jianmei Jiang (Jinan University)

11:45-12:00

Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR

Chengxi Lei (Massey University); Satwinder Dr Singh (University of Auckland); Feng Hou (Massey University); Ruili Wang (Massey University)

10:30:-12:00

Oral Session 9 – Social and Interactive Multimedia

10:30-10:45

An Information Cascade Prediction Algorithm Based on Time Series

Dongming Chen (Northeastern University); Mingshuo Nie (Northeastern University); Zhengping Sun (Northeastern University); Huilin Chen (The Australian National University); Dongqi Wang (Northeastern University)

10:45-11:00

Watermarking Vision-Language Models

Shan Wan (Shanghai Normal University); Wu Liu (University of Science and Technology); Yijun Liu (Chinese Academy of Sciences; University of Chinese Academy of Sciences); Feiniu Yuan (Shanghai Normal University); Chunli Meng (Shanghai Normal University)

11:00-11:15

SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation

Chen-hsiu Huang (National Taiwan University); Ja-Ling Wu (National Taiwan University)

11:15-11:30

FeedMatch: Evolves for Semi-Supervised Multimedia Classification from Student Feedback

Junjiang Liu (Beijing University of Posts and Telecommunications); Dandan Sun (Beijing University of Posts and Telecommunications); Hailun Xia (Beijing University of Posts and Telecommunications); Jiangtao Bai (Beijing University of Posts and Telecommunications); Xinyue Fan (Beijing University of Posts and Telecommunications)

11:30-11:45

Dual-Stream Keyframe Enhancement for Video Question Answering

Zhenzhen Hu (Hefei University of Technology); Xin Guan (Hefei University of Technology); Jia Li (Hefei University of Technology); Zijie Song (Hefei University of Technology); Richang Hong (Hefei University of Technology)

11:45-12:00

Dual-stream Multi-modal Interactive Vision-language Tracking

Zhiyi Mo (Guangxi Normal University; Wuzhou University); Guangtong Zhang (Guangxi Normal University); Jian Nong (WuZhou University); Bineng Zhong (Guangxi Normal University); Zhi Li (Guangxi Normal University)

10:30:-12:00

Oral Session 10 – Multimodal Analysis and Description II

10:30-10:45

Prompting Industrial Anomaly Segment with Large Vision-Language Models

Jinheng Zhou (Shanghai Normal University); Wu Liu (University of Science and Technology); Guang Yang (Zhongguancun Laboratory); He Zhao (Micro-intelligent); Feiniu Yuan (Shanghai Normal University)

10:45-11:00

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

Zhiyuan Li (University of Sydney); Dongnan Liu (University of Sydney); Heng Wang (The University of Sydney); Chaoyi Zhang (University of Sydney); Weidong Cai (University of Sydney)

11:00-11:15

Structured Bipartite Graph Ensemble Clustering

Chen Wang (Massey University)

11:15-11:30

Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video

Tomoya Sugihara (The University of Tokyo); Shuntaro Masuda (The University of Tokyo); Ling Xiao (The University of Tokyo); Toshihiko Yamasaki (The University of Tokyo)

11:30-11:45

MAFS: Modality-Aware Federated Semi-Supervised Learning with Selective Data Sharing Specified by Individual Clients

Yi-Chen Li (National Tsing Hua University); Chih-Fan Hsu (Inventec Corp.); JianKai Wang (Qualcomm); Chung-Chi Tsai (Qualcomm Technology); Cheng-Hsin Hsu (National Tsing Hua University)

11:45-12:00

Prompt-based Continual Learning for Extending Pretrained CLIP Models' Knowledge

Li Jiao (Communication University of China); Tian Wang (Communication University of China); Lihong Cao (Communication University of China)

11:45-13:00

Lunch

13:00-14:30

Oral Session 11 – Multimedia HCI and Quality of Experience

13:00-13:15

A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion

Zeyu Zhao (Institute of Automation, Chinese Academy of Sciences); Nan Gao (CASIA); Zhi Zeng (Institute of Automation, Chinese Academy of Sciences); GuiXuan Zhang (CASIA); Jie Liu (CASIA); ShuWu Zhang (CASIA)

13:15-13:30

MS-GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via multi-scale Geodesic Patch Similarity

Bingyang Cui (Cooperative Medianet Innovation Center, Shanghai Jiaotong University); Yujie Zhang (Shanghai Jiao Tong University); Qi Yang (Tencent); Yiling Xu (Shanghai Jiao Tong University)

13:30-13:45

A Benchmark for Gaussian Splatting Compression and Quality Assessment Study

Qi Yang (Tencent); Kaifa Yang (Shanghai Jiao Tong University); Yuke Xing (Shanghai Jiao Tong University); Yiling Xu (Shanghai Jiao Tong University); Zhu Li (University of Missouri-Kansas City)

13:45-14:00

LMoW: A Latent Random Variable Model for Unconditional Human Motion Generation

Faisal Ahmed (University of Alberta); Justin Rozeboom (University of Alberta); Hanran Song (University of Alberta); Chenqiu Zhao (University of Alberta); Anup Basu (University of Alberta)

14:00-14:15

BCS-NeRF: Bundle Cross-Sensing Neural Radiance Fields

Mingwei Cao (Anhui University); Fengna Wang (Anhui University); Dengdi Sun (Anhui University); Haifeng Zhao (Anhui University)

14:15-14:30

Hierarchical Part-Attention Networks for 3D Human Reconstruction

Jinwei Li (University of Chinese Academy of Sciences); Yongkang Cheng (Northwest A&F University); Yonghe Zhang (Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences; Innovation Academy for Microsatellites of Chinese Academy of Sciences); Pengcheng Wang (Innovation Academy for Microsatellites of Chinese Academy of Sciences)

13:00-14:30

Oral Session 12 – Emotional and Social Signals in Multimedia

13:00-13:15

Emotion-Aware and Efficient Meme Sticker Dialogue Generation

Zhaojun Guo (Fudan University); Junqiang Huang (Fudan University); Guobiao Li (Fudan University); Wanli Peng (China Agricultural University); Xinpeng Zhang (Fudan University); Zhenxing Qian (Fudan University); Sheng Li (Fudan University)

13:15-13:30

MicroMamba: State Space Model with Partitioned Window Scan for Micro-Expression Recognition

Tianchen Zhou (Southeast University); Jiateng Liu (Southeast University); Yue JIN (Southeast University); Li Yao (Southeast University)

13:30-13:45

MFNet: Mixed Feature Network for Enhancing Facial Emotion Recognition on the Small-Scale Dataset

Huilin Chen (The Australian National University)

13:45-14:00

Adaptive Both homo- and hetero-Feature Integration for Multimodal Emotion Recognition

Ze Kun Wang (Tianjin University of Science and Technology); Zhan Jun Si (Tianjin University of Science and Technology)

14:00-14:15

HuBERT-CLAP: Contrastive Learning-Based Multimodal Emotion Recognition Using Self-Alignment Approach

Long Hoang Nguyen (Fruit); Nhat Truong Pham (Sungkyunkwan University); Mustaqeem Khan (Mohamed Bin Zayed University of Artificial Intelligence); Alice Ahlem OTHMANI (UPEC); Abdulmotaleb EI Saddik (University of Ottawa)

14:15-14:30

Advancing Music Emotion Recognition: A Transformer Encoder-Based Approach

Yangyuan Chen (Wenzhou University of Technology); Zhizhong Ma (Wenzhou University of Technology); Mingjing Wang (Wenzhou University of Technology); Mingzhe LIU (Wenzhou University of Technology)

13:00-14:30

Oral Session 13 – Deep Learning for Multimedia II

13:00-13:15

Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text Recognition

Pu Li (San Diego State University); Yibiao Zhao (iSee); Xiaobai Liu (San Diego State University)

13:15-13:30

T2QRM: Text-Driven Quadruped Robot Motion Generation

Minghui Wang (South China University of Technology); Zixu Wang (South China University of Technology); Hongbin Xu (South China University of Technology); Kun Hu (The University of Sydney); Zhiyong Wang (The University of Sydney); Wenxiong Kang (South China University of Technology)

13:30-13:45

FATO: Frequency Attention Transformer for Omnidirectional Image Super-Resolution

Hongyu An (University of Chinese Academy of Sciences); Xinfeng Zhang (University of Chinese Academy of Sciences); Shijie Zhao (Bytedance Inc.); Li Zhang (Bytedance Inc.)

13:45-14:00

IdentityKD: Identity-wise Cross-modal Knowledge Distillation for Person Recognition via mmWave Radar Sensors

Liqun Shan (University of Louisiana at Lafayette); Rujun Zhang (North East Petroleum University); Sai Venkatesh Chilukoti (University of Louisiana at Lafayette); Xingli Zhang (University of Louisiana); Insup Lee (University of Pennsylvania); Xiali Hei (University of Louisiana at Lafayette)

14:00-14:15

Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR

Chengxi Lei (Massey University); Satwinder Dr Singh (University of Auckland); Feng Hou (Massey University); Ruili Wang (Massey University)

14:15-14:30

Multi-stage Image Deraining based on Pre-trained Diffusion Model

Xiong Zeng (Wuhan University of Science and Technology); Jiang Min (Wuhan University of Science and Technology); Ronghua Huang (Wuhan University of Science and Technology)

14:30-15:15

Afternoon Tea

15:15-16:00

Poster Session 2

DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer

Ying Hu (Nanjing University of Aeronautics and Astronautics); Chenyi Zhuang (Nanjing University of Aeronautics and Astronautics); Pan Gao (Nanjing University of Aeronautics and Astronautics)

Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition

Chao TAN (MonotaRO); Sheng Li (National Institute of Information & Communications Technology (NICT)); Yang Cao (Tokyo Institute of Technology); Zhao Ren (University of Bremen); Tanja Schultz (University of Bremen)

A method for detecting hands off the steering wheel

Yujia Xu (Hubei University of Education); Deyu Pan (Beijing Institute of Graphic Communication); Ling Ding (Hubei University of Education)

A Multi-angle Text Recognition Algorithm

Jie Wang (Northeastern University); Huilin Chen (The Australian National University); Wandong Xue (Northeastern University); Dongming Chen (Northeastern University); Dongqi Wang (Northeastern University)

Development of a Chinese Synonym Library: Enhancing Clinical Terminology Standardization and Interoperability

Yani Chen (Dalian Maritime University); Kaiyu Nie (Department of Burns and Reconstructive Surgery, Affiliated Hospital of Zunyi Medical University); Xiaoxia Nie (College of Foreign languages, Guizhou University ); Ruili wang (Honeycomb Telemedicine Technology)

Fine-grained Video Semantic Distillation for Video-Text Retrieval

Zuyi Pei (Dalian University of Technology); Baoli Sun (Dalian University of Technology); zhihui wang (Dalian University of Technology); Haojie Li (Shandong University of Science and Technology)

SS-FS CSA: Self-Supervised and Fully Supervised Integration for 3D Cerebrovascular Segmentation

Chenxi Niu (University of Nottingham Ningbo China); Ziyu Liu (University of Nottingham, Ningbo, China); Xiangjian  He (University of Nottingham Ningbo)

PCMark-NAS: Lightweight Print-Camera Resilient Watermarking Networks via Neural Architecture Search

Daidou Guo (University of Shanghai for Science and Technology); Chuan Qin (University of Shanghai for Science and Technology)

CSCCap: Plugging Sparse Coding in Zero-Shot Image Captioning

Yu Song (Henan University); Xiaohui Yang (Henan University); Rongping Huang (Southern University of Science and Technology); BAI HAIFENG (DEEPEXI); Lili Yang (Southern University of Science and Technology)

Description-Driven Audiovisual Embedding Space Learning for Enhanced Movie Understanding

Wei-Lun Huang (National Taiwan University of Science and Technology); Shao-Hung Wu (National Tsing Hua University); Hung-Chang Huang (KKCompany Technologies); Min-Chun Hu (National Tsing Hua University); Tse-Yu Pan (National Taiwan University of Science and Technology)

MBC-ATA: Maximum Binary Classification and Anchor-based Triplet Augmentation for Unbiased Scene Graph Generation

Hao Zhang (Shandong University); Xingning Dong (Ant Group); Jinfei Gao (Shandong University); Liang Hao (HBIS DIGITAL TECH Co.,Ltd.); Pei Shen (HBIS DIGITAL TECH Co.,Ltd.); Tian Gan (Shandong University)

16:00-17:30

Oral Session 14 – Deep Learning for Multimedia III

16:00-16:15

Point-Supervised Temporal Action Detection with Label Supplementation Based on Transformer

Cui Xu (University of Chinese Academy of Sciences); Laiyun Qing (University of Chinese Academy of Sciences)

16:15-16:30

A Unified Contrastive Framework with Multi-Granularity Fusion for Text-to-Image Generation

Yachao He (Shandong Normal University); Li Liu (Shandong Normal University); Huaxiang Zhang (Shandong Normal University); Dongmei Liu (Shandong Normal University); Hongzhen Li (Shandong Normal University)

16:30-16:45

Local Feature-Emphasizing Transformer for Cloth-Changing Person Re-identification

Jieqiong Zhou (Nanjing University of Information Science and Technology); Guoqing Zhang (Nanjing University of Information Science and Technology); Yuhui Zheng (Nanjing University of Information Science & Technology); Fuguo Zhang (Nanjing University of Information Science and Technology; Suzhou Kunke Intelligent Equipment co., ltd)

16:45-17:00

HSMnet: Hybrid Sampling and Matching Network for DETR-based Person Search

Zhengjie Lu (Hebei University); Jinjia Peng (Hebei University); Huibing Wang (Dalian Maritime University); Qingxuan Shi (Hebei University); Bin Wang (Hebei University)

17:00-17:15

Dlpp-Net: Degradation Location Prior Prediction Network for Image Restoration

Yongjian Liu (Wuhan University of Technology); Shunwei Zhang (Wuhan University of Technology); Jinyu Xu (Wuhan University of Technology); Jiachen Li (Wuhan University of Technology); Yanchun Ma (Wuhan Vocational College of Software and Engineering); Qing Xie (Wuhan University of Technology)

17:15-17:30

A Robust Few-shot Learning Framework via Dual-branch Adversarial Noise Pretraining

Jiale Wang (Hefei University of Technology); Xueliang Liu (Hefei University of Technology); Yuling Su (Hefei University of Technology)

16:00-17:30

Oral Session 15 – Multimedia and Vision – 3D and Point Cloud

16:00-16:15

RoboFormer: A Robust Multi-Modal Transformer for 3D Object Detection in Autonomous Driving

Yuang Liu (Beijing University of Posts and Telecommunications); Dacheng Liao (Beijing University of Posts and Telecommunications); Mengshi Qi (Beijing University of Posts and Telecommunications); Liang Liu (Beijing University of Posts and Telecommunications ); Huadong Ma (Beijing University of Posts and Telecommunications)

16:15-16:30

MRGait: A Multi-range feature learning framework for Cross-View Gait Recognition

Muhammad Saad Shakeel (South China University of Technology); Kun Liu (South China University of Technology); Xiaochuan Liao (South China University of Technology); Wenxiong Kang (South China University of Technology)

16:30-16:45

Point Cloud Normal Estimation via Representation Learning on Height Maps

Yang  Yi (Deakin University); Dasith T de Silva Edirimuni (The University of Western Australia); Ye Zhu (Deakin University); Shang Gao (Deakin University); Zhiyong Wang (The University of Sydney); Antonio Robles-Kelly (Deakin University); Xuequan Lu (La Trobe University)

16:45-17:00

An Efficient Multi-prior Hybrid Approach for Consistent 3D Generation from Single Images

Yichen Ouyang (Zhejiang University); Jiayi Ye (Zhejiang University); Wenhao Chai (University of Washington); Dapeng Tao (Yunnan University); Yibing Zhan (JD Explore Academy); Gaoang Wang (Zhejiang University)

17:00-17:15

Multi-Frame Sparse Convolutional Learning for Point Cloud Color Denoising

Tailin Yang (Xidian University); Wei Wu (Xidian University); Zhu Li (University of Missouri, Kansas City); Rui Zhou (XDU)

17:15-17:30

Multi-Modality Semantic-Shared Cross-View Ground-to-Aerial Localization

Kai Zhang (Nanjing University of Science and Technology); Xia Yuan (Nanjing University of Science and Technology); ShunTong Chen (Nanjing University of Science and Technology); Di Hu (Nanjing University of Science and Technology); Chunxia Zhao (Nanjing University of Science and Technology)

16:00-17:30

Oral Session 16 – Multimedia applications II

16:00-16:15

MambaVesselNet: A Hybrid CNN-Mamba Architecture for 3D Cerebrovascular Segmentation

Yanming Chen (University of Nottingham Ningbo); Xiangjian He (University of Nottingham Ningbo)

16:15-16:30

MoE-Polyp: Shifting More Attention to Small Polyp Segmentation via Mixture-of-Experts

Zihuang Wu (Jiangxi Normal University); Xinyu Xiong (Hangzhou Hikvision Digital Technology Co Ltd); Ying Chen (Pazhou Lab); Siying Li (Shenzhen University); Hua Chen (Jiangxi Normal University)

16:30-16:45

MFTAnet: Two-step Aggregation Net of Multiscale Features for Pneumoconiosis Screening

Wei Qingjin (Sichuan Normal University); Xiaozhuo Li (Sichuan Normal University); Liu Dinglu (Sichuan Normal University); Zhiwu Liao (Sichuan Normal University)

16:45-17:00

FA-UNext: A Feedback Attention-based MLP Network for Medical Image Segmentation

Qianyu Li (Dalian University of Technology); Bingcai Chen (Dalian University of Technology); Jax Tian (Dalian University of Technology); RuoLan Liu (Dalian University of Technology)

17:00-17:15

CoolColor: Text-guided COherent OLd film COLORization

Zichuan Huang (Peking University); Yifan Li (Peking University); Shuai Yang (Peking University); Jiaying Liu (Peking University)

17:15-17:30

Multimodal Sign Language Knowledge Graph and Representation: Text, Video KeyFrames, and Motion Trajectories

Ziqiang Liu (Beijing Institute of Petrochemical Technology); Gongwei Fang (Beijing Institute of Petrochemical Technology); Wentong Wang (Beijing Institute of Petrochemical Technology); Qiang Liu (Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology)

18:00-20:00

Conference Banquet (to be confirmed)


Main Conference Day 3 – Friday 6 December

8:30-9:00

Registration

9:00-10:00

Keynote 3 

EasyFL: Optimising Federated Learning for Computer Vision Applications

Yonggang Wen (Nanyang Technological University, Singapore)

10:00-10:30

Morning Tea

10:30-12:00

Oral Session 17 – Music and Audio Processing in Multimedia

10:30-10:45

DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition

Fei Xiang (Dalian Maritime University); Ruili Wang (Dalian Maritime University); Junjie Hou (Dalian Maritime University); Xingang Wang (Dalian Maritime University)

10:45-11:00

StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech

Haowei Lou (University of New South Wales); Hye-Young Paik (University of New South Wales); Wen Hu (University of New South Wales); Lina Yao (University of New South Wales)

11:00-11:15

Bivariate Mixup for 2D Contact Point Localization with Piezoelectric Microphone Array

Shogo Yonezawa (Tokyo University of Science); Yukinobu Taniguchi (Tokyo University of Science); Go Irie (Tokyo University of Science)

11:15-11:30

Multi-domain Acoustic Feature Fusion for Speaker Recognition

Shanshan Yao (Shanxi University); Tian Li (Shanxi University)

11:30-11:45

The quantification of emotional expressions and perceptions of vocal vibrato in basic emotion: commercial operatic singing recordings

Liu JieYing (Tokyo University of Arts)

11:45-12:00

Pitch-aware generative pretraining improves multi-pitch estimation with scarce data

Mary Pilataki (Queen Mary University of London); Matthias Mauch (Apple); Simon Dixon (Queen Mary University of London)

10:30-12:00

Oral Session 18 – Short Papers

10:30-10:45

Sketch-based 3D Model Retrieval with Cross-Modal Representation

Hairui Yang (Dalian University of Technology); Ning Wang (Dalian University of Technology); Zhihui Wang (Dalian University of Technology); Lei Wang (Dalian University of Technology)

10:45-11:00

Highly Fault-Tolerant Discrete Lattice Information Coding Method for Screen-Shooting Scenarios

Daidou Guo (University of Shanghai for Science and Technology); Ching-Chun Chang (National Institute of Informatics); Cheng SenMao (University of Shanghai for Science and Technology); Chuan Qin (University of Shanghai for Science and Technology)

11:00-11:15

Fibre Population-guided Pre-training for 3D Spatial Super-Resolution on Multimodal Brain Diffusion MR Imaging

Zihao Tang (University of Sydney); Xinyi Wang (University of Sydney); Mariano Cabezas (The University of Sydney); Arkiev D'Souza (The University of Sydney); Michael Barnett (Sydney Neuroimaging Analysis Centre); Fernando Calamante (The University of Sydney); Weidong Cai (University of Sydney); Chenyu Wang (University of Sydney, Sydney Neuroimaging Analysis Centre)

11:15-11:30

Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild

Tianqi Wei (The University of Queensland); Zhi Chen (The  University of Queensland); Xin Yu (University of Queensland)

11:30-11:45

Emotionally Guided Symbolic Music Generation Using Diffusion Models: The AGE-DM Approach

Mingzhe Zhang (University of Queensland); Laura Ferris (University of Queensland); Lin Yue (University of Adelaide); Miao Xu (University of Queensland)

11:45-12:00

SS-FS CSA: Self-Supervised and Fully Supervised Integration for 3D Cerebrovascular Segmentation

Chenxi Niu (University of Nottingham Ningbo); Ziyu Liu (University of Nottingham Ningbo); Xiangjian  He (University of Nottingham Ningbo)

10:30-12:00

Oral Session 19 – Multimedia and Vision - Detection

10:30-10:45

Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga

Takara Taniguchi (The University of Tokyo); Ryosuke Furuta (The University of Tokyo)

10:45-11:00

Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection

Xinhao Zhong (Institute of Information Science, Beijing Jiaotong University); Siyu Jiao (Beijing Jiaotong University); Yao Zhao (Beijing Jiaotong University); Yunchao Wei (UTS)

11:00-11:15

KBY-Net: A Dual Learning Framework for Improving Object Detection in Rainy Weather Conditions

Zheng-Xian Keh (Multimedia University); Lai-Kuan Wong (Multimedia University); Yuen Peng Loh (Multimedia University); Ke Gu (Beijing University of Technology); Weisi Lin (Nanyang Technological University, Singapore)

11:15-11:30

Fire and Smoke Detection with Burning Intensity Representation

Xiaoyi Han (Zhejiang University); Yanfei Wu (China Mobile (Suzhou) Software Technology Co., Ltd.); Nan Pu (University of Trento); Zunlei Feng (Zhejiang University); Qifei Zhang (Zhejiang University); Yijun Bei (Zhejiang University); Lechao Cheng (Hefei University of Technology)

11:30-11:45

Transition in Focus of Prediction Tasks for Skeleton Graph Component Detection with Transformer

Zhiyuan Wang (Soochow University); Cong Yang (Soochow University); Yulu Zhang (Chery); Zeyd Boukhers (Fraunhofer Institute for Applied Information Technology FIT); Wei Sui (Horizon Robotics); Yi Ji (Soochow University); Chunping Liu (Soochow University)

11:45-12:00

CSUNet: Contour-Sensitive Underwater Salient Object Detection

Wei Yu (Harbin Boiler Company Limited); Yi Wang (Dalian University of Technology); Shijun Yan (Dalian University of Technology); Tianzhu Wang (freelance researcher); Zhihan Wang (Massey University); Weirong Sun (Massey University); Yu Zhao (Shanghai Zhudian Semiconductor Co., Ltd); Xinwei Xue (Dalian University of Technology)

Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages

13:45

Begin speech

13:45 - 14:15

Invited Talk: Conversational AI Datasets for Large-Scale Model Training

Speaker: Qingqing Zhang, MagicData Inc.

14:15 - 14:45

Invited Talk: Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in?

Speaker: Qianying Liu, National Institute of Informatics, Tokyo, Japan

14:45 - 15:00

Invited Talk: Towards Advanced End-to-End Myanmar Speech Recognition and Synthesis

Speaker: Win Pa Pa, University of Computer Studies, Yangon, Myanmar

15:00 - 15:15

Selected Workshop Paper: How to Design Translation Prompts for ChatGPT: An Empirical Study

Speaker: Yuan Gao, Massey University, New Zealand

15:15 - 15:30

Tea Break

15:30 - 16:00

Invited Talk: Transferring the Power of Audio from Languages to Healthcare: Insights and Perspectives

Speaker: Qian Kun, Beijing Institute of Technology, China

16:00 - 16:30

Invited Talk: Unified Multimodal Understanding and Generation, a New Perspective

Speaker: Zuchao Li, Wuhan University, China

16:30 - 16:45

Selected Workshop Paper: Disentangling Singlish Discourse Particles with Task-Driven Representation

Speaker: Linus Tze En Foo, University of Edinburgh, United Kingdom

16:45 - 17:00

Invited Talk: AI4Bharat

Speaker: Raj Dabre, National Institute of Information and Communications Technology, Kyoto, Japan

17:00

Closing Speech: Organizers

Workshop on SpandLDeteriorate

13:45

Opening

14:00 - 14:45

Keynote 1

14:45 - 15:00

Paper Presentation: Reference-free automatic speech severity evaluation using acoustic unit language modelling

Speaker: Bence M Halpern

15:00 - 15:15

Paper Presentation: Free-FreeSLT: A Gloss-Free, Parameter-Free model for Sign Language Translation

Speaker: Weirong Sun

15:15 - 15:30

Break and Afternoon Tea

15:30 - 16:15

Keynote 2

16:15 - 16:30

Paper Presentation: Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection

Speaker: Yilin Pan

16:30 - 16:45

Close

Collaboration and Evolution of Foundation and Specialized Models Workshop

9:00 – 9:15

Opening Remarks

9:15 - 9:30

Paper Presentation:An Adaptive Aggregation Method for Federated Learning via Meta Controller

Speaker: Tao Shen

9:30 - 9:45

Paper Presentation:DHelper: A Collaborative Toolkit for Manuscript Restoration

Speaker: Yue Han/Yuqing Zhang

9:45 - 10:00

Paper Presentation:Distributed Optimization over Block-Cyclic Data

Speaker: Yucheng Ding/Chaoyue Niu

10:00 - 10:30

Invited Talk:Towards Industrial Large Models and Digital Twins

Speaker: Jiehan Zhou

10:30 - 10:45

Coffee Break

10:45 - 11:15

Invited Talk:Beyond Language: Revisiting ASR for Future Challenges

Speaker: Sheng Li

11:15 - 11:45

Invited Talk (Online):Fine-grained Action Analysis for Human Behavior Understanding

Speaker: Jinglin Xu

11:45 - 12:15

Invited Talk (Online):Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learnin

Speaker: Ting Wang

12:15

Close

MFM-RsAg

9:00 – 9:05

Welcome

9:05 – 9:50

Keynote Speech: Domain specific foundation models in Agriculture

Speaker: Xin Yu

9:50 – 10:20

Paper Presentation: Remote-sensing Foundation Model for Agriculture: A Survey

Speaker: Yu Luo

10:20 – 10:45

Morning Tea/Coffee Break

10:45 – 11:15

Paper Presentation: Agricultural Detection Using Spectral Adaptive Imaging Model

Speaker: Yuning Wang

11:15 – 11:45

Paper Presentation: An Encoder–Decoder Framework for Foundation Model-based Remote Sensing Semantic Segmentation

Speaker: Jiale Song

11:45 – 12:15

Paper Presentation: Improved downscaled inversion of soil moisture based on the BGKS methodology

Speaker: Wei Hu

12:15

Close