Tuesday 3 December - Workshops, Tutorials, PhD School | |||
08:30-09:00 | Registration | ||
09:00-10:30 |
Workshop 1 – Workshop on Collaboration and Evolution of Foundation and Specialized Models
Venue: Room 405-470 Organiser: Shengyu Zhang |
Workshop 2 – Workshop on Multimodal Foundation Models of Remote Sensing and Agriculture
Venue: Room 405-460 Organiser: Kun Hu |
Tutorial 1 – Vision-Language Models for Multimedia Applications: From Foundations to State-of-the-Art
Venue: Room 401-401 Organiser: Yanbin Liu |
10:30-10:45 | Morning Tea | ||
10:45-12:15 |
Workshop 1 – Workshop on Collaboration and Evolution of Foundation and Specialized Models
Venue: Room 405-470 Organiser: Shengyu Zhang |
Workshop 2 – Workshop on Multimodal Foundation Models of Remote Sensing and Agriculture
Venue: Room 405-460 Organiser: Kun Hu |
Tutorial 2 – Understanding Australian Sign Language
Venue: Room 401-401 Organiser: Heming Du |
12:15-13:45 | Lunch | ||
13:45-15:15 |
Workshop 3 – Workshop of Multimodal, Multilingual, and Multitask Modeling Technologies for Oriental Languages
Venue: Room 405-470 Organiser: Sheng Li |
Workshop 4 – Workshop on Multi-Biological Sensing Data for Language Deterioration Prediction
Venue: Room 405-460 Organiser: Yilin Pan |
PhD School
Venue: Room 401-401 Chair: Kun Hu |
15:15-15:30 | Afternoon Tea | ||
15:30-17:00 |
Workshop 3 – Workshop of Multimodal, Multilingual, and Multitask Modeling Technologies for Oriental Languages
Venue: Room 405-470 Organiser: Sheng Li |
Workshop 4 – Workshop on Multi-Biological Sensing Data for Language Deterioration Prediction
Venue: Room 405-460 Organiser: Yilin Pan |
PhD School
Venue: Room 401-401 Chair: Kun Hu |
Wednesday 4 December - Main Conference - Day 1 | |||
08:00-09:00 | Registration | ||
09:00-09:15 |
Conference Opening
Venue: Room 401-401 Speaker: Ruili Wang and Jiaying |
||
09:15-10:15 |
Keynote 1 – Prof Wenwu Zhu, Tsinghua University, China
Venue: Room 401-401 Chair: Ruili Wang |
||
10:15-10:45 | Morning Tea | ||
10:45-12:15 |
Oral Session 1 – Highlighted Papers
Venue: Room 401-401 Chair: Wen-Huang Cheng |
||
12:15-13:30 | Lunch | ||
13:30-15:00 |
Oral Session 2 – Deep Learning for Multimedia I
Venue: Room 405-470 Chair: Wei-Ta Chu |
Oral Session 3 – Multimodal Analysis and Description I
Venue: Room 405-460 Chair: Xiao Wu |
Oral Session 4 – Multimedia Systems
Venue: Room 401-401 Chair: Shengyu Zhang |
15:00-15:30 | Afternoon Tea | ||
15:00-16:00 |
Poster Session 1
Venue: Level 4 Foyer Chair: Kun Hu |
||
16:00-17:30 |
Oral Session 5 – Multimodal Analysis and Description II
Venue: Room 405-470 Chair: Tong Qiao |
Oral Session 6 – Multimedia Applications I
Venue: Room 405-460 Chair: Wong Lai Kuan |
Oral Session 7 – Multimedia and Vision I
Venue: Room 401-401 Chair: Sheng Li |
18:00-20:00 |
Welcome Reception
Venue: Level 4 Foyer |
Thursday 5 December - Main Conference - Day 2 | |||
08:00-09:00 | Registration | ||
09:00-10:00 |
Keynote 2 – Prof Klara Nahrstedt, University of Illinois Urbana-Champaign, USA
Venue: Room 401-401 Chair: Jiaying Liu |
||
10:00-10:30 | Morning Tea | ||
10:30-12:00 |
Oral Session 8 – Multimedia and Vision II
Venue: Room 405-470 Chair: Jiaying Liu |
Oral Session 9 – Social Multimedia
Venue: Room 405-460 |
Oral Session 10 – Multimodal Analysis and Description III
Venue: Room 401-401 Chair: Laiyun Qing |
12:00-13:30 | Lunch | ||
13:30-15:00 |
Oral Session 11 – Multimedia HCI and Quality of Experience
Venue: Room 405-470 Chair: Dongming Chen |
Oral Session 12 – Emotional and Social Signals in Multimedia
Venue: Room 405-460 Chair: Weiqi Yan |
Oral Session 13 – Deep Learning for Multimedia II
Venue: Room 401-401 Chair: Yanbin Liu |
15:00-15:30 | Afternoon Tea | ||
15:00-16:00 |
Poster Session 2
Venue: Level 4 Foyer Chair: Kun Hu |
||
16:00-17:30 |
Oral Session 14 – Deep Learning for Multimedia III
Venue: Room 405-470 Chair: Abdulmotaleb El Saddik |
Oral Session 15 – Multimedia and Vision III
Venue: Room 405-460 Chair: Reza Shahamiri |
Oral Session 16 – Multimedia Applications II
Venue: Room 401-401 Chair: Wentong Wang |
18:00-20:00 |
Conference Banquet/Dinner
Venue: Fale Pasifika Chair: Ruili Wang, Jiaying Liu |
Friday 6 December - Main Conference - Day 3 | |||
08:30-09:00 | Registration | ||
09:00-10:00 |
Keynote 3 – Prof Yonggang Wen, Nanyang Technological University (NTU), Singapore
Venue: Room 401-401 Chair: Zhiyong Wang |
||
10:00-10:30 | Morning Tea | ||
10:30-12:00 |
Oral Session 17 – Music and Audio Processing in Multimedia
Venue: Room 405-470 Chair: Xiaotong Ji |
Oral Session 18 – Short Papers
Venue: Room 405-460 Chair: Zihao Tang |
Oral Session 19 – Deep Learning for Multimedia IV
Venue: Room 401-401 Chair: Kun Hu |
12:00-13:30 | Lunch | ||
13:30-17:00 | Guided Tour |
Main Conference Day 1 – Wednesday 4 December
8:00-9:00 |
Registration |
9:00-9:15 |
Conference Opening |
9:15-10:15 |
Keynote 1 Multimodal Generative AI in Dynamic and Open Environments Professor Wenwu Zhu (Tsinghua University, China) |
10:15-10:30 |
Morning Tea |
10:30-12:00 |
Oral Session 1 – Highlighted Papers |
10:30-10:45 |
TMM-CLIP: Task-guided Multi-Modal Alignment for Rehearsal-Free Class Incremental Learning Yuankang Pan (Southwest Jiaotong University ); Zhaoquan Yuan (Southwest Jiaotong University); Xiao Wu (Southwest Jiaotong University); Zechao Li (Nanjing University of Science and Technology); Changsheng Xu (CASIA) |
10:45-11:00
|
SITransformer: Shared Information-Guided Transformer for Extreme Multimodal Summarization Sicheng Liu (The University of Sydney); Lintao Wang (The University of Sydney); Xiaogang Zhu (The University of Adelaide); Xuequan Lu (La Trobe University); Zhiyong Wang (The University of Sydney); Kun Hu (The University of Sydney) |
11:00-11:15
|
S2FB IoU: Improving Boundary-based Object-Centric Image Segmentation Quality Evaluation Rim El Filali (Nanjing University of Science and Technology); Soufiane JDABA (Nanjing University of Science and Technology); Ronghui Xie (Nanjing University of Science and Technology); Ran Shi (Nanjing University of Science and Technology); Tong Qiao (Hangzhou Dianzi University); Pan Qiaodong (Shaoxing Public Security institute); Ting Wu (Hangzhou innovation Institute Beihang University) |
11:15-11:30
|
Efficient Low-Dimensional Representation Via Manifold Learning-Based Model for Multimodal Sentiment Analysis Xingang Wang (Dalian Maritime University); Mengyi Wang (Dalian Maritime University); Hai Cui (Dalian Maritime University); Yijia Zhang (Dalian Maritime University) |
11:30-11:45 |
ViCo: Engaging Video Comment Generation with Human Preference Rewards Yuchong Sun (Renmin University of China); Bei Liu (Microsoft Research); Xu Chen (Renmin University of China); Ruihua Song (Renmin University of China); Jianlong Fu (Microsoft Research) |
11:45-12:00 |
Personalized Sentiment Estimation Based on Recall and Resting Ratio of Frontal EEG Shun Katada (Osaka University); Kazunori Komatani (Osaka University) |
12:00-13:30 |
Lunch |
13:30-15:00 |
Oral Session 2 – Deep Learning for Multimedia |
13:30-13:45 |
Accelerating Inference of Networks in the Frequency Domain Chenqiu Zhao (University of Alberta); Guanfang Dong (University of Alberta); Anup Basu (University of Alberta) |
13:45-14:00 |
CFRL: Coarse-Fine Decoupled Representation Learning For Long-Tailed Recognition Yiran Song (Shanghai Jiao Tong University); Qianyu Zhou (Shanghai Jiao Tong University); Kun Hu (The University of Sydney); Lizhuang Ma (Shanghai Jiao Tong University); Xuequan Lu (La Trobe University) |
14:00-14:15 |
Dual-Enhanced Disentangled Multi-View Clustering Zhiqian Dong (Anhui University); Sheng Yang (Anhui University); Peng Zhou (Anhui University) |
14:15-14:30 |
Where You See Is What You Know: A Visual-Semantic Conceptual Explainer Luhao Zhu (Zhejiang University); Xiangwei Kong (Zhejiang University); Runsen Li (Zhejiang University); Guodong Guo (Eastern Institute of Technology) |
14:30-14:45 |
A Multi-scale Framework towards Human-Machine Friendly Remote Sensing Image Coding Yingkai He (Wuhan university); Zhen Zhang (Wuhan University); Jing Xiao (Wuhan University) |
14:45-15:00 |
Latent Variables Coding for Perceptual Image Compression Yingkai He (Wuhan University); Zhen Zhang (Wuhan University); Liang Liao (Nanyang Technological University); Jing Xiao (Wuhan University)
|
13:30-15:00 |
Oral Session 3 – Multimodal Analysis and Description I |
13:30-13:45 |
Multimodal Energy Prompting for Video Salient Object Detection Tao Jiang (Massey University); Feng Hou (Massey University); Yi Wang (Dalian University of Technology) |
13:45-14:00 |
CS-HOI: Human Object Interaction Detection Enhanced by Common Sense CHENG-KANG TAN (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University) |
14:00-14:15 |
Adaptive Feature Inheritance and Thresholding for Ingredient Recognition in Multimedia Cooking Instructions Yixin Zhang (Kyoto University); Yoko Yamakata (University of Tokyo, Japan); Keishi Tajima (Kyoto University) |
14:15-14:30 |
ScaMo: Towards Text to Video Storyboard Generation Using Scale and Movement of Shots Xu Gu (Renmin University of China); Xihua Wang (Renmin University of China); Chuhao Jin (Renmin University of China); Ruihua Song (Renmin University of China) |
14:30-14:45 |
TCFusion: A Three-branch Cross-domain Fusion Network for Infrared and Visible Images Wenyu Shao (Dalian Maritime University); Hongbo Liu (Dalian Maritime University) |
13:30-15:00 |
Oral Session 4 – Multimedia Systems |
13:30-13:45 |
QoS-Diff: Adaptive Auto-tuning Framework for Low-latency Diffusion Model Inference Pingyi Huo (The Pennsylvania State University); Ajay Narayanan Sridhar (The Pennsylvania State University); Md Fahim Faysal Khan (The Pennsylvania State University ); Kiwan Maeng (The Pennsylvania State University); Vijaykrishnan Narayanan (The Pennsylvania State University) |
13:45-14:00 |
OpenVideoWalls: an Open-Source System for Building Video Walls with Recycling Heterogeneous Displays Zichen Zhu (Rutgers University); Zhongze Tang (Rutgers University); Amir A Nassereldine (University at Buffalo); Jinjun Xiong (University at Buffalo); Sheng Wei (Rutgers University) |
14:00-14:15 |
Advancing Multimodal LLMs: A Focus on Geometry Problem Solving, Reasoning, and Sequential Scoring Raj Jaiswal (IIIT Delhi); Avinash Anand (IIIT Delhi); Rajiv Ratn Shah (IIIT Delhi) |
14:15-14:30 |
Incorporating Pre-ordering Representations for Low-resource Neural Machine Translation Yuan Gao (Massey University); Feng Hou (Massey University); Ruili Wang (Massey University) |
14:30-14:45 |
ADP3D: Adaptive Point Selection for Efficient Multi-frame 3D Object Detection Guohuan Gao (Beijing Institute of Technology); Gang Zhang (Tsinghua University); Xiangyang Xu (Beijing Institute of Technology) |
14:45-15:00 |
Underwater Image Enhancement via Domain Adaptive Transfer Learning and Hybrid Reinforcement Model Tingting Yao (Dalian Maritime University); Gao Yuan (Dalian Maritime University); Zihao Feng (Dalian Maritime University); Qing Hu (Dalian Maritime University); Zhiyong Wang (The University of Sydney)
|
15:00-15:15 |
Afternoon Tea |
15:15-16:00 |
Poster Session 1 |
|
HFS-HNeRV: High-Frequency Spectrum Hybrid Neural Representation for Videos Jianhua Zhao (Auckland University of Technology); Xue Jun Li (Auckland University of Technology); Peter Han Joo Chong (Auckland University of Technology) |
|
Following in the Footsteps: Predicting Human Trajectories Using Motion Pattern Memory Yuxin Yang (Beijing University of Posts and Telecommunications); Pengfei Zhu (Beijing University of Posts and Telecommunications); Mengshi Qi (Beijing University of Posts and Telecommunications); Huadong Ma (Beijing University of Posts and Telecommunications) |
|
On the Robustness of Deep Face Inpainting: An Adversarial Perspective Wenhao Gao (Nanjing University of Science and Technology); Zhenbo Song (Nanjing University of Science and Technology); Zhenyuan Zhang (Nanjing University of Science and Technology); Jianfeng Lu (Nanjing University of Science and Technology) |
|
Repetitive Action Counting with Feature Interaction Enhancement and Adaptive Gate Fusion Jiazhen Zhang (Hefei University of Technology); Kun Li (Zhejiang University); Yanyan Wei (Hefei University of Technology); Fei Wang (Hefei University of Technology); Wei Qian (Hefei University of Technology); Jinxing Zhou (Hefei University of Technology); Dan Guo (Hefei University of Technology) |
|
Focal Diffusion Process for Object-Aware 3D LiDAR Generation Huijie Zhang (San Diego State University); Xiaobai Liu (San Diego State University) |
|
Improving Sequential DeepFake Detection with Local Information Enhancement Dong Long Yun (Harbin Institute of Technology); Yuanrong Xu (Harbin institute of technology); Jianping Zhong (Harbin Institute of Technology); Zhaobo Qi (Harbin Institute of Technology); Weigang Zhang (Harbin Institute of Technology, Weihai) |
|
Moving Object Tracking based on Kernel and Random-coupled Neural Network Yiran Chen (Chengdu University of Technology); Haoran Liu (Chengdu University of Technology); Mingzhe Liu (Chengdu University of Technology); Yanhua Liu (Chengdu University of Technology); Ruili Wang (Massey University); Peng Li (The College of Engineering Technology of Chengdu University of Technology) |
|
Low-Light Image Enhancement via FourierTMamba: A Hybrid Frequency-Spatial Approach Shuwei Peng (Jiangxi Normal University); Xu Zhang (Jiangxi Normal University); Aiwen Jiang (Jiangxi Normal University); Changhong Liu (Jiangxi Normal University); Jihua Ye (Jiangxi Normal University) |
|
RandommaskFormer: Light Weight Remote Sensing Scene Classification with Masked Transformer Xianbin Hu (Xidian University); Wei Wu (Xidian University); Zhu Li (University of Missouri, Kansas City) |
|
Local Feature-Emphasizing Transformer for Cloth-Changing Person Re-identification Jieqiong Zhou (Nanjing University of Information Science and Technology); Guoqing Zhang (Nanjing University of Information Science and Technology); Yuhui Zheng (Nanjing University of Information Science & Technology); Fuguo Zhang (Nanjing University of Information Science and Technology; Suzhou Kunke Intelligent Equipment co., ltd) |
|
Fast Online Adaptation of Visual SLAM via Variational Information Transfer and Preservation Sangni Xu (South China University of Technology); Hao Xiong (Macquarie University); Qiuxia Wu (South China University of Technology, China); zhihui wang (Dalian University of Technology); Shlomo Berkovsky (Macquarie University); Zhiyong Wang (The University of Sydney) |
|
MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining Fengqi Li (Dalian Jiaotong University); Mengchao Guo (Dalian Jiaotong University); Renxuan Xiong (Dalian Jiaotong University); Donglei Yang (Dalian Jiaotong University); Yan Zhang (Department of Informatics, University of Oslo); Yi Wang (Dalian University of Technology); Fengqiang Xu (Dalian Jiaotong University)
|
16:00-17:30 |
Oral Session 5 – Multimodal Analysis and Description |
16:00-16:15 |
Enhancing Modality Representation and Alignment for Multimodal Cold-start Active Learning Meng Shen (Nanyang Technological University); Yake Wei (Renmin University of China); Jianxiong (Terry) Yin (NVIDIA AI Tech Centre); Deepu Rajan (Nanyang Technological University); Di Hu (Renmin University of China); Simon See (NVIDIA AI Tech Centre) |
16:15-16:30 |
Active Object Segmentation: A New Modality for Egocentric Action Recognition Jian Ma (Tianjin University); Bin Zhu (Singapore Management University); Kun Li (Tianjin University); Dima Damen (University of Bristol) |
16:30-16:45 |
Incremental Few-Shot Object Detection by Leveraging External Information from Large Multimodal Models Guan-Yu Wu (National Cheng Kung University); Wei-Ta Chu (National Cheng Kung University) |
16:45-17:00 |
CISampler: Correlated Information Guided Frame Sampling for Gesture Recognition in Video Yuanyuan Shi (Xidian University); Yunan Li (Xidian University); Siyu Liang (Xidian University); Huizhou Chen (Xidian University); Qiguang Miao (Xidian University) |
17:00-17:15 |
Unified Multi-view Clustering based on Joint Multi-Structure Representation Learning Song Huang (Shenzhen University); Zeng Ziming (Shenzhen University); Min Li (Shenzhen University); Jianping Wang (Shenzhen University) |
17:15-17:30 |
DocPointer: A parameter-efficient Pointer Network for Key Information Extraction Haipeng Li (Shandong University of Science and Technology); Guangcun Wei (Shandong University of Science and Technology); Haochen Xu (Shandong University of Science and Technology); Boyan Guo (Shandong University of Science and Technology)
|
16:00-17:30 |
Oral Session 6 – Multimedia Applications I |
16:15-16:30 |
Joint Frame-Level and Block-Level Rate-Perception Optimized Preprocessing for Video Coding Huajie Tan (Peking University); Guoqing Xiang (Alibaba); Xiaodong Xie (Peking University); Huizhu Jia (Peking University) |
16:30-16:45 |
Robust Discriminative and Modal-Consistent Feature Learning for Fine-Grained Sketch-Based Image Retrieval Junchao Ge (Kunming University of Science and Technology); Huafeng Li (Kunming University of Science and Technology); Yafei Zhang (Kunming University of Science and Technology) |
16:45-17:00 |
LMHaze: Intensity-aware Image Dehazing with a Large-scale Multi-intensity Real Haze Dataset Ruikun Zhang (Beijing Institute of Technology); Hao Yang (Beijing Institute of Technology); Yan Yang (The Australian National University); Ying Fu (Beijing Institute of Technology); Liyuan Pan (The Australian National University) |
17:00-17:15 |
Variational Stochastic Multiple Auto-Encoder for Multimodal Recommendation Ying Qiao (Shandong University); Aoxuan Chen (Shandong University); Xiang Li (Shandong University); Jinfei Gao (Shandong University) |
17:15-17:30 |
Flexible Semantic Watermarking for Robust Diffusion Model Detection and Tracing Zhitong Zhou (Institute of Information Engineering, Chinese Academy of Sciences); JING YU (Institute of Information Engineering, Chinese Academy of Sciences); Keke Gai (Beijing Institute of Technology); Jiamin Zhuang (Institute of Information Engineering, Chinese Academy of Sciences); Gaopeng Gou (Institute of Information Engineering, CAS); Gang Xiong (Institute of Information Engineering, Chinese Academy of Sciences)
|
16:00-17:30 |
Multimedia and Vision – Video and Animation |
16:00-16:15 |
LoopAnimate: Loopable Salient Object Animation Fanyi Wang (Honor); Peng Liu (OPPO AI Center); Haotian Hu (Zhejiang Leapmotor Technology CO., LTD.); Dan Meng (OPPO Research Institute); Jingwen Su (OPPO AI Center); Jinjin Xu (OPPO research institute); Yanhao Zhang (OPPO AI Center); Xiaoming Ren (OPPO); Zhiwang Zhang (NingboTech University) |
16:15-16:30 |
SpikMamba: When SNN meets Mamba in Event-based Human Action Recognition Jiaqi Chen (Northeastern University); Yan Yang (The Australian National University); Shizhuo Deng (Northeastern University); Da Teng (Northeastern University); Liyuan Pan (The Australian National University) |
16:30-16:45 |
STODINE: Decompose video to Object-centric Spatial-Temporal Slots for physical reasoning Haoyuan Zhang (University of Chinese Academy of Sciences); Xiangyu Zhu (Chinese Academy of Sciences); Qu Tang (Institute of Automation, Chinese Academy of Sciences); Zhaoxiang Zhang (Chinese Academy of Sciences); Zhen Lei (Institute of Automation, Chinese Academy of Sciences) |
16:45-17:00 |
GGAvatar: Reconstructing Garment-Separated 3D Gaussian Splatting Avatars from Monocular Video Jingxuan Chen (Jinan University-University of Birmingham Joint Institute) |
17:00-17:15 |
Action Selection Learning for Multi-label Multi-view Action Recognition Trung Thanh NGUYEN (Nagoya University); Yasutomo Kawanishi (RIKEN); Takahiro Komamizu (Nagoya University); Ichiro Ide (Nagoya University) |
17:15-17:30 |
MSTMENet: Multi-Scale Spatio-Temporal Mapping and Evolution Network for Video Deraining Fengqi Li (Dalian Jiaotong University); Mengchao M Guo (Dalian Jiaotong University); Renxuan Xiong (Dalian Jiaotong University); Donglei Yang (Dalian Jiaotong University); Yan Zhang (Department of Informatics, University of Oslo); Yi Wang (Dalian University of Technology); Fengqiang Xu (Dalian Jiaotong University)
|
18:00-20:00 |
Welcome Reception Title to be confirmed |
Main Conference Day 2 – Thursday 5 December
8:30-9:00 |
Registration |
9:00-10:00 |
Keynote 2 End-to-End System and Networking Challenges of Multi-View Video Systems Professor Klara Nahrstedt (University of Illinois Urbana-Champaign, USA) |
10:00-10:30 |
Morning Tea |
10:30-12:00 |
Oral Session 8 – Multimedia and Vision – Segmentation and Classification |
10:30-10:45 |
CA-OVS: Cluster and Adapt Mask Proposals for Open-Vocabulary Semantic Segmentation Son Duy Dao (RMIT University); Hengcan Shi (Monash University); Dinh Q Phung (Monash University); Jianfei Cai (Monash University) |
10:45-11:00 |
Feature-weighted Multi-stage Bayesian Prototype for Few-shot Classification Xiaocong Zhou (Hohai University); Fan Liu (Hohai University); Chuanyi Zhang (Hohai University); Feifan Li (Hohai University); Wenwen Cai (Hohai University); Jun Zhou (Griffith University) |
11:00-11:15 |
Layout Relationship Decoupling Framework for Multi-target Domain Adaptative Semantic Segmentation Yuhang Zhang (Shenzhen University); Cuixin Yang (The Hong Kong Polytechnic University); Muxin Liao (Shenzhen University); Shishun Tian (Shenzhen University); Wenbin Zou (Shenzhen University); Chen Xu (Shenzhen University) |
11:15-11:30 |
FreqFormer: A Frequency Transformer for Semantic Segmentation of Remote Sensing Images Xin Li (Hohai University); Feng Xu (Hohai University; Jiangsu Ocean University); Yao Tong (Nanjing University of Chinese Medicine); Fan Liu (Hohai University); Yiwei Fang (Hohai University); Xin Lyu ( Hohai University); Jun Zhou (Griffith University) |
11:30-11:45 |
Ultrasound Video Segmentation of Pubic Symphysis and Fetal Head for Angle of Progression Measurement Shuangping Chen (Jinan University); Huijin Wang (Jinan University); Shun Long (Jinan University); Jieyun Bai (The Univeristy of Auckland); Jianmei Jiang (Jinan University) |
11:45-12:00 |
Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR Chengxi Lei (Massey University); Satwinder Dr Singh (University of Auckland); Feng Hou (Massey University); Ruili Wang (Massey University)
|
10:30:-12:00 |
Oral Session 9 – Social and Interactive Multimedia |
10:30-10:45 |
An Information Cascade Prediction Algorithm Based on Time Series Dongming Chen (Northeastern University); Mingshuo Nie (Northeastern University); Zhengping Sun (Northeastern University); Huilin Chen (The Australian National University); Dongqi Wang (Northeastern University) |
10:45-11:00 |
Watermarking Vision-Language Models Shan Wan (Shanghai Normal University); Wu Liu (University of Science and Technology); Yijun Liu (Chinese Academy of Sciences; University of Chinese Academy of Sciences); Feiniu Yuan (Shanghai Normal University); Chunli Meng (Shanghai Normal University) |
11:00-11:15 |
SLIC: Secure Learned Image Codec through Compressed Domain Watermarking to Defend Image Manipulation Chen-hsiu Huang (National Taiwan University); Ja-Ling Wu (National Taiwan University) |
11:15-11:30 |
FeedMatch: Evolves for Semi-Supervised Multimedia Classification from Student Feedback Junjiang Liu (Beijing University of Posts and Telecommunications); Dandan Sun (Beijing University of Posts and Telecommunications); Hailun Xia (Beijing University of Posts and Telecommunications); Jiangtao Bai (Beijing University of Posts and Telecommunications); Xinyue Fan (Beijing University of Posts and Telecommunications) |
11:30-11:45 |
Dual-Stream Keyframe Enhancement for Video Question Answering Zhenzhen Hu (Hefei University of Technology); Xin Guan (Hefei University of Technology); Jia Li (Hefei University of Technology); Zijie Song (Hefei University of Technology); Richang Hong (Hefei University of Technology) |
11:45-12:00 |
Dual-stream Multi-modal Interactive Vision-language Tracking Zhiyi Mo (Guangxi Normal University; Wuzhou University); Guangtong Zhang (Guangxi Normal University); Jian Nong (WuZhou University); Bineng Zhong (Guangxi Normal University); Zhi Li (Guangxi Normal University)
|
10:30:-12:00 |
Oral Session 10 – Multimodal Analysis and Description II |
10:30-10:45 |
Prompting Industrial Anomaly Segment with Large Vision-Language Models Jinheng Zhou (Shanghai Normal University); Wu Liu (University of Science and Technology); Guang Yang (Zhongguancun Laboratory); He Zhao (Micro-intelligent); Feiniu Yuan (Shanghai Normal University) |
10:45-11:00 |
Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation Zhiyuan Li (University of Sydney); Dongnan Liu (University of Sydney); Heng Wang (The University of Sydney); Chaoyi Zhang (University of Sydney); Weidong Cai (University of Sydney) |
11:00-11:15 |
Structured Bipartite Graph Ensemble Clustering Chen Wang (Massey University) |
11:15-11:30 |
Language-Guided Self-Supervised Video Summarization Using Text Semantic Matching Considering the Diversity of the Video Tomoya Sugihara (The University of Tokyo); Shuntaro Masuda (The University of Tokyo); Ling Xiao (The University of Tokyo); Toshihiko Yamasaki (The University of Tokyo) |
11:30-11:45 |
MAFS: Modality-Aware Federated Semi-Supervised Learning with Selective Data Sharing Specified by Individual Clients Yi-Chen Li (National Tsing Hua University); Chih-Fan Hsu (Inventec Corp.); JianKai Wang (Qualcomm); Chung-Chi Tsai (Qualcomm Technology); Cheng-Hsin Hsu (National Tsing Hua University) |
11:45-12:00 |
Prompt-based Continual Learning for Extending Pretrained CLIP Models' Knowledge Li Jiao (Communication University of China); Tian Wang (Communication University of China); Lihong Cao (Communication University of China)
|
11:45-13:00 |
Lunch |
13:00-14:30 |
Oral Session 11 – Multimedia HCI and Quality of Experience |
13:00-13:15 |
A Unified Editing Method for Co-Speech Gesture Generation via Diffusion Inversion Zeyu Zhao (Institute of Automation, Chinese Academy of Sciences); Nan Gao (CASIA); Zhi Zeng (Institute of Automation, Chinese Academy of Sciences); GuiXuan Zhang (CASIA); Jie Liu (CASIA); ShuWu Zhang (CASIA) |
13:15-13:30 |
MS-GeodesicPSIM: Predicting the Quality of Static Mesh with Texture Map via multi-scale Geodesic Patch Similarity Bingyang Cui (Cooperative Medianet Innovation Center, Shanghai Jiaotong University); Yujie Zhang (Shanghai Jiao Tong University); Qi Yang (Tencent); Yiling Xu (Shanghai Jiao Tong University) |
13:30-13:45 |
A Benchmark for Gaussian Splatting Compression and Quality Assessment Study Qi Yang (Tencent); Kaifa Yang (Shanghai Jiao Tong University); Yuke Xing (Shanghai Jiao Tong University); Yiling Xu (Shanghai Jiao Tong University); Zhu Li (University of Missouri-Kansas City) |
13:45-14:00 |
LMoW: A Latent Random Variable Model for Unconditional Human Motion Generation Faisal Ahmed (University of Alberta); Justin Rozeboom (University of Alberta); Hanran Song (University of Alberta); Chenqiu Zhao (University of Alberta); Anup Basu (University of Alberta) |
14:00-14:15 |
BCS-NeRF: Bundle Cross-Sensing Neural Radiance Fields Mingwei Cao (Anhui University); Fengna Wang (Anhui University); Dengdi Sun (Anhui University); Haifeng Zhao (Anhui University) |
14:15-14:30 |
Hierarchical Part-Attention Networks for 3D Human Reconstruction Jinwei Li (University of Chinese Academy of Sciences); Yongkang Cheng (Northwest A&F University); Yonghe Zhang (Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences; Innovation Academy for Microsatellites of Chinese Academy of Sciences); Pengcheng Wang (Innovation Academy for Microsatellites of Chinese Academy of Sciences)
|
13:00-14:30 |
Oral Session 12 – Emotional and Social Signals in Multimedia |
13:00-13:15 |
Emotion-Aware and Efficient Meme Sticker Dialogue Generation Zhaojun Guo (Fudan University); Junqiang Huang (Fudan University); Guobiao Li (Fudan University); Wanli Peng (China Agricultural University); Xinpeng Zhang (Fudan University); Zhenxing Qian (Fudan University); Sheng Li (Fudan University) |
13:15-13:30 |
MicroMamba: State Space Model with Partitioned Window Scan for Micro-Expression Recognition Tianchen Zhou (Southeast University); Jiateng Liu (Southeast University); Yue JIN (Southeast University); Li Yao (Southeast University) |
13:30-13:45 |
MFNet: Mixed Feature Network for Enhancing Facial Emotion Recognition on the Small-Scale Dataset Huilin Chen (The Australian National University) |
13:45-14:00 |
Adaptive Both homo- and hetero-Feature Integration for Multimodal Emotion Recognition Ze Kun Wang (Tianjin University of Science and Technology); Zhan Jun Si (Tianjin University of Science and Technology) |
14:00-14:15 |
HuBERT-CLAP: Contrastive Learning-Based Multimodal Emotion Recognition Using Self-Alignment Approach Long Hoang Nguyen (Fruit); Nhat Truong Pham (Sungkyunkwan University); Mustaqeem Khan (Mohamed Bin Zayed University of Artificial Intelligence); Alice Ahlem OTHMANI (UPEC); Abdulmotaleb EI Saddik (University of Ottawa) |
14:15-14:30 |
Advancing Music Emotion Recognition: A Transformer Encoder-Based Approach Yangyuan Chen (Wenzhou University of Technology); Zhizhong Ma (Wenzhou University of Technology); Mingjing Wang (Wenzhou University of Technology); Mingzhe LIU (Wenzhou University of Technology)
|
13:00-14:30 |
Oral Session 13 – Deep Learning for Multimedia II |
13:00-13:15 |
Policy-driven Auto-Augmentation with Distillment Rewards for Scene Text Recognition Pu Li (San Diego State University); Yibiao Zhao (iSee); Xiaobai Liu (San Diego State University) |
13:15-13:30 |
T2QRM: Text-Driven Quadruped Robot Motion Generation Minghui Wang (South China University of Technology); Zixu Wang (South China University of Technology); Hongbin Xu (South China University of Technology); Kun Hu (The University of Sydney); Zhiyong Wang (The University of Sydney); Wenxiong Kang (South China University of Technology) |
13:30-13:45 |
FATO: Frequency Attention Transformer for Omnidirectional Image Super-Resolution Hongyu An (University of Chinese Academy of Sciences); Xinfeng Zhang (University of Chinese Academy of Sciences); Shijie Zhao (Bytedance Inc.); Li Zhang (Bytedance Inc.) |
13:45-14:00 |
IdentityKD: Identity-wise Cross-modal Knowledge Distillation for Person Recognition via mmWave Radar Sensors Liqun Shan (University of Louisiana at Lafayette); Rujun Zhang (North East Petroleum University); Sai Venkatesh Chilukoti (University of Louisiana at Lafayette); Xingli Zhang (University of Louisiana); Insup Lee (University of Pennsylvania); Xiali Hei (University of Louisiana at Lafayette) |
14:00-14:15 |
Mix-fine-tune: An Alternate Fine-tuning Strategy for Domain Adaptation and Generalization of Low-resource ASR Chengxi Lei (Massey University); Satwinder Dr Singh (University of Auckland); Feng Hou (Massey University); Ruili Wang (Massey University) |
14:15-14:30 |
Multi-stage Image Deraining based on Pre-trained Diffusion Model Xiong Zeng (Wuhan University of Science and Technology); Jiang Min (Wuhan University of Science and Technology); Ronghua Huang (Wuhan University of Science and Technology)
|
14:30-15:15 |
Afternoon Tea |
15:15-16:00 |
Poster Session 2 |
|
DiffuseST: Unleashing the Capability of the Diffusion Model for Style Transfer Ying Hu (Nanjing University of Aeronautics and Astronautics); Chenyi Zhuang (Nanjing University of Aeronautics and Astronautics); Pan Gao (Nanjing University of Aeronautics and Astronautics) |
|
Investigating Effective Speaker Property Privacy Protection in Federated Learning for Speech Emotion Recognition Chao TAN (MonotaRO); Sheng Li (National Institute of Information & Communications Technology (NICT)); Yang Cao (Tokyo Institute of Technology); Zhao Ren (University of Bremen); Tanja Schultz (University of Bremen) |
|
A method for detecting hands off the steering wheel Yujia Xu (Hubei University of Education); Deyu Pan (Beijing Institute of Graphic Communication); Ling Ding (Hubei University of Education) |
|
A Multi-angle Text Recognition Algorithm Jie Wang (Northeastern University); Huilin Chen (The Australian National University); Wandong Xue (Northeastern University); Dongming Chen (Northeastern University); Dongqi Wang (Northeastern University) |
|
Development of a Chinese Synonym Library: Enhancing Clinical Terminology Standardization and Interoperability Yani Chen (Dalian Maritime University); Kaiyu Nie (Department of Burns and Reconstructive Surgery, Affiliated Hospital of Zunyi Medical University); Xiaoxia Nie (College of Foreign languages, Guizhou University ); Ruili wang (Honeycomb Telemedicine Technology) |
|
Fine-grained Video Semantic Distillation for Video-Text Retrieval Zuyi Pei (Dalian University of Technology); Baoli Sun (Dalian University of Technology); zhihui wang (Dalian University of Technology); Haojie Li (Shandong University of Science and Technology) |
|
SS-FS CSA: Self-Supervised and Fully Supervised Integration for 3D Cerebrovascular Segmentation Chenxi Niu (University of Nottingham Ningbo China); Ziyu Liu (University of Nottingham, Ningbo, China); Xiangjian He (University of Nottingham Ningbo) |
|
PCMark-NAS: Lightweight Print-Camera Resilient Watermarking Networks via Neural Architecture Search Daidou Guo (University of Shanghai for Science and Technology); Chuan Qin (University of Shanghai for Science and Technology) |
|
CSCCap: Plugging Sparse Coding in Zero-Shot Image Captioning Yu Song (Henan University); Xiaohui Yang (Henan University); Rongping Huang (Southern University of Science and Technology); BAI HAIFENG (DEEPEXI); Lili Yang (Southern University of Science and Technology) |
|
Description-Driven Audiovisual Embedding Space Learning for Enhanced Movie Understanding Wei-Lun Huang (National Taiwan University of Science and Technology); Shao-Hung Wu (National Tsing Hua University); Hung-Chang Huang (KKCompany Technologies); Min-Chun Hu (National Tsing Hua University); Tse-Yu Pan (National Taiwan University of Science and Technology) |
|
MBC-ATA: Maximum Binary Classification and Anchor-based Triplet Augmentation for Unbiased Scene Graph Generation Hao Zhang (Shandong University); Xingning Dong (Ant Group); Jinfei Gao (Shandong University); Liang Hao (HBIS DIGITAL TECH Co.,Ltd.); Pei Shen (HBIS DIGITAL TECH Co.,Ltd.); Tian Gan (Shandong University)
|
16:00-17:30 |
Oral Session 14 – Deep Learning for Multimedia III |
16:00-16:15
|
Point-Supervised Temporal Action Detection with Label Supplementation Based on Transformer Cui Xu (University of Chinese Academy of Sciences); Laiyun Qing (University of Chinese Academy of Sciences) |
16:15-16:30
|
A Unified Contrastive Framework with Multi-Granularity Fusion for Text-to-Image Generation Yachao He (Shandong Normal University); Li Liu (Shandong Normal University); Huaxiang Zhang (Shandong Normal University); Dongmei Liu (Shandong Normal University); Hongzhen Li (Shandong Normal University) |
16:30-16:45
|
Local Feature-Emphasizing Transformer for Cloth-Changing Person Re-identification Jieqiong Zhou (Nanjing University of Information Science and Technology); Guoqing Zhang (Nanjing University of Information Science and Technology); Yuhui Zheng (Nanjing University of Information Science & Technology); Fuguo Zhang (Nanjing University of Information Science and Technology; Suzhou Kunke Intelligent Equipment co., ltd) |
16:45-17:00
|
HSMnet: Hybrid Sampling and Matching Network for DETR-based Person Search Zhengjie Lu (Hebei University); Jinjia Peng (Hebei University); Huibing Wang (Dalian Maritime University); Qingxuan Shi (Hebei University); Bin Wang (Hebei University) |
17:00-17:15 |
Dlpp-Net: Degradation Location Prior Prediction Network for Image Restoration Yongjian Liu (Wuhan University of Technology); Shunwei Zhang (Wuhan University of Technology); Jinyu Xu (Wuhan University of Technology); Jiachen Li (Wuhan University of Technology); Yanchun Ma (Wuhan Vocational College of Software and Engineering); Qing Xie (Wuhan University of Technology) |
17:15-17:30 |
A Robust Few-shot Learning Framework via Dual-branch Adversarial Noise Pretraining Jiale Wang (Hefei University of Technology); Xueliang Liu (Hefei University of Technology); Yuling Su (Hefei University of Technology)
|
16:00-17:30 |
Oral Session 15 – Multimedia and Vision – 3D and Point Cloud |
16:00-16:15
|
RoboFormer: A Robust Multi-Modal Transformer for 3D Object Detection in Autonomous Driving Yuang Liu (Beijing University of Posts and Telecommunications); Dacheng Liao (Beijing University of Posts and Telecommunications); Mengshi Qi (Beijing University of Posts and Telecommunications); Liang Liu (Beijing University of Posts and Telecommunications ); Huadong Ma (Beijing University of Posts and Telecommunications) |
16:15-16:30
|
MRGait: A Multi-range feature learning framework for Cross-View Gait Recognition Muhammad Saad Shakeel (South China University of Technology); Kun Liu (South China University of Technology); Xiaochuan Liao (South China University of Technology); Wenxiong Kang (South China University of Technology) |
16:30-16:45
|
Point Cloud Normal Estimation via Representation Learning on Height Maps Yang Yi (Deakin University); Dasith T de Silva Edirimuni (The University of Western Australia); Ye Zhu (Deakin University); Shang Gao (Deakin University); Zhiyong Wang (The University of Sydney); Antonio Robles-Kelly (Deakin University); Xuequan Lu (La Trobe University) |
16:45-17:00
|
An Efficient Multi-prior Hybrid Approach for Consistent 3D Generation from Single Images Yichen Ouyang (Zhejiang University); Jiayi Ye (Zhejiang University); Wenhao Chai (University of Washington); Dapeng Tao (Yunnan University); Yibing Zhan (JD Explore Academy); Gaoang Wang (Zhejiang University) |
17:00-17:15 |
Multi-Frame Sparse Convolutional Learning for Point Cloud Color Denoising Tailin Yang (Xidian University); Wei Wu (Xidian University); Zhu Li (University of Missouri, Kansas City); Rui Zhou (XDU) |
17:15-17:30 |
Multi-Modality Semantic-Shared Cross-View Ground-to-Aerial Localization Kai Zhang (Nanjing University of Science and Technology); Xia Yuan (Nanjing University of Science and Technology); ShunTong Chen (Nanjing University of Science and Technology); Di Hu (Nanjing University of Science and Technology); Chunxia Zhao (Nanjing University of Science and Technology)
|
16:00-17:30 |
Oral Session 16 – Multimedia applications II |
16:00-16:15
|
MambaVesselNet: A Hybrid CNN-Mamba Architecture for 3D Cerebrovascular Segmentation Yanming Chen (University of Nottingham Ningbo); Xiangjian He (University of Nottingham Ningbo) |
16:15-16:30
|
MoE-Polyp: Shifting More Attention to Small Polyp Segmentation via Mixture-of-Experts Zihuang Wu (Jiangxi Normal University); Xinyu Xiong (Hangzhou Hikvision Digital Technology Co Ltd); Ying Chen (Pazhou Lab); Siying Li (Shenzhen University); Hua Chen (Jiangxi Normal University) |
16:30-16:45
|
MFTAnet: Two-step Aggregation Net of Multiscale Features for Pneumoconiosis Screening Wei Qingjin (Sichuan Normal University); Xiaozhuo Li (Sichuan Normal University); Liu Dinglu (Sichuan Normal University); Zhiwu Liao (Sichuan Normal University) |
16:45-17:00
|
FA-UNext: A Feedback Attention-based MLP Network for Medical Image Segmentation Qianyu Li (Dalian University of Technology); Bingcai Chen (Dalian University of Technology); Jax Tian (Dalian University of Technology); RuoLan Liu (Dalian University of Technology) |
17:00-17:15 |
CoolColor: Text-guided COherent OLd film COLORization Zichuan Huang (Peking University); Yifan Li (Peking University); Shuai Yang (Peking University); Jiaying Liu (Peking University) |
17:15-17:30 |
Multimodal Sign Language Knowledge Graph and Representation: Text, Video KeyFrames, and Motion Trajectories Ziqiang Liu (Beijing Institute of Petrochemical Technology); Gongwei Fang (Beijing Institute of Petrochemical Technology); Wentong Wang (Beijing Institute of Petrochemical Technology); Qiang Liu (Academy of Artificial Intelligence, Beijing Institute of Petrochemical Technology) |
18:00-20:00 |
Conference Banquet (to be confirmed)
|
Main Conference Day 3 – Friday 6 December
8:30-9:00 |
Registration |
9:00-10:00 |
Keynote 3 EasyFL: Optimising Federated Learning for Computer Vision Applications Yonggang Wen (Nanyang Technological University, Singapore) |
10:00-10:30 |
Morning Tea |
10:30-12:00 |
Oral Session 17 – Music and Audio Processing in Multimedia |
10:30-10:45 |
DCEPNet: Dual-Channel Emotional Perception Network for Speech Emotion Recognition Fei Xiang (Dalian Maritime University); Ruili Wang (Dalian Maritime University); Junjie Hou (Dalian Maritime University); Xingang Wang (Dalian Maritime University) |
10:45-11:00
|
StyleSpeech: Parameter-efficient Fine Tuning for Pre-trained Controllable Text-to-Speech Haowei Lou (University of New South Wales); Hye-Young Paik (University of New South Wales); Wen Hu (University of New South Wales); Lina Yao (University of New South Wales) |
11:00-11:15
|
Bivariate Mixup for 2D Contact Point Localization with Piezoelectric Microphone Array Shogo Yonezawa (Tokyo University of Science); Yukinobu Taniguchi (Tokyo University of Science); Go Irie (Tokyo University of Science) |
11:15-11:30
|
Multi-domain Acoustic Feature Fusion for Speaker Recognition Shanshan Yao (Shanxi University); Tian Li (Shanxi University) |
11:30-11:45 |
The quantification of emotional expressions and perceptions of vocal vibrato in basic emotion: commercial operatic singing recordings Liu JieYing (Tokyo University of Arts) |
11:45-12:00 |
Pitch-aware generative pretraining improves multi-pitch estimation with scarce data Mary Pilataki (Queen Mary University of London); Matthias Mauch (Apple); Simon Dixon (Queen Mary University of London)
|
10:30-12:00 |
Oral Session 18 – Short Papers |
10:30-10:45 |
Sketch-based 3D Model Retrieval with Cross-Modal Representation Hairui Yang (Dalian University of Technology); Ning Wang (Dalian University of Technology); Zhihui Wang (Dalian University of Technology); Lei Wang (Dalian University of Technology)
|
10:45-11:00
|
Highly Fault-Tolerant Discrete Lattice Information Coding Method for Screen-Shooting Scenarios Daidou Guo (University of Shanghai for Science and Technology); Ching-Chun Chang (National Institute of Informatics); Cheng SenMao (University of Shanghai for Science and Technology); Chuan Qin (University of Shanghai for Science and Technology) |
11:00-11:15
|
Fibre Population-guided Pre-training for 3D Spatial Super-Resolution on Multimodal Brain Diffusion MR Imaging Zihao Tang (University of Sydney); Xinyi Wang (University of Sydney); Mariano Cabezas (The University of Sydney); Arkiev D'Souza (The University of Sydney); Michael Barnett (Sydney Neuroimaging Analysis Centre); Fernando Calamante (The University of Sydney); Weidong Cai (University of Sydney); Chenyu Wang (University of Sydney, Sydney Neuroimaging Analysis Centre) |
11:15-11:30
|
Snap and Diagnose: An Advanced Multimodal Retrieval System for Identifying Plant Diseases in the Wild Tianqi Wei (The University of Queensland); Zhi Chen (The University of Queensland); Xin Yu (University of Queensland) |
11:30-11:45 |
Emotionally Guided Symbolic Music Generation Using Diffusion Models: The AGE-DM Approach Mingzhe Zhang (University of Queensland); Laura Ferris (University of Queensland); Lin Yue (University of Adelaide); Miao Xu (University of Queensland) |
11:45-12:00 |
SS-FS CSA: Self-Supervised and Fully Supervised Integration for 3D Cerebrovascular Segmentation Chenxi Niu (University of Nottingham Ningbo); Ziyu Liu (University of Nottingham Ningbo); Xiangjian He (University of Nottingham Ningbo)
|
10:30-12:00 |
Oral Session 19 – Multimedia and Vision - Detection |
10:30-10:45 |
Learning Gaussian Data Augmentation in Feature Space for One-shot Object Detection in Manga Takara Taniguchi (The University of Tokyo); Ryosuke Furuta (The University of Tokyo) |
10:45-11:00
|
Collaborative Feature-Logits Contrastive Learning for Open-Set Semi-Supervised Object Detection Xinhao Zhong (Institute of Information Science, Beijing Jiaotong University); Siyu Jiao (Beijing Jiaotong University); Yao Zhao (Beijing Jiaotong University); Yunchao Wei (UTS) |
11:00-11:15
|
KBY-Net: A Dual Learning Framework for Improving Object Detection in Rainy Weather Conditions Zheng-Xian Keh (Multimedia University); Lai-Kuan Wong (Multimedia University); Yuen Peng Loh (Multimedia University); Ke Gu (Beijing University of Technology); Weisi Lin (Nanyang Technological University, Singapore) |
11:15-11:30
|
Fire and Smoke Detection with Burning Intensity Representation Xiaoyi Han (Zhejiang University); Yanfei Wu (China Mobile (Suzhou) Software Technology Co., Ltd.); Nan Pu (University of Trento); Zunlei Feng (Zhejiang University); Qifei Zhang (Zhejiang University); Yijun Bei (Zhejiang University); Lechao Cheng (Hefei University of Technology) |
11:30-11:45 |
Transition in Focus of Prediction Tasks for Skeleton Graph Component Detection with Transformer Zhiyuan Wang (Soochow University); Cong Yang (Soochow University); Yulu Zhang (Chery); Zeyd Boukhers (Fraunhofer Institute for Applied Information Technology FIT); Wei Sui (Horizon Robotics); Yi Ji (Soochow University); Chunping Liu (Soochow University) |
11:45-12:00 |
CSUNet: Contour-Sensitive Underwater Salient Object Detection Wei Yu (Harbin Boiler Company Limited); Yi Wang (Dalian University of Technology); Shijun Yan (Dalian University of Technology); Tianzhu Wang (freelance researcher); Zhihan Wang (Massey University); Weirong Sun (Massey University); Yu Zhao (Shanghai Zhudian Semiconductor Co., Ltd); Xinwei Xue (Dalian University of Technology)
|
Workshop of Multimodal, Multilingual and Multitask Modeling Technologies for Oriental Languages
13:45 |
Begin speech |
13:45 - 14:15 |
Invited Talk: Conversational AI Datasets for Large-Scale Model Training Speaker: Qingqing Zhang, MagicData Inc. |
14:15 - 14:45 |
Invited Talk: Beyond English-Centric LLMs: What Language Do Multilingual Language Models Think in? Speaker: Qianying Liu, National Institute of Informatics, Tokyo, Japan |
14:45 - 15:00 |
Invited Talk: Towards Advanced End-to-End Myanmar Speech Recognition and Synthesis Speaker: Win Pa Pa, University of Computer Studies, Yangon, Myanmar |
15:00 - 15:15 |
Selected Workshop Paper: How to Design Translation Prompts for ChatGPT: An Empirical Study Speaker: Yuan Gao, Massey University, New Zealand |
15:15 - 15:30 |
Tea Break |
15:30 - 16:00 |
Invited Talk: Transferring the Power of Audio from Languages to Healthcare: Insights and Perspectives Speaker: Qian Kun, Beijing Institute of Technology, China |
16:00 - 16:30 |
Invited Talk: Unified Multimodal Understanding and Generation, a New Perspective Speaker: Zuchao Li, Wuhan University, China |
16:30 - 16:45 |
Selected Workshop Paper: Disentangling Singlish Discourse Particles with Task-Driven Representation Speaker: Linus Tze En Foo, University of Edinburgh, United Kingdom |
16:45 - 17:00 |
Invited Talk: AI4Bharat Speaker: Raj Dabre, National Institute of Information and Communications Technology, Kyoto, Japan |
17:00 |
Closing Speech: Organizers |
Workshop on SpandLDeteriorate
13:45 |
Opening |
14:00 - 14:45 |
Keynote 1 |
14:45 - 15:00 |
Paper Presentation: Reference-free automatic speech severity evaluation using acoustic unit language modelling Speaker: Bence M Halpern |
15:00 - 15:15 |
Paper Presentation: Free-FreeSLT: A Gloss-Free, Parameter-Free model for Sign Language Translation Speaker: Weirong Sun |
15:15 - 15:30 |
Break and Afternoon Tea |
15:30 - 16:15 |
Keynote 2 |
16:15 - 16:30 |
Paper Presentation: Swin-BERT: A Feature Fusion System designed for Speech-based Alzheimer's Dementia Detection Speaker: Yilin Pan |
16:30 - 16:45 |
Close |
Collaboration and Evolution of Foundation and Specialized Models Workshop
9:00 – 9:15 |
Opening Remarks |
9:15 - 9:30 |
Paper Presentation:An Adaptive Aggregation Method for Federated Learning via Meta Controller Speaker: Tao Shen |
9:30 - 9:45 |
Paper Presentation:DHelper: A Collaborative Toolkit for Manuscript Restoration Speaker: Yue Han/Yuqing Zhang |
9:45 - 10:00 |
Paper Presentation:Distributed Optimization over Block-Cyclic Data Speaker: Yucheng Ding/Chaoyue Niu |
10:00 - 10:30 |
Invited Talk:Towards Industrial Large Models and Digital Twins Speaker: Jiehan Zhou |
10:30 - 10:45 |
Coffee Break |
10:45 - 11:15 |
Invited Talk:Beyond Language: Revisiting ASR for Future Challenges Speaker: Sheng Li |
11:15 - 11:45 |
Invited Talk (Online):Fine-grained Action Analysis for Human Behavior Understanding Speaker: Jinglin Xu |
11:45 - 12:15 |
Invited Talk (Online):Heterogeneity-aware Personalized Federated Learning via Adaptive Dual-Agent Reinforcement Learnin Speaker: Ting Wang |
12:15 |
Close |
MFM-RsAg
9:00 – 9:05 |
Welcome |
9:05 – 9:50 |
Keynote Speech: Domain specific foundation models in Agriculture Speaker: Xin Yu |
9:50 – 10:20 |
Paper Presentation: Remote-sensing Foundation Model for Agriculture: A Survey Speaker: Yu Luo |
10:20 – 10:45 |
Morning Tea/Coffee Break |
10:45 – 11:15 |
Paper Presentation: Agricultural Detection Using Spectral Adaptive Imaging Model Speaker: Yuning Wang |
11:15 – 11:45 |
Paper Presentation: An Encoder–Decoder Framework for Foundation Model-based Remote Sensing Semantic Segmentation Speaker: Jiale Song |
11:45 – 12:15 |
Paper Presentation: Improved downscaled inversion of soil moisture based on the BGKS methodology Speaker: Wei Hu |
12:15 |
Close |