We are delighted to announce that our paper has been officially accepted by the ACM International Conference on Multimedia (ACMMM 2025) and selected for Oral Presentation! Highlights of Review Results ...
Abstract: Prompt tuning is a valuable technique for adapting visual language models (VLMs) to different downstream tasks, such as domain generalization and learning from a few examples. Previous ...
International Publicity contains a variety of modal symbols including text, pictures and sound, and their meanings are expressive. It is conducive for the Communist Party of China to use international ...
Abstract: Learning joint and coordinated features across modalities is essential for many audio-visual tasks. Existing pre-training methods primarily focus on global information, neglecting ...
Multimodal semantic segmentation shows significant potential for enhancing segmentation accuracy in complex scenes. However, current methods often incorporate specialized feature fusion modules ...
In the rapidly evolving landscape of AI-driven creativity, few projects have captured the imagination of Instagram like The Visual Dome, a sprawling and intricate digital world conceived by Tony ...
Large language models (LLMs) have transformed natural language processing (NLP) by demonstrating the effectiveness of increasing the number of parameters and training data for various reasoning tasks.