Joint-GRPO
A method that orchestrates the collaboration between a Vision-Language Model and a Video Diffusion Model to optimize their outputs based on a shared reward.
A method that orchestrates the collaboration between a Vision-Language Model and a Video Diffusion Model to optimize their outputs based on a shared reward.
An optimization technique used in reinforcement learning to improve policy performance based on feedback from the environment.
A method of reconstructing models using a limited number of input images.
A method for reconstructing 3D human avatars from images without the need for human pose inputs.
Symbolic natural language understanding involves the use of symbolic representations and rules to interpret and understand human language.
Benchmarking is the process of comparing performance metrics of systems or models against a standard or set of criteria.
A token information transfer module that compresses vision tokens into instruction tokens while preserving multimodal understanding.
This technique involves identifying and mining patches of data across different modalities based on their similarities.
Confidence-Aware Patch Mining focuses on evaluating the reliability of mined patches to improve the quality of data used in analysis.
Low-Magnification Screening is a method for analyzing pathology images at lower magnifications to identify relevant features.