Joint-GRPO
A method that orchestrates the collaboration between a Vision-Language Model and a Video Diffusion Model to optimize their outputs based on a shared reward.
A method that orchestrates the collaboration between a Vision-Language Model and a Video Diffusion Model to optimize their outputs based on a shared reward.
The phenomenon where models generate outputs that are not grounded in the input data or reality.
A measure of how effectively a model utilizes tokens (or inputs) to produce outputs, impacting computational resource usage.
A task that involves predicting the next event in a video given a procedural or predictive question, requiring dynamic video responses.
A cognitive strategy that allows a model to switch between quick, heuristic-based decision-making and slower, analytical reasoning.
A dataset designed for testing human reconstruction algorithms.
A large-scale dataset for human avatar reconstruction tasks.
A dataset used for benchmarking human avatar reconstruction methods.
The accurate and true poses of a subject, used as a reference in various computer vision tasks.
A method of reconstructing models using a limited number of input images.