Chain-of-Frames reasoning
A reasoning approach that involves analyzing a sequence of frames in a video to derive conclusions.
A reasoning approach that involves analyzing a sequence of frames in a video to derive conclusions.
The process of identifying patterns in data to make predictions or draw conclusions.
The study of how physical systems evolve over time, including motion and interaction of objects.
The ability to understand and remember the spatial relationships between objects.
The ability of a model to make predictions or inferences without having seen any examples of the task during training.
A reasoning approach that involves breaking down complex problems into manageable parts to find solutions.
A dedicated dataset created for training and evaluating models on the Video-Next-Event Prediction task.
A benchmark suite for assessing video reasoning abilities in generative video models.
A generative model that creates videos by diffusing information over time, often conditioned on textual or visual inputs.
A method that orchestrates the collaboration between a Vision-Language Model and a Video Diffusion Model to optimize their outputs based on a shared reward.