MathVision
A benchmark dataset designed to test the visual reasoning capabilities of models in mathematical contexts.
A benchmark dataset designed to test the visual reasoning capabilities of models in mathematical contexts.
A benchmark dataset used to assess the mathematical reasoning abilities of models in a multimodal context.
A benchmark dataset for evaluating multimodal reasoning capabilities, particularly in interpreting charts.
A GUI agent is an intelligent software entity designed to interact with graphical user interfaces, performing tasks typically executed by human users.
The task of analyzing and interpreting long-duration video content for various applications, such as summarization or event detection.
The SUPERB benchmark is a suite of tasks designed to evaluate the performance of speech processing models across various applications.
A task in which a system retrieves or generates answers to questions posed in natural language.
A model architecture that allows for multiple reasoning capabilities to be embedded within a single model, optimizing deployment memory.
A type of reinforcement learning task where the agent must make decisions in a continuous action space, often used in robotics and simulation environments.
A benchmark that assesses the ability of models to count unique objects in video sequences.