Sparse Input Reconstruction
A method of reconstructing models using a limited number of input images.
A method of reconstructing models using a limited number of input images.
A model that integrates visual and textual information to understand and generate content based on multimodal inputs.
The process of determining the position and orientation of a person’s body parts in an image.
A technique for reconstructing 3D avatars from images or video.
A benchmark dataset designed to test the visual reasoning capabilities of models in mathematical contexts.
A method for reconstructing 3D human avatars from images without the need for human pose inputs.
A benchmark dataset used to assess the mathematical reasoning abilities of models in a multimodal context.
A popular large multimodal model used as the backbone for the EvoLMM framework.
A benchmark dataset for evaluating multimodal reasoning capabilities, particularly in interpreting charts.
A learning mechanism where agents receive feedback based on their performance, allowing them to improve autonomously.