Pose Estimation
The process of determining the position and orientation of a person’s body parts in an image.
The process of determining the position and orientation of a person’s body parts in an image.
A technique for reconstructing 3D avatars from images or video.
A method for reconstructing 3D human avatars from images without the need for human pose inputs.
A benchmark dataset used to assess the mathematical reasoning abilities of models in a multimodal context.
A benchmark dataset for evaluating multimodal reasoning capabilities, particularly in interpreting charts.
A popular large multimodal model used as the backbone for the EvoLMM framework.
Models that can process and integrate information from multiple modalities, such as text and images.
The Proposer is an agent in the EvoLMM framework that generates diverse, image-grounded questions.
OpenCyc is an open-source version of the Cyc knowledge base and inference engine, designed to provide a rich representation of common knowledge.
EvoLMM is a self-evolving framework for training large multimodal models using continuous self-rewarding processes.