Vision-Language Model
A model that integrates visual and textual information to understand and generate content based on multimodal inputs.
A model that integrates visual and textual information to understand and generate content based on multimodal inputs.
A popular large multimodal model used as the backbone for the EvoLMM framework.
EvoLMM is a self-evolving framework for training large multimodal models using continuous self-rewarding processes.
Statistical language models are probabilistic models that predict the likelihood of sequences of words in a language.
A hybrid model architecture that combines the efficiency of state-space models with the expressivity of attention mechanisms.
SurvAgent is a hierarchical multi-agent system designed for multimodal survival prediction in oncology.
Codec2Vec is a speech representation learning framework that uses discrete audio codec units for feature extraction.
A hybrid attention mechanism that enhances the efficiency and performance of large language models.
A framework for building reasoning-oriented large language models that incorporates multiple nested submodels optimized for different deployment configurations.
AINA is a framework designed to learn robot manipulation policies from human demonstrations in natural environments.