Category: Concepts

Mamba-Attention

A hybrid attention mechanism that enhances the efficiency and performance of large language models.

Pruning

A technique used to reduce the size of a neural network by removing weights or neurons that contribute little to the model’s performance.

Nemotron Elastic

A framework for building reasoning-oriented large language models that incorporates multiple nested submodels optimized for different deployment configurations.

Kesten-Stigum Threshold

A critical threshold in community detection problems that separates the regimes where community structure can be reliably detected from where it cannot.