Category: Reinforcement Learning

GRPO

An optimization technique used in reinforcement learning to improve policy performance based on feedback from the environment.

Reward Profiling

A method for selectively updating the policy based on high-confidence performance estimations to improve the stability and convergence of reinforcement learning algorithms.

Policy Gradient

A class of algorithms in reinforcement learning that optimize the policy directly by adjusting the parameters in the direction of the gradient of expected reward.