Tag: Technique

Reward Profiling

A method for selectively updating the policy based on high-confidence performance estimations to improve the stability and convergence of reinforcement learning algorithms.

NoSense

A baseline model that discards temporal structure and utilizes a bag-of-words approach with SigLIP for video analysis.

SigLIP

A model that uses a bag-of-words approach for processing video data, focusing on significant visual features.

Policy Gradient

A class of algorithms in reinforcement learning that optimize the policy directly by adjusting the parameters in the direction of the gradient of expected reward.