Policies and Value Functions
A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. During training, the agent tunes the parameters of its policy approximator to maximize the long-term reward.
Reinforcement Learning Toolbox™ software provides approximator objects for actor and critic. The actor implements the policy that selects the best action to take. The critic implements the value (or Q-value) function that estimates the value (the cumulative long-term reward) of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, seeCreate Policies and Value Functions.
Blocks
Policy | Reinforcement learning policy |
Functions
Topics
- Create Policies and Value Functions
Specify policies and value functions using function approximators, such as deep neural networks.
- Import Neural Network Models
You can import existing policies from other deep learning frameworks using the ONNX™ model format.