Main Content

Policies and Value Functions

Define policy and value function approximators, such as actors and critics

A reinforcement learning policy is a mapping from the current environment observation to a probability distribution of the actions to be taken. During training, the agent tunes the parameters of its policy approximator to maximize the long-term reward.

Reinforcement Learning Toolbox™ software provides approximator objects for actor and critic. The actor implements the policy that selects the best action to take. The critic implements the value (or Q-value) function that estimates the value (the cumulative long-term reward) of the current policy. Depending on your application and selected agent, you can define policy and value function approximator using different approximation models, such as deep neural networks, linear basis functions, or look-up tables. For more information, seeCreate Policies and Value Functions.

Blocks

Policy Reinforcement learning policy

Functions

expand all

rlTable Value table or Q table
rlValueFunction Value function approximator object for reinforcement learning agents
rlQValueFunction Q-Value function approximator object for reinforcement learning agents
rlVectorQValueFunction Vector Q-value function approximator for reinforcement learning agents
rlContinuousDeterministicActor Deterministic actor with a continuous action space for reinforcement learning agents
rlDiscreteCategoricalActor Stochastic categorical actor with a discrete action space for reinforcement learning agents
rlContinuousGaussianActor Stochastic Gaussian actor with a continuous action space for reinforcement learning agents
rlOptimizerOptions Optimization options for actors and critics
rlMaxQPolicy Policy object to generate discrete max-Q actions for custom training loops and application deployment
rlEpsilonGreedyPolicy Policy object to generate discrete epsilon-greedy actions for custom training loops
rlDeterministicActorPolicy Policy object to generate continuous deterministic actions for custom training loops and application deployment
rlAdditiveNoisePolicy Policy object to generate continuous noisy actions for custom training loops
rlStochasticActorPolicy Policy object to generate stochastic actions for custom training loops and application deployment
quadraticLayer Quadratic layer for actor or critic network
scalingLayer Scaling layer for actor or critic network
softplusLayer Softplus layer for actor or critic network
featureInputLayer Feature input layer
reluLayer Rectified Linear Unit (ReLU) layer
tanhLayer Hyperbolic tangent (tanh) layer
fullyConnectedLayer Fully connected layer
lstmLayer Long short-term memory (LSTM) layer
softmaxLayer Softmax layer
getActor Get actor from reinforcement learning agent
setActor Set actor of reinforcement learning agent
getCritic Get critic from reinforcement learning agent
setCritic Set critic of reinforcement learning agent
getLearnableParameters Obtain learnable parameter values from agent, function approximator, or policy object
setLearnableParameters 代理,函数的设置可学的参数值approximator, or policy object
getModel Get function approximator model from actor or critic
setModel Set function approximation model for actor or critic
getAction Obtain action from agent, actor, or policy object given environment observations
getValue Obtain estimated value from a critic given environment observations and actions
getMaxQValue Obtain maximum estimated value over all possible actions from a Q-value function critic with discrete action space, given environment observations
evaluate Evaluate function approximator object given observation (or observation-action) input data
gradient Evaluate gradient of function approximator object given observation and action input data
accelerate Option to accelerate computation of gradient for approximator object based on neural network

Topics

Baidu
map