sim
Description
simulates one or more reinforcement learning agents within an environment, using default simulation options.experience
= sim(env
,agents
)
performs the same simulation as the previous syntax.experience
= sim(agents
,env
)
Examples
Simulate Reinforcement Learning Environment
Simulate a reinforcement learning environment with an agent configured for that environment. For this example, load an environment and agent that are already configured. The environment is a discrete cart-pole environment created withrlPredefinedEnv
. The agent is a policy gradient (rlPGAgent
) agent. For more information about the environment and agent used in this example, seeTrain PG Agent to Balance Cart-Pole System.
rng(0)% for reproducibilityloadRLSimExample.matenv
env = CartPoleDiscreteAction with properties: Gravity: 9.8000 MassCart: 1 MassPole: 0.1000 Length: 0.5000 MaxForce: 10 Ts: 0.0200 ThetaThresholdRadians: 0.2094 XThreshold: 2.4000 RewardForNotFalling: 1 PenaltyForFalling: -5 State: [4x1 double]
agent
agent = rlPGAgent with properties: AgentOptions: [1x1 rl.option.rlPGAgentOptions] UseExplorationPolicy: 1 ObservationInfo: [1x1 rl.util.rlNumericSpec] ActionInfo: [1x1 rl.util.rlFiniteSetSpec] SampleTime: 0.1000
Typically, you train the agent usingtrain
and simulate the environment to test the performance of the trained agent. For this example, simulate the environment using the agent you loaded. Configure simulation options, specifying that the simulation run for 100 steps.
simOpts = rlSimulationOptions(MaxSteps=100);
For the predefined cart-pole environment used in this example. you can useplot
to generate a visualization of the cart-pole system. When you simulate the environment, this plot updates automatically so that you can watch the system evolve during the simulation.
plot(env)
Simulate the environment.
experience = sim(env,agent,simOpts)
experience =struct with fields:Observation: [1x1 struct] Action: [1x1 struct] Reward: [1x1 timeseries] IsDone: [1x1 timeseries] SimulationInfo: [1x1 struct]
The output structureexperience
records the observations collected from the environment, the action and reward, and other data collected during the simulation. Each field contains atimeseries
object or a structure oftimeseries
data objects. For instance,experience.Action
is atimeseries
containing the action imposed on the cart-pole system by the agent at each step of the simulation.
experience.Action
ans =struct with fields:CartPoleAction: [1x1 timeseries]
Simulate Simulink Environment with Multiple Agents
Simulate an environment created for the Simulink® model used in the exampleTrain Multiple Agents to Perform Collaborative Task, using the agents trained in that example.
Load the agents in the MATLAB® workspace.
loadrlCollaborativeTaskAgents
Create an environment for therlCollaborativeTask
Simulink® model, which has two agent blocks. Since the agents used by the two blocks (agentA
andagentB
) are already in the workspace, you do not need to pass their observation and action specifications to create the environment.
env = rlSimulinkEnv(..."rlCollaborativeTask",...["rlCollaborativeTask/Agent A","rlCollaborativeTask/Agent B"]);
Load the parameters that are needed by therlCollaborativeTask
Simulink® model to run.
rlCollaborativeTaskParams
Simulate the agents against the environment, saving the experiences inxpr
.
xpr = sim(env,[agentA agentB]);
Plot actions of both agents.
次要情节(2,1,1);情节(xpr (1) .Action.forces)次要情节(2,1,2); plot(xpr(2).Action.forces)
Input Arguments
env
—Environment
reinforcement learning environment object
Environment in which the agents act, specified as one of the following kinds of reinforcement learning environment object:
A predefined MATLAB®or Simulink®environment created using
rlPredefinedEnv
. This kind of environment does not support training multiple agents at the same time.A custom MATLAB environment you create with functions such as
rlFunctionEnv
orrlCreateEnvTemplate
. This kind of environment does not support training multiple agents at the same time.A custom Simulink environment you create using
rlSimulinkEnv
. This kind of environment supports training multiple agents at the same time.
For more information about creating and configuring environments, see:
Whenenv
is a Simulink environment, callingsim
compiles and simulates the model associated with the environment.
agents
—Agents
reinforcement learning agent object|array of agent objects
Agents to simulate, specified as a reinforcement learning agent object, such asrlACAgent
orrlDDPGAgent
, or as an array of such objects.
Ifenv
is a multi-agent environment created withrlSimulinkEnv
, specify agents as an array. The order of the agents in the array must match the agent order used to createenv
. Multi-agent simulation is not supported for MATLAB environments.
For more information about how to create and configure agents for reinforcement learning, seeReinforcement Learning Agents.
simOpts
—Simulation options
rlSimulationOptions
object
Simulation options, specified as anrlSimulationOptions
object. Use this argument to specify options such as:
Number of steps per simulation
Number of simulations to run
For details, seerlSimulationOptions
.
Output Arguments
experience
— Simulation results
structure | structure array
Simulation results, returned as a structure or structure array. The number of rows in the array is equal to the number of simulations specified by theNumSimulations
option ofrlSimulationOptions
. The number of columns in the array is the number of agents. The fields of eachexperience
structure are as follows.
Observation
— Observations
structure
Observations collected from the environment, returned as a structure with fields corresponding to the observations specified in the environment. Each field contains atimeseries
of lengthN+ 1, whereNis the number of simulation steps.
To obtain the current observation and the next observation for a given simulation step, use code such as the following, assuming one of the fields ofObservation
isobs1
.
Obs = getSamples(experience.Observation.obs1,1:N); NextObs = getSamples(experience.Observation.obs1,2:N+1);
sim
to generate experiences for training.
Action
— Actions
structure
Actions computed by the agent, returned as a structure with fields corresponding to the action signals specified in the environment. Each field contains atimeseries
of lengthN, whereNis the number of simulation steps.
Reward
— Rewards
timeseries
Reward at each step in the simulation, returned as atimeseries
of lengthN, whereNis the number of simulation steps.
IsDone
— Flag indicating termination of episode
timeseries
国旗表明终止啊f the episode, returned as atimeseries
of a scalar logical signal. This flag is set at each step by the environment, according to conditions you specify for episode termination when you configure the environment. When the environment sets this flag to 1, simulation terminates.
SimulationInfo
——模拟期间收集的信息
structure | vector ofSimulink.SimulationOutput
objects
Information collected during simulation, returned as one of the following:
For MATLAB environments, a structure containing the field
SimulationError
. This structure contains any errors that occurred during simulation.For Simulink environments, a
Simulink.SimulationOutput
object containing simulation data. Recorded data includes any signals and states that the model is configured to log, simulation metadata, and any errors that occurred.
Version History
Introduced in R2019a
See Also
Functions
Objects
Open Example
You have a modified version of this example. Do you want to open this example with your edits?
MATLAB Command
You clicked a link that corresponds to this MATLAB command:
Run the command by entering it in the MATLAB Command Window. Web browsers do not support MATLAB commands.
Select a Web Site
Choose a web site to get translated content where available and see local events and offers. Based on your location, we recommend that you select:.
You can also select a web site from the following list:
How to Get Best Site Performance
Select the China site (in Chinese or English) for best site performance. Other MathWorks country sites are not optimized for visits from your location.
Americas
- América Latina(Español)
- Canada(English)
- United States(English)
Europe
- Belgium(English)
- Denmark(English)
- Deutschland(Deutsch)
- España(Español)
- Finland(English)
- France(Français)
- Ireland(English)
- Italia(Italiano)
- Luxembourg(English)
- Netherlands(English)
- Norway(English)
- Österreich(Deutsch)
- Portugal(English)
- Sweden(English)
- Switzerland
- United Kingdom(English)