交互式创建和训练强化学习代理
在强化学习设计器应用程序中使用可视化交互工作流设计、训练和模拟强化学习代理。使用该应用程序在强化学习工具箱™中设置一个强化学习问题,而无需编写MATLAB®代码。通过整个强化学习工作流程:
- 在应用中导入一个现有的环境
- 为您的环境导入或创建一个新的代理,并为该代理选择适当的超参数
- 使用“强化学习工具箱”创建的默认神经网络体系结构或导入自定义体系结构
- 在单个或多个工人上训练代理,并针对环境模拟训练过的代理
- •将最终的代理导出到MATLAB工作区,以便进一步使用和部署
从MATLAB的R2021a版本开始,强化学习工具箱允许您使用新的强化学习设计器应用程序交互设计、训练和模拟RL代理。从命令行或从MATLAB工具条打开应用程序。首先,您需要创建代理将针对其进行训练的环境对象。强化学习设计器允许您从MATLAB工作区导入环境对象,从几个预定义的环境中进行选择,或创建自己的自定义环境。对于本例,让我们创建一个具有离散动作空间的预定义的车杆MATLAB环境,我们还将从MATLAB工作区导入一个具有连续动作空间的四足机器人的自定义Simulink环境。您可以根据需要从“环境”窗格中删除或重命名环境对象,并且可以在“预览”窗格中查看观察和操作空间的尺寸。要创建一个代理,请在“强化学习”选项卡的“代理”部分单击“新建”。根据所选择的环境,以及观察和行动空间的性质,应用程序将显示一个兼容的内置训练算法列表。对于这个演示,我们将选择DQN算法。该应用程序将生成一个DQN代理,具有默认的批评家体系结构。在创建代理之前,您可以根据需要调整批评家的一些默认值。 The new agent will appear in the Agents pane and the Agent Editor will show a summary view of the agent and available hyperparameters that can be tuned. For example let’s change the agent’s sample time and the critic’s learn rate. Here, we can also adjust the exploration strategy of the agent and see how exploration will progress with respect to number of training steps. To view the critic default network, click View Critic Model on the DQN Agent tab. The Deep Learning Network Analyzer opens and displays the critic structure. You can change the critic neural network by importing a different critic network from the workspace. You can also import a different set of agent options or a different critic representation object altogether. Click Train to specify training options such as stopping criteria for the agent. Here, let’s set the max number of episodes to 1000 and leave the rest to their default values. To parallelize training click on the Use Parallel button. Parallelization options include additional settings such as the type of data workers will send back, whether data will be sent synchronously or not and more. After setting the training options, you can generate a MATLAB script with the specified settings that you can use outside the app if needed. To start training, click Train. During the training process, the app opens the Training Session tab and displays the training progress. If visualization of the environment is available, you can also view how the environment responds during training. You can stop training anytime and choose to accept or discard training results. Accepted results will show up under the Results Pane and a new trained agent will also appear under Agents. To simulate an agent, go to the Simulate tab and select the appropriate agent and environment object from the drop-down list. For this task, let’s import a pretrained agent for the 4-legged robot environment we imported at the beginning. Double click on the agent object to open the Agent editor. You can see that this is a DDPG agent that takes in 44 continuous observations and outputs 8 continuous torques. In the Simulate tab, select the desired number of simulations and simulation length. If you need to run a large number of simulations, you can run them in parallel. After clicking Simulate, the app opens the Simulation Session tab. If available, you can view the visualization of the environment at this stage as well. When the simulations are completed, you will be able to see the reward for each simulation as well as the reward mean and standard deviation. Remember that the reward signal is provided as part of the environment. To analyze the simulation results, click on Inspect Simulation Data. In the Simulation Data Inspector you can view the saved signals for each simulation episode. If you want to keep the simulation results click accept. When you finish your work, you can choose to export any of the agents shown under the Agents pane. For convenience, you can also directly export the underlying actor or critic representations, actor or critic neural networks, and agent options. To save the app session for future use, click Save Session on the Reinforcement Learning tab. For more information please refer to the documentation of Reinforcement Learning Toolbox.
您也可以从以下列表中选择网站:
如何获得最佳的网站性能
选择中国网站(中文或英文)以获得最佳的网站表现。其他MathWorks国家网站没有针对从您的位置访问进行优化。