Water Distribution System Scheduling Using Reinforcement Learning
This example shows how to learn an optimal pump scheduling policy for a water distribution system using reinforcement learning (RL).
Water Distribution System
The following figure shows a water distribution system.
Here:
is the amount of water supplied to the water tank from a reservoir.
is the amount of water flowing out of the tank to satisfy usage demand.
The objective of the reinforcement learning agent is to schedule the number of pumps running to both minimize the energy usage of the system and satisfy the usage demand ( ). The dynamics of the tank system are governed by the following equation.
Here, and . The demand over a 24 hour period is a function of time given as
where is the expected demand and 代表了德mand uncertainty, which is sampled from a uniform random distribution.
The supply is determined by the number of pumps running, according to following mapping.
To simplify the problem, power consumption is defined as the number of pumps running, .
The following function is the reward for this environment. To avoid overflowing or emptying the tank, an additional cost is added if the water height is close to the maximum or minimum water levels, or , respectively.
Generate Demand Profile
To generate and a water demand profile based on the number of days considered, use thegenerateWaterDemand
function defined at the end of this example.
num_days = 4;% Number of days[WaterDemand,T_max] = generateWaterDemand(num_days);
View the demand profile.
plot(WaterDemand)
Open and Configure Model
Open the distribution system Simulink model.
mdl ="watertankscheduling"; open_system(mdl)
In addition to the reinforcement learning agent, a simple baseline controller is defined in the Control law MATLAB Function block. This controller activates a certain number of pumps depending on the water level.
Specify the initial water height.
h0 = 3;% m
Specify the model parameters.
SampleTime = 0.2; H_max = 7;% Max tank height (m)A_tank = 40;% Area of tank (m^2)
Create Environment Interface for RL Agent
To create an environment interface for the Simulink model, first define the action and observation specifications,actInfo
andobsInfo
, respectively. The agent action is the selected number of pumps. The agent observation is the water height, which is measured as a continuous-time signal.
actInfo = rlFiniteSetSpec([0,1,2,3]); obsInfo = rlNumericSpec([1,1]);
Create the environment interface.
env = rlSimulinkEnv(mdl,mdl+"/RL Agent",obsInfo,actInfo);
Specify a custom reset function, which is defined at the end of this example, that randomizes the initial water height and the water demand. Doing so allows the agent to be trained on different initial water levels and water demand functions for each episode.
env.ResetFcn = @(in)localResetFcn(in);
Create DQN Agent
A DQN agent approximates the long-term reward, given observations and actions, using a critic Q-value function representation. To create the critic, first create a deep neural network. For more information on creating a deep neural network value function representation, seeCreate Policies and Value Functions.
% Fix the random generator seed for reproducibility.rng(0);
Create a deep neural network for the critic. For this example, use a non-recurrent neural network. To use a recurrent neural network, setuseLSTM
totrue
.
useLSTM = false;ifuseLSTM layers = [ sequenceInputLayer(obsInfo.Dimension(1),"Name","state","Normalization","none") fullyConnectedLayer(32,"Name","fc_1") reluLayer("Name","relu_body1") lstmLayer(32,"Name",“lstm”) fullyConnectedLayer(32,"Name","fc_3") reluLayer("Name","relu_body3") fullyConnectedLayer(numel(actInfo.Elements),"Name","output")];elselayers = [ featureInputLayer(obsInfo.Dimension(1),"Name","state","Normalization","none") fullyConnectedLayer(32,"Name","fc_1") reluLayer("Name","relu_body1") fullyConnectedLayer(32,"Name","fc_2") reluLayer("Name","relu_body2") fullyConnectedLayer(32,"Name","fc_3") reluLayer("Name","relu_body3") fullyConnectedLayer(numel(actInfo.Elements),"Name","output")];enddnn = dlnetwork(layers);
Specify options for creating a critic representation.
criticOpts = rlOptimizerOptions(“LearnRate”,0.001,'GradientThreshold',1);
Create a critic using rlVectorQValueFunction with the defined deep neural network and options.
critic = rlVectorQValueFunction(dnn,obsInfo,actInfo);
Create DQN Agent
To create an agent, first specify agent options. If you are using an LSTM network, set the sequence length to20
.
opt = rlDQNAgentOptions('SampleTime',SampleTime);ifuseLSTM opt.SequenceLength = 20;elseopt.SequenceLength = 1;endopt.DiscountFactor = 0.995; opt.ExperienceBufferLength = 1e6; opt.EpsilonGreedyExploration.EpsilonDecay = 1e-5; opt.EpsilonGreedyExploration.EpsilonMin = .02; opt.CriticOptimizerOptions = criticOpts;
Create the agent using the defined options and critic representation.
agent = rlDQNAgent(critic,opt);
Train Agent
To train the agent, first specify the training options. For this example, use the following options.
Run training for 1000 episodes, with each episode lasting at
ceil(T_max/Ts)
time steps.Display the training progress in the Episode Manager dialog box (set the
Plots
option)
Specify the options for training using anrlTrainingOptions
object.
trainOpts = rlTrainingOptions(...'MaxEpisodes',1000,...'MaxStepsPerEpisode',ceil(T_max/SampleTime),...'Verbose',false,...“阴谋”,'training-progress',...'StopTrainingCriteria','EpisodeCount',...'StopTrainingValue',1000,...'ScoreAveragingWindowLength',100);
While you don't do it for this example, you can save agents during the training process. For example, the following options save every agent with a reward value greater than or equal to-42
.
Save agents using SaveAgentCriteria if necessary trainOpts.SaveAgentCriteria = 'EpisodeReward'; trainOpts.SaveAgentValue = -42;
Train the agent using thetrain
function. Training this agent is a computationally intensive process that takes several hours to complete. To save time while running this example, load a pretrained agent by settingdoTraining
tofalse
. To train the agent yourself, setdoTraining
totrue
.
doTraining = false;ifdoTraining% Train the agent.% Connect the RL Agent block by toggling the manual switch blockset_param(mdl+"/Manual Switch",'sw','0'); trainingStats = train(agent,env,trainOpts);else% Load the pretrained agent for the example.load('SimulinkWaterDistributionDQN.mat','agent')end
The following figure shows the training progress.
Simulate DQN Agent
To validate the performance of the trained agent, simulate it in the water-tank environment. For more information on agent simulation, seerlSimulationOptions
andsim
.
To simulate the agent performance, connect the RL Agent block by toggling the manual switch block.
set_param(mdl+"/Manual Switch",'sw','0');
为每个simulatio设置步骤的最大数量n and the number of simulations. For this example, run 30 simulations. The environment reset function sets a different initial water height and water demand are different in each simulation.
NumSimulations = 30; simOptions = rlSimulationOptions('MaxSteps',T_max/SampleTime,...'NumSimulations', NumSimulations);
To compare the agent with the baseline controller under the same conditions, reset the initial random seed used in the environment reset function.
env.ResetFcn("Reset seed");
Simulate the agent against the environment.
experienceDQN = sim(env,agent,simOptions);
Simulate Baseline Controller
To compare DQN agent with the baseline controller, you must simulate the baseline controller using the same simulation options and initial random seed for the reset function.
Enable the baseline controller.
set_param(mdl+"/Manual Switch",'sw','1');
To compare the agent with the baseline controller under the same conditions, reset the random seed used in the environment reset function.
env.ResetFcn("Reset seed");
Simulate the baseline controller against the environment.
experienceBaseline = sim(env,agent,simOptions);
Compare DQN Agent with Baseline Controller
Initialize cumulative reward result vectors for both the agent and baseline controller.
resultVectorDQN = zeros(NumSimulations, 1); resultVectorBaseline = zeros(NumSimulations,1);
Calculate cumulative rewards for both the agent and the baseline controller.
forct = 1: NumSimulations resultVectorDQN(ct) = sum(experienceDQN(ct).Reward); resultVectorBaseline(ct) = sum(experienceBaseline(ct).Reward);end
Plot the cumulative rewards.
plot([resultVectorDQN resultVectorBaseline],'o') set(gca,'xtick',1:NumSimulations) xlabel("Simulation number") ylabel('Cumulative Reward') legend('DQN','Baseline','Location','NorthEastOutside')
The cumulative reward obtained by the agent is consistently around –40. This value is much greater than the average reward obtained by the baseline controller. Therefore, the DQN agent consistently outperforms the baseline controller in terms of energy savings.
Local Functions
Water Demand Function
function[WaterDemand,T_max] = generateWaterDemand(num_days) t = 0:(num_days*24)-1;% hrT_max = t(end); Demand_mean = [28, 28, 28, 45, 55, 110, 280, 450, 310, 170, 160, 145, 130,...150, 165, 155, 170, 265, 360, 240, 120, 83, 45, 28]';% m^3/hrDemand = repmat(Demand_mean,1,num_days); Demand = Demand(:);% Add noise to demanda = -25;% m^3/hrb = 25;% m^3/hrDemand_noise = a + (b-a).*rand(numel(Demand),1); WaterDemand = timeseries(Demand + Demand_noise,t); WaterDemand.Name ="Water Demand"; WaterDemand.TimeInfo.Units ='hours';end
Reset Function
functionin = localResetFcn(in)% Use a persistent random seed value to evaluate the agent and the baseline% controller under the same conditions.persistentrandomSeedifisempty(randomSeed) randomSeed = 0;endifstrcmp(in,"Reset seed") randomSeed = 0;returnendrandomSeed= randomSeed + 1; rng(randomSeed)% Randomize water demand.num_days = 4; H_max = 7; [WaterDemand,~] = generateWaterDemand(num_days); assignin('base','WaterDemand',WaterDemand)% Randomize initial height.h0 = 3*randn;whileh0 <= 0 || h0 >= H_max h0 = 3*randn;endblk ='watertankscheduling/Water Tank System/Initial Water Height'; in = setBlockParameter(in,blk,'Value',num2str(h0));end