Reward Wrappers¶
While the pcse_gym/
Gymnasium module natively supports a few Reward Wrappers, it is may be desirable
to create a different reward to train a RL agent on, given different values when growing a crop (reduced water use, reduced fertilizer runoff, etc.)
Base Class¶
The abstract pcse_gym.wrappers.RewardWrapper
base class inherits the gymnasium.Wrapper
class.
It has two functions _validate()
which enforces correct environment wrapping order, and the
reset()
function which correctly passes the keyword arguments to the base environment
Caution
Always call the utils.make_gym_env()
function to make an environment before
wrapping with a RewardWrapper
. Otherwise, the environment will be wrapped with a
gymnasium.Wrapper.OrderEnforcing
wrapper which does not pass keyword arguments in reset()
.
See Environment Creation.
Creating a Reward Wrapper¶
To create a reward wrapper, inherit the pcse_gym.wrappers.RewardWrapper
class. The __init__()
function should have the following header:
class RewardFertilizationThresholdWrapper(RewardWrapper):
"""
Modifies the reward to be a function with high penalties for if a
threshold is crossed during fertilization or irrigation
"""
def __init__(self, env: gym.Env, args):
"""Initialize the :class:`RewardFertilizationThresholdWrapper` wrapper with an environment."""
super().__init__(env)
self.env = env
Any required arguments should be passed by the Args
dataclass (specified in utils.py
).
Changing the Reward¶
The reward function generally depends on the output of the simulation and the action taken. Upon taking an action, an action_tuple
is returned
from the _take_action()
function which specifies how much water, Nitrogen, Phosphorous, or Potassium was applied to the crop.
From this action tuple and the output of the crop simulation, the reward can be updated.
See the example below which penalizes the application of any fertilizer or water, while rewarding the total growth of the crop, WSO
.
def _get_reward(self, output: dict, act_tuple:tuple):
"""Gets the reward as a penalty based on the amount of NPK/Water applied
Args:
output: dict - output from model
act_tuple: tuple - NPK/Water amounts"""
reward = output.iloc[-1]['WSO'] - \
(np.sum(self.cost * np.array([act_tuple[2:]])))
return reward
Important
When using a pcse_gym.RewardWrapper
, the output of the crop simulation must contain the output variables needed to compute the
new reward. For information on configuring simulation output, see Environment Configuration.
The Step Function¶
Often, the step function needs to be overridden with env.unwrapped
calls to ensure compatibility with a variety of wrappers,
especially if the environment was previously wrapped with NPKDictObservationWrapper
or NPKDictActionWrapper
.
To ensure compatibility, copy the step
function from pcse_gym.wofost_base.NPK_Env
to
the pcse_gym.wrappers.NewRewardWrapper
class. Below is the template, which does not need modifications:
def step(self, action):
"""Run one timestep of the environment's dynamics.
Sends action to the WOFOST model and recieves the resulting observation
which is then processed to the _get_reward() function and _process_output()
function for a reward and observation
Args:
action: integer
"""
# Send action signal to model and run model
act_tuple = self.env.unwrapped._take_action(action)
output = self.env.unwrapped._run_simulation()
observation = self.env.unwrapped._process_output(output)
reward = self._get_reward(output, act_tuple)
# Terminate based on crop finishing
termination = output.iloc[-1]['FIN'] == 1.0
# Truncate based on site end date
truncation = self.env.unwrapped.date >= self.env.unwrapped.site_end_date
self.env.unwrapped._log(output.iloc[-1]['WSO'], act_tuple, reward)
return observation, reward, termination, truncation, self.env.unwrapped.log