gfn.gym.box =========== .. py:module:: gfn.gym.box Classes ------- .. autoapisummary:: gfn.gym.box.BoxPolar Module Contents --------------- .. py:class:: BoxPolar(delta = 0.1, R0 = 0.1, R1 = 0.5, R2 = 2.0, epsilon = 0.0001, device = 'cpu', debug = False) Bases: :py:obj:`gfn.env.Env` Box environment with polar (norm-based) action validation. Corresponds to the environment in Section 4.1 of https://arxiv.org/abs/2301.12594 Actions are 2D vectors whose L2 norm must equal delta (for non-s0 forward steps) or be at most delta (for the initial s0 step). Use with the polar estimators/distributions in ``box_polar_utils.py``. .. seealso:: :class:`~gfn.gym.box_cartesian.BoxCartesian` for a simpler per-dimension Cartesian variant. .. attribute:: delta The step size. .. attribute:: R0 The base reward. .. attribute:: R1 The reward for being outside the first box. .. attribute:: R2 The reward for being inside the second box. .. attribute:: epsilon A small value to avoid numerical issues. .. attribute:: device The device to use. :type: Literal["cpu", "cuda"] | torch.device .. py:attribute:: R0 :value: 0.1 .. py:attribute:: R1 :value: 0.5 .. py:attribute:: R2 :value: 2.0 .. py:method:: backward_step(states, actions) Backward step function for the Box environment. :param states: States object representing the current states. :param actions: Actions object representing the actions to be taken. :returns: The previous states as a States object. .. py:attribute:: delta :value: 0.1 .. py:attribute:: epsilon :value: 0.0001 .. py:method:: is_action_valid(states, actions, backward = False) Checks if the actions are valid (polar norm-based semantics). For polar actions: - Forward from s0: norm(action) <= delta - Forward from non-s0: norm(action) == delta (within tolerance) - Backward: state - action >= 0 component-wise - Backward to s0: if norm(state) < delta, action must equal state :param states: The current states. :param actions: The actions to be taken. :param backward: Whether the actions are backward actions. :returns: True if the actions are valid, False otherwise. .. py:method:: log_partition(condition=None) Returns the log partition of the reward function. .. py:method:: make_random_states(batch_shape, conditions = None, device = None, debug = False) Generates random states tensor of shape (*batch_shape, 2). :param batch_shape: The shape of the batch. :param conditions: Optional tensor of shape (*batch_shape, condition_dim) containing condition vectors for conditional GFlowNets. :param device: The device to use. :param debug: If True, emit States with debug guards (not compile-friendly). :returns: A States object with random states. .. py:method:: norm(x) :staticmethod: Computes the L2 norm of the input tensor along the last dimension. :param x: Input tensor of shape `(*batch_shape, 2)`. :returns: Normalized tensor of shape `batch_shape`. .. py:method:: reward(final_states) Reward is distance from the goal point. :param final_states: States object representing the final states. :returns: The reward tensor of shape `batch_shape`. .. py:method:: step(states, actions) Step function for the Box environment. :param states: States object representing the current states. :param actions: Actions object representing the actions to be taken. :returns: The next states as a States object.