gfn.gym.box
===========

.. py:module:: gfn.gym.box


Classes
-------

.. autoapisummary::

   gfn.gym.box.BoxPolar


Module Contents
---------------

.. py:class:: BoxPolar(delta = 0.1, R0 = 0.1, R1 = 0.5, R2 = 2.0, epsilon = 0.0001, device = 'cpu', debug = False)

   Bases: :py:obj:`gfn.env.Env`


   Box environment with polar (norm-based) action validation.

   Corresponds to the environment in Section 4.1 of
   https://arxiv.org/abs/2301.12594

   Actions are 2D vectors whose L2 norm must equal delta (for non-s0 forward
   steps) or be at most delta (for the initial s0 step). Use with the polar
   estimators/distributions in ``box_polar_utils.py``.

   .. seealso::

      :class:`~gfn.gym.box_cartesian.BoxCartesian` for a simpler
      per-dimension Cartesian variant.

   .. attribute:: delta

      The step size.

   .. attribute:: R0

      The base reward.

   .. attribute:: R1

      The reward for being outside the first box.

   .. attribute:: R2

      The reward for being inside the second box.

   .. attribute:: epsilon

      A small value to avoid numerical issues.

   .. attribute:: device

      The device to use.

      :type: Literal["cpu", "cuda"] | torch.device


   .. py:attribute:: R0
      :value: 0.1


   .. py:attribute:: R1
      :value: 0.5


   .. py:attribute:: R2
      :value: 2.0


   .. py:method:: backward_step(states, actions)

      Backward step function for the Box environment.

      :param states: States object representing the current states.
      :param actions: Actions object representing the actions to be taken.

      :returns: The previous states as a States object.


   .. py:attribute:: delta
      :value: 0.1


   .. py:attribute:: epsilon
      :value: 0.0001


   .. py:method:: is_action_valid(states, actions, backward = False)

      Checks if the actions are valid (polar norm-based semantics).

      For polar actions:
      - Forward from s0: norm(action) <= delta
      - Forward from non-s0: norm(action) == delta (within tolerance)
      - Backward: state - action >= 0 component-wise
      - Backward to s0: if norm(state) < delta, action must equal state

      :param states: The current states.
      :param actions: The actions to be taken.
      :param backward: Whether the actions are backward actions.

      :returns: True if the actions are valid, False otherwise.


   .. py:method:: log_partition(condition=None)

      Returns the log partition of the reward function.


   .. py:method:: make_random_states(batch_shape, conditions = None, device = None, debug = False)

      Generates random states tensor of shape (*batch_shape, 2).

      :param batch_shape: The shape of the batch.
      :param conditions: Optional tensor of shape (*batch_shape, condition_dim) containing
                         condition vectors for conditional GFlowNets.
      :param device: The device to use.
      :param debug: If True, emit States with debug guards (not compile-friendly).

      :returns: A States object with random states.


   .. py:method:: norm(x)
      :staticmethod:


      Computes the L2 norm of the input tensor along the last dimension.

      :param x: Input tensor of shape `(*batch_shape, 2)`.

      :returns: Normalized tensor of shape `batch_shape`.


   .. py:method:: reward(final_states)

      Reward is distance from the goal point.

      :param final_states: States object representing the final states.

      :returns: The reward tensor of shape `batch_shape`.


   .. py:method:: step(states, actions)

      Step function for the Box environment.

      :param states: States object representing the current states.
      :param actions: Actions object representing the actions to be taken.

      :returns: The next states as a States object.