gfn.gym.helpers.box_cartesian_utils
===================================

.. py:module:: gfn.gym.helpers.box_cartesian_utils

.. autoapi-nested-parse::

   Cartesian increment estimators and distributions for the Box environment.


Classes
-------

.. autoapisummary::

   gfn.gym.helpers.box_cartesian_utils.BoxCartesianDistribution
   gfn.gym.helpers.box_cartesian_utils.BoxCartesianPBDistribution
   gfn.gym.helpers.box_cartesian_utils.BoxCartesianPBEstimator
   gfn.gym.helpers.box_cartesian_utils.BoxCartesianPBMLP
   gfn.gym.helpers.box_cartesian_utils.BoxCartesianPFEstimator
   gfn.gym.helpers.box_cartesian_utils.BoxCartesianPFMLP
   gfn.gym.helpers.box_cartesian_utils.UniformBoxCartesianPBModule
   gfn.gym.helpers.box_cartesian_utils._BetaMixtureMixin
   gfn.gym.helpers.box_cartesian_utils._BoxCartesianMLP


Module Contents
---------------

.. py:class:: BoxCartesianDistribution(states, exit_logits, mixture_logits, alpha, beta, delta, epsilon = 1e-06, temperature = 1.0)

   Bases: :py:obj:`_BetaMixtureMixin`, :py:obj:`torch.distributions.Distribution`


   Cartesian increment distribution for Box environment.

   Uses MixtureSameFamily(Categorical, Beta) per dimension for sampling increments.
   Much simpler than polar coordinates - samples relative increments per dimension
   and converts to absolute using: action = min_incr + r * (max_range).

   .. attribute:: delta

      Minimum step size.

   .. attribute:: epsilon

      Small value for numerical stability.


   .. py:attribute:: alpha


   .. py:attribute:: arg_constraints
      :type:  dict


   .. py:attribute:: at_boundary


   .. py:attribute:: beta


   .. py:attribute:: delta


   .. py:attribute:: epsilon
      :value: 1e-06


   .. py:attribute:: exit_logits_scaled


   .. py:attribute:: is_s0


   .. py:method:: log_prob(actions)

      Compute log probability using Cartesian per-dimension approach.


   .. py:attribute:: log_weights


   .. py:attribute:: max_range


   .. py:attribute:: min_incr


   .. py:attribute:: n_dim


   .. py:method:: sample(sample_shape = Size())

      Sample actions using Cartesian per-dimension increments.


   .. py:attribute:: states


.. py:class:: BoxCartesianPBDistribution(states, bts_logits, mixture_logits, alpha, beta, delta, epsilon = 1e-06, temperature = 1.0)

   Bases: :py:obj:`_BetaMixtureMixin`, :py:obj:`torch.distributions.Distribution`


   Backward Cartesian distribution for Box environment.

   In torchgfn's design, the source state is the origin [0, 0]. The BTS
   (back-to-source) action moves directly to s0 by setting action = state.

   WHY THE LEARNED BTS BERNOULLI IS CRITICAL
   ------------------------------------------
   The Trajectory Balance (TB) loss is:

       L = (log P_F(τ) + log Z - log P_B(τ) - log R(x_T))^2

   The BTS step (x_1 → s0) is the *last* step of every backward trajectory.
   Before this fix, BTS was always deterministic (forced): log P_B(BTS | x_1) = 0.
   A constant zero drops out of the gradient, so P_B received no gradient
   from the BTS step — regardless of trajectory length.

   For 1-step trajectories (s0 → x_T, then immediately BTS back to s0), P_B
   was *entirely* gradient-free: log P_B(τ) = 0 for every such trajectory,
   making P_B invisible to the TB loss. This is particularly harmful because
   the reward landscape is dominated by states reachable from s0 in one step
   (the high-reward ring at |x - 0.5| ∈ (0.3, 0.4)).

   With a learned Bernoulli P(BTS | x):
   - log P_B(BTS | x_1)  = log P(BTS=1 | x_1)  — gradient flows into P_B
   - log P_B(~BTS | x_1) = log P(BTS=0 | x_1)  — also receives gradient
   This closes the TB loop fully: P_B now has an incentive to assign higher
   probability to BTS from states that are indeed close to the reward modes,
   and lower probability from states that are far away.

   FORCED vs. STOCHASTIC BTS
   -------------------------
   When any dimension of the state is <= delta, the valid backward increment
   range [delta, state[d]] is empty for that dimension. BTS is the *only*
   valid action, so it is forced (log_prob = 0, deterministic). For all other
   states, BTS is an optional choice sampled from the learned Bernoulli.


   .. py:attribute:: alpha


   .. py:attribute:: any_dim_near_origin


   .. py:attribute:: arg_constraints
      :type:  dict


   .. py:attribute:: beta


   .. py:attribute:: bts_logits_scaled


   .. py:attribute:: delta


   .. py:attribute:: dim_near_origin


   .. py:attribute:: epsilon
      :value: 1e-06


   .. py:method:: log_prob(actions)

      Compute log probability of backward actions.

      - Forced BTS (any_dim_near_origin): log_prob = 0 (deterministic).
      - Stochastic BTS: log_prob = log P(BTS=1 | s) from Bernoulli.
      - Non-BTS: log_prob = log P(BTS=0 | s) + log_p_beta + log_jacobian.


   .. py:attribute:: log_weights


   .. py:attribute:: max_range


   .. py:attribute:: min_incr


   .. py:attribute:: n_dim


   .. py:method:: sample(sample_shape = Size())

      Sample backward actions.

      BTS (action = state) is forced when near origin; otherwise sampled
      from the learned Bernoulli. Non-BTS samples come from Beta mixture.


   .. py:attribute:: states


.. py:class:: BoxCartesianPBEstimator(env, module, n_components, min_concentration = 0.1, max_concentration = 100.0, numerical_epsilon = 1e-06, debug = False)

   Bases: :py:obj:`gfn.estimators.Estimator`, :py:obj:`gfn.estimators.PolicyMixin`


   Simplified PB estimator using Cartesian increments with back-to-source.


   .. py:attribute:: delta


   .. py:attribute:: epsilon


   .. py:property:: expected_output_dim
      :type: int


      bts_logit + (weights + alpha + beta) * n_dim * n_comp.

      :type: Expected output dimension


   .. py:attribute:: max_concentration
      :value: 100.0


   .. py:attribute:: min_concentration
      :value: 0.1


   .. py:attribute:: n_components


   .. py:attribute:: n_dim
      :value: 2


   .. py:attribute:: numerical_epsilon
      :value: 1e-06


   .. py:attribute:: temperature
      :type:  float
      :value: 1.0


   .. py:method:: to_probability_distribution(states, module_output)

      Convert module output to backward probability distribution.


   .. py:method:: uniform(env, n_components, **kwargs)
      :classmethod:


      Create an estimator with a fixed (non-learned) uniform backward policy.

      :param env: The Box environment.
      :param n_components: Number of mixture components.
      :param \*\*kwargs: Extra keyword arguments forwarded to the constructor.

      :returns: A ``BoxCartesianPBEstimator`` whose module has no learnable parameters.


.. py:class:: BoxCartesianPBMLP(hidden_dim, n_hidden_layers, n_components, n_dim = 2, **kwargs)

   Bases: :py:obj:`_BoxCartesianMLP`


   MLP for Box backward policy. First output is back-to-start logit.


.. py:class:: BoxCartesianPFEstimator(env, module, n_components, min_concentration = 0.1, max_concentration = 100.0, numerical_epsilon = 1e-06, debug = False)

   Bases: :py:obj:`gfn.estimators.Estimator`, :py:obj:`gfn.estimators.PolicyMixin`


   Simplified PF estimator using Cartesian increments.

   Much simpler than BoxPFEstimator - uses a single MLP and BoxCartesianDistribution.


   .. py:attribute:: delta


   .. py:attribute:: epsilon


   .. py:property:: expected_output_dim
      :type: int


      exit_logit + (weights + alpha + beta) * n_dim * n_comp.

      :type: Expected output dimension


   .. py:attribute:: max_concentration
      :value: 100.0


   .. py:attribute:: min_concentration
      :value: 0.1


   .. py:attribute:: n_components


   .. py:attribute:: n_dim
      :value: 2


   .. py:attribute:: numerical_epsilon
      :value: 1e-06


   .. py:attribute:: temperature
      :type:  float
      :value: 1.0


   .. py:method:: to_probability_distribution(states, module_output)

      Convert module output to a probability distribution.

      :param states: The states.
      :param module_output: Output from the module, shape (batch, expected_output_dim).

      :returns: BoxCartesianDistribution instance.


.. py:class:: BoxCartesianPFMLP(hidden_dim, n_hidden_layers, n_components, n_dim = 2, **kwargs)

   Bases: :py:obj:`_BoxCartesianMLP`


   MLP for Box forward policy. First output is exit logit.


.. py:class:: UniformBoxCartesianPBModule(n_components, n_dim = 2)

   Bases: :py:obj:`gfn.utils.modules.UniformModule`


   Fixed (non-learned) backward policy module for Cartesian Box.

   Backward-compatible alias for ``UniformModule``.  Prefer
   ``BoxCartesianPBEstimator.uniform(env, n_components)`` for new code.


.. py:class:: _BetaMixtureMixin

   Shared Beta mixture sampling/log_prob for forward and backward distributions.


   .. py:method:: _beta_mixture_log_prob(r)

      Compute Beta mixture log_prob without MixtureSameFamily overhead.


   .. py:method:: _sample_beta_mixture()

      Sample from Beta mixture without MixtureSameFamily overhead.


   .. py:attribute:: alpha
      :type:  torch.Tensor


   .. py:attribute:: beta
      :type:  torch.Tensor


   .. py:attribute:: log_weights
      :type:  torch.Tensor


.. py:class:: _BoxCartesianMLP(hidden_dim, n_hidden_layers, n_components, n_dim = 2, **kwargs)

   Bases: :py:obj:`gfn.utils.modules.MLP`


   Base MLP for Box Cartesian policies (forward and backward).

   Output format: [logit, mixture_logits..., alpha..., beta...]
   where the first logit is exit (PF) or back-to-start (PB), and
   mixture_logits, alpha, beta each have shape n_dim * n_components.

   States are normalized from [0, 1] to [-1, 1] before the forward pass to
   match the gflownet reference (states2policy normalization).


   .. py:method:: forward(preprocessed_states)

      Forward pass. Normalizes [0, 1] states to [-1, 1] before the MLP.


   .. py:attribute:: n_components


   .. py:attribute:: n_dim
      :value: 2