tutorials.examples.train_line
=============================

.. py:module:: tutorials.examples.train_line


Attributes
----------

.. autoapisummary::

   tutorials.examples.train_line.parser


Classes
-------

.. autoapisummary::

   tutorials.examples.train_line.GaussianStepMLP
   tutorials.examples.train_line.ScaledGaussianWithOptionalExit
   tutorials.examples.train_line.StepEstimator


Functions
---------

.. autoapisummary::

   tutorials.examples.train_line.main
   tutorials.examples.train_line.render
   tutorials.examples.train_line.train


Module Contents
---------------

.. py:class:: GaussianStepMLP(hidden_dim, n_hidden_layers, policy_std_min = 0.1, policy_std_max = 1)

   Bases: :py:obj:`gfn.utils.modules.MLP`


   A deep neural network for the forward and backward policy.


   .. py:method:: forward(preprocessed_states)

      Calculate the gaussian parameters, applying the bound to sigma.

      :param preprocessed_states: a tensor of shape (*batch_shape, 2) containing the states.

      Returns a tensor of shape (*batch_shape, 2) containing the mean and variance of the Gaussian distribution.


   .. py:attribute:: input_dim
      :value: 2


   .. py:attribute:: output_dim
      :value: 2


   .. py:attribute:: policy_std_max
      :value: 1


   .. py:attribute:: policy_std_min
      :value: 0.1


.. py:class:: ScaledGaussianWithOptionalExit(states, mus, scales, backward, n_steps = 5)

   Bases: :py:obj:`torch.distributions.Distribution`


   Extends the Beta distribution by considering the step counter. When sampling,
   the step counter can be used to ensure the `exit_action` [inf, inf] is sampled.


   .. py:attribute:: backward


   .. py:attribute:: dist


   .. py:attribute:: exit_action


   .. py:attribute:: idx_at_final_backward_step


   .. py:attribute:: idx_at_final_forward_step


   .. py:method:: log_prob(sampled_actions)

      Computes log-probabilities, returning 0 for deterministic exit/BTS transitions.


   .. py:method:: sample(sample_shape=())


.. py:class:: StepEstimator(env, module, backward)

   Bases: :py:obj:`gfn.estimators.Estimator`, :py:obj:`gfn.estimators.PolicyMixin`


   Estimator for PF and PB of the Line environment.


   .. py:attribute:: backward


   .. py:property:: expected_output_dim
      :type: int


      Expected output dimension of the module.

      :returns: The expected output dimension of the module, or None if the output dimension
                is not well-defined (e.g., when the output is a TensorDict for GraphActions).


   .. py:attribute:: n_steps_per_trajectory


   .. py:method:: to_probability_distribution(states, module_output, scale_factor=0)

      Converts the output of the neural network to a probability distribution.

      :param states: The states to use for the distribution.
      :param module_output: The output of the neural network as a tensor of shape (*batch_shape, output_dim).
      :param scale_factor: The scale factor to use for the distribution.

      Returns a distribution object.


.. py:function:: main(args)

.. py:data:: parser

.. py:function:: render(env, validation_samples=None)

   Renders the reward distribution over the 1D env.


.. py:function:: train(gflownet, env, seed=4444, n_trajectories=3000000.0, batch_size=128, lr_base=0.001, gradient_clip_value=5, exploration_var_starting_val=2)

   Trains a GFlowNet on the Line Environment.