tutorials.examples.train_line

Attributes

parser

Classes

GaussianStepMLP

A deep neural network for the forward and backward policy.

ScaledGaussianWithOptionalExit

Extends the Beta distribution by considering the step counter. When sampling,

StepEstimator

Estimator for PF and PB of the Line environment.

Functions

main(args)

render(env[, validation_samples])

Renders the reward distribution over the 1D env.

train(gflownet, env[, seed, n_trajectories, ...])

Trains a GFlowNet on the Line Environment.

Module Contents

class tutorials.examples.train_line.GaussianStepMLP(hidden_dim, n_hidden_layers, policy_std_min=0.1, policy_std_max=1)

Bases: gfn.utils.modules.MLP

A deep neural network for the forward and backward policy.

Parameters:
  • hidden_dim (int)

  • n_hidden_layers (int)

  • policy_std_min (float)

  • policy_std_max (float)

forward(preprocessed_states)

Calculate the gaussian parameters, applying the bound to sigma.

Parameters:

preprocessed_states (torch.Tensor) – a tensor of shape (*batch_shape, 2) containing the states.

Return type:

torch.Tensor

Returns a tensor of shape (*batch_shape, 2) containing the mean and variance of the Gaussian distribution.

input_dim = 2
output_dim = 2
policy_std_max = 1
policy_std_min = 0.1
class tutorials.examples.train_line.ScaledGaussianWithOptionalExit(states, mus, scales, backward, n_steps=5)

Bases: torch.distributions.Distribution

Extends the Beta distribution by considering the step counter. When sampling, the step counter can be used to ensure the exit_action [inf, inf] is sampled.

Parameters:
  • states (gfn.states.States)

  • mus (torch.Tensor)

  • scales (torch.Tensor)

  • backward (bool)

  • n_steps (int)

backward
dist
exit_action
idx_at_final_backward_step
idx_at_final_forward_step
log_prob(sampled_actions)

Computes log-probabilities, returning 0 for deterministic exit/BTS transitions.

sample(sample_shape=())
class tutorials.examples.train_line.StepEstimator(env, module, backward)

Bases: gfn.estimators.Estimator, gfn.estimators.PolicyMixin

Estimator for PF and PB of the Line environment.

Parameters:
backward
property expected_output_dim: int

Expected output dimension of the module.

Returns:

The expected output dimension of the module, or None if the output dimension is not well-defined (e.g., when the output is a TensorDict for GraphActions).

Return type:

int

n_steps_per_trajectory
to_probability_distribution(states, module_output, scale_factor=0)

Converts the output of the neural network to a probability distribution.

Parameters:
  • states (gfn.states.States) – The states to use for the distribution.

  • module_output (torch.Tensor) – The output of the neural network as a tensor of shape (*batch_shape, output_dim).

  • scale_factor – The scale factor to use for the distribution.

Return type:

torch.distributions.Distribution

Returns a distribution object.

tutorials.examples.train_line.main(args)
tutorials.examples.train_line.parser
tutorials.examples.train_line.render(env, validation_samples=None)

Renders the reward distribution over the 1D env.

tutorials.examples.train_line.train(gflownet, env, seed=4444, n_trajectories=3000000.0, batch_size=128, lr_base=0.001, gradient_clip_value=5, exploration_var_starting_val=2)

Trains a GFlowNet on the Line Environment.