gfn.utils.trust_pcl

Trust-PCL ↔ RTB parameter conversion utilities.

Deleu et al. (2025, arXiv:2509.01632) proved that Relative Trajectory Balance (RTB) is mathematically equivalent to Trust-PCL, an off-policy reinforcement learning method with KL regularization toward a reference policy (Nachum et al., NeurIPS 2017).

The core identity (Proposition 3.1):

\[\mathcal{L}_{\text{Trust-PCL}}(\phi, \psi) = \alpha^2 \,\mathcal{L}_{\text{RTB}}(\phi, \psi)\]

where \(\alpha = 1/\beta\) is the Trust-PCL temperature.

What this means: Training a GFlowNet with RTB is exactly the same optimization problem as training a policy with Trust-PCL. The same parameters are updated, the same gradients flow, and the same fixed point is reached. Only the loss scale differs by the constant \(\alpha^2\).

Parameter correspondence:

Derivation sketch:

The RTB balance condition for a trajectory \(\tau\) is:

\[\log Z_\psi + \log p_\phi(\tau) = \beta \log r(x_T) + \log p_\theta(\tau)\]

Multiplying both sides by \(\alpha = 1/\beta\):

\[\alpha \log Z_\psi + \alpha \log p_\phi(\tau) = \log r(x_T) + \alpha \log p_\theta(\tau)\]

Rearranging with \(V^{\text{soft}}_\psi(s_0) = \alpha \log Z_\psi\):

\[-V^{\text{soft}}_\psi(s_0) + \sum_t r_t + \alpha \sum_t \log \frac{\pi_{\text{ref}}(a_t|s_t)}{\pi_\phi(a_t|s_t)} = 0\]

This is exactly the Trust-PCL consistency condition (Nachum et al. 2017, Equation 3). The KL regularization term \(\alpha \sum_t \log(\pi_{\text{ref}} / \pi_\phi)\) emerges naturally from the ratio of prior to posterior trajectory log-probabilities in the original RTB equation — no separate KL penalty is added; it is an intrinsic consequence of the balance condition.

References

Deleu et al. “Relative Trajectory Balance is equivalent to Trust-PCL” (2025, arXiv:2509.01632).

Nachum et al. “Trust-PCL: An Off-Policy Trust Region Method for Continuous Control” (NeurIPS 2017, arXiv:1707.01891).

Venkatraman et al. “Amortizing intractable inference in diffusion models for vision, language, and control” (NeurIPS 2024, arXiv:2405.20971).

Functions

rtb_to_trust_pcl_params(logZ, beta)

Convert RTB parameters to Trust-PCL parameters.

trust_pcl_to_rtb_params(alpha, v_soft_s0)

Convert Trust-PCL parameters to RTB parameters.

Module Contents

gfn.utils.trust_pcl.rtb_to_trust_pcl_params(logZ, beta)

Convert RTB parameters to Trust-PCL parameters.

Parameters:
  • logZ (torch.Tensor | float) – RTB log-partition function \(\log Z_\psi\).

  • beta (torch.Tensor | float) – RTB reward scaling \(\beta\).

Returns:

  • alpha = 1 / beta — Trust-PCL temperature

  • v_soft_s0 = alpha * logZ — soft value function at \(s_0\)

Return type:

Dictionary with keys "alpha" and "v_soft_s0"

Example:

>>> rtb_to_trust_pcl_params(logZ=2.0, beta=0.5)
{'alpha': 2.0, 'v_soft_s0': 4.0}
gfn.utils.trust_pcl.trust_pcl_to_rtb_params(alpha, v_soft_s0)

Convert Trust-PCL parameters to RTB parameters.

Parameters:
  • alpha (torch.Tensor | float) – Trust-PCL temperature \(\alpha\).

  • v_soft_s0 (torch.Tensor | float) – Soft value function at \(s_0\), i.e. \(V^{\text{soft}}_\psi(s_0)\).

Returns:

  • beta = 1 / alpha — RTB reward scaling

  • logZ = v_soft_s0 / alpha — RTB log-partition function

Return type:

Dictionary with keys "beta" and "logZ"

Example:

>>> trust_pcl_to_rtb_params(alpha=2.0, v_soft_s0=4.0)
{'beta': 0.5, 'logZ': 2.0}