gfn.utils.trust_pcl¶

Trust-PCL ↔ RTB parameter conversion utilities.

Deleu et al. (2025, arXiv:2509.01632) proved that Relative Trajectory Balance (RTB) is mathematically equivalent to Trust-PCL, an off-policy reinforcement learning method with KL regularization toward a reference policy (Nachum et al., NeurIPS 2017).

The core identity (Proposition 3.1):

\[\mathcal{L}_{\text{Trust-PCL}}(\phi, \psi) = \alpha^2 \,\mathcal{L}_{\text{RTB}}(\phi, \psi)\]

where \(\alpha = 1/\beta\) is the Trust-PCL temperature.

What this means: Training a GFlowNet with RTB is exactly the same optimization problem as training a policy with Trust-PCL. The same parameters are updated, the same gradients flow, and the same fixed point is reached. Only the loss scale differs by the constant \(\alpha^2\).

Parameter correspondence:

Derivation sketch:

The RTB balance condition for a trajectory \(\tau\) is:

\[\log Z_\psi + \log p_\phi(\tau) = \beta \log r(x_T) + \log p_\theta(\tau)\]

Multiplying both sides by \(\alpha = 1/\beta\):

\[\alpha \log Z_\psi + \alpha \log p_\phi(\tau) = \log r(x_T) + \alpha \log p_\theta(\tau)\]

Rearranging with \(V^{\text{soft}}_\psi(s_0) = \alpha \log Z_\psi\):

\[-V^{\text{soft}}_\psi(s_0) + \sum_t r_t + \alpha \sum_t \log \frac{\pi_{\text{ref}}(a_t|s_t)}{\pi_\phi(a_t|s_t)} = 0\]

This is exactly the Trust-PCL consistency condition (Nachum et al. 2017, Equation 3). The KL regularization term \(\alpha \sum_t \log(\pi_{\text{ref}} / \pi_\phi)\) emerges naturally from the ratio of prior to posterior trajectory log-probabilities in the original RTB equation — no separate KL penalty is added; it is an intrinsic consequence of the balance condition.

References

Deleu et al. “Relative Trajectory Balance is equivalent to Trust-PCL” (2025, arXiv:2509.01632).

Nachum et al. “Trust-PCL: An Off-Policy Trust Region Method for Continuous Control” (NeurIPS 2017, arXiv:1707.01891).

Venkatraman et al. “Amortizing intractable inference in diffusion models for vision, language, and control” (NeurIPS 2024, arXiv:2405.20971).

Functions¶

`rtb_to_trust_pcl_params`(logZ, beta)	Convert RTB parameters to Trust-PCL parameters.
`trust_pcl_to_rtb_params`(alpha, v_soft_s0)	Convert Trust-PCL parameters to RTB parameters.

Module Contents¶

gfn.utils.trust_pcl.rtb_to_trust_pcl_params(logZ, beta)¶

Convert RTB parameters to Trust-PCL parameters.

Parameters:

logZ (torch.Tensor | float) – RTB log-partition function \(\log Z_\psi\).
beta (torch.Tensor | float) – RTB reward scaling \(\beta\).

Returns:

alpha = 1 / beta — Trust-PCL temperature
v_soft_s0 = alpha * logZ — soft value function at \(s_0\)

Return type:

Dictionary with keys "alpha" and "v_soft_s0"

Example:

>>> rtb_to_trust_pcl_params(logZ=2.0, beta=0.5)
{'alpha': 2.0, 'v_soft_s0': 4.0}

gfn.utils.trust_pcl.trust_pcl_to_rtb_params(alpha, v_soft_s0)¶

Convert Trust-PCL parameters to RTB parameters.