gfn.utils.trust_pcl¶
Trust-PCL ↔ RTB parameter conversion utilities.
Deleu et al. (2025, arXiv:2509.01632) proved that Relative Trajectory Balance (RTB) is mathematically equivalent to Trust-PCL, an off-policy reinforcement learning method with KL regularization toward a reference policy (Nachum et al., NeurIPS 2017).
The core identity (Proposition 3.1):
where \(\alpha = 1/\beta\) is the Trust-PCL temperature.
What this means: Training a GFlowNet with RTB is exactly the same optimization problem as training a policy with Trust-PCL. The same parameters are updated, the same gradients flow, and the same fixed point is reached. Only the loss scale differs by the constant \(\alpha^2\).
Parameter correspondence:
Derivation sketch:
The RTB balance condition for a trajectory \(\tau\) is:
Multiplying both sides by \(\alpha = 1/\beta\):
Rearranging with \(V^{\text{soft}}_\psi(s_0) = \alpha \log Z_\psi\):
This is exactly the Trust-PCL consistency condition (Nachum et al. 2017, Equation 3). The KL regularization term \(\alpha \sum_t \log(\pi_{\text{ref}} / \pi_\phi)\) emerges naturally from the ratio of prior to posterior trajectory log-probabilities in the original RTB equation — no separate KL penalty is added; it is an intrinsic consequence of the balance condition.
References
Deleu et al. “Relative Trajectory Balance is equivalent to Trust-PCL” (2025, arXiv:2509.01632).
Nachum et al. “Trust-PCL: An Off-Policy Trust Region Method for Continuous Control” (NeurIPS 2017, arXiv:1707.01891).
Venkatraman et al. “Amortizing intractable inference in diffusion models for vision, language, and control” (NeurIPS 2024, arXiv:2405.20971).
Functions¶
|
Convert RTB parameters to Trust-PCL parameters. |
|
Convert Trust-PCL parameters to RTB parameters. |
Module Contents¶
- gfn.utils.trust_pcl.rtb_to_trust_pcl_params(logZ, beta)¶
Convert RTB parameters to Trust-PCL parameters.
- Parameters:
logZ (torch.Tensor | float) – RTB log-partition function \(\log Z_\psi\).
beta (torch.Tensor | float) – RTB reward scaling \(\beta\).
- Returns:
alpha = 1 / beta— Trust-PCL temperaturev_soft_s0 = alpha * logZ— soft value function at \(s_0\)
- Return type:
Dictionary with keys
"alpha"and"v_soft_s0"
Example:
>>> rtb_to_trust_pcl_params(logZ=2.0, beta=0.5) {'alpha': 2.0, 'v_soft_s0': 4.0}
- gfn.utils.trust_pcl.trust_pcl_to_rtb_params(alpha, v_soft_s0)¶
Convert Trust-PCL parameters to RTB parameters.
- Parameters:
alpha (torch.Tensor | float) – Trust-PCL temperature \(\alpha\).
v_soft_s0 (torch.Tensor | float) – Soft value function at \(s_0\), i.e. \(V^{\text{soft}}_\psi(s_0)\).
- Returns:
beta = 1 / alpha— RTB reward scalinglogZ = v_soft_s0 / alpha— RTB log-partition function
- Return type:
Dictionary with keys
"beta"and"logZ"
Example:
>>> trust_pcl_to_rtb_params(alpha=2.0, v_soft_s0=4.0) {'beta': 0.5, 'logZ': 2.0}