gfn.gym.hypergrid ================= .. py:module:: gfn.gym.hypergrid .. autoapi-nested-parse:: Adapted from https://github.com/Tikquuss/GflowNets_Tutorial Attributes ---------- .. autoapisummary:: gfn.gym.hypergrid.logger Classes ------- .. autoapisummary:: gfn.gym.hypergrid.BitwiseXORReward gfn.gym.hypergrid.ConditionalHyperGrid gfn.gym.hypergrid.ConditionalMultiScaleReward gfn.gym.hypergrid.CosineReward gfn.gym.hypergrid.DeceptiveReward gfn.gym.hypergrid.GridReward gfn.gym.hypergrid.HyperGrid gfn.gym.hypergrid.MultiplicativeCoprimeReward gfn.gym.hypergrid.OriginalReward gfn.gym.hypergrid.SparseReward Functions --------- .. autoapisummary:: gfn.gym.hypergrid._first_k_dims gfn.gym.hypergrid.get_bitwise_xor_presets gfn.gym.hypergrid.get_conditional_multiscale_presets gfn.gym.hypergrid.get_cosine_presets gfn.gym.hypergrid.get_deceptive_presets gfn.gym.hypergrid.get_multiplicative_coprime_presets gfn.gym.hypergrid.get_original_presets gfn.gym.hypergrid.get_reward_presets gfn.gym.hypergrid.get_sparse_presets gfn.gym.hypergrid.lcm gfn.gym.hypergrid.lcm_multiple gfn.gym.hypergrid.smallest_multiplier_to_integers Module Contents --------------- .. py:class:: BitwiseXORReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Tiered, compositional reward based on bitwise XOR/parity constraints. This class implements the "Bitwise/XOR fractal" environment family: where tiers progressively constrain bit-planes across a subset of dimensions via linear parity checks over GF(2). It supports easy sharding by high-bit prefixes, and difficulty control by adjusting which bit-planes and how many dimensions are constrained per tier. GF(2) is the finite field with two elements {0, 1}, where addition and multiplication are performed modulo 2. In this context, vector addition is equivalent to bitwise XOR, and matrix-vector products (A @ b) are evaluated entrywise modulo 2. Reward form: R(s) = R0 + Σ_t tier_weights[t] · 1[ state satisfies all constraints up to tier t ] Key kwargs (with reasonable defaults): - R0: float, base reward (default 0.0) - tier_weights: list[float], strictly increasing weights for each tier - dims_constrained: Optional[list[int]] subset of dims to constrain (default: all dims) - bits_per_tier: list[tuple[int,int]]; for each tier t, inclusive bit range (low_bit, high_bit). Example: [(0,5), (0,7), (0,9)]. - parity_checks: Optional[list[dict]]; per tier, optional parity system: Each entry may contain: { "A": IntTensor[num_checks, m], "c": IntTensor[num_checks] } where m = len(dims_constrained). Constraints apply identically to every bit-plane specified for that tier: A @ b(mod2) == c, where b are the bit values across constrained dimensions at the tested bit-plane. If omitted for a tier, a single even-parity check across all constrained dims is used by default: sum(b) mod 2 == 0. Difficulty presets align with step ranges by controlling the highest bit used and the number of constrained dimensions. Typical distance from origin for valid modes scales roughly like (constrained_dims · 2^{highest_bit}). Comparison with other compositional rewards: - MultiplicativeCoprimeReward: number-theoretic (prime factorization); same constraint type per tier (tighter exponent caps), no cross-scale dependency. - ConditionalMultiScaleReward: base-B digit decomposition with conditional constraints across scales; each tier's rule is a function of prior tiers, introducing qualitatively different structure at each level. - This class: GF(2) linear algebra on bit-planes; same parity check type per tier, but applied to progressively wider bit windows. Constraints at each bit-plane are independent of other planes. .. py:attribute:: R0 :type: float .. py:method:: __call__(states_tensor) .. py:method:: _apply_parity_checks(bits_plane, tier_idx) Apply GF(2) linear parity checks at a single bit-plane. bits_plane: (..., m) with m=len(dims_constrained), integer in {0,1}. Returns mask (...,) bool. .. py:method:: _even_parity_mask(bits) bits: (..., m) int/bool -> returns (...,) bool for even parity. .. py:attribute:: bits_per_tier :type: list[tuple[int, int]] .. py:attribute:: dims_constrained :type: list[int] .. py:attribute:: parity_checks .. py:attribute:: tier_weights :type: list[float] .. py:class:: ConditionalHyperGrid(*args, **kwargs) Bases: :py:obj:`HyperGrid` HyperGrid environment with condition-aware rewards. Let condition 'c' be a real value in [0, 1]. It defines the reward as a linear interpolation between the uniform reward and the original reward. Special cases are: - c = 0: Uniform reward (all terminal states get reward=R0+R1+R2) - c = 1: Original HyperGrid reward (original multi-modal reward landscape) .. py:attribute:: _log_partition_cache :type: dict[torch.Tensor, float] .. py:attribute:: _max_reward :type: float .. py:attribute:: _original_reward_fn .. py:attribute:: _true_dist_cache :type: dict[torch.Tensor, torch.Tensor] .. py:attribute:: condition_dim :type: int :value: 1 .. py:attribute:: is_conditional :type: bool :value: True .. py:method:: log_partition(condition) Compute the log partition for the given condition. :param condition: The condition to compute the log partition for. condition.shape should be (1,) :returns: The log partition function, as a float. .. py:method:: reward(states) Compute rewards for the conditional environment. A condition is continuous from 0 to 1: - 0: Fully uniform reward (all states get R0+R1+R2) - 1: Fully original HyperGrid reward - In between: Linear interpolation between uniform and original :param states: The states to compute rewards for. states.tensor.shape should be (*batch_shape, *state_shape) :returns: A tensor of shape (*batch_shape,) containing the rewards. .. py:method:: sample_conditions(batch_shape) Sample conditions for the environment. .. py:method:: true_dist(condition) Compute the true distribution for the given condition. :param condition: The condition to compute the true distribution for. :param condition.shape should be: :type condition.shape should be: 1, :returns: The true distribution for the given condition as a 1-dimensional tensor. .. py:class:: ConditionalMultiScaleReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Tiered reward via conditional digit constraints across spatial scales. Each coordinate is decomposed in base B into L = log_B(H) digits. Tier t constrains digit t-1 via a shifted filter that depends on all finer-scale digits, creating a hierarchy where learning lower-scale structure is prerequisite for predicting higher-scale constraints. Per-dimension constraint at tier t: (d_{t-1}(i) + sigma_t(i)) mod B < f_t where sigma_t(i) = sum_{k=0}^{t-2} a_{t,k} * d_k(i) mod B is a linear function of finer-scale digits with seed-derived coefficients a_{t,k}. Optional cross-dimensional constraint at tier t: sum_i d_{t-1}(i) ≡ 0 (mod m_t) Reward form (cumulative — tier t requires all tiers 1..t): R(s) = R0 + sum_t tier_weights[t] * 1[s satisfies tiers 1..t] Mode count (exact closed form): Without cross-dim: modes_T = (prod_{t=1}^T f_t)^d * B^{(L-T)*d} With cross-dim: modes_T = (prod_t f_t)^d * B^{(L-T)*d} / prod_t m_t Partition function (analytic, no enumeration): Z = R0 * H^d + sum_t w_t * modes_t Key kwargs: - R0: float, base reward (default 0.0). - tier_weights: list[float], reward weight per tier. - base: int, digit base B (default 4). H must be a power of B. - filter_width: int, number of passing digit values per tier (default B//2). Constant across tiers to avoid mode collapse at deep tiers. - seed: int, PRNG seed for generating shift coefficients (default 42). - cross_dim_mods: Optional[list[int|None]], per-tier modular cross-dim constraint. m_t must divide filter_width for exact mode counts. Default: no cross-dim constraints. - active_dims: Optional[list[int]], subset of dims to constrain (default: all dims). Comparison with other compositional rewards: - BitwiseXORReward: GF(2) parity checks on bit-planes; each tier widens the bit window but uses the same rule type. No cross-scale dependency. - MultiplicativeCoprimeReward: prime factorization with tightening exponent caps. Same constraint type at every tier. - This class: each tier introduces a qualitatively different constraint whose form depends on what was learned at prior tiers (conditional structure across scales). .. py:attribute:: R0 :type: float .. py:method:: __call__(states_tensor) .. py:method:: _extract_digits(x, num_levels) Extract base-B digits from x. :param x: (..., m) integer tensor with coordinate values. :param num_levels: how many digit levels to extract. :returns: List of num_levels tensors, each (..., m), with digit values in [0, B). digits[k] is the k-th digit (scale k), i.e., floor(x / B^k) mod B. .. py:attribute:: active_dims :type: list[int] .. py:method:: analytic_log_partition() Compute log(Z) analytically. Z = R0 * H^d + sum_t w_t * modes_t .. py:method:: analytic_mode_count(tier = None) Compute exact mode count for a given tier (1-indexed) or all tiers. :param tier: 1-indexed tier number. If None, returns count for the highest tier (most constrained). :returns: Number of states satisfying all constraints up to the given tier. .. py:attribute:: base :type: int .. py:attribute:: cross_dim_mods :type: list[int | None] :value: [] .. py:attribute:: filter_width :type: int .. py:method:: mode_threshold(target_sparsity = 0.1) Return the reward threshold for mode counting at the adaptive tier. States with reward >= this value are counted as modes. The tier is chosen via ``mode_tier(target_sparsity)`` so that mode coverage adapts to dimensionality. :param target_sparsity: Passed to ``mode_tier()``. :returns: Reward threshold (R0 + sum of weights up to the mode tier). .. py:method:: mode_tier(target_sparsity = 0.1) Return the lowest tier whose mode coverage is below target_sparsity. Coverage at tier t = (f/B)^(t*d) (before cross-dim constraints). This adapts the mode definition to dimensionality: at low d, deeper tiers are needed for modes to be sparse; at high d, even tier 1 is already a needle in a haystack. :param target_sparsity: Fraction of total state space below which modes are considered "interesting" (default 0.10 = 10%). :returns: 1-indexed tier number. Clamped to [1, num_tiers]. .. py:attribute:: num_levels :type: int :value: 0 .. py:attribute:: seed :type: int .. py:attribute:: shift_coeffs :type: list[list[int]] :value: [] .. py:attribute:: tier_weights :type: list[float] .. py:class:: CosineReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Cosine reward function. .. py:method:: __call__(states_tensor) .. py:class:: DeceptiveReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Deceptive reward function from Adaptive Teachers (Kim et al., 2025). Note that the reward definition in the paper (eq. (9)) is incorrect, and we follow the official implementation (https://github.com/alstn12088/adaptive-teacher/blob/8cfcb2298fce3f46eb36ead03791eeee75b7d066/grid/env.py#L27) while modifying it to use EPS = 1e-12 to handle inequalities with floating points. .. py:method:: __call__(states_tensor) .. py:class:: GridReward(height, ndim, **kwargs) Bases: :py:obj:`abc.ABC` Base class for reward functions that can be pickled. .. py:attribute:: _EPS :value: 1e-12 .. py:method:: __call__(states_tensor) :abstractmethod: .. py:attribute:: height .. py:attribute:: kwargs .. py:attribute:: ndim .. py:class:: HyperGrid(ndim = 2, height = 8, reward_fn_str = 'original', reward_fn_kwargs = None, device = 'cpu', calculate_partition = False, store_all_states = False, debug = False, validate_modes = True, mode_stats = 'none', mode_stats_samples = 20000) Bases: :py:obj:`gfn.env.DiscreteEnv` HyperGrid environment from the GFlowNets paper. The states are represented as 1-d tensors of length `ndim` with values in `{0, 1, ..., height - 1}`. .. attribute:: ndim The dimension of the grid. .. attribute:: height The height of the grid. .. attribute:: reward_fn The reward function. .. attribute:: calculate_partition Whether to calculate the log partition function. .. attribute:: store_all_states Whether to store all states. .. attribute:: validate_modes Whether to check that at least one state reaches the mode threshold at init; raises if not. .. attribute:: mode_stats One of {"none", "approx", "exact"}. If not "none", computes (exact or approximate) `n_modes` and `n_mode_states`. "exact" requires `store_all_states=True` and enumerates all states. .. attribute:: mode_stats_samples Number of random samples when `mode_stats="approx"`. .. py:attribute:: States :type: type[gfn.states.DiscreteStates] .. py:attribute:: _all_states_tensor :value: None .. py:method:: _enumerate_all_states_tensor(batch_size = 20000) Enumerate all grid states, optionally storing them and computing log Z. Iterates over the full Cartesian product ``{0, ..., H-1}^D`` in batches (via multiprocessing) to avoid materializing all ``H^D`` states at once. :param batch_size: Number of states per batch. .. py:method:: _exists_bitwise_xor(thr) Feasibility and constructive check for ``BitwiseXORReward``. Steps: - For each tier, verify the GF(2) parity system has at least one solution using Gaussian elimination modulo 2. If any tier is infeasible, no mode exists. - The all-zero configuration satisfies even-parity constraints, so if tiers are feasible we evaluate that point against the threshold with tolerance. .. py:method:: _exists_cosine(thr) Analytic upper-bound check for ``CosineReward``. Idea: - The per-dimension factor is ``(cos(50·ax) + 1) · N(0,1)(5·ax)`` with ax in [0,0.5]. We estimate its maximum over the discrete grid by evaluating all candidate ax and taking the maximum value ``m``. - The full reward upper bound is ``R0 + m^D * R1``. If this is at least the mode target and the given threshold, a mode-level state must exist. - We also compute a theoretical per-dimension peak (at ax≈0) to form a slightly conservative target scaled by ``mode_gamma``. .. py:method:: _exists_fallback_random(thr) Random sampling fallback. Draw a modest batch of random states on CPU and accept if any exceed the threshold with a small tolerance. This is a last resort to avoid expensive enumeration for large grids. .. py:method:: _exists_multiplicative_coprime(thr) Number-theoretic constructive check for ``MultiplicativeCoprimeReward``. Constructs a candidate state by factoring the target LCM (if any) over the allowed primes, assigning each prime power to a separate active dimension, and verifying coprimality and grid-bound constraints. .. py:method:: _exists_original_or_deceptive(thr) Constructive check for ``OriginalReward`` and ``DeceptiveReward``. Intuition: - These rewards form rings/bands around the center when each coordinate is normalized to [0,1]. The mode lies on a thin band at specific normalized distances from the center. - We translate those fractional band boundaries into integer indices via small inside/outside nudges (using ``EPS_INDEX_CMP``) and test one candidate index from any non-empty feasible interval. - If the reward at that candidate exceeds the threshold (with ``EPS_REWARD_CMP`` tolerance), we return True. .. py:method:: _exists_sparse(thr) Constructive check for ``SparseReward``. This reward assigns positive mass only to a finite set of target configurations. When ``H>=2`` and ``D>=1``, a known target is the zero vector except for certain coordinates fixed at 1 or ``H-2``. We probe a canonical target and confirm the threshold is not above its reward. .. py:method:: _generate_combinations_in_batches(ndim, max_val, batch_size) Yield batches of the Cartesian product {0, ..., max_val}^ndim. Uses multiprocessing to avoid materializing the full product (size ``(max_val+1)^ndim``) in memory. :param ndim: Number of dimensions (tuple length). :param max_val: Maximum coordinate value (inclusive). :param batch_size: Number of tuples per batch. :Yields: An iterator of tuples for each batch. .. py:attribute:: _log_partition :value: None .. py:method:: _mode_reward_threshold() Returns the reward threshold used to define a mode. By default, a state is considered in a mode if its reward is at least the schema-defined threshold derived from the configured reward. .. py:attribute:: _mode_stats_kind :type: str :value: 'none' .. py:method:: _modes_exist_quick_check() Lightweight check that a mode-level state exists. In simple terms, this answers: "Is there at least one state whose reward reaches the mode threshold?" without enumerating all states. It proceeds in three stages: 1) If the grid is small (or pre-enumerated), it computes rewards exactly and checks against the threshold. 2) Otherwise, it dispatches to reward-specific constructive tests that are sufficient to guarantee at least one state reaches the threshold. 3) As a last resort, it samples a small batch of random states. .. py:method:: _modes_exist_quick_check_info() Same as _modes_exist_quick_check but returns (ok, message). .. py:attribute:: _n_mode_states_estimate :type: float | None :value: None .. py:attribute:: _n_mode_states_exact :type: int | None :value: None .. py:method:: _solve_gf2_has_solution(A, c) :staticmethod: Return True if A x = c over GF(2) has at least one solution. Performs Gaussian elimination modulo 2 (XOR arithmetic) without constructing a specific solution. A row that reduces to all-zero coefficients with a non-zero RHS (``0 = 1``) indicates inconsistency. .. py:attribute:: _true_dist :value: None .. py:method:: _worker(task) Return a slice of the Cartesian product for one batch. :param task: (values, ndim, start_idx, end_idx) where values is the list of coordinate values, ndim is the number of dimensions, and [start_idx, end_idx) is the range within the full product. .. py:method:: all_indices() Generate all possible indices for the grid. :returns: A list of all possible indices for the grid. .. py:property:: all_states :type: gfn.states.DiscreteStates | None Returns a tensor of all hypergrid states as a `DiscreteStates` instance. .. py:method:: backward_step(states, actions) Performs a backward step in the environment. :param states: The current states. :param actions: The actions to take. :returns: The previous states. .. py:attribute:: calculate_partition :value: False .. py:method:: get_states_indices(states) Get the indices of the states in the canonical ordering. :param states: The states to get the indices of. :returns: The indices of the states in the canonical ordering. .. py:method:: get_terminating_states_indices(states) Get the indices of the terminating states in the canonical ordering. :param states: The states to get the indices of. :returns: The indices of the terminating states in the canonical ordering. .. py:attribute:: height :value: 8 .. py:method:: log_partition(condition=None) Returns the log partition of the reward function. .. py:method:: make_random_states(batch_shape, conditions = None, device = None, debug = False) Creates a batch of random states. :param batch_shape: The shape of the batch. :param conditions: Optional tensor of shape (*batch_shape, condition_dim) containing condition vectors for conditional GFlowNets. :param device: The device to use. :param debug: If True, emit States with debug guards (not compile-friendly). :returns: A `DiscreteStates` object with random states. .. py:method:: make_states_class() Returns the DiscreteStates class for the HyperGrid environment. .. py:method:: mode_mask(states) Boolean mask indicating which states are in a mode. A state is flagged as mode if its reward is greater-or-equal to the threshold based on `reward_fn_kwargs` (R0+R1+R2 by default). .. py:method:: modes_found(states) Returns the set of canonical state indices for mode states in the batch. Each mode state is identified by its unique canonical index (from ``get_states_indices``), not by a quadrant-based grouping. This allows correct mode-state tracking for all reward functions. .. py:property:: n_mode_states :type: int | float | None Number of states inside a mode (exact, approx, or None). - If mode_stats="exact", returns an exact integer count. - If mode_stats="approx", returns a floating-point estimate. - If store_all_states is True (but mode_stats was "none"), computes on demand from all_states. - Otherwise, returns None. .. py:property:: n_modes :type: int | float | None Returns the total number of mode states for this environment. Equivalent to ``n_mode_states``. Each individual grid cell whose reward meets the mode threshold counts as one mode. .. py:property:: n_states :type: int Returns the number of states in the environment. .. py:property:: n_terminating_states :type: int Returns the number of terminating states in the environment. .. py:attribute:: ndim :value: 2 .. py:method:: reward(states) Computes the reward for a batch of final states. In the normal setting, the reward is: `R(s) = R_0 + 0.5 \prod_{d=1}^D \mathbf{1} \left( \left\lvert \frac{s^d}{H-1} - 0.5 \right\rvert \in (0.25, 0.5] \right) + 2 \prod_{d=1}^D \mathbf{1} \left( \left\lvert \frac{s^d}{H-1} - 0.5 \right\rvert \in (0.3, 0.4) \right)` :param final_states: The final states. :returns: The reward of the final states. .. py:attribute:: reward_fn .. py:attribute:: reward_fn_kwargs :value: None .. py:method:: step(states, actions) Performs a step in the environment. :param states: The current states. :param actions: The actions to take. :returns: The next states. .. py:attribute:: store_all_states :value: False .. py:property:: terminating_states :type: gfn.states.DiscreteStates | None Returns all terminating states of the environment. .. py:method:: true_dist(condition=None) Returns the pmf over all states in the hypergrid. .. py:class:: MultiplicativeCoprimeReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Tiered reward based on prime-support and coprimality/lcm composition. Each tier enforces that per-dimension values use only a small shared prime set with bounded exponents, plus optional cross-dimension constraints (pairwise coprime pairs and/or target lcm). Higher tiers tighten exponent caps or add additional global targets. This encourages information sharing to learn the latent prime/exponent structure. Reward form: R(s) = R0 + Σ_t tier_weights[t] · 1[ constraints_0..t all satisfied ] Key kwargs: - R0: float, base reward (default 0.0) - tier_weights: list[float] - primes: list[int], e.g., [2,3,5,7,11] - exponent_caps: list[int], same length as tier_weights. Cap for every prime at tier t (uniform cap across primes for simplicity). - active_dims: Optional[list[int]]; constraints only apply to these dims (default: all dims). Other dims are ignored in constraints. - coprime_pairs: Optional[list[tuple[int,int]]]; indices relative to active_dims. - target_lcms: Optional[list[int | None]]; per-tier target lcm across active dims. Notes: - Values 0 are treated as invalid for prime-support constraints (cannot factorize); value 1 is valid with all-zero exponents. - Implementation removes primes up to the current tier cap and checks residue == 1. Exponent counts are accumulated to evaluate LCM targets. Comparison with other compositional rewards: - BitwiseXORReward: GF(2) parity checks on bit-planes; same constraint type per tier (wider bit window), no cross-scale dependency. - ConditionalMultiScaleReward: base-B digit decomposition with conditional constraints across scales; each tier's rule depends on prior tiers. - This class: prime factorization with bounded exponents and optional coprimality/LCM targets. Same constraint type per tier (tighter caps), but cross-dimension coupling via coprime pairs and LCM targets. .. py:attribute:: R0 :type: float .. py:method:: __call__(states_tensor) .. py:method:: _factor_exponents_up_to_cap(v, cap) Trial-divide each element by allowed primes, returning residue and exponents. :param v: (...,) LongTensor of non-negative values to factorize. :param cap: Maximum number of times each prime may divide a value. :returns: (...,) values after stripping allowed primes (1 if fully factored). exps: [num_primes, ...] exponent counts per prime (leading axis is primes). :rtype: residue .. py:method:: _lcm_ok(exps, target_lcm) Check whether max exponents across dims match target LCM's factorization. :param exps: [num_primes, ..., num_active_dims] exponent counts. :param target_lcm: The target LCM value to match. :returns: (...,) bool mask, True where the LCM of active-dim values equals target. .. py:method:: _pairwise_coprime_ok(v) Check that configured dimension pairs share no common allowed prime. :param v: (..., num_active_dims) coordinate values. :returns: (...,) bool mask, True where all coprime pair constraints hold. .. py:attribute:: active_dims :type: list[int] .. py:attribute:: coprime_pairs .. py:attribute:: exponent_caps :type: list[int] .. py:attribute:: primes :type: list[int] .. py:attribute:: target_lcms .. py:attribute:: tier_weights :type: list[float] .. py:class:: OriginalReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` The reward function from the original GFlowNet paper (Bengio et al., 2021; https://arxiv.org/abs/2106.04399). .. py:method:: __call__(states_tensor) .. py:class:: SparseReward(height, ndim, **kwargs) Bases: :py:obj:`GridReward` Sparse reward function from the GAFN paper (Pan et al., 2022; https://arxiv.org/abs/2210.03308). .. py:method:: __call__(states_tensor) .. py:attribute:: targets .. py:function:: _first_k_dims(k, ndim) Return indices [0, 1, ..., min(k, ndim)-1] for the first k dimensions. .. py:function:: get_bitwise_xor_presets(ndim, height) Return five difficulty presets for BitwiseXORReward. The presets target approximate L1 distance bands by selecting the highest constrained bit and number of constrained dimensions. Typical distance scales like m · 2^b, where m is the number of constrained dims and b the highest bit. Bands (steps from s0): - easy: ~50-100 - medium: ~250-500 - hard: ~1k-2.5k - challenging: ~2.5k-5k - impossible: 5k+ Notes - You may tweak m (dims) and bit windows to fine-tune distances for your D,H. - Tier weights are geometric to encourage reaching higher tiers. - Parity checks default to even parity across constrained dims per bit-plane. .. py:function:: get_conditional_multiscale_presets(ndim, height) Return five difficulty presets for ConditionalMultiScaleReward. All presets use base=4 (requiring H to be a power of 4). The number of available digit levels is L = log_4(H). Difficulty is controlled by: - Number of tiers (more tiers = deeper compositional hierarchy) - Number of active dims (more = exponentially sparser modes) - Cross-dim modular constraints (further sparsification) Mode counts are computed via: modes_T = (f^T * 4^{L-T})^d / prod_t m_t with f = filter_width = 2 (i.e. B//2), so each tier halves modes per coord. Presets (assuming H=256, i.e. L=4 digit levels): - easy: 2 tiers, 3 active dims -> ~2M modes at tier 2 - medium: 3 tiers, 4 active dims -> ~1M modes at tier 3 - hard: 3 tiers, 6 active dims, cross-dim -> ~260K modes at tier 3 - challenging: 4 tiers, 8 active dims, cross-dim -> ~65K modes at tier 4 - impossible: 4 tiers, 12 active dims, cross-dim -> ~4K modes at tier 4 .. py:function:: get_cosine_presets(ndim, height) Return five presets for CosineReward. R1 scales the oscillatory product, and `mode_gamma` (used only for mode detection thresholding) tightens what is considered a "mode-like" maximum. .. py:function:: get_deceptive_presets(ndim, height) Return five presets for DeceptiveReward. Increase R2 to accentuate the thin band, and set a small but non-zero R0. R1 controls the center emphasis vs. the cancelled outer region. .. py:function:: get_multiplicative_coprime_presets(ndim, height) Return five difficulty presets for MultiplicativeCoprimeReward. Bands (steps from s0): - easy: ~50-100 (small primes, small exponents, few active dims) - medium: ~250-500 (adds one prime, caps=2, more dims, light coupling) - hard: ~1k-2.5k (primes up to 11, caps=3, more dims, LCM target) - challenging: ~2.5k-5k (primes up to 13, caps=3-4, 10-12 dims, tighter) - impossible: 5k+ (primes up to 29, caps=4, 12-16 dims, multiple targets) Notes - Distances are approximate; increase primes and exponent caps to push further. - `active_dims` indexes are relative to state dims; we pick first k for simplicity. - `coprime_pairs` are pairs within `active_dims` index space. - Tier weights are geometric. .. py:function:: get_original_presets(ndim, height) Return five presets for OriginalReward. These presets primarily control the relative importance of the outer ring (R1) and thin band (R2). Exploration difficulty (distance from s0) is more a function of (D, H) than of these weights; tune D and H externally to match your distance bands. .. py:function:: get_reward_presets(reward_fn_str, ndim, height) Return presets for a given reward name: 'bitwise_xor', 'multiplicative_coprime', 'conditional_multiscale'. Usage ---- presets = get_reward_presets("bitwise_xor", D, H) kwargs = presets["hard"] env = HyperGrid(ndim=D, height=H, reward_fn_str="bitwise_xor", reward_fn_kwargs=kwargs) .. py:function:: get_sparse_presets(ndim, height) Return five presets for SparseReward. SparseReward has built-in targets; it ignores most kwargs. Presets are provided for API symmetry and future extensibility. .. py:function:: lcm(a, b) Returns the lowest common multiple between a and b. .. py:function:: lcm_multiple(numbers) Find the lowest common multiple across a list of numbers .. py:data:: logger .. py:function:: smallest_multiplier_to_integers(float_vector, precision=3) Used to calculate a scale factor to avoid imprecise floating point arithmetic.