Noise Injection Strategies

This module provides different approaches to injecting noise into hidden states. The key insight is that noise injection should enable “trajectory resampling” - allowing the model to escape local optima in reasoning space.

Injection State

class structured_stochasticity.injection.InjectionState(step=0, total_steps=None)[source]

Bases: object

Tracks state across injection calls for stateful strategies.

Parameters:
  • step (int)

  • total_steps (int | None)

step: int = 0
total_steps: int | None = None
advance()[source]
reset()[source]
__init__(step=0, total_steps=None)
Parameters:
  • step (int)

  • total_steps (int | None)

Return type:

None

Base Class

class structured_stochasticity.injection.NoiseInjector(scale=0.1, device='cuda')[source]

Bases: ABC

Abstract base class for noise injection strategies.

Parameters:
  • scale (float)

  • device (str)

__init__(scale=0.1, device='cuda')[source]
Parameters:
  • scale (float)

  • device (str)

abstractmethod sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

inject(hidden_states)[source]

Inject noise into hidden states.

Parameters:

hidden_states (torch.Tensor) – Tensor of shape (batch, seq_len, hidden_dim)

Returns:

Perturbed hidden states of same shape

Return type:

torch.Tensor

reset()[source]

Reset internal state (call between generations).

Gaussian Noise

class structured_stochasticity.injection.GaussianNoiseInjector(scale=0.1, device='cuda')[source]

Bases: NoiseInjector

Injects Gaussian noise scaled by a constant factor.

This is the simplest strategy: z ~ N(0, scale²)

The scale parameter controls the magnitude of perturbation. Too small: won’t escape attractor basins Too large: destroys coherent reasoning

Parameters:
  • scale (float)

  • device (str)

__init__(scale=0.1, device='cuda')[source]
Parameters:
  • scale (float)

  • device (str)

sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

Uniform Noise

class structured_stochasticity.injection.UniformNoiseInjector(scale=0.1, device='cuda')[source]

Bases: NoiseInjector

Injects uniform noise in range [-scale, scale].

Uniform noise has bounded magnitude, which may be preferable when you want to guarantee perturbations stay within a range.

Parameters:
  • scale (float)

  • device (str)

__init__(scale=0.1, device='cuda')[source]
Parameters:
  • scale (float)

  • device (str)

sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

Annealed Noise

class structured_stochasticity.injection.AnnealedNoiseInjector(scale=0.1, anneal_factor=0.95, min_scale=0.01, device='cuda')[source]

Bases: NoiseInjector

Injects noise that decreases over the generation process.

Motivation: Strong perturbation early (when problem framing matters most) tapering to stability later (when solution is crystallizing).

Scale at step t: scale * (anneal_factor ^ t)

This mirrors a natural intuition: you want to explore different framings early, then commit and execute once you’ve found a good path.

Parameters:
  • scale (float)

  • anneal_factor (float)

  • min_scale (float)

  • device (str)

__init__(scale=0.1, anneal_factor=0.95, min_scale=0.01, device='cuda')[source]
Parameters:
  • scale (float)

  • anneal_factor (float)

  • min_scale (float)

  • device (str)

property current_scale: float
sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

Layer-Selective Injection

class structured_stochasticity.injection.LayerSelectiveInjector(layer_scales, default_scale=0.0, device='cuda')[source]

Bases: NoiseInjector

Applies different noise scales to different layers.

This allows testing the hypothesis that early layers (problem framing) vs late layers (output realization) have different sensitivity to perturbation.

Parameters:
  • layer_scales (dict[int, float]) – Dict mapping layer index to noise scale

  • default_scale (float) – Scale for layers not in layer_scales

  • device (str)

__init__(layer_scales, default_scale=0.0, device='cuda')[source]
Parameters:
  • layer_scales (dict[int, float])

  • default_scale (float)

  • device (str)

current_layer: int | None
set_layer(layer_idx)[source]

Set which layer we’re currently injecting into.

Parameters:

layer_idx (int)

property current_scale: float
sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

Once-Per-Generation Injection

class structured_stochasticity.injection.OncePerGenerationInjector(scale=0.1, latent_dim=None, device='cuda')[source]

Bases: NoiseInjector

Samples noise once and reuses it for entire generation.

This corresponds to the formalism in the paper:

z ~ P(z|X) [sampled once] h = f_θ(X, z)

The same z influences all tokens, creating a consistent “reasoning trajectory” rather than per-token perturbation.

Parameters:
  • scale (float)

  • latent_dim (int | None)

  • device (str)

__init__(scale=0.1, latent_dim=None, device='cuda')[source]
Parameters:
  • scale (float)

  • latent_dim (int | None)

  • device (str)

sample(shape)[source]

Sample noise tensor of given shape.

Parameters:

shape (tuple[int, ...])

Return type:

torch.Tensor

reset()[source]

Reset forces new noise sample on next generation.

Factory Function

structured_stochasticity.injection.create_injector(strategy, scale=0.1, device='cuda', **kwargs)[source]

Factory function to create noise injectors.

Parameters:
  • strategy (str) – One of “gaussian”, “uniform”, “annealed”, “once”, “layer_selective”

  • scale (float) – Base noise scale

  • device (str) – Torch device

  • **kwargs – Strategy-specific arguments

Returns:

Configured NoiseInjector instance

Return type:

NoiseInjector