Noise Injection Strategies¶

This module provides different approaches to injecting noise into hidden states. The key insight is that noise injection should enable “trajectory resampling” - allowing the model to escape local optima in reasoning space.

Injection State¶

class structured_stochasticity.injection.InjectionState(step=0, total_steps=None)[source]¶

Bases: object

Tracks state across injection calls for stateful strategies.

Parameters:

step (int)
total_steps (int | None)

step: int = 0¶

total_steps: int | None = None¶

advance()[source]¶

reset()[source]¶

__init__(step=0, total_steps=None)¶

Parameters:

step (int)
total_steps (int | None)

Return type:

None

Base Class¶

class structured_stochasticity.injection.NoiseInjector(scale=0.1, device='cuda')[source]¶

Bases: ABC

Abstract base class for noise injection strategies.

Parameters:

scale (float)
device (str)

__init__(scale=0.1, device='cuda')[source]¶

Parameters:

scale (float)
device (str)

abstractmethod sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

inject(hidden_states)[source]¶

Inject noise into hidden states.

Parameters:: hidden_states (torch.Tensor) – Tensor of shape (batch, seq_len, hidden_dim)
Returns:: Perturbed hidden states of same shape
Return type:: torch.Tensor

reset()[source]¶: Reset internal state (call between generations).

Gaussian Noise¶

class structured_stochasticity.injection.GaussianNoiseInjector(scale=0.1, device='cuda')[source]¶

Bases: NoiseInjector

Injects Gaussian noise scaled by a constant factor.

This is the simplest strategy: z ~ N(0, scale²)

The scale parameter controls the magnitude of perturbation. Too small: won’t escape attractor basins Too large: destroys coherent reasoning

Parameters:

scale (float)
device (str)

__init__(scale=0.1, device='cuda')[source]¶

Parameters:

scale (float)
device (str)

sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

Uniform Noise¶

class structured_stochasticity.injection.UniformNoiseInjector(scale=0.1, device='cuda')[source]¶

Bases: NoiseInjector

Injects uniform noise in range [-scale, scale].

Uniform noise has bounded magnitude, which may be preferable when you want to guarantee perturbations stay within a range.

Parameters:

scale (float)
device (str)

__init__(scale=0.1, device='cuda')[source]¶

Parameters:

scale (float)
device (str)

sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

Annealed Noise¶

class structured_stochasticity.injection.AnnealedNoiseInjector(scale=0.1, anneal_factor=0.95, min_scale=0.01, device='cuda')[source]¶

Bases: NoiseInjector

Injects noise that decreases over the generation process.

Motivation: Strong perturbation early (when problem framing matters most) tapering to stability later (when solution is crystallizing).

Scale at step t: scale * (anneal_factor ^ t)

This mirrors a natural intuition: you want to explore different framings early, then commit and execute once you’ve found a good path.

Parameters:

scale (float)
anneal_factor (float)
min_scale (float)
device (str)

__init__(scale=0.1, anneal_factor=0.95, min_scale=0.01, device='cuda')[source]¶

Parameters:

scale (float)
anneal_factor (float)
min_scale (float)
device (str)

property current_scale: float¶

sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

Layer-Selective Injection¶

class structured_stochasticity.injection.LayerSelectiveInjector(layer_scales, default_scale=0.0, device='cuda')[source]¶

Bases: NoiseInjector

Applies different noise scales to different layers.

This allows testing the hypothesis that early layers (problem framing) vs late layers (output realization) have different sensitivity to perturbation.

Parameters:

layer_scales (dict[int, float]) – Dict mapping layer index to noise scale
default_scale (float) – Scale for layers not in layer_scales
device (str)

__init__(layer_scales, default_scale=0.0, device='cuda')[source]¶

Parameters:

layer_scales (dict[int, float])
default_scale (float)
device (str)

current_layer: int | None¶

set_layer(layer_idx)[source]¶

Set which layer we’re currently injecting into.

Parameters:: layer_idx (int)

property current_scale: float¶

sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

Once-Per-Generation Injection¶

class structured_stochasticity.injection.OncePerGenerationInjector(scale=0.1, latent_dim=None, device='cuda')[source]¶

Bases: NoiseInjector

Samples noise once and reuses it for entire generation.

This corresponds to the formalism in the paper:: z ~ P(z|X) [sampled once] h = f_θ(X, z)

The same z influences all tokens, creating a consistent “reasoning trajectory” rather than per-token perturbation.

Parameters:

scale (float)
latent_dim (int | None)
device (str)

__init__(scale=0.1, latent_dim=None, device='cuda')[source]¶

Parameters:

scale (float)
latent_dim (int | None)
device (str)

sample(shape)[source]¶

Sample noise tensor of given shape.

Parameters:: shape (tuple[int, ...])
Return type:: torch.Tensor

reset()[source]¶: Reset forces new noise sample on next generation.

Factory Function¶

structured_stochasticity.injection.create_injector(strategy, scale=0.1, device='cuda', **kwargs)[source]¶

Factory function to create noise injectors.

Parameters:

strategy (str) – One of “gaussian”, “uniform”, “annealed”, “once”, “layer_selective”
scale (float) – Base noise scale
device (str) – Torch device
**kwargs – Strategy-specific arguments

Returns:

Configured NoiseInjector instance

Return type:

NoiseInjector

Structured Stochasticity

Navigation

Related Topics

Noise Injection Strategies¶

Injection State¶

Base Class¶

Gaussian Noise¶

Uniform Noise¶

Annealed Noise¶

Layer-Selective Injection¶

Once-Per-Generation Injection¶

Factory Function¶