Structured Stochasticity

An experimental framework for testing whether structured noise injection into LLM hidden states can mitigate reasoning collapse on complex algorithmic tasks.

The Problem

Contemporary large language models exhibit a sharp performance collapse as problem complexity increases. Recent research demonstrates that even state-of-the-art reasoning models fail on algorithmic tasks beyond a certain complexity threshold, regardless of how much "thinking" they're allowed to do.

Key Observation: Reasoning effort (e.g., length of reasoning traces) initially increases with complexity, then unexpectedly declines right when accuracy collapses. This suggests a structural limitation rather than mere resource exhaustion.

Core Hypothesis

We argue that this collapse is not an inherent limitation of neural sequence models, but rather a consequence of single-trajectory, deterministic inference.

In current LLMs:

In contrast, human reasoning involves:

The Solution: Structured Stochasticity

Weak vs Strong Stochasticity
Standard (Weak):
  Input X  -->  [Deterministic h = f(X)]  -->  Sample Output Y

Proposed (Strong):
  Input X  +  z ~ P(z|X)  -->  [Stochastic h = f(X,z)]  -->  Sample Output Y

By injecting a latent variable z into the model's hidden states, we enable distinct internal reasoning trajectories for the same input. This allows multiple problem decompositions and solution strategies to be explored.

Framework Components

Noise Injection

Multiple strategies for injecting noise: Gaussian, Uniform, Annealed (decreasing over generation), and Layer-Selective approaches.

PyTorch Hooks

Non-invasive modification of hidden states using forward hooks. Supports all major transformer architectures (Llama, Mistral, GPT-2, etc.).

Benchmark Tasks

Algorithmic tasks with exact solutions: Tower of Hanoi (flagship), Arithmetic Sequences, and Logical Deduction puzzles.

Evaluation

Trajectory aggregation via majority voting, oracle bounds, and K-scaling analysis to measure how performance improves with more trajectories.

Quick Start

# Install the package
pip install -e .

# Run a quick experiment
from structured_stochasticity import run_quick_experiment

results = run_quick_experiment(
    model_name="meta-llama/Llama-3.2-1B",
    noise_scale=0.1,
    complexity_range=(3, 5),
    k_values=[1, 5, 10],
    trials=5
)

Or use the CLI:

# Run with default config
ss-experiment --config configs/default.yaml

# Override parameters
ss-experiment --model meta-llama/Llama-3.2-1B --scale 0.15

Key Question Being Tested

Hypothesis: If maximum solvable complexity increases with K (number of trajectories) under constant token budgets, it would indicate that reasoning collapse is trajectory-determined, not capacity-limited.

Improved scaling with K would suggest that:

Documentation