Benchmark Tasks

Tasks are chosen to have:

  • Exact algorithmic solutions (objectively verifiable)

  • Scalable complexity (parameterized difficulty)

  • Minimal heuristic shortcuts (tests true reasoning)

Data Classes

class structured_stochasticity.tasks.TaskInstance(prompt, complexity, ground_truth=None, metadata=<factory>)[source]

Bases: object

A single instance of a task.

Parameters:
  • prompt (str)

  • complexity (int)

  • ground_truth (str | None)

  • metadata (dict)

prompt: str
complexity: int
ground_truth: str | None = None
metadata: dict
__init__(prompt, complexity, ground_truth=None, metadata=<factory>)
Parameters:
  • prompt (str)

  • complexity (int)

  • ground_truth (str | None)

  • metadata (dict)

Return type:

None

class structured_stochasticity.tasks.TaskResult(response, is_correct, complexity, error_message=None, metadata=<factory>)[source]

Bases: object

Result of attempting a task.

Parameters:
  • response (str)

  • is_correct (bool)

  • complexity (int)

  • error_message (str | None)

  • metadata (dict)

response: str
is_correct: bool
complexity: int
error_message: str | None = None
metadata: dict
__init__(response, is_correct, complexity, error_message=None, metadata=<factory>)
Parameters:
  • response (str)

  • is_correct (bool)

  • complexity (int)

  • error_message (str | None)

  • metadata (dict)

Return type:

None

Base Task Class

class structured_stochasticity.tasks.Task[source]

Bases: ABC

Abstract base class for benchmark tasks.

name: str = 'base_task'
abstractmethod generate_instance(complexity)[source]

Generate a task instance at given complexity level.

Parameters:

complexity (int)

Return type:

TaskInstance

abstractmethod verify_solution(instance, response)[source]

Verify if a response correctly solves the task.

Parameters:
Return type:

TaskResult

generate_batch(complexity_range, instances_per_level=1)[source]

Generate multiple instances across complexity levels.

Parameters:
  • complexity_range (tuple[int, int])

  • instances_per_level (int)

Return type:

list[TaskInstance]

Tower of Hanoi

class structured_stochasticity.tasks.TowerOfHanoi(prompt_style='standard')[source]

Bases: Task

Tower of Hanoi puzzle.

The classic recursive puzzle: move n disks from peg A to peg C, using peg B as auxiliary. Rules: - Only one disk can be moved at a time - A larger disk cannot be placed on a smaller disk

Optimal solution requires 2^n - 1 moves.

This task is ideal because: - Solution is deterministic and verifiable - Requires recursive planning - No shortcuts - must actually solve it - Complexity is well-defined (number of disks)

The classic recursive puzzle and our flagship task because:

  1. It has exact recursive structure

  2. Complexity grows exponentially (2^n - 1 moves)

  3. Early commitment to wrong approach is fatal

  4. Solution is easily verifiable

Parameters:

prompt_style (str)

name: str = 'tower_of_hanoi'
__init__(prompt_style='standard')[source]
Parameters:

prompt_style (str) – How to phrase the task - “standard”: Clear, direct instructions - “minimal”: Bare problem statement - “detailed”: Include rules explanation

generate_instance(complexity)[source]

Generate a Tower of Hanoi instance.

Parameters:

complexity (int) – Number of disks (n)

Return type:

TaskInstance

verify_solution(instance, response)[source]

Verify a Tower of Hanoi solution by simulation.

Parses moves from response and simulates execution, checking all rules are followed and final state is correct.

Parameters:
Return type:

TaskResult

Arithmetic Sequence

class structured_stochasticity.tasks.ArithmeticSequence(operations=None)[source]

Bases: Task

Multi-step arithmetic task.

Generates a sequence of arithmetic operations that must be performed in order. Tests sequential reasoning without shortcuts.

Parameters:

operations (list[str])

name: str = 'arithmetic_sequence'
__init__(operations=None)[source]
Parameters:

operations (list[str])

generate_instance(complexity)[source]

Generate arithmetic sequence.

Parameters:

complexity (int) – Number of operations

Return type:

TaskInstance

verify_solution(instance, response)[source]

Check if response contains correct answer.

Parameters:
Return type:

TaskResult

Logical Deduction

class structured_stochasticity.tasks.LogicalDeduction[source]

Bases: Task

Logical deduction puzzles.

“A is taller than B. B is taller than C. Who is shortest?”

Complexity scales with number of entities and comparisons.

name: str = 'logical_deduction'
NAMES = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Henry']
COMPARISONS = ['taller than', 'shorter than', 'older than', 'younger than']
generate_instance(complexity)[source]

Generate a logical ordering puzzle.

Parameters:

complexity (int) – Number of entities to order

Return type:

TaskInstance

verify_solution(instance, response)[source]

Check if response contains correct name.

Parameters:
Return type:

TaskResult

Task Factory

class structured_stochasticity.tasks.TaskFactory[source]

Bases: object

Factory for creating task instances.

Available Tasks:

  • tower_of_hanoi - Tower of Hanoi puzzle

  • arithmetic - Multi-step arithmetic

  • logical_deduction - Logical ordering puzzles

TASKS = {'arithmetic': <class 'structured_stochasticity.tasks.ArithmeticSequence'>, 'logical_deduction': <class 'structured_stochasticity.tasks.LogicalDeduction'>, 'tower_of_hanoi': <class 'structured_stochasticity.tasks.TowerOfHanoi'>}
classmethod create(task_name, **kwargs)[source]

Create a task by name.

Parameters:

task_name (str)

Return type:

Task

classmethod list_tasks()[source]

List available tasks.

Return type:

list[str]