Benchmark Tasks¶

Tasks are chosen to have:

Exact algorithmic solutions (objectively verifiable)
Scalable complexity (parameterized difficulty)
Minimal heuristic shortcuts (tests true reasoning)

Data Classes¶

class structured_stochasticity.tasks.TaskInstance(prompt, complexity, ground_truth=None, metadata=<factory>)[source]¶

Bases: object

A single instance of a task.

Parameters:

prompt (str)
complexity (int)
ground_truth (str | None)
metadata (dict)

prompt: str¶

complexity: int¶

ground_truth: str | None = None¶

metadata: dict¶

__init__(prompt, complexity, ground_truth=None, metadata=<factory>)¶

Parameters:

prompt (str)
complexity (int)
ground_truth (str | None)
metadata (dict)

Return type:

None

class structured_stochasticity.tasks.TaskResult(response, is_correct, complexity, error_message=None, metadata=<factory>)[source]¶

Bases: object

Result of attempting a task.

Parameters:

response (str)
is_correct (bool)
complexity (int)
error_message (str | None)
metadata (dict)

response: str¶

is_correct: bool¶

complexity: int¶

error_message: str | None = None¶

metadata: dict¶

__init__(response, is_correct, complexity, error_message=None, metadata=<factory>)¶

Parameters:

response (str)
is_correct (bool)
complexity (int)
error_message (str | None)
metadata (dict)

Return type:

None

Base Task Class¶

class structured_stochasticity.tasks.Task[source]¶

Bases: ABC

Abstract base class for benchmark tasks.

name: str = 'base_task'¶

abstractmethod generate_instance(complexity)[source]¶

Generate a task instance at given complexity level.

Parameters:: complexity (int)
Return type:: TaskInstance

abstractmethod verify_solution(instance, response)[source]¶

Verify if a response correctly solves the task.

Parameters:

instance (TaskInstance)
response (str)

Return type:

TaskResult

generate_batch(complexity_range, instances_per_level=1)[source]¶

Generate multiple instances across complexity levels.

Parameters:

complexity_range (tuple[int, int])
instances_per_level (int)

Return type:

list[TaskInstance]

Tower of Hanoi¶

class structured_stochasticity.tasks.TowerOfHanoi(prompt_style='standard')[source]¶

Bases: Task

Tower of Hanoi puzzle.

The classic recursive puzzle: move n disks from peg A to peg C, using peg B as auxiliary. Rules: - Only one disk can be moved at a time - A larger disk cannot be placed on a smaller disk

Optimal solution requires 2^n - 1 moves.

This task is ideal because: - Solution is deterministic and verifiable - Requires recursive planning - No shortcuts - must actually solve it - Complexity is well-defined (number of disks)

The classic recursive puzzle and our flagship task because:

It has exact recursive structure
Complexity grows exponentially (2^n - 1 moves)
Early commitment to wrong approach is fatal
Solution is easily verifiable

Parameters:: prompt_style (str)

name: str = 'tower_of_hanoi'¶

__init__(prompt_style='standard')[source]¶

Parameters:: prompt_style (str) – How to phrase the task - “standard”: Clear, direct instructions - “minimal”: Bare problem statement - “detailed”: Include rules explanation

generate_instance(complexity)[source]¶

Generate a Tower of Hanoi instance.

Parameters:: complexity (int) – Number of disks (n)
Return type:: TaskInstance

verify_solution(instance, response)[source]¶

Verify a Tower of Hanoi solution by simulation.

Parses moves from response and simulates execution, checking all rules are followed and final state is correct.

Parameters:

instance (TaskInstance)
response (str)

Return type:

TaskResult

Arithmetic Sequence¶

class structured_stochasticity.tasks.ArithmeticSequence(operations=None)[source]¶

Bases: Task

Multi-step arithmetic task.

Generates a sequence of arithmetic operations that must be performed in order. Tests sequential reasoning without shortcuts.

Parameters:: operations (list[str])

name: str = 'arithmetic_sequence'¶

__init__(operations=None)[source]¶

Parameters:: operations (list[str])

generate_instance(complexity)[source]¶

Generate arithmetic sequence.

Parameters:: complexity (int) – Number of operations
Return type:: TaskInstance

verify_solution(instance, response)[source]¶

Check if response contains correct answer.

Parameters:

instance (TaskInstance)
response (str)

Return type:

TaskResult

Logical Deduction¶

class structured_stochasticity.tasks.LogicalDeduction[source]¶

Bases: Task

Logical deduction puzzles.

“A is taller than B. B is taller than C. Who is shortest?”

Complexity scales with number of entities and comparisons.

name: str = 'logical_deduction'¶

NAMES = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Henry']¶

COMPARISONS = ['taller than', 'shorter than', 'older than', 'younger than']¶

generate_instance(complexity)[source]¶

Generate a logical ordering puzzle.

Parameters:: complexity (int) – Number of entities to order
Return type:: TaskInstance

verify_solution(instance, response)[source]¶

Check if response contains correct name.

Parameters:

instance (TaskInstance)
response (str)

Return type:

TaskResult

Task Factory¶

class structured_stochasticity.tasks.TaskFactory[source]¶

Bases: object

Factory for creating task instances.

Available Tasks:

tower_of_hanoi - Tower of Hanoi puzzle
arithmetic - Multi-step arithmetic
logical_deduction - Logical ordering puzzles

TASKS = {'arithmetic': <class 'structured_stochasticity.tasks.ArithmeticSequence'>, 'logical_deduction': <class 'structured_stochasticity.tasks.LogicalDeduction'>, 'tower_of_hanoi': <class 'structured_stochasticity.tasks.TowerOfHanoi'>}¶

classmethod create(task_name, **kwargs)[source]¶

Create a task by name.

Parameters:: task_name (str)
Return type:: Task

classmethod list_tasks()[source]¶

List available tasks.

Return type:: list[str]

Structured Stochasticity

Navigation

Related Topics

Benchmark Tasks¶

Data Classes¶

Base Task Class¶

Tower of Hanoi¶

Arithmetic Sequence¶

Logical Deduction¶

Task Factory¶