Benchmark Tasks¶
Tasks are chosen to have:
Exact algorithmic solutions (objectively verifiable)
Scalable complexity (parameterized difficulty)
Minimal heuristic shortcuts (tests true reasoning)
Data Classes¶
- class structured_stochasticity.tasks.TaskInstance(prompt, complexity, ground_truth=None, metadata=<factory>)[source]¶
Bases:
objectA single instance of a task.
- Parameters:
prompt (str)
complexity (int)
ground_truth (str | None)
metadata (dict)
- prompt: str¶
- complexity: int¶
- ground_truth: str | None = None¶
- metadata: dict¶
- __init__(prompt, complexity, ground_truth=None, metadata=<factory>)¶
- Parameters:
prompt (str)
complexity (int)
ground_truth (str | None)
metadata (dict)
- Return type:
None
- class structured_stochasticity.tasks.TaskResult(response, is_correct, complexity, error_message=None, metadata=<factory>)[source]¶
Bases:
objectResult of attempting a task.
- Parameters:
response (str)
is_correct (bool)
complexity (int)
error_message (str | None)
metadata (dict)
- response: str¶
- is_correct: bool¶
- complexity: int¶
- error_message: str | None = None¶
- metadata: dict¶
- __init__(response, is_correct, complexity, error_message=None, metadata=<factory>)¶
- Parameters:
response (str)
is_correct (bool)
complexity (int)
error_message (str | None)
metadata (dict)
- Return type:
None
Base Task Class¶
- class structured_stochasticity.tasks.Task[source]¶
Bases:
ABCAbstract base class for benchmark tasks.
- name: str = 'base_task'¶
- abstractmethod generate_instance(complexity)[source]¶
Generate a task instance at given complexity level.
- Parameters:
complexity (int)
- Return type:
- abstractmethod verify_solution(instance, response)[source]¶
Verify if a response correctly solves the task.
- Parameters:
instance (TaskInstance)
response (str)
- Return type:
- generate_batch(complexity_range, instances_per_level=1)[source]¶
Generate multiple instances across complexity levels.
- Parameters:
complexity_range (tuple[int, int])
instances_per_level (int)
- Return type:
list[TaskInstance]
Tower of Hanoi¶
- class structured_stochasticity.tasks.TowerOfHanoi(prompt_style='standard')[source]¶
Bases:
TaskTower of Hanoi puzzle.
The classic recursive puzzle: move n disks from peg A to peg C, using peg B as auxiliary. Rules: - Only one disk can be moved at a time - A larger disk cannot be placed on a smaller disk
Optimal solution requires 2^n - 1 moves.
This task is ideal because: - Solution is deterministic and verifiable - Requires recursive planning - No shortcuts - must actually solve it - Complexity is well-defined (number of disks)
The classic recursive puzzle and our flagship task because:
It has exact recursive structure
Complexity grows exponentially (2^n - 1 moves)
Early commitment to wrong approach is fatal
Solution is easily verifiable
- Parameters:
prompt_style (str)
- name: str = 'tower_of_hanoi'¶
- __init__(prompt_style='standard')[source]¶
- Parameters:
prompt_style (str) – How to phrase the task - “standard”: Clear, direct instructions - “minimal”: Bare problem statement - “detailed”: Include rules explanation
- generate_instance(complexity)[source]¶
Generate a Tower of Hanoi instance.
- Parameters:
complexity (int) – Number of disks (n)
- Return type:
- verify_solution(instance, response)[source]¶
Verify a Tower of Hanoi solution by simulation.
Parses moves from response and simulates execution, checking all rules are followed and final state is correct.
- Parameters:
instance (TaskInstance)
response (str)
- Return type:
Arithmetic Sequence¶
- class structured_stochasticity.tasks.ArithmeticSequence(operations=None)[source]¶
Bases:
TaskMulti-step arithmetic task.
Generates a sequence of arithmetic operations that must be performed in order. Tests sequential reasoning without shortcuts.
- Parameters:
operations (list[str])
- name: str = 'arithmetic_sequence'¶
- generate_instance(complexity)[source]¶
Generate arithmetic sequence.
- Parameters:
complexity (int) – Number of operations
- Return type:
- verify_solution(instance, response)[source]¶
Check if response contains correct answer.
- Parameters:
instance (TaskInstance)
response (str)
- Return type:
Logical Deduction¶
- class structured_stochasticity.tasks.LogicalDeduction[source]¶
Bases:
TaskLogical deduction puzzles.
“A is taller than B. B is taller than C. Who is shortest?”
Complexity scales with number of entities and comparisons.
- name: str = 'logical_deduction'¶
- NAMES = ['Alice', 'Bob', 'Carol', 'Dave', 'Eve', 'Frank', 'Grace', 'Henry']¶
- COMPARISONS = ['taller than', 'shorter than', 'older than', 'younger than']¶
- generate_instance(complexity)[source]¶
Generate a logical ordering puzzle.
- Parameters:
complexity (int) – Number of entities to order
- Return type:
- verify_solution(instance, response)[source]¶
Check if response contains correct name.
- Parameters:
instance (TaskInstance)
response (str)
- Return type:
Task Factory¶
- class structured_stochasticity.tasks.TaskFactory[source]¶
Bases:
objectFactory for creating task instances.
Available Tasks:
tower_of_hanoi- Tower of Hanoi puzzlearithmetic- Multi-step arithmeticlogical_deduction- Logical ordering puzzles
- TASKS = {'arithmetic': <class 'structured_stochasticity.tasks.ArithmeticSequence'>, 'logical_deduction': <class 'structured_stochasticity.tasks.LogicalDeduction'>, 'tower_of_hanoi': <class 'structured_stochasticity.tasks.TowerOfHanoi'>}¶