Python Integration

[!WARNING] Python support is still under development and not production-ready. The APIs used to write it are not documented because they are still subject to large amounts of change.

Overview

Psyche provides a Python integration that allows you to write modeling code in Python using libraries like Hugging Face Transformers while leveraging Psyche's Rust core for training orchestration. This integration is designed for research where you want the flexibility of Python modeling with Psyche's training infrastructure, and production-scale training where you want to take advanted of highly optimized training frameworks already built in Python.

The Python integration works through a "sidecar" process that Psyche spawns and communicates with during training.

Development Setup

To develop with the Python integration, we have a Python development shell available.

This shell provides:

The psyche Python module (built from Rust using PyO3)
PyTorch
Transformers library
Other required Python dependencies

Development Workflow

When you enter the dev shell, it compiles the Rust extension that provides the psyche Python module. If you modify any Rust code in the Python extension or its dependencies, you must exit and re-enter the dev shell to recompile the extension.

We recommend running commands directly through the dev shell without entering it, which will recompile the extension as needed.

For example, to run the train program using python:

nix develop .#dev-python --command cargo run --features python --example train -- \
  --model emozilla/llama2-20m-init \
  --data-path ./data/fineweb-10bt/ \
  --total-batch 2 \
  --micro-batch 1 \
  --python

Alternatively, you can enter the shell with

nix develop .#dev-python

but this is likely to be a footgun as it's easy to forget to exit and re-enter the shell.

Architecture

The Python integration uses a sidecar architecture:

Psyche Core (Rust): Handles data loading, distributed training coordination, and spawns Python processes
Python Sidecar: Runs the modeling code using PyTorch and Transformers or any other Python code.

When you use the --python flag, Psyche automatically spawns Python sidecar processes using:

python -m psyche.sidecar --parent-pid <pid> --backend <backend> --init-method <method> --world-size <size> --rank <rank>

Testing Your Changes

To test modifications to the Python integration:

Modify the sidecar code in the Python extension
Run the training example:

nix develop .#dev-python --command cargo run --features python --example train -- \
  --model emozilla/llama2-20m-init \
  --data-path ./data/fineweb-10bt/ \
  --total-batch 2 \
  --micro-batch 1 \
  --python

How It Works

Initialization: Psyche spawns Python sidecar processes for each rank
Model Creation: The sidecar receives model architecture and source information via the distributed store
Training Loop: Psyche coordinates training by sending operations (train, optimize, extract) to the sidecar
Data Flow: Training data is broadcast to all processes, and gradients/parameters are communicated back through PyTorch's distributed primitives

The sidecar handles three main operations:

Train: Forward/backward pass with gradient accumulation
Optimize: Apply DisTrO results to the model being trained
Extract: Model state extraction for checkpointing

This architecture allows you to write complex modeling code in Python while integrating with Psyche's distributed training network.