Python Integration
[!WARNING] Python support is still under development and not production-ready. The APIs used to write it are not documented because they are still subject to large amounts of change.
Overview
Psyche provides a Python integration that allows you to write modeling code in Python using libraries like Hugging Face Transformers while leveraging Psyche's Rust core for training orchestration. This integration is designed for research where you want the flexibility of Python modeling with Psyche's training infrastructure, and production-scale training where you want to take advantage of highly optimized training frameworks already built in Python.
The Python integration works through a "sidecar" process that Psyche spawns and communicates with during training.
Development Setup
To develop with the Python integration, we have a Nix development shell with Python available.
This shell provides:
- The
psychePython module (built from Rust using PyO3) - PyTorch
- Transformers library
- Other required Python dependencies via
pyproject.toml/uv.lock
Development Workflow
You can use uv pip to install arbitrary packages. Dependencies are tracked via uv.lock, so if you don't have direnv set up, you must exit and re-enter the development shell with nix develop .#dev-python.
When you enter the dev shell, it compiles the Rust extension that provides the psyche Python module. If you modify any Rust code in the Python extension or its dependencies, you must exit and re-enter the dev shell to recompile the extension.
We recommend running commands directly through the dev shell without entering it, which will recompile the extension as needed.
For example, to run the train program using python:
nix develop .#dev-python --command just train-model-python \
--model emozilla/llama2-20m-init \
--data-path ./data/fineweb-10bt/ \
--total-batch 2 \
--micro-batch 1 \
--python
Alternatively, you could enter the shell and run the commands with:
nix develop .#dev-python
but this is likely to be a footgun as it's easy to forget to exit and re-enter the shell.
Architecture
The Python integration uses a sidecar architecture:
- Psyche Core (Rust): Handles data loading, distributed training coordination, and spawns Python processes
- Python Sidecar: Runs the modeling code using PyTorch and Transformers or any other Python code.
When you use the --python flag, Psyche automatically spawns Python sidecar processes using:
python -m psyche.sidecar --parent-pid <pid> --backend <backend> --init-method <method> --world-size <size> --rank <rank>
By default only one sidecar using one GPU will be spawned, the amount will change depending on two different arguments --data-parallelism and --tensor-parallelism. The first one will spawned one entire copy of the model per GPU and the latter will split the model across multiple GPUs. The amount of sidecars spawned will be the product of these two arguments. Take into account that you will need tensor_parallelism * data_parallelism GPUs to run that amount of sidecars.
Here's an overview of the different options that the psyche-sidecar provides in case you want to test sidecars with different configurations.
Command-line options
# Command-Line Help for `psyche-sidecar`This document contains the help content for the psyche-sidecar command-line program.
Command Overview:
psyche-sidecar
Multi-node sidecar for Psyche distributed training
Usage: psyche-sidecar <COMMAND>
Subcommands:
python—rust— Run Rust sidecar process (TODO: implement)
psyche-sidecar python
Usage: psyche-sidecar python [OPTIONS] --main-host <MAIN_HOST> --world-size <WORLD_SIZE> --start-rank <START_RANK>
Options:
-
--main-host <MAIN_HOST>— Address of the main node -
--port <PORT>— Port for coordinationDefault value:
34567 -
--world-size <WORLD_SIZE>— World size for distributed training -
--start-rank <START_RANK>— Start rank for distributed training -
--start-device <START_DEVICE> -
--num-local-ranks <NUM_LOCAL_RANKS> -
--backend <BACKEND>— Backend for torch.distributed (default: nccl)Default value:
nccl
psyche-sidecar rust
Run Rust sidecar process (TODO: implement)
Usage: psyche-sidecar rust
This document was generated automatically by
clap-markdown.
Testing Your Changes
To test modifications to the Python integration:
- Modify the sidecar code in the Python extension
- Run the training example with the same
just train-model-pythoncommand we outlined earlier.
How It Works
- Initialization: Psyche spawns Python sidecar processes for each rank
- Model Creation: The sidecar receives model architecture and source information via the distributed store
- Training Loop: Psyche coordinates training by sending operations (train, optimize, extract) to the sidecar
- Data Flow: Training data is broadcast to all processes, and gradients/parameters are communicated back through PyTorch's distributed primitives
The sidecar handles three main operations:
- Train: Forward/backward pass with gradient accumulation
- Optimize: Apply DisTrO results to the model being trained
- Extract: Model state extraction for checkpointing
This architecture allows you to write complex modeling code in Python while integrating with Psyche's distributed training network.