Psyche Glossary

ActiveStep The state machine phases a Client goes through during a training Round or Epoch, synchronized with the Coordinator's RunState. Includes Warmup, Training, Witness, and Cooldown.

AMD ROCm An alternative GPU compute platform to NVIDIA's CUDA. Support for ROCm is planned for Psyche clients in the future.

Authorizer A Solana program that issues authorizations to specific users.

Authorization A specific role (scope) assigned to a single user (grantee) by a specific authority (grantor). The grantee can then delegate authorization to other keys (the delegates) that can act on its behalf. In practice, this is useful for managing permissions to nodes in data center clusters easily.

Batch A subset of the training data processed by clients in a single step within a Round. Identified by a BatchId.

BatchId A unique identifier for a specific Batch of training data.

Bloom Filter A probabilistic data structure used for efficient set membership testing (e.g., checking if a client's commitment has been witnessed). Used in WitnessBloom. Has a small chance of false positives.

BLOOM_FALSE_RATE The target false positive rate (1% in this case) for the Bloom Filters used in the witness protocol.

Checkpoint A saved state of the LLM being trained. Psyche uses checkpoints to allow runs to be paused, resumed, or recovered after interruptions. Checkpoints can be stored in a central HubRepo or shared between clients via P2P.

Checkpointers Designated, trusted participants responsible for saving the model Checkpoint during the Cooldown phase.

Client The software participants run on their own hardware (typically with a GPU) to contribute to the distributed training process. Clients perform computations, submit results (Commitments), and participate in Witnessing.

ClientState The status of a Client as tracked by the Coordinator. Key states include Healthy, Dropped, Withdrawn, and Ejected.

Commitment A cryptographic hash (SHA-256) of a client's computational results for a given Batch. Submitting commitments allows the Coordinator and Witnesses to verify work was done without transferring the full results initially.

Commitee The particular role of a client in a given round. Can be one of Trainer, Verifier or TieBreaker.

Cooldown A phase (RunState and ActiveStep) at the end of an Epoch where model Checkpoints are saved and the system prepares for the next epoch.

Coordinator The central orchestrator of the Psyche training system, implemented as a Solana program. It manages the training lifecycle (RunState), client participation (ClientState), data batch assignment, and Witnessing.

CoordinatorConfig The set of parameters defining how a specific training run operates (e.g., warmup_time, witness_quorum, rounds_per_epoch).

CUDA NVIDIA's parallel computing platform and programming model, required for running the Psyche client on NVIDIA GPUs.

Data Provider Component responsible for supplying the training data in organized Batches.

Desync An error state (StepError::Desync) occurring when a Client's ActiveStep falls out of synchronization with the Coordinator's RunState.

Docker A platform used to build, ship, and run applications in Containers. Psyche uses Docker to distribute and run the client software.

Dropped A ClientState indicating a client has become unresponsive or disconnected unexpectedly.

Ejected A ClientState indicating a client has been forcibly removed from the training run, typically due to failing health checks or malicious behavior. Ejected clients may be subject to Slashing.

Epoch A major cycle in the training process, composed of multiple Rounds. A Checkpoint starts with the WaitingForMembers and Warmup phases and ends with a Cooldown phase.

Exited Clients A buffer on the Coordinator holding records of clients that have recently left the run (Dropped, Withdrawn, Ejected).

Finished A RunState indicating that the training run has completed its configured total_steps.

Garnix CI (Continuous Integration) service based on Nix, used by Psyche.

Health Check A verification procedure (health_check()) initiated by designated witness clients. Its purpose is to monitor peer clients and confirm they are actively processing their assigned training batches. When a witness client detects a peer that appears unresponsive or failing (unhealthy), it notifies the central coordinator. The coordinator independently verifies the status of the reported peer by running its own health check. If this verification is verified then the peer is marked as unhealthy and is kicked.

Healthy The desired ClientState, indicating the client is connected, responsive, and participating correctly in the training process. Only Healthy clients typically receive Rewards.

HubRepo A centralized repository location (e.g., Hugging Face, S3 bucket) where the model Checkpoint can be stored, particularly when initializing or if P2P storage is unavailable.

Iroh A P2P library that Psyche uses for data-sharing between the clients.

Lightweight Hashing Using efficient hashing algorithms like SHA-256 for Commitments to allow for fast verification by the Coordinator and Witnesses.

Metal Apple's graphics and compute API. A future backend target for running the Psyche client on Mac hardware.

min_clients The minimum number of Healthy clients required for a training run to progress beyond the WaitingForMembers state.

Mining Pool A Solana program that implements a basic "mining" or lending pool mechanism where users (lenders) can deposit collateral into a pool to delegate funds to other participants with more compute power and eventually claim redeemable tokens proportionate to their share of the total deposited collateral.

NUM_STORED_ROUNDS A constant defining how many past rounds' states are kept in the Coordinator's history buffer (e.g., 4 rounds).

Nix Tool for declarative and reproducible builds used by Psyche.

Opportunistic Witnessing A feature that allows progressing early from the RoundTrain phase to the Witness phase, given that the witness quorum is reached.

Paused A RunState where the training process is temporarily stopped by manual intervention. Can be resumed later.

P2P Peer-to-Peer, meaning a client acts both as a client and as a server, sharing data with it's peers. This is the intended way of data-sharing during a stable run.

Psyche Nous Research's set of systems that enable distributed training of transformer-based AI models over the internet.

Round A smaller cycle within an Epoch. Involves a training phase (RoundTrain) and a validation phase (RoundWitness).

RoundTrain The phase (RunState and ActiveStep) where clients download assigned data Batches, perform training computations (e.g., calculate gradients), and submit Commitments.

RoundWitness The phase (RunState and ActiveStep) where clients act as Witnesses to validate the Commitments submitted by other clients during RoundTrain. Requires a witness_quorum to succeed.

rounds_per_epoch A configuration parameter (CoordinatorConfig) specifying how many Rounds make up one Epoch.

RunState The overall state of the training run as managed by the Coordinator. Examples include Uninitialized, WaitingForMembers, Warmup, RoundTrain, RoundWitness, Cooldown, Paused, Finished.

SHA-256 The specific cryptographic hash function used to create Commitments in Psyche.

Solana The blockchain platform on which the Psyche Coordinator program runs.

StepError A category of errors related to the Client's ActiveStep progression, such as Desync.

tick() A function periodically called on the Coordinator program to drive the state machine transitions (advancing RunState based on time limits, client counts, and submitted results). Specific versions exist for different states (e.g., tick_waiting_for_members, tick_round_witness).

total_steps A configuration parameter defining the total number of training steps or batches the run aims to complete before entering the Finished state.

Training The ActiveStep where the client actively computes gradients or other training operations on its assigned data Batch.

Treasurer A Solana program that runs on top of psyche's Coordinator managing the distribution of rewards to the clients and keeping track of the points earned by each client in the training process.

Uninitialized The default starting RunState of the Coordinator before a training run is configured and started.

WaitingForMembers The RunState where the Coordinator waits for the minimum number of clients (min_clients) to connect and become Healthy before starting the training process.

Warmup The initial phase (RunState and ActiveStep) of a training run where clients download the model Checkpoint and initialize their training environment.

Witness A Client selected to validate other client's work.

WitnessBloom The specific Bloom Filter used on the Coordinator to track which client Commitments have been successfully witnessed.

Witness Quorum The minimum number of clients that must successfully act as Witnesses and agree on the validity of results for a Round to be considered successful.

Withdrawn A ClientState indicating that a client has exited the run.