running offchain

When developing for Psyche, you might not want to spin up all the Solana infrastructure if you're working on a feature like the distributed networking or the training code.

To that end, we maintain a "centralized" client & server package that simply communicate over TCP instead of dealing with code deployed to a Solana network.

There's a server package, and a client package. To develop with them, you'd spin up one server with whatever run config you want

Local Testnet

The local testnet is a helper application designed to easily spin up a Server and multiple clients. It's useful for doing sample runs on your own hardware, and for development.

Pre-requisites

Since we want to run many clients and the server we'll need several terminal windows to monitor them. The tool uses tmux to create them.

If you're using the Nix devShell, tmux is already included.

Running

A sample invocation that fires up 3 clients to train on a 20m model might look like this:

just local-testnet --num-clients 3 --config-path ./config/consilience-match-llama2-20m-fineweb-pretrain-dev/

There's a lot of options to configure the local testnet. Check em out below!

Command-line options # Command-Line Help for `psyche-centralized-local-testnet`

This document contains the help content for the psyche-centralized-local-testnet command-line program.

Command Overview:

psyche-centralized-local-testnet

Usage: psyche-centralized-local-testnet <COMMAND>

Subcommands:
  • start — Starts the local-testnet running each part of the system in a separate terminal pane

psyche-centralized-local-testnet start

Starts the local-testnet running each part of the system in a separate terminal pane

Usage: psyche-centralized-local-testnet start [OPTIONS] --num-clients <NUM_CLIENTS> --config-path <CONFIG_PATH>

Options:
  • --num-clients <NUM_CLIENTS> — Number of clients to start

  • --config-path <CONFIG_PATH> — File path to the configuration that the coordinator will need to start

  • --write-distro-data <WRITE_DISTRO_DATA> — If provided, write DisTrO data to disk in this path

  • --server-port <SERVER_PORT> — Port where the server for this testnet will be listen it to (this is the one that clients must use when connecting)

    Default value: 20000

  • --tui <TUI> — Enables a terminal-based graphical interface for monitoring analytics

    Default value: true

    Possible values: true, false

  • --random-kill-num <RANDOM_KILL_NUM> — Kill N clients randomly every <RANDOM_KILL_INTERVAL> seconds

  • --allowed-to-kill <ALLOWED_TO_KILL> — Which clients we're allowed to kill randomly

  • --random-kill-interval <RANDOM_KILL_INTERVAL> — Kill <RANDOM_KILL_NUM> clients randomly every N seconds

    Default value: 120

  • --log <LOG> — Sets the level of the logging for more granular information

    Default value: warn,psyche=debug

  • --first-client-checkpoint <FIRST_CLIENT_CHECKPOINT> — HF repo where the first client could get the model and the configuration to use

  • --hf-token <HF_TOKEN>

  • --write-log

    Default value: false

  • --wandb-project <WANDB_PROJECT>

  • --wandb-group <WANDB_GROUP>

  • --wandb-entity <WANDB_ENTITY>

  • --optim-stats <OPTIM_STATS>

  • --eval-tasks <EVAL_TASKS>


This document was generated automatically by clap-markdown.

Server & Client

Both of these applications can be spun up individually at your discretion instead of using the local testnet. We include all their command-line options for your reading pleasure:

Client # Command-Line Help for `psyche-centralized-client`

This document contains the help content for the psyche-centralized-client command-line program.

Command Overview:

psyche-centralized-client

Usage: psyche-centralized-client <COMMAND>

Subcommands:
  • show-identity — Displays the client's unique identifier, used to participate in training runs
  • train — Allows the client to join a training run and contribute to the model's training process

psyche-centralized-client show-identity

Displays the client's unique identifier, used to participate in training runs

Usage: psyche-centralized-client show-identity [OPTIONS]

Options:
  • --identity-secret-key-path <IDENTITY_SECRET_KEY_PATH> — Path to the clients secret key. Create a new random one running openssl rand 32 > secret.key or use the RAW_IDENTITY_SECRET_KEY environment variable

psyche-centralized-client train

Allows the client to join a training run and contribute to the model's training process

Usage: psyche-centralized-client train [OPTIONS] --run-id <RUN_ID> --server-addr <SERVER_ADDR>

Options:
  • -i, --identity-secret-key-path <IDENTITY_SECRET_KEY_PATH> — Path to the clients secret key. Create a new random one running openssl rand 32 > secret.key. If not provided a random one will be generated

  • --bind-p2p-port <BIND_P2P_PORT> — Sets the port for the client's P2P network participation. If not provided, a random port will be chosen

  • --bind-p2p-interface <BIND_P2P_INTERFACE> — Sets the network interface for the client's P2P network participation. If not provided, will bind to all interfaces

  • --logs <LOGS> — Sets clients logs interface tui: Enables a terminal-based graphical interface for monitoring analytics. console: standard logs json: standard logs with json format

    Default value: tui

    Possible values: tui, console, json

  • --run-id <RUN_ID> — A unique identifier for the training run. This ID allows the client to join a specific active run

  • --data-parallelism <DATA_PARALLELISM>

    Default value: 1

  • --tensor-parallelism <TENSOR_PARALLELISM>

    Default value: 1

  • --micro-batch-size <MICRO_BATCH_SIZE>

  • --write-gradients-dir <WRITE_GRADIENTS_DIR> — If provided, every shared gradient this client sees will be written to this directory

  • --eval-tasks <EVAL_TASKS>

  • --eval-fewshot <EVAL_FEWSHOT>

    Default value: 0

  • --eval-seed <EVAL_SEED>

    Default value: 42

  • --eval-task-max-docs <EVAL_TASK_MAX_DOCS>

  • --checkpoint-dir <CHECKPOINT_DIR> — If provided, every model parameters update will be save in this directory after each epoch

  • --hub-repo <HUB_REPO> — Path to the Hugging Face repository containing model data and configuration

  • --wandb-project <WANDB_PROJECT>

  • --wandb-run <WANDB_RUN>

  • --wandb-group <WANDB_GROUP>

  • --wandb-entity <WANDB_ENTITY>

  • --write-log <WRITE_LOG>

  • --optim-stats-steps <OPTIM_STATS_STEPS>

  • --grad-accum-in-fp32

    Default value: false

  • --dummy-training-delay-secs <DUMMY_TRAINING_DELAY_SECS>

  • --max-concurrent-parameter-requests <MAX_CONCURRENT_PARAMETER_REQUESTS>

    Default value: 8

  • --max-concurrent-downloads <MAX_CONCURRENT_DOWNLOADS>

    Default value: 8

  • --compression <COMPRESSION>

    Default value: 2

  • --server-addr <SERVER_ADDR>


This document was generated automatically by clap-markdown.

Server # Command-Line Help for `psyche-centralized-server`

This document contains the help content for the psyche-centralized-server command-line program.

Command Overview:

psyche-centralized-server

Usage: psyche-centralized-server <COMMAND>

Subcommands:
  • validate-config — Checks that the configuration declared in the state.toml file is valid
  • run — Starts the server and launches the coordinator with the declared configuration

psyche-centralized-server validate-config

Checks that the configuration declared in the state.toml file is valid

Usage: psyche-centralized-server validate-config [OPTIONS] --state <STATE>

Options:
  • --state <STATE> — Path to the state.toml file to validate
  • --data-config <DATA_CONFIG> — Path to data.toml file to validate. If no provided then it will not be checked

psyche-centralized-server run

Starts the server and launches the coordinator with the declared configuration

Usage: psyche-centralized-server run [OPTIONS] --state <STATE>

Options:
  • --state <STATE> — Path to TOML of Coordinator state

  • -s, --server-port <SERVER_PORT> — Port for the server, which clients will use to connect. if not specified, a random free port will be chosen

  • --tui <TUI>

    Default value: true

    Possible values: true, false

  • --data-config <DATA_CONFIG> — Path to TOML of data server config

  • --save-state-dir <SAVE_STATE_DIR> — Path to save the server and coordinator state

  • --init-warmup-time <INIT_WARMUP_TIME> — Sets the warmup time for the run. This overrides the warmup_time declared in the state file

  • --init-min-clients <INIT_MIN_CLIENTS> — Sets the minimum number of clients required to start a run. This overrides the min_clients declared in the state file

  • --withdraw-on-disconnect <WITHDRAW_ON_DISCONNECT> — Automatically withdraw clients that disconenct from the server

    Default value: true

    Possible values: true, false


This document was generated automatically by clap-markdown.