running offchain
When developing for Psyche, you might not want to spin up all the Solana infrastructure if you're working on a feature like the distributed networking or the training code.
To that end, we maintain a "centralized" client & server package that simply communicate over TCP instead of dealing with code deployed to a Solana network.
There's a server
package, and a client
package.
To develop with them, you'd spin up one server
with whatever run config you want
Local Testnet
The local testnet is a helper application designed to easily spin up a Server and multiple clients. It's useful for doing sample runs on your own hardware, and for development.
Pre-requisites
Since we want to run many clients and the server we'll need several terminal windows to monitor them. The tool uses tmux to create them.
If you're using the Nix devShell, tmux is already included.
Running
A sample invocation that fires up 3 clients to train on a 20m model might look like this:
just local-testnet --num-clients 3 --config-path ./config/consilience-match-llama2-20m-fineweb-pretrain-dev/
There's a lot of options to configure the local testnet. Check em out below!
Command-line options
# Command-Line Help for `psyche-centralized-local-testnet`This document contains the help content for the psyche-centralized-local-testnet
command-line program.
Command Overview:
psyche-centralized-local-testnet
Usage: psyche-centralized-local-testnet <COMMAND>
Subcommands:
start
— Starts the local-testnet running each part of the system in a separate terminal pane
psyche-centralized-local-testnet start
Starts the local-testnet running each part of the system in a separate terminal pane
Usage: psyche-centralized-local-testnet start [OPTIONS] --num-clients <NUM_CLIENTS> --config-path <CONFIG_PATH>
Options:
-
--num-clients <NUM_CLIENTS>
— Number of clients to start -
--config-path <CONFIG_PATH>
— File path to the configuration that the coordinator will need to start -
--write-distro-data <WRITE_DISTRO_DATA>
— If provided, write DisTrO data to disk in this path -
--server-port <SERVER_PORT>
— Port where the server for this testnet will be listen it to (this is the one that clients must use when connecting)Default value:
20000
-
--tui <TUI>
— Enables a terminal-based graphical interface for monitoring analyticsDefault value:
true
Possible values:
true
,false
-
--random-kill-num <RANDOM_KILL_NUM>
— Kill N clients randomly every <RANDOM_KILL_INTERVAL> seconds -
--allowed-to-kill <ALLOWED_TO_KILL>
— Which clients we're allowed to kill randomly -
--random-kill-interval <RANDOM_KILL_INTERVAL>
— Kill <RANDOM_KILL_NUM> clients randomly every N secondsDefault value:
120
-
--log <LOG>
— Sets the level of the logging for more granular informationDefault value:
warn,psyche=debug
-
--first-client-checkpoint <FIRST_CLIENT_CHECKPOINT>
— HF repo where the first client could get the model and the configuration to use -
--hf-token <HF_TOKEN>
-
--write-log
Default value:
false
-
--wandb-project <WANDB_PROJECT>
-
--wandb-group <WANDB_GROUP>
-
--wandb-entity <WANDB_ENTITY>
-
--optim-stats <OPTIM_STATS>
-
--eval-tasks <EVAL_TASKS>
This document was generated automatically by
clap-markdown
.
Server & Client
Both of these applications can be spun up individually at your discretion instead of using the local testnet. We include all their command-line options for your reading pleasure:
Client
# Command-Line Help for `psyche-centralized-client`This document contains the help content for the psyche-centralized-client
command-line program.
Command Overview:
psyche-centralized-client
↴psyche-centralized-client show-identity
↴psyche-centralized-client train
↴
psyche-centralized-client
Usage: psyche-centralized-client <COMMAND>
Subcommands:
show-identity
— Displays the client's unique identifier, used to participate in training runstrain
— Allows the client to join a training run and contribute to the model's training process
psyche-centralized-client show-identity
Displays the client's unique identifier, used to participate in training runs
Usage: psyche-centralized-client show-identity [OPTIONS]
Options:
--identity-secret-key-path <IDENTITY_SECRET_KEY_PATH>
— Path to the clients secret key. Create a new random one runningopenssl rand 32 > secret.key
or use theRAW_IDENTITY_SECRET_KEY
environment variable
psyche-centralized-client train
Allows the client to join a training run and contribute to the model's training process
Usage: psyche-centralized-client train [OPTIONS] --run-id <RUN_ID> --server-addr <SERVER_ADDR>
Options:
-
-i
,--identity-secret-key-path <IDENTITY_SECRET_KEY_PATH>
— Path to the clients secret key. Create a new random one runningopenssl rand 32 > secret.key
. If not provided a random one will be generated -
--bind-p2p-port <BIND_P2P_PORT>
— Sets the port for the client's P2P network participation. If not provided, a random port will be chosen -
--bind-p2p-interface <BIND_P2P_INTERFACE>
— Sets the network interface for the client's P2P network participation. If not provided, will bind to all interfaces -
--logs <LOGS>
— Sets clients logs interface tui: Enables a terminal-based graphical interface for monitoring analytics. console: standard logs json: standard logs with json formatDefault value:
tui
Possible values:
tui
,console
,json
-
--run-id <RUN_ID>
— A unique identifier for the training run. This ID allows the client to join a specific active run -
--data-parallelism <DATA_PARALLELISM>
Default value:
1
-
--tensor-parallelism <TENSOR_PARALLELISM>
Default value:
1
-
--micro-batch-size <MICRO_BATCH_SIZE>
-
--write-gradients-dir <WRITE_GRADIENTS_DIR>
— If provided, every shared gradient this client sees will be written to this directory -
--eval-tasks <EVAL_TASKS>
-
--eval-fewshot <EVAL_FEWSHOT>
Default value:
0
-
--eval-seed <EVAL_SEED>
Default value:
42
-
--eval-task-max-docs <EVAL_TASK_MAX_DOCS>
-
--checkpoint-dir <CHECKPOINT_DIR>
— If provided, every model parameters update will be save in this directory after each epoch -
--hub-repo <HUB_REPO>
— Path to the Hugging Face repository containing model data and configuration -
--wandb-project <WANDB_PROJECT>
-
--wandb-run <WANDB_RUN>
-
--wandb-group <WANDB_GROUP>
-
--wandb-entity <WANDB_ENTITY>
-
--write-log <WRITE_LOG>
-
--optim-stats-steps <OPTIM_STATS_STEPS>
-
--grad-accum-in-fp32
Default value:
false
-
--dummy-training-delay-secs <DUMMY_TRAINING_DELAY_SECS>
-
--max-concurrent-parameter-requests <MAX_CONCURRENT_PARAMETER_REQUESTS>
Default value:
8
-
--max-concurrent-downloads <MAX_CONCURRENT_DOWNLOADS>
Default value:
8
-
--compression <COMPRESSION>
Default value:
2
-
--server-addr <SERVER_ADDR>
This document was generated automatically by
clap-markdown
.
Server
# Command-Line Help for `psyche-centralized-server`This document contains the help content for the psyche-centralized-server
command-line program.
Command Overview:
psyche-centralized-server
↴psyche-centralized-server validate-config
↴psyche-centralized-server run
↴
psyche-centralized-server
Usage: psyche-centralized-server <COMMAND>
Subcommands:
validate-config
— Checks that the configuration declared in thestate.toml
file is validrun
— Starts the server and launches the coordinator with the declared configuration
psyche-centralized-server validate-config
Checks that the configuration declared in the state.toml
file is valid
Usage: psyche-centralized-server validate-config [OPTIONS] --state <STATE>
Options:
--state <STATE>
— Path to thestate.toml
file to validate--data-config <DATA_CONFIG>
— Path todata.toml
file to validate. If no provided then it will not be checked
psyche-centralized-server run
Starts the server and launches the coordinator with the declared configuration
Usage: psyche-centralized-server run [OPTIONS] --state <STATE>
Options:
-
--state <STATE>
— Path to TOML of Coordinator state -
-s
,--server-port <SERVER_PORT>
— Port for the server, which clients will use to connect. if not specified, a random free port will be chosen -
--tui <TUI>
Default value:
true
Possible values:
true
,false
-
--data-config <DATA_CONFIG>
— Path to TOML of data server config -
--save-state-dir <SAVE_STATE_DIR>
— Path to save the server and coordinator state -
--init-warmup-time <INIT_WARMUP_TIME>
— Sets the warmup time for the run. This overrides thewarmup_time
declared in the state file -
--init-min-clients <INIT_MIN_CLIENTS>
— Sets the minimum number of clients required to start a run. This overrides themin_clients
declared in the state file -
--withdraw-on-disconnect <WITHDRAW_ON_DISCONNECT>
— Automatically withdraw clients that disconenct from the serverDefault value:
true
Possible values:
true
,false
This document was generated automatically by
clap-markdown
.