Quickstart: Providing Compute to NousNet
This guide walks you through the complete process of setting up your machine to provide compute to a NousNet training run. It assumes you have been provided the run-manager binary by the run administrator.
Prerequisites Checklist
Before starting, ensure you have:
- Linux operating system (Ubuntu recommended)
- NVIDIA GPU with sufficient VRAM for the model being trained
-
The
run-managerbinary - Run ID from the run administrator
Step 1: Verify NVIDIA Drivers
NousNet requires an NVIDIA CUDA-capable GPU. Verify your drivers are installed:
nvidia-smi
You should see output showing your GPU model, driver version, and CUDA version. If this command fails, install NVIDIA drivers following the NVIDIA driver installation guide.
Step 2: Install Docker
Install Docker Engine following the official Docker installation guide for your Linux distribution.
After installation, verify Docker is working:
docker --version
Docker Post-Installation Steps
Important: You must add your user to the docker group to run Docker without sudo:
sudo usermod -aG docker $USER
Then log out and back in (or reboot) for the group change to take effect.
Verify the change worked:
docker run hello-world
If this runs without requiring sudo, you're set.
For more details, see the Docker post-installation guide.
Step 3: Install NVIDIA Container Toolkit
The NVIDIA Container Toolkit enables GPU access inside Docker containers. This is required for NousNet to use your GPU for training.
Follow the NVIDIA Container Toolkit installation guide for your distribution.
After installation, verify GPU access works inside Docker:
docker run --rm --gpus all nvidia/cuda:12.2.2-devel-ubuntu22.04
You should see the same GPU information as running nvidia-smi directly.
Troubleshooting: If you see an error like
could not select device driver "" with capabilities: [[gpu]], the NVIDIA Container Toolkit is not installed correctly. Revisit the installation guide.
Step 4: Install Solana CLI and Create Wallet
Install Solana CLI
sh -c "$(curl -sSfL https://release.anza.xyz/stable/install)"
After installation, add Solana to your PATH by adding this line to your ~/.bashrc or ~/.zshrc:
export PATH="$HOME/.local/share/solana/install/active_release/bin:$PATH"
Then reload your shell:
source ~/.bashrc # or source ~/.zshrc
Verify the installation:
solana --version
For more details, see the Solana installation docs.
Generate a Keypair
Create a new Solana keypair for your node:
solana-keygen new --outfile ~/.config/solana/psyche-node.json
You'll be prompted to set an optional passphrase. The keypair file will be created at the specified path.
Important: Back up this keypair file securely. If you lose it, you lose access to any rewards earned.
Get your public key (you'll need this):
solana-keygen pubkey ~/.config/solana/psyche-node.json
Step 5: Get Authorization to Join the Run
NousNet runs are permissioned. To join, you need the run administrator to authorize your wallet.
- Send your public key to the run administrator (the output from
solana-keygen pubkeyabove) - The administrator will create an authorization for your key
- Once authorized, you can proceed to join the run
Step 6: Fund Your Wallet (Devnet)
Your wallet needs SOL to pay for transaction fees when communicating with the Solana blockchain.
First, configure Solana CLI to use devnet:
solana config set --url https://api.devnet.solana.com
Then request an airdrop from the devnet faucet:
solana airdrop 2 ~/.config/solana/psyche-node.json
Verify your balance:
solana balance ~/.config/solana/psyche-node.json
Note: If the airdrop fails due to rate limiting, wait a few minutes and try again, or use the Solana Faucet web interface.
Step 7: Create the Environment File
Create a .env file with your configuration. This file tells the run-manager how to connect and authenticate.
# Create the env file
cat > ~/.config/psyche/run.env << 'EOF'
# Path to your Solana keypair
WALLET_PRIVATE_KEY_PATH=/home/YOUR_USERNAME/.config/solana/psyche-node.json
# Solana RPC endpoints (devnet)
RPC=https://api.devnet.solana.com
WS_RPC=wss://api.devnet.solana.com
# The run you're joining (provided by run administrator)
RUN_ID=your_run_id_here
# Your public key (the one authorized by the run admin)
AUTHORIZER=YOUR_PUBLIC_KEY_HERE
# Required for GPU access in container
NVIDIA_DRIVER_CAPABILITIES=all
EOF
Replace the following values:
| Variable | Replace With |
|---|---|
YOUR_USERNAME | Your Linux username |
your_run_id_here | The run ID from your administrator |
YOUR_PUBLIC_KEY_HERE | Your wallet's public key |
Optional Configuration
You can add these optional variables to tune performance, please ask run adminstrator for help:
# Number of GPUs to use for data parallelism (default: 1)
DATA_PARALLELISM=1
# Number of GPUs to distribute model across (default: 1)
TENSOR_PARALLELISM=1
# Samples per GPU per training step (tune based on VRAM)
MICRO_BATCH_SIZE=4
Step 8: Run the Manager
Make the binary executable if needed:
chmod +x ./run-manager
Open and enter a tmux window:
tmux
Start providing compute to the network:
./run-manager --env-file ~/.config/psyche/run.env
The run-manager will:
- Connect to the Solana coordinator
- Pull the appropriate Docker image for the run
- Start the training container
- Stream logs to your terminal
Step 9: Verify It's Working
After starting, you should see:
- Image pull progress - Docker downloading the NousNet client image
- Container startup - The training container initializing
- Connection logs - Your client connecting to the coordinator
- Training logs - Progress updates as training proceeds
A healthy startup looks something like:
INFO run_manager: Docker tag for run 'your_run': nousresearch/psyche-client:v0.x.x
INFO run_manager: Pulling image from registry: nousresearch/psyche-client:v0.x.x
INFO run_manager: Starting container...
INFO run_manager: Started container: abc123...
[+] Starting to train in run your_run...
To stop the client gracefully, press Ctrl+C.
Troubleshooting
GPU Not Detected in Container
Error: could not select device driver "" with capabilities: [[gpu]]
Solution: The NVIDIA Container Toolkit is not installed or configured correctly. Revisit Step 3 and ensure you can run docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi successfully.
Docker Permission Denied
Error: permission denied while trying to connect to the Docker daemon socket
Solution: Your user isn't in the docker group. Run:
sudo usermod -aG docker $USER
Then log out and back in.
Wallet Not Found
Error: Failed to read wallet file from: /path/to/keypair.json
Solution: Verify the WALLET_PRIVATE_KEY_PATH in your .env file points to an existing file:
ls -l ~/.config/solana/psyche-node.json
RPC Connection Failures
Error: RPC error: failed to get account or connection timeouts
Solution:
- Verify your RPC endpoints are correct in the
.envfile - For devnet, use
https://api.devnet.solana.comandwss://api.devnet.solana.com - The public devnet RPC has rate limits; if issues persist, consider using a dedicated RPC provider
Not Authorized to Join
Error: Authorization or permission errors when trying to join
Solution: Confirm with the run administrator that your public key has been authorized. You can verify your authorization status:
./run-manager can-join \
--rpc https://api.devnet.solana.com \
--run-id YOUR_RUN_ID \
--authorizer YOUR_PUBLIC_KEY \
--address YOUR_PUBLIC_KEY
Container Keeps Restarting
Symptom: Container restarts repeatedly with "version mismatch"
Solution: This usually indicates a Docker image pull issue:
- Check your internet connection
- Verify Docker Hub is accessible:
docker pull hello-world - Check disk space:
df -h
Running Multiple Machines
If you want to provide compute from multiple machines, each machine must use a different keypair. Running the same keypair on multiple machines simultaneously will cause issues.
NousNet uses a delegation system for this:
- Your main keypair (the one authorized by the run admin) acts as your master key
- You generate additional delegate keys for each machine
- You register those delegates under your master key
- Each machine uses its own delegate key
Setup for Multiple Machines
On your first machine (where your master key is):
- Generate a delegate keypair for each additional machine:
solana-keygen new --outfile ~/.config/solana/psyche-delegate-1.json
solana-keygen new --outfile ~/.config/solana/psyche-delegate-2.json
# ... etc
- Get the public keys:
solana-keygen pubkey ~/.config/solana/psyche-delegate-1.json
solana-keygen pubkey ~/.config/solana/psyche-delegate-2.json
- Register the delegates under your master key (requires the run admin's join authority pubkey):
run-manager join-authorization-delegate \
--rpc [RPC] \
--wallet-private-key-path [USER_MASTER_KEYPAIR_FILE] \
--join-authority [JOIN_AUTHORITY_PUBKEY]
--delegates-clear [true/false] # Optionally remove previously set delegates
--delegates-added [USER_DELEGATES_PUBKEYS] # Multiple pubkeys can be added
Note: Ask the run administrator for the
JOIN_AUTHORITY_PUBKEY.
-
Copy each delegate keypair file to its respective machine.
-
Fund each delegate wallet with SOL for transaction fees.
On each additional machine:
Configure the .env file to use that machine's delegate keypair:
WALLET_PRIVATE_KEY_PATH=/path/to/psyche-delegate-N.json
AUTHORIZER=YOUR_MASTER_PUBLIC_KEY
The AUTHORIZER should be your master key's public key (the one authorized by the run admin), not the delegate's public key.
Claiming
- Claiming Rewards: After participating in training, you can claim rewards using:
./run-manager treasurer-claim-rewards \ --rpc https://api.devnet.solana.com \ --run-id YOUR_RUN_ID \ --wallet-private-key-path ~/.config/solana/psyche-node.json
Quick Reference
| Command | Purpose |
|---|---|
nvidia-smi | Verify GPU and drivers |
docker run --rm --gpus all nvidia/cuda:12.0-base nvidia-smi | Verify GPU access in Docker |
solana-keygen pubkey ~/.config/solana/psyche-node.json | Get your public key |
solana balance ~/.config/solana/psyche-node.json | Check wallet balance |
./run-manager --env-file ~/.config/psyche/run.env | Start providing compute |
Ctrl+C | Stop the client gracefully |