Install, configure, and run Rete on every device you own. Mesh setup, CLI reference, troubleshooting.
Rete ships on Mac, Windows, Linux, and iOS. One license key covers every platform you own — no per-device fee.
Apple Silicon required (M1 or newer). macOS 13 (Ventura) or later.
Rete into /Applications.~/Library/Application Support/Rete/.Windows 10 (build 17763+) or Windows 11. x64 only.
%APPDATA%\Rete\.x86_64 only for now. Vulkan-capable GPU recommended (NVIDIA via the Vulkan ICD, AMD via Mesa, Intel via ANV); falls back to CPU.
For headless servers, NAS boxes, datacenter rentals. Sets up a systemd service so it auto-starts on boot.
curl -fsSL https://get.retes.app/install.sh | sh
The script:
/opt/rete-node/usr/local/bin/rete-nodeOverride paths via env vars: RETE_INSTALL_DIR=/elsewhere or RETE_DOWNLOAD_BASE=https://your-mirror.
Self-contained binary. No install, no root. Best for desktop Linux.
wget https://get.retes.app/rete-node-x86_64.AppImage chmod +x rete-node-x86_64.AppImage ./rete-node-x86_64.AppImage --host
The AppImage extracts to /tmp on each run. Bundle is ~49 MB.
For RunPod, Lambda, Vast.ai, and other GPU-on-demand cloud providers. Image: reteai/rete-node on Docker Hub.
docker run --gpus all -p 8080:8080 \ -e RETE_MODE=host \ -v rete-data:/data \ reteai/rete-node:latest
Modes via RETE_MODE:
RETE_MODE=host — full chat UI on port 8080RETE_JOIN_CODE=RETE-XXXX — join a remote mesh as compute requesterThe --gpus all flag requires nvidia-container-toolkit on the host (preinstalled on most cloud GPU providers).
Search "Rete" on the iPhone or iPad App Store. Standalone with a small on-device model (SmolLM2-360M); pairs with a Mac/Windows/Linux host for bigger models.
On first launch, Rete shows a license activation screen. Paste your RETE-XXXX-XXXX-XXXX key (sent in your purchase email) and click Activate. The same key works on every platform you own — Mac, Windows, Linux, iPhone, iPad.
Lost your key? Recover it at retes.app/recover using the email you used at checkout.
After running rete-node --host, the web UI is at http://localhost:8080. To access from another device on your network, find the Linux box's LAN IP and visit http://<ip>:8080. Rete listens on 0.0.0.0 by default — change with --bind 127.0.0.1 if you want to restrict.
| Platform | Path |
|---|---|
| macOS | ~/Library/Application Support/Rete/ |
| Windows | %APPDATA%\Rete\ |
| Linux | ~/.local/share/rete/ (XDG) |
| Docker | /data (mount a volume to persist) |
Conversations, downloaded models, and the license file all live there. To migrate Rete between machines, copy this directory over.
Rete uses GGUF-format models from HuggingFace. The in-app picker has 7 curated options, but you can drop any GGUF into the models directory.
| Model | Size | RAM needed | Notes |
|---|---|---|---|
| Phi-3 Mini 4K | 2.4 GB | 4 GB | Small, fast. Good default for any machine. |
| Llama 3.1 8B | 4.9 GB | 8 GB | Strong all-around. |
| Qwen 2.5 7B | 4.7 GB | 8 GB | Multilingual, code-friendly. |
| Qwen 2.5 Coder 7B | 4.7 GB | 8 GB | Best small-model coder. |
| Mistral 7B v0.3 | 4.4 GB | 8 GB | Fast, reliable. |
| Mistral Small 24B | 14 GB | 16 GB | Mid-tier; works well meshed. |
| Llama 3.1 70B | 42 GB | 48 GB | Frontier-tier. Pool with another machine. |
Drop any *.gguf file into the models directory and it'll appear in the model picker. Q4_K_M quants are the recommended starting point.
Mesh-only sizing: the largest model that fits on a single device is roughly (free RAM) − 2 GB. For bigger models, mesh: two devices with 16 GB each can run a 24 GB model that fits on neither alone.
Mesh networking lets multiple devices contribute compute to a single chat. Pair on LAN (zero config) or across the public internet (via invite code).
rpc-server, no UI.Same physical machine can be both — the Linux box you SSH into can be the host (web UI on port 8080) AND host model layers. Or it can be a pure provider with no model loaded.
Devices on the same WiFi auto-discover each other via mDNS. No config needed — open Rete on each device, they'll show up in the Mesh sidebar.
For pairing devices on different networks (your laptop on cellular + your home Linux box, or you + a friend on different ISPs), Rete uses an invite-code flow that tunnels through a Cloudflare WebSocket relay.
rete-node headless on Linux). You'll get a code like RETE-AVTZ.WAN performance: the relay tunnels llama.cpp's RPC frames over WebSocket. Bandwidth is fine (~6 KB per token activation), but RTT compounds across many round-trips per token. Expect ~0.2 tok/s on WAN vs near-native on LAN. Use WAN mesh for memory pooling (running models that don't fit locally), not real-time chat.
The Linux rete-node binary can be either side:
rete-node # PROVIDER (default) — generate invite, contribute compute rete-node --join RETE-XXXX # REQUESTER — tunnel a remote machine's compute to a local port rete-node --host # HOST — full Rete webapp at http://localhost:8080
Full rete-node flag reference. Run with no arguments to default to provider mode.
| Flag | Default | Description |
|---|---|---|
--host | off | Run the full webapp + chat UI on 0.0.0.0:8080. |
--join CODE | — | Run as requester. Joins an invite code, exposes remote compute on a local port. |
--port N | auto | Local rpc-server port (provider mode only). |
--listen-port N | auto | Local listen port (requester mode only). Point llama-server --rpc 127.0.0.1:N at it. |
--bind ADDR | 0.0.0.0 | Webapp bind address (host mode only). Use 127.0.0.1 to restrict to localhost. |
--http-port N | 8080 | Webapp HTTP port (host mode only). |
--relay-https URL | see below | HTTPS endpoint of the relay registry. Override for self-hosting. |
--relay-ws URL | see below | WSS endpoint of the relay tunneler. |
-h / --help | — | Print help and exit. |
Most configuration is via environment variables. Override on the command line, in your shell profile, or in the systemd unit.
| Variable | Default | Description |
|---|---|---|
RETE_DATA_DIR | ~/.local/share/rete | Where conversations, models, license, and logs live. |
RETE_RELAY_HTTPS | https://rete-relay.sanders-creech.workers.dev | Relay registry endpoint. |
RETE_RELAY_WS | wss://rete-relay.sanders-creech.workers.dev | Relay tunnel endpoint. |
RETE_RPC_SERVER | auto | Override path to bundled rpc-server binary. |
RETE_PEER_FREE_GB | 16 | Assumed free RAM for remote peers (used by the layer-split planner until peers report capabilities). |
RETE_DOWNLOAD_BASE | https://get.retes.app | Used by install.sh only. |
The curl install script offers to install a systemd unit. To customize, edit /etc/systemd/system/rete-node.service:
[Unit] Description=Rete Node — distributed inference compute provider After=network-online.target Wants=network-online.target [Service] Type=simple ExecStart=/opt/rete-node/rete-node --host Restart=on-failure RestartSec=10 User=YOUR_USER Environment=RETE_DATA_DIR=/var/lib/rete [Install] WantedBy=multi-user.target
Reload after editing: sudo systemctl daemon-reload && sudo systemctl restart rete-node.
Older Linux installs may not ship libgomp1 by default. The Rete bundle ships its own copy, so this should be rare — but if you see it:
# Debian / Ubuntu sudo apt install libgomp1 # Fedora / RHEL sudo dnf install libgomp
If GPU detection prints CPU [cpu] when you have a GPU, install the Vulkan loader:
# Debian / Ubuntu sudo apt install libvulkan1 mesa-vulkan-drivers # Fedora / RHEL sudo dnf install vulkan-loader mesa-vulkan-drivers # Arch sudo pacman -S vulkan-icd-loader
NVIDIA GPUs need the proprietary driver installed for Vulkan to find them. AMD/Intel work via Mesa.
Make sure you're online and your firewall allows HTTPS to retes.app. If the issue persists, the validation API may be slow scanning Stripe sessions for older purchases — the key is real, just retry in a minute. If still stuck, email hello@retes.app with your purchase email.
Another service (Jenkins, dev server) is using the port. Either stop that service or change Rete's port: rete-node --host --http-port 9090.
Install nvidia-container-toolkit on the host and pass --gpus all. Check with docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi.
~/.local/share/rete-node/logs/rpc-server.logjournalctl -u rete-node -f$20 once, lifetime. One license activates Rete on every device you own — Mac, Windows, Linux, iPhone, iPad. No subscriptions, no per-device fees, no telemetry.
Not currently. The llama.cpp backend is open source (we contribute patches upstream). The Rete app, mesh runtime, web UI, and Cloudflare relay are closed source for now.
No. Conversations, documents, and model weights stay on your devices. The only network traffic Rete generates is:
retes.app/api/validate (one HTTP call per activation)No telemetry, no analytics, no account.
Single device: roughly (free RAM) − 2 GB for the GGUF Q4_K_M quants. Meshed: sum of free RAM across all paired devices. A pair of 32 GB Macs can run Llama 3.1 70B (42 GB) that fits on neither alone.
llama.cpp's RPC backend is the only mature path I found for splitting a model's layers across machines at runtime, with a wire protocol simple enough to tunnel through a relay. Ollama wraps llama.cpp internally but doesn't expose distributed inference. vLLM is single-node-multi-GPU, not designed for cross-machine mesh.
Yes. The rete-node binary in provider or requester mode (without --host) doesn't require a license — it's just compute. The license activates the chat UI (--host mode and the desktop apps). One paid Mac/Windows host can mesh with any number of free Linux providers.
Email hello@retes.app with logs from the paths in Troubleshooting. A GitHub issues page is on the roadmap.