Documentation

Rete docs

Install, configure, and run Rete on every device you own. Mesh setup, CLI reference, troubleshooting.

Installation

Rete ships on Mac, Windows, Linux, and iOS. One license key covers every platform you own — no per-device fee.

macOS

Apple Silicon required (M1 or newer). macOS 13 (Ventura) or later.

Download the DMG: rete-mac.dmg
Open the DMG and drag Rete into /Applications.
Launch Rete. The first launch sets up data dirs at ~/Library/Application Support/Rete/.

Windows

Windows 10 (build 17763+) or Windows 11. x64 only.

Open the Microsoft Store listing: apps.microsoft.com/detail/9ngwjp2dlcdn
Click Install.
Launch Rete from the Start menu. First launch sets up data dirs at %APPDATA%\Rete\.

Linux

x86_64 only for now. Vulkan-capable GPU recommended (NVIDIA via the Vulkan ICD, AMD via Mesa, Intel via ANV); falls back to CPU.

For headless servers, NAS boxes, datacenter rentals. Sets up a systemd service so it auto-starts on boot.

curl -fsSL https://get.retes.app/install.sh | sh

The script:

Verifies architecture (x86_64 only)
Downloads the tarball to /opt/rete-node
Symlinks /usr/local/bin/rete-node
Optionally installs a systemd unit (asks during install)

Override paths via env vars: RETE_INSTALL_DIR=/elsewhere or RETE_DOWNLOAD_BASE=https://your-mirror.

Self-contained binary. No install, no root. Best for desktop Linux.

wget https://get.retes.app/rete-node-x86_64.AppImage
chmod +x rete-node-x86_64.AppImage
./rete-node-x86_64.AppImage --host

The AppImage extracts to /tmp on each run. Bundle is ~49 MB.

For RunPod, Lambda, Vast.ai, and other GPU-on-demand cloud providers. Image: reteai/rete-node on Docker Hub.

docker run --gpus all -p 8080:8080 \
  -e RETE_MODE=host \
  -v rete-data:/data \
  reteai/rete-node:latest

Modes via RETE_MODE:

RETE_MODE=host — full chat UI on port 8080
RETE_JOIN_CODE=RETE-XXXX — join a remote mesh as compute requester
(default) — provider mode, prints an invite code on stdout

The --gpus all flag requires nvidia-container-toolkit on the host (preinstalled on most cloud GPU providers).

iOS

Search "Rete" on the iPhone or iPad App Store. Standalone with a small on-device model (SmolLM2-360M); pairs with a Mac/Windows/Linux host for bigger models.

First launch & license activation

On first launch, Rete shows a license activation screen. Paste your RETE-XXXX-XXXX-XXXX key (sent in your purchase email) and click Activate. The same key works on every platform you own — Mac, Windows, Linux, iPhone, iPad.

Lost your key? Recover it at retes.app/recover using the email you used at checkout.

Linux web UI access

After running rete-node --host, the web UI is at http://localhost:8080. To access from another device on your network, find the Linux box's LAN IP and visit http://<ip>:8080. Rete listens on 0.0.0.0 by default — change with --bind 127.0.0.1 if you want to restrict.

Where data lives

Platform	Path
macOS	`~/Library/Application Support/Rete/`
Windows	`%APPDATA%\Rete\`
Linux	`~/.local/share/rete/` (XDG)
Docker	`/data` (mount a volume to persist)

Conversations, downloaded models, and the license file all live there. To migrate Rete between machines, copy this directory over.

Choosing a model

Rete uses GGUF-format models from HuggingFace. The in-app picker has 7 curated options, but you can drop any GGUF into the models directory.

Curated catalog

Model	Size	RAM needed	Notes
Phi-3 Mini 4K	2.4 GB	4 GB	Small, fast. Good default for any machine.
Llama 3.1 8B	4.9 GB	8 GB	Strong all-around.
Qwen 2.5 7B	4.7 GB	8 GB	Multilingual, code-friendly.
Qwen 2.5 Coder 7B	4.7 GB	8 GB	Best small-model coder.
Mistral 7B v0.3	4.4 GB	8 GB	Fast, reliable.
Mistral Small 24B	14 GB	16 GB	Mid-tier; works well meshed.
Llama 3.1 70B	42 GB	48 GB	Frontier-tier. Pool with another machine.

Bring your own GGUF

Drop any *.gguf file into the models directory and it'll appear in the model picker. Q4_K_M quants are the recommended starting point.

Mesh-only sizing: the largest model that fits on a single device is roughly (free RAM) − 2 GB. For bigger models, mesh: two devices with 16 GB each can run a 24 GB model that fits on neither alone.

Mesh setup

Mesh networking lets multiple devices contribute compute to a single chat. Pair on LAN (zero config) or across the public internet (via invite code).

Roles

Host (chat client): the device running the chat UI. Loads model layers, dispatches inference work to peers.
Provider (compute node): shares its GPU/RAM. Runs rpc-server, no UI.

Same physical machine can be both — the Linux box you SSH into can be the host (web UI on port 8080) AND host model layers. Or it can be a pure provider with no model loaded.

LAN mesh

Devices on the same WiFi auto-discover each other via mDNS. No config needed — open Rete on each device, they'll show up in the Mesh sidebar.

WAN mesh (across networks)

For pairing devices on different networks (your laptop on cellular + your home Linux box, or you + a friend on different ISPs), Rete uses an invite-code flow that tunnels through a Cloudflare WebSocket relay.

On the compute provider, click Invite in the Mesh sidebar (or run rete-node headless on Linux). You'll get a code like RETE-AVTZ.
On the chat host, paste the code into the Join field.
The two endpoints pair through the relay automatically. Subsequent chats use the provider's compute via llama.cpp's RPC backend.

WAN performance: the relay tunnels llama.cpp's RPC frames over WebSocket. Bandwidth is fine (~6 KB per token activation), but RTT compounds across many round-trips per token. Expect ~0.2 tok/s on WAN vs near-native on LAN. Use WAN mesh for memory pooling (running models that don't fit locally), not real-time chat.

Linux node modes

The Linux rete-node binary can be either side:

rete-node                       # PROVIDER (default) — generate invite, contribute compute
rete-node --join RETE-XXXX      # REQUESTER — tunnel a remote machine's compute to a local port
rete-node --host                # HOST — full Rete webapp at http://localhost:8080

CLI reference (Linux)

Full rete-node flag reference. Run with no arguments to default to provider mode.

Flag	Default	Description
`--host`	off	Run the full webapp + chat UI on `0.0.0.0:8080`.
`--join CODE`	—	Run as requester. Joins an invite code, exposes remote compute on a local port.
`--port N`	auto	Local `rpc-server` port (provider mode only).
`--listen-port N`	auto	Local listen port (requester mode only). Point `llama-server --rpc 127.0.0.1:N` at it.
`--bind ADDR`	`0.0.0.0`	Webapp bind address (host mode only). Use `127.0.0.1` to restrict to localhost.
`--http-port N`	8080	Webapp HTTP port (host mode only).
`--relay-https URL`	see below	HTTPS endpoint of the relay registry. Override for self-hosting.
`--relay-ws URL`	see below	WSS endpoint of the relay tunneler.
`-h` / `--help`	—	Print help and exit.

Configuration

Most configuration is via environment variables. Override on the command line, in your shell profile, or in the systemd unit.

Variable	Default	Description
`RETE_DATA_DIR`	`~/.local/share/rete`	Where conversations, models, license, and logs live.
`RETE_RELAY_HTTPS`	`https://rete-relay.sanders-creech.workers.dev`	Relay registry endpoint.
`RETE_RELAY_WS`	`wss://rete-relay.sanders-creech.workers.dev`	Relay tunnel endpoint.
`RETE_RPC_SERVER`	auto	Override path to bundled `rpc-server` binary.
`RETE_PEER_FREE_GB`	`16`	Assumed free RAM for remote peers (used by the layer-split planner until peers report capabilities).
`RETE_DOWNLOAD_BASE`	`https://get.retes.app`	Used by `install.sh` only.

systemd unit (advanced)

The curl install script offers to install a systemd unit. To customize, edit /etc/systemd/system/rete-node.service:

[Unit]
Description=Rete Node — distributed inference compute provider
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStart=/opt/rete-node/rete-node --host
Restart=on-failure
RestartSec=10
User=YOUR_USER
Environment=RETE_DATA_DIR=/var/lib/rete

[Install]
WantedBy=multi-user.target

Reload after editing: sudo systemctl daemon-reload && sudo systemctl restart rete-node.

Troubleshooting

"libgomp.so.1: cannot open shared object file"

Older Linux installs may not ship libgomp1 by default. The Rete bundle ships its own copy, so this should be rare — but if you see it:

# Debian / Ubuntu
sudo apt install libgomp1
# Fedora / RHEL
sudo dnf install libgomp

Vulkan loader missing

If GPU detection prints CPU [cpu] when you have a GPU, install the Vulkan loader:

# Debian / Ubuntu
sudo apt install libvulkan1 mesa-vulkan-drivers
# Fedora / RHEL
sudo dnf install vulkan-loader mesa-vulkan-drivers
# Arch
sudo pacman -S vulkan-icd-loader

NVIDIA GPUs need the proprietary driver installed for Vulkan to find them. AMD/Intel work via Mesa.

License key shows "not found, check your connection"

Make sure you're online and your firewall allows HTTPS to retes.app. If the issue persists, the validation API may be slow scanning Stripe sessions for older purchases — the key is real, just retry in a minute. If still stuck, email hello@retes.app with your purchase email.

Mesh peer pairing fails

Invite expired: codes expire after 10 minutes. Generate a fresh one.
Already used: each code is single-use. Generate again on the provider side.
Connection drops mid-load: WAN mesh weight load takes time (1.2 GB ≈ 2 min, 42 GB ≈ 70 min). Don't kill the chat early.

Port 8080 already in use

Another service (Jenkins, dev server) is using the port. Either stop that service or change Rete's port: rete-node --host --http-port 9090.

Docker container can't see GPU

Install nvidia-container-toolkit on the host and pass --gpus all. Check with docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi.

Logs

Linux: ~/.local/share/rete-node/logs/rpc-server.log
systemd: journalctl -u rete-node -f
macOS: Console.app → look for "Rete"

FAQ

How does pricing work?

$20 once, lifetime. One license activates Rete on every device you own — Mac, Windows, Linux, iPhone, iPad. No subscriptions, no per-device fees, no telemetry.

Is Rete open source?

Not currently. The llama.cpp backend is open source (we contribute patches upstream). The Rete app, mesh runtime, web UI, and Cloudflare relay are closed source for now.

Does Rete send data anywhere?

No. Conversations, documents, and model weights stay on your devices. The only network traffic Rete generates is:

License validation against retes.app/api/validate (one HTTP call per activation)
Model downloads from HuggingFace (when you click Download in the model picker)
Mesh traffic between your devices (LAN direct, or WAN via the Cloudflare relay)

No telemetry, no analytics, no account.

What hardware is supported?

Mac: Apple Silicon (M1+). Intel Macs not supported (MLX requirement).
Windows: x64. Vulkan or DirectML for GPU acceleration.
Linux: x86_64. Vulkan for GPU (NVIDIA / AMD / Intel). Native CUDA build coming.
iOS: A17 Pro or newer for the standalone model. Older devices can still mesh.

What's the largest model I can run?

Single device: roughly (free RAM) − 2 GB for the GGUF Q4_K_M quants. Meshed: sum of free RAM across all paired devices. A pair of 32 GB Macs can run Llama 3.1 70B (42 GB) that fits on neither alone.

Why llama.cpp instead of Ollama / vLLM / etc?

llama.cpp's RPC backend is the only mature path I found for splitting a model's layers across machines at runtime, with a wire protocol simple enough to tunnel through a relay. Ollama wraps llama.cpp internally but doesn't expose distributed inference. vLLM is single-node-multi-GPU, not designed for cross-machine mesh.

Can I run a Linux node without paying?

Yes. The rete-node binary in provider or requester mode (without --host) doesn't require a license — it's just compute. The license activates the chat UI (--host mode and the desktop apps). One paid Mac/Windows host can mesh with any number of free Linux providers.

Where do I report bugs?

Email hello@retes.app with logs from the paths in Troubleshooting. A GitHub issues page is on the roadmap.