Documentation

Rete docs

Install, configure, and run Rete on every device you own. Mesh setup, CLI reference, troubleshooting.

Installation

Rete ships on Mac, Windows, Linux, and iOS. One license key covers every platform you own — no per-device fee.

macOS

Apple Silicon required (M1 or newer). macOS 13 (Ventura) or later.

  1. Download the DMG: rete-mac.dmg
  2. Open the DMG and drag Rete into /Applications.
  3. Launch Rete. The first launch sets up data dirs at ~/Library/Application Support/Rete/.

Windows

Windows 10 (build 17763+) or Windows 11. x64 only.

  1. Open the Microsoft Store listing: apps.microsoft.com/detail/9ngwjp2dlcdn
  2. Click Install.
  3. Launch Rete from the Start menu. First launch sets up data dirs at %APPDATA%\Rete\.

Linux

x86_64 only for now. Vulkan-capable GPU recommended (NVIDIA via the Vulkan ICD, AMD via Mesa, Intel via ANV); falls back to CPU.

For headless servers, NAS boxes, datacenter rentals. Sets up a systemd service so it auto-starts on boot.

curl -fsSL https://get.retes.app/install.sh | sh

The script:

  • Verifies architecture (x86_64 only)
  • Downloads the tarball to /opt/rete-node
  • Symlinks /usr/local/bin/rete-node
  • Optionally installs a systemd unit (asks during install)

Override paths via env vars: RETE_INSTALL_DIR=/elsewhere or RETE_DOWNLOAD_BASE=https://your-mirror.

Self-contained binary. No install, no root. Best for desktop Linux.

wget https://get.retes.app/rete-node-x86_64.AppImage
chmod +x rete-node-x86_64.AppImage
./rete-node-x86_64.AppImage --host

The AppImage extracts to /tmp on each run. Bundle is ~49 MB.

For RunPod, Lambda, Vast.ai, and other GPU-on-demand cloud providers. Image: reteai/rete-node on Docker Hub.

docker run --gpus all -p 8080:8080 \
  -e RETE_MODE=host \
  -v rete-data:/data \
  reteai/rete-node:latest

Modes via RETE_MODE:

  • RETE_MODE=host — full chat UI on port 8080
  • RETE_JOIN_CODE=RETE-XXXX — join a remote mesh as compute requester
  • (default) — provider mode, prints an invite code on stdout

The --gpus all flag requires nvidia-container-toolkit on the host (preinstalled on most cloud GPU providers).

iOS

Search "Rete" on the iPhone or iPad App Store. Standalone with a small on-device model (SmolLM2-360M); pairs with a Mac/Windows/Linux host for bigger models.

First launch & license activation

On first launch, Rete shows a license activation screen. Paste your RETE-XXXX-XXXX-XXXX key (sent in your purchase email) and click Activate. The same key works on every platform you own — Mac, Windows, Linux, iPhone, iPad.

Lost your key? Recover it at retes.app/recover using the email you used at checkout.

Linux web UI access

After running rete-node --host, the web UI is at http://localhost:8080. To access from another device on your network, find the Linux box's LAN IP and visit http://<ip>:8080. Rete listens on 0.0.0.0 by default — change with --bind 127.0.0.1 if you want to restrict.

Where data lives

PlatformPath
macOS~/Library/Application Support/Rete/
Windows%APPDATA%\Rete\
Linux~/.local/share/rete/ (XDG)
Docker/data (mount a volume to persist)

Conversations, downloaded models, and the license file all live there. To migrate Rete between machines, copy this directory over.

Choosing a model

Rete uses GGUF-format models from HuggingFace. The in-app picker has 7 curated options, but you can drop any GGUF into the models directory.

Curated catalog

ModelSizeRAM neededNotes
Phi-3 Mini 4K2.4 GB4 GBSmall, fast. Good default for any machine.
Llama 3.1 8B4.9 GB8 GBStrong all-around.
Qwen 2.5 7B4.7 GB8 GBMultilingual, code-friendly.
Qwen 2.5 Coder 7B4.7 GB8 GBBest small-model coder.
Mistral 7B v0.34.4 GB8 GBFast, reliable.
Mistral Small 24B14 GB16 GBMid-tier; works well meshed.
Llama 3.1 70B42 GB48 GBFrontier-tier. Pool with another machine.

Bring your own GGUF

Drop any *.gguf file into the models directory and it'll appear in the model picker. Q4_K_M quants are the recommended starting point.

Mesh-only sizing: the largest model that fits on a single device is roughly (free RAM) − 2 GB. For bigger models, mesh: two devices with 16 GB each can run a 24 GB model that fits on neither alone.

Mesh setup

Mesh networking lets multiple devices contribute compute to a single chat. Pair on LAN (zero config) or across the public internet (via invite code).

Roles

Same physical machine can be both — the Linux box you SSH into can be the host (web UI on port 8080) AND host model layers. Or it can be a pure provider with no model loaded.

LAN mesh

Devices on the same WiFi auto-discover each other via mDNS. No config needed — open Rete on each device, they'll show up in the Mesh sidebar.

WAN mesh (across networks)

For pairing devices on different networks (your laptop on cellular + your home Linux box, or you + a friend on different ISPs), Rete uses an invite-code flow that tunnels through a Cloudflare WebSocket relay.

  1. On the compute provider, click Invite in the Mesh sidebar (or run rete-node headless on Linux). You'll get a code like RETE-AVTZ.
  2. On the chat host, paste the code into the Join field.
  3. The two endpoints pair through the relay automatically. Subsequent chats use the provider's compute via llama.cpp's RPC backend.

WAN performance: the relay tunnels llama.cpp's RPC frames over WebSocket. Bandwidth is fine (~6 KB per token activation), but RTT compounds across many round-trips per token. Expect ~0.2 tok/s on WAN vs near-native on LAN. Use WAN mesh for memory pooling (running models that don't fit locally), not real-time chat.

Linux node modes

The Linux rete-node binary can be either side:

rete-node                       # PROVIDER (default) — generate invite, contribute compute
rete-node --join RETE-XXXX      # REQUESTER — tunnel a remote machine's compute to a local port
rete-node --host                # HOST — full Rete webapp at http://localhost:8080

CLI reference (Linux)

Full rete-node flag reference. Run with no arguments to default to provider mode.

FlagDefaultDescription
--hostoffRun the full webapp + chat UI on 0.0.0.0:8080.
--join CODERun as requester. Joins an invite code, exposes remote compute on a local port.
--port NautoLocal rpc-server port (provider mode only).
--listen-port NautoLocal listen port (requester mode only). Point llama-server --rpc 127.0.0.1:N at it.
--bind ADDR0.0.0.0Webapp bind address (host mode only). Use 127.0.0.1 to restrict to localhost.
--http-port N8080Webapp HTTP port (host mode only).
--relay-https URLsee belowHTTPS endpoint of the relay registry. Override for self-hosting.
--relay-ws URLsee belowWSS endpoint of the relay tunneler.
-h / --helpPrint help and exit.

Configuration

Most configuration is via environment variables. Override on the command line, in your shell profile, or in the systemd unit.

VariableDefaultDescription
RETE_DATA_DIR~/.local/share/reteWhere conversations, models, license, and logs live.
RETE_RELAY_HTTPShttps://rete-relay.sanders-creech.workers.devRelay registry endpoint.
RETE_RELAY_WSwss://rete-relay.sanders-creech.workers.devRelay tunnel endpoint.
RETE_RPC_SERVERautoOverride path to bundled rpc-server binary.
RETE_PEER_FREE_GB16Assumed free RAM for remote peers (used by the layer-split planner until peers report capabilities).
RETE_DOWNLOAD_BASEhttps://get.retes.appUsed by install.sh only.

systemd unit (advanced)

The curl install script offers to install a systemd unit. To customize, edit /etc/systemd/system/rete-node.service:

[Unit]
Description=Rete Node — distributed inference compute provider
After=network-online.target
Wants=network-online.target

[Service]
Type=simple
ExecStart=/opt/rete-node/rete-node --host
Restart=on-failure
RestartSec=10
User=YOUR_USER
Environment=RETE_DATA_DIR=/var/lib/rete

[Install]
WantedBy=multi-user.target

Reload after editing: sudo systemctl daemon-reload && sudo systemctl restart rete-node.

Troubleshooting

"libgomp.so.1: cannot open shared object file"

Older Linux installs may not ship libgomp1 by default. The Rete bundle ships its own copy, so this should be rare — but if you see it:

# Debian / Ubuntu
sudo apt install libgomp1
# Fedora / RHEL
sudo dnf install libgomp

Vulkan loader missing

If GPU detection prints CPU [cpu] when you have a GPU, install the Vulkan loader:

# Debian / Ubuntu
sudo apt install libvulkan1 mesa-vulkan-drivers
# Fedora / RHEL
sudo dnf install vulkan-loader mesa-vulkan-drivers
# Arch
sudo pacman -S vulkan-icd-loader

NVIDIA GPUs need the proprietary driver installed for Vulkan to find them. AMD/Intel work via Mesa.

License key shows "not found, check your connection"

Make sure you're online and your firewall allows HTTPS to retes.app. If the issue persists, the validation API may be slow scanning Stripe sessions for older purchases — the key is real, just retry in a minute. If still stuck, email hello@retes.app with your purchase email.

Mesh peer pairing fails

Port 8080 already in use

Another service (Jenkins, dev server) is using the port. Either stop that service or change Rete's port: rete-node --host --http-port 9090.

Docker container can't see GPU

Install nvidia-container-toolkit on the host and pass --gpus all. Check with docker run --rm --gpus all nvidia/cuda:12.6.0-base-ubuntu22.04 nvidia-smi.

Logs

FAQ

How does pricing work?

$20 once, lifetime. One license activates Rete on every device you own — Mac, Windows, Linux, iPhone, iPad. No subscriptions, no per-device fees, no telemetry.

Is Rete open source?

Not currently. The llama.cpp backend is open source (we contribute patches upstream). The Rete app, mesh runtime, web UI, and Cloudflare relay are closed source for now.

Does Rete send data anywhere?

No. Conversations, documents, and model weights stay on your devices. The only network traffic Rete generates is:

No telemetry, no analytics, no account.

What hardware is supported?

What's the largest model I can run?

Single device: roughly (free RAM) − 2 GB for the GGUF Q4_K_M quants. Meshed: sum of free RAM across all paired devices. A pair of 32 GB Macs can run Llama 3.1 70B (42 GB) that fits on neither alone.

Why llama.cpp instead of Ollama / vLLM / etc?

llama.cpp's RPC backend is the only mature path I found for splitting a model's layers across machines at runtime, with a wire protocol simple enough to tunnel through a relay. Ollama wraps llama.cpp internally but doesn't expose distributed inference. vLLM is single-node-multi-GPU, not designed for cross-machine mesh.

Can I run a Linux node without paying?

Yes. The rete-node binary in provider or requester mode (without --host) doesn't require a license — it's just compute. The license activates the chat UI (--host mode and the desktop apps). One paid Mac/Windows host can mesh with any number of free Linux providers.

Where do I report bugs?

Email hello@retes.app with logs from the paths in Troubleshooting. A GitHub issues page is on the roadmap.