Infrastructure

Proxmox cluster

A four-node home cluster + a Raspberry-Pi QDevice that keeps quorum alive even with three of four nodes down.

Proxmox VE 9ZFSCorosyncPBSMellanox 10G

Nodes: 4
QDevice: RPi
Quorum: 4 / 7
Uplink: 10 G

Topology

Four nodes meshed via corosync over the LAN, plus a Raspberry Pi off to the side that doubles as the backup server and a corosync QDevice. Hover any node:

The hardware

node	role	cpu	ram	notes
pvedesktopmsi	primary · GPU · NAS	i9-13900K · 24c/32t	64 GB	RTX 4080 · 2× ZFS mirror naspool (5.4 TB) · 10G SFP+ uplink
pvelaptop	haos host · battery UPS	i5-5200U · 2c/4t	16 GB	internal battery keeps HA alive past the rack UPS
pvewyse5070	services workhorse	Celeron J4105 · 4c/4t	32 GB	9 containers including production mapping backend · CyberPower UPS over USB
pvehp	general-purpose 4th node	i7-6700T · 4c/8t	32 GB	HP EliteDesk Mini · USB-2.5G + onboard 1G bonded (active-backup) · this site lives here
pi (192.168.1.223)	PBS + QDevice	BCM2711 · 4c	8 GB	8 TB IronWolf via USB-SATA · vote weight 3 under LMS algorithm

Why the laptop is one of the nodes

The whole rack is on a UPS, but the laptop’s internal battery buys the cluster a much longer tail in an extended outage. The HAOS virtual machine (Home Assistant’s control plane for the rest of the house) lives on that node so the orchestration brain is the last thing to lose power.

A 5-minute systemd timer pushes the battery’s capacity, voltage, cycle count and AC-online state into InfluxDB; Grafana plots the long-term health curve so the day the battery falls below 80 % of design capacity doesn’t sneak up.

bash

# /usr/local/bin/push-battery-metrics.sh (excerpt)
read_int() { tr -d '\n' < "/sys/class/power_supply/BAT1/$1" 2>/dev/null; }
charge_now=$(read_int charge_now)
charge_full_design=$(read_int charge_full_design)
health_pct=$(awk -v a=$charge_full -v b=$charge_full_design 'BEGIN{printf "%.2f", 100*a/b}')
curl -sS -u proxmox_writer:****** -XPOST \
  "http://192.168.1.118:8086/write?db=proxmox" \
  --data-binary "battery,host=pvelaptop,model=PA5185U-1BRS health_pct=$health_pct"

The 10G networking story

The desktop host used to ride two onboard I226-V 2.5 Gbps interfaces. A known firmware bug on that NIC produced a hard link flap every few minutes, which broke long-running TCP streams, most painfully PBS chunk uploads, which the kernel held open well past the HTTP/2 idle timeout.

A Mellanox ConnectX-3 EN now collapses everything onto a single 10 G SFP+ uplink to the aggregation switch. iperf3 sustains line rate with zero retransmits:

shell

$ iperf3 -c 192.168.1.223 -t 30 -P 4
[SUM]   0.00-30.00  sec  3.29 GBytes  942 Mbits/sec  0    sender
[SUM]   0.00-30.04  sec  3.29 GBytes  942 Mbits/sec       receiver
# (capped at ~942 Mb/s by the Pi side, not the link)

The trade-off is a single point of failure, which is the right one to take for a home lab: the two onboard ports remain cabled in, just admin-down, ready to bond back in if the Mellanox ever dies.

Quorum math

Four cluster nodes (1 vote each) plus a Pi-hosted QDevice (3 votes under the LMS algorithm) gives 7 total votes, quorum = 4. One surviving node plus the Pi clears that bar, so three of the four nodes can drop and the cluster keeps running.

text

$ sudo pvecm status
Quorum information
  Date:             Fri May 22 21:47:50 2026
  Nodes:            4
  Quorate:          Yes

Votequorum information
  Expected votes:   7
  Highest expected: 7
  Total votes:      7
  Quorum:           4
  Flags:            Quorate Qdevice

Membership information
  Nodeid      Votes    Qdevice Name
  0x00000001       1   A,V,NMW pvedesktopmsi
  0x00000002       1   A,V,NMW pvelaptop
  0x00000003       1   A,V,NMW pvewyse5070
  0x00000004       1   A,V,NMW pvehp
  0x00000000       3            Qdevice

Backups

Proxmox Backup Server runs on the same Pi. The schedule is staggered per-node (21:00, 22:00, 23:00, 00:00) because the Pi cannot drain all four nodes’ dirty bitmaps simultaneously without stalling, and a stalled qemu I/O path has cost me an ext4 corruption inside HAOS more than once. The big container (a ~2 TB photo store on a ZFS mirror) gets its own weekly slot at 2 a.m. on Sunday so its 6–10 hour run never collides with anything else.