Proxmox cluster
A four-node home cluster + a Raspberry-Pi QDevice that keeps quorum alive even with three of four nodes down.
- Nodes
- 4
- QDevice
- RPi
- Quorum
- 4 / 7
- Uplink
- 10 G
Topology
Four nodes meshed via corosync over the LAN, plus a Raspberry Pi off to the side that doubles as the backup server and a corosync QDevice. Hover any node:
The hardware
| node | role | cpu | ram | notes |
|---|---|---|---|---|
| pvedesktopmsi | primary · GPU · NAS | i9-13900K · 24c/32t | 64 GB | RTX 4080 · 2× ZFS mirror naspool (5.4 TB) · 10G SFP+ uplink |
| pvelaptop | haos host · battery UPS | i5-5200U · 2c/4t | 16 GB | internal battery keeps HA alive past the rack UPS |
| pvewyse5070 | services workhorse | Celeron J4105 · 4c/4t | 32 GB | 9 containers including production mapping backend · CyberPower UPS over USB |
| pvehp | general-purpose 4th node | i7-6700T · 4c/8t | 32 GB | HP EliteDesk Mini · USB-2.5G + onboard 1G bonded (active-backup) · this site lives here |
| pi (192.168.1.223) | PBS + QDevice | BCM2711 · 4c | 8 GB | 8 TB IronWolf via USB-SATA · vote weight 3 under LMS algorithm |
Why the laptop is one of the nodes
The whole rack is on a UPS, but the laptop’s internal battery buys the cluster a much longer tail in an extended outage. The HAOS virtual machine (Home Assistant’s control plane for the rest of the house) lives on that node so the orchestration brain is the last thing to lose power.
A 5-minute systemd timer pushes the battery’s capacity, voltage, cycle count and AC-online state into InfluxDB; Grafana plots the long-term health curve so the day the battery falls below 80 % of design capacity doesn’t sneak up.
# /usr/local/bin/push-battery-metrics.sh (excerpt)
read_int() { tr -d '\n' < "/sys/class/power_supply/BAT1/$1" 2>/dev/null; }
charge_now=$(read_int charge_now)
charge_full_design=$(read_int charge_full_design)
health_pct=$(awk -v a=$charge_full -v b=$charge_full_design 'BEGIN{printf "%.2f", 100*a/b}')
curl -sS -u proxmox_writer:****** -XPOST \
"http://192.168.1.118:8086/write?db=proxmox" \
--data-binary "battery,host=pvelaptop,model=PA5185U-1BRS health_pct=$health_pct"The 10G networking story
The desktop host used to ride two onboard I226-V 2.5 Gbps interfaces. A known firmware bug on that NIC produced a hard link flap every few minutes, which broke long-running TCP streams, most painfully PBS chunk uploads, which the kernel held open well past the HTTP/2 idle timeout.
A Mellanox ConnectX-3 EN now collapses everything onto a single 10 G SFP+ uplink to the aggregation switch. iperf3 sustains line rate with zero retransmits:
$ iperf3 -c 192.168.1.223 -t 30 -P 4
[SUM] 0.00-30.00 sec 3.29 GBytes 942 Mbits/sec 0 sender
[SUM] 0.00-30.04 sec 3.29 GBytes 942 Mbits/sec receiver
# (capped at ~942 Mb/s by the Pi side, not the link)The trade-off is a single point of failure, which is the right one to take for a home lab: the two onboard ports remain cabled in, just admin-down, ready to bond back in if the Mellanox ever dies.
Quorum math
Four cluster nodes (1 vote each) plus a Pi-hosted QDevice (3 votes under the LMS algorithm) gives 7 total votes, quorum = 4. One surviving node plus the Pi clears that bar, so three of the four nodes can drop and the cluster keeps running.
$ sudo pvecm status
Quorum information
Date: Fri May 22 21:47:50 2026
Nodes: 4
Quorate: Yes
Votequorum information
Expected votes: 7
Highest expected: 7
Total votes: 7
Quorum: 4
Flags: Quorate Qdevice
Membership information
Nodeid Votes Qdevice Name
0x00000001 1 A,V,NMW pvedesktopmsi
0x00000002 1 A,V,NMW pvelaptop
0x00000003 1 A,V,NMW pvewyse5070
0x00000004 1 A,V,NMW pvehp
0x00000000 3 QdeviceBackups
Proxmox Backup Server runs on the same Pi. The schedule is staggered per-node (21:00, 22:00, 23:00, 00:00) because the Pi cannot drain all four nodes’ dirty bitmaps simultaneously without stalling, and a stalled qemu I/O path has cost me an ext4 corruption inside HAOS more than once. The big container (a ~2 TB photo store on a ZFS mirror) gets its own weekly slot at 2 a.m. on Sunday so its 6–10 hour run never collides with anything else.