HomeLab HQ | Data Extremes

Why PBS Over vzdump

Proxmox VE's built-in vzdump writes full VM/CT dumps every job. It works, but each backup is the full image — slow, space-heavy, and painful to keep more than a week or two of.

Proxmox Backup Server (PBS) speaks a different protocol: client-side chunking with content-defined boundaries, server-side deduplication, and incremental-forever via dirty-bitmap tracking. The second daily backup of a 100GB VM might be 200MB. Verify jobs cryptographically check chunks; sync jobs replicate to another PBS for offsite.

Install

PBS is its own distro. The clean path is a dedicated VM or bare-metal host:

Download the PBS ISO from proxmox.com
Install on a host with at least 4GB RAM and a dedicated datastore disk
Disable the enterprise repo, enable the no-subscription repo
apt update && apt full-upgrade

Resist the urge to run PBS in an LXC on the same Proxmox host you are backing up. It works, but a host failure takes both your VMs and your backups offline at once.

Datastore Layout

A datastore is a directory PBS uses to store chunks, indices, and metadata. The disk layout choice is permanent for that datastore:

ZFS pool (mirror or raidz2): Recommended. Snapshots, scrub, and checksums on top of PBS's own checksums. See the ZFS pool design guide — mirrors are best because PBS workloads are random I/O during prune/verify.
Single SSD: Fast, but no redundancy. Pair with a sync job to a remote PBS or you have one copy.
Avoid USB drives. PBS will create thousands of chunk files; USB enclosures choke on the metadata workload.

Create the datastore via the web UI: Datastore → Add Datastore. Point it at a mount path (e.g., /mnt/datastore/main).

Connect PVE to PBS

Get the datastore fingerprint from PBS: Dashboard → Show Fingerprint. Create an API token: Access Control → API Token with DatastoreBackup role on the datastore.

On Proxmox VE: Datacenter → Storage → Add → Proxmox Backup Server. Fill in:

ID: pbs-main
Server: PBS hostname or IP
Username: backup@pbs!pve-token (the format with the !)
Password: the token secret
Datastore: name you created
Fingerprint: paste it

Backup Jobs

On PVE: Datacenter → Backup → Add. Settings worth tuning:

Schedule: Daily at 2 AM is fine for most. Stagger if you have many jobs so they do not all hit network/disk at once.
Mode: Snapshot for live backups. Uses qemu agent for guest filesystem freeze if installed — install qemu-guest-agent in every VM.
Selection: All with explicit exclusions beats picking VMs one by one and forgetting new ones.
Notification: Always-fail-only is annoying noise; pick "On failure" and trust the PBS dashboard for daily green status.
Retention: Set per-job or globally on the datastore — discussed below.

Retention Policy

PBS uses a GFS (grandfather-father-son) policy with explicit keep-N values. A sane starting point:

keep-last:    3
keep-daily:   7
keep-weekly:  4
keep-monthly: 6
keep-yearly:  1

Configure under Datastore → Prune & GC or per job. Run a Prune job nightly and a GC job weekly. GC walks the chunk store and frees deduped chunks that no snapshot references — the actual space reclaimer.

Verify Jobs: Catch Bit Rot

Schedule a Verify job to re-read and cryptographically check stored chunks. Weekly with "skip verified within 30 days" is the standard cadence. Without verification, a backup that looks fine in the UI can still be silently corrupted — you find out at restore time.

Namespaces for Multi-PVE

If you back up multiple PVE clusters or hosts, use namespaces so they do not collide on VM ID (every PVE numbers VMs from 100). Under Datastore → namespace:

pve-main/
pve-edge/
external/

Set the namespace in the PVE storage config. Each PVE only sees its own backups in the restore UI.

Sync Jobs: The Offsite Story

Backups on one box are not backups. Configure a second PBS at a friend's house, a cheap VPS, or a colo, and set up a Sync Job:

On the offsite PBS, add the source PBS as a Remote (server, fingerprint, token).
Create a sync job that pulls from the source datastore to a local one. The source server pushes changed chunks; bandwidth is roughly the daily delta.
Schedule daily, after the primary backup window. Set retention separately on the remote — usually longer, cheaper retention.

Sync only transfers chunks the remote does not have, so the second day's sync is small. Initial sync is full transfer — kick it off on a Saturday.

Restore Procedures

Whole-VM restore: On PVE, Storage → Backups → Restore. Choose new VM ID and target storage. Takes ~10 minutes for a 50GB VM on local SSD.
Single-file restore from a snapshot: browse the snapshot in PBS UI, navigate the guest filesystem, download the file. No need to spin up a clone.
Pull from offsite: If the primary PBS is gone, point PVE at the offsite PBS as a storage and restore from there. Practice this once.

Encrypted Backups

Configure encryption client-side on PVE so chunks are encrypted before leaving the host. Generate a key per node (Datacenter → Storage → Edit → Encryption Key), and back up the key separately — losing it makes backups unreadable.

With encryption on, server-side dedup still works within a single key/namespace, but not across encrypted backups from different nodes. Tradeoff: privacy vs dedup ratio.

Operational Notes

qemu-guest-agent in every Linux VM. Without it, snapshots are crash-consistent only — most filesystems survive, but databases may need recovery on restore.
Container backups use vzdump-style snapshot under the hood. Slightly less elegant than VM backups but still incremental.
Monitor PBS itself via Prometheus — there is a community exporter. See the Prometheus + Grafana guide. Alert on failed jobs and on missing backups (job that hasn't run in 36h).
Restore test quarterly. A backup you have never restored is a rumor.

Common Pitfalls

PBS on the same host as PVE: single point of failure. At minimum, put it on a different physical disk; ideally on a different machine.
No GC running: Pruned snapshots free chunks only after GC. Disk usage looks bloated until you realize GC is disabled.
Missing fingerprint mismatch: If PBS reinstalls, the cert changes and PVE rejects connections. Update the fingerprint in PVE storage config.
Forgetting the encryption key backup: Stored only on the source PVE host by default. Save it offline (password manager, paper) — without it, encrypted backups are bricks.

Validation Checklist

Daily backup job ran successfully (PBS dashboard green)
Weekly verify job completed without errors
GC job runs weekly and reclaims space
Sync to offsite PBS is current (check timestamp)
Quarterly: restored a random VM to a test ID and it boots
Encryption key (if used) is backed up outside PBS
PBS health is alerted on via Prometheus or PBS notification settings