Why PBS Over vzdump
Proxmox VE's built-in vzdump writes full VM/CT dumps every job. It works, but each backup is the full image — slow, space-heavy, and painful to keep more than a week or two of.
Proxmox Backup Server (PBS) speaks a different protocol: client-side chunking with content-defined boundaries, server-side deduplication, and incremental-forever via dirty-bitmap tracking. The second daily backup of a 100GB VM might be 200MB. Verify jobs cryptographically check chunks; sync jobs replicate to another PBS for offsite.
Install
PBS is its own distro. The clean path is a dedicated VM or bare-metal host:
- Download the PBS ISO from proxmox.com
- Install on a host with at least 4GB RAM and a dedicated datastore disk
- Disable the enterprise repo, enable the no-subscription repo
apt update && apt full-upgrade
Resist the urge to run PBS in an LXC on the same Proxmox host you are backing up. It works, but a host failure takes both your VMs and your backups offline at once.
Datastore Layout
A datastore is a directory PBS uses to store chunks, indices, and metadata. The disk layout choice is permanent for that datastore:
- ZFS pool (mirror or raidz2): Recommended. Snapshots, scrub, and checksums on top of PBS's own checksums. See the ZFS pool design guide — mirrors are best because PBS workloads are random I/O during prune/verify.
- Single SSD: Fast, but no redundancy. Pair with a sync job to a remote PBS or you have one copy.
- Avoid USB drives. PBS will create thousands of chunk files; USB enclosures choke on the metadata workload.
Create the datastore via the web UI: Datastore → Add Datastore. Point it at a mount path (e.g., /mnt/datastore/main).
Connect PVE to PBS
Get the datastore fingerprint from PBS: Dashboard → Show Fingerprint. Create an API token: Access Control → API Token with DatastoreBackup role on the datastore.
On Proxmox VE: Datacenter → Storage → Add → Proxmox Backup Server. Fill in:
- ID:
pbs-main - Server: PBS hostname or IP
- Username:
backup@pbs!pve-token(the format with the !) - Password: the token secret
- Datastore: name you created
- Fingerprint: paste it
Backup Jobs
On PVE: Datacenter → Backup → Add. Settings worth tuning:
- Schedule: Daily at 2 AM is fine for most. Stagger if you have many jobs so they do not all hit network/disk at once.
- Mode:
Snapshotfor live backups. Uses qemu agent for guest filesystem freeze if installed — install qemu-guest-agent in every VM. - Selection:
Allwith explicit exclusions beats picking VMs one by one and forgetting new ones. - Notification: Always-fail-only is annoying noise; pick "On failure" and trust the PBS dashboard for daily green status.
- Retention: Set per-job or globally on the datastore — discussed below.
Retention Policy
PBS uses a GFS (grandfather-father-son) policy with explicit keep-N values. A sane starting point:
keep-last: 3
keep-daily: 7
keep-weekly: 4
keep-monthly: 6
keep-yearly: 1
Configure under Datastore → Prune & GC or per job. Run a Prune job nightly and a GC job weekly. GC walks the chunk store and frees deduped chunks that no snapshot references — the actual space reclaimer.
Verify Jobs: Catch Bit Rot
Schedule a Verify job to re-read and cryptographically check stored chunks. Weekly with "skip verified within 30 days" is the standard cadence. Without verification, a backup that looks fine in the UI can still be silently corrupted — you find out at restore time.
Namespaces for Multi-PVE
If you back up multiple PVE clusters or hosts, use namespaces so they do not collide on VM ID (every PVE numbers VMs from 100). Under Datastore → namespace:
pve-main/
pve-edge/
external/
Set the namespace in the PVE storage config. Each PVE only sees its own backups in the restore UI.
Sync Jobs: The Offsite Story
Backups on one box are not backups. Configure a second PBS at a friend's house, a cheap VPS, or a colo, and set up a Sync Job:
- On the offsite PBS, add the source PBS as a Remote (server, fingerprint, token).
- Create a sync job that pulls from the source datastore to a local one. The source server pushes changed chunks; bandwidth is roughly the daily delta.
- Schedule daily, after the primary backup window. Set retention separately on the remote — usually longer, cheaper retention.
Sync only transfers chunks the remote does not have, so the second day's sync is small. Initial sync is full transfer — kick it off on a Saturday.
Restore Procedures
- Whole-VM restore: On PVE, Storage → Backups → Restore. Choose new VM ID and target storage. Takes ~10 minutes for a 50GB VM on local SSD.
- Single-file restore from a snapshot: browse the snapshot in PBS UI, navigate the guest filesystem, download the file. No need to spin up a clone.
- Pull from offsite: If the primary PBS is gone, point PVE at the offsite PBS as a storage and restore from there. Practice this once.
Encrypted Backups
Configure encryption client-side on PVE so chunks are encrypted before leaving the host. Generate a key per node (Datacenter → Storage → Edit → Encryption Key), and back up the key separately — losing it makes backups unreadable.
With encryption on, server-side dedup still works within a single key/namespace, but not across encrypted backups from different nodes. Tradeoff: privacy vs dedup ratio.
Operational Notes
- qemu-guest-agent in every Linux VM. Without it, snapshots are crash-consistent only — most filesystems survive, but databases may need recovery on restore.
- Container backups use vzdump-style snapshot under the hood. Slightly less elegant than VM backups but still incremental.
- Monitor PBS itself via Prometheus — there is a community exporter. See the Prometheus + Grafana guide. Alert on failed jobs and on missing backups (job that hasn't run in 36h).
- Restore test quarterly. A backup you have never restored is a rumor.
Common Pitfalls
- PBS on the same host as PVE: single point of failure. At minimum, put it on a different physical disk; ideally on a different machine.
- No GC running: Pruned snapshots free chunks only after GC. Disk usage looks bloated until you realize GC is disabled.
- Missing fingerprint mismatch: If PBS reinstalls, the cert changes and PVE rejects connections. Update the fingerprint in PVE storage config.
- Forgetting the encryption key backup: Stored only on the source PVE host by default. Save it offline (password manager, paper) — without it, encrypted backups are bricks.
Validation Checklist
- Daily backup job ran successfully (PBS dashboard green)
- Weekly verify job completed without errors
- GC job runs weekly and reclaims space
- Sync to offsite PBS is current (check timestamp)
- Quarterly: restored a random VM to a test ID and it boots
- Encryption key (if used) is backed up outside PBS
- PBS health is alerted on via Prometheus or PBS notification settings