Proxmox Backup Server

Incremental, deduplicated, verified VM and container backups — and the offsite story to match.

Why PBS Over vzdump

Proxmox VE's built-in vzdump writes full VM/CT dumps every job. It works, but each backup is the full image — slow, space-heavy, and painful to keep more than a week or two of.

Proxmox Backup Server (PBS) speaks a different protocol: client-side chunking with content-defined boundaries, server-side deduplication, and incremental-forever via dirty-bitmap tracking. The second daily backup of a 100GB VM might be 200MB. Verify jobs cryptographically check chunks; sync jobs replicate to another PBS for offsite.

Install

PBS is its own distro. The clean path is a dedicated VM or bare-metal host:

  • Download the PBS ISO from proxmox.com
  • Install on a host with at least 4GB RAM and a dedicated datastore disk
  • Disable the enterprise repo, enable the no-subscription repo
  • apt update && apt full-upgrade

Resist the urge to run PBS in an LXC on the same Proxmox host you are backing up. It works, but a host failure takes both your VMs and your backups offline at once.

Datastore Layout

A datastore is a directory PBS uses to store chunks, indices, and metadata. The disk layout choice is permanent for that datastore:

  • ZFS pool (mirror or raidz2): Recommended. Snapshots, scrub, and checksums on top of PBS's own checksums. See the ZFS pool design guide — mirrors are best because PBS workloads are random I/O during prune/verify.
  • Single SSD: Fast, but no redundancy. Pair with a sync job to a remote PBS or you have one copy.
  • Avoid USB drives. PBS will create thousands of chunk files; USB enclosures choke on the metadata workload.

Create the datastore via the web UI: Datastore → Add Datastore. Point it at a mount path (e.g., /mnt/datastore/main).

Connect PVE to PBS

Get the datastore fingerprint from PBS: Dashboard → Show Fingerprint. Create an API token: Access Control → API Token with DatastoreBackup role on the datastore.

On Proxmox VE: Datacenter → Storage → Add → Proxmox Backup Server. Fill in:

  • ID: pbs-main
  • Server: PBS hostname or IP
  • Username: backup@pbs!pve-token (the format with the !)
  • Password: the token secret
  • Datastore: name you created
  • Fingerprint: paste it

Backup Jobs

On PVE: Datacenter → Backup → Add. Settings worth tuning:

  • Schedule: Daily at 2 AM is fine for most. Stagger if you have many jobs so they do not all hit network/disk at once.
  • Mode: Snapshot for live backups. Uses qemu agent for guest filesystem freeze if installed — install qemu-guest-agent in every VM.
  • Selection: All with explicit exclusions beats picking VMs one by one and forgetting new ones.
  • Notification: Always-fail-only is annoying noise; pick "On failure" and trust the PBS dashboard for daily green status.
  • Retention: Set per-job or globally on the datastore — discussed below.

Retention Policy

PBS uses a GFS (grandfather-father-son) policy with explicit keep-N values. A sane starting point:

keep-last:    3
keep-daily:   7
keep-weekly:  4
keep-monthly: 6
keep-yearly:  1

Configure under Datastore → Prune & GC or per job. Run a Prune job nightly and a GC job weekly. GC walks the chunk store and frees deduped chunks that no snapshot references — the actual space reclaimer.

Verify Jobs: Catch Bit Rot

Schedule a Verify job to re-read and cryptographically check stored chunks. Weekly with "skip verified within 30 days" is the standard cadence. Without verification, a backup that looks fine in the UI can still be silently corrupted — you find out at restore time.

Namespaces for Multi-PVE

If you back up multiple PVE clusters or hosts, use namespaces so they do not collide on VM ID (every PVE numbers VMs from 100). Under Datastore → namespace:

pve-main/
pve-edge/
external/

Set the namespace in the PVE storage config. Each PVE only sees its own backups in the restore UI.

Sync Jobs: The Offsite Story

Backups on one box are not backups. Configure a second PBS at a friend's house, a cheap VPS, or a colo, and set up a Sync Job:

  • On the offsite PBS, add the source PBS as a Remote (server, fingerprint, token).
  • Create a sync job that pulls from the source datastore to a local one. The source server pushes changed chunks; bandwidth is roughly the daily delta.
  • Schedule daily, after the primary backup window. Set retention separately on the remote — usually longer, cheaper retention.

Sync only transfers chunks the remote does not have, so the second day's sync is small. Initial sync is full transfer — kick it off on a Saturday.

Restore Procedures

  • Whole-VM restore: On PVE, Storage → Backups → Restore. Choose new VM ID and target storage. Takes ~10 minutes for a 50GB VM on local SSD.
  • Single-file restore from a snapshot: browse the snapshot in PBS UI, navigate the guest filesystem, download the file. No need to spin up a clone.
  • Pull from offsite: If the primary PBS is gone, point PVE at the offsite PBS as a storage and restore from there. Practice this once.

Encrypted Backups

Configure encryption client-side on PVE so chunks are encrypted before leaving the host. Generate a key per node (Datacenter → Storage → Edit → Encryption Key), and back up the key separately — losing it makes backups unreadable.

With encryption on, server-side dedup still works within a single key/namespace, but not across encrypted backups from different nodes. Tradeoff: privacy vs dedup ratio.

Operational Notes

  • qemu-guest-agent in every Linux VM. Without it, snapshots are crash-consistent only — most filesystems survive, but databases may need recovery on restore.
  • Container backups use vzdump-style snapshot under the hood. Slightly less elegant than VM backups but still incremental.
  • Monitor PBS itself via Prometheus — there is a community exporter. See the Prometheus + Grafana guide. Alert on failed jobs and on missing backups (job that hasn't run in 36h).
  • Restore test quarterly. A backup you have never restored is a rumor.

Common Pitfalls

  • PBS on the same host as PVE: single point of failure. At minimum, put it on a different physical disk; ideally on a different machine.
  • No GC running: Pruned snapshots free chunks only after GC. Disk usage looks bloated until you realize GC is disabled.
  • Missing fingerprint mismatch: If PBS reinstalls, the cert changes and PVE rejects connections. Update the fingerprint in PVE storage config.
  • Forgetting the encryption key backup: Stored only on the source PVE host by default. Save it offline (password manager, paper) — without it, encrypted backups are bricks.

Validation Checklist

  • Daily backup job ran successfully (PBS dashboard green)
  • Weekly verify job completed without errors
  • GC job runs weekly and reclaims space
  • Sync to offsite PBS is current (check timestamp)
  • Quarterly: restored a random VM to a test ID and it boots
  • Encryption key (if used) is backed up outside PBS
  • PBS health is alerted on via Prometheus or PBS notification settings

- Crafted by Axiom|Spectre