top of page

Linux Administration Playbook

Updated: 5 days ago

Here is a practitioner-focused, end-to-end Linux administration guide aimed at experienced system administrators and SREs. It emphasizes commands, checklists, and “what to do when things go wrong” (failure-mode thinking), rather than deep theory. It’s distro-agnostic with examples primarily for both Debian/Ubuntu and RHEL/Rocky/Alma families, and call-outs when behaviour differs.


Table of Contents


1. Foundations: Layout, permissions, and must‑know commands

2. Boot & Recovery: UEFI, GRUB2, initramfs, and rescue workflows

3. systemd Deep Dive: units, services, sockets, and timers

4. Accounts & Privilege: users, groups, sudo, and PAM

5. Disks & Partitioning: GPT, udev, and device naming

6. LVM Mastery: PV/VG/LV, thin pools, and snapshots

7. Filesystems: ext4, XFS, Btrfs (features, tuning, growth)

8. Memory & Kernel Tuning: swap, vm.*, and sysctl hygiene

9. Networking: iproute2, VLAN/bridge/bond, NetworkManager

10. Firewalls: nftables and firewalld (policy, NAT, and zones)

11. Time & NTP: chrony, stratum sanity, and drift triage

12. Logging: journald vs rsyslog, persistence, and shipping

13. Security Core: SELinux/AppArmor, SSH hardening, fapolicyd

14. Software Lifecycle: apt/dpkg and dnf/rpm (pinning & rollback)

15. Virtualization & Containers: KVM/libvirt and Podman (systemd)

16. File Services: NFS and Samba (domain member patterns)

17. Backup Strategies: rsync, tar incrementals, and Borg

18. Observability & Performance: sysstat, perf, eBPF (bpftrace)

19. Troubleshooting Playbook: boot failures, network, and storage

20. Automation Starter: Ansible-ready conventions and layouts

21. Schedules & Jobs: cron vs systemd timers (idempotent jobs)

22. Audit & Compliance: auditd, rules hygiene, and reporting

23. Disaster Recovery: restore drills and notes you actually need

24. Admin Cheat Sheets: quick commands and one‑liners

25. Appendix: Sample configs and unit files


1) Foundations: Layout, permissions, and must‑know commands


Filesystem layout. Know what must be present and writable:

 `/`, `/boot`, `/etc`, `/usr`, `/var`, `/home`, `/tmp`, `/run`. 

Ownership and modes. Use `chmod`, `chown`, `setfacl`/`getfacl`. Prefer ACLs to avoid “world‑writable” creep; record exceptions.

 

Essential commands & habits

- Discovery: `lsblk -f`, `findmnt`, `mount`, `df -h`, `du -xh --max-depth 1 /path`

- Process: `ps -eo pid,comm,%cpu,%mem --sort=-%cpu`, `top/htop`, `pidstat`, `lsof -p <pid>`

- Networking: `ip a`, `ip r`, `ss -ltnp`, `resolvectl status` / `dig +short`, `tcpdump -nn -i <iface> port <p>`

- Packages: `apt`, `dnf`, `rpm`, `dpkg -l` (inventory)

- Services: `systemctl status NAME`, `journalctl -u NAME --since -2h`

 

Golden rules

1. Never change system files without a backup & comment block.

2. Prefer drop‑in snippets over editing vendor files (systemd, rsyslog, sudoers). 

3. Script with set -euo pipefail and explicit logging; test with shellcheck.


2) Boot & Recovery: UEFI, GRUB2, initramfs, and rescue


Boot chain: firmware → UEFI/BIOS → GRUB2 → Linux kernel → initramfs (dracut/initramfs-tools) → `systemd` PID 1. 

Reality checks

- GRUB config is generated from templates (`/etc/default/grub` + `/etc/grub.d/`) → `grub.cfg` via `grub2-mkconfig` (RHEL family) or `update-grub` (Debian/Ubuntu).

- When storage/driver topology changes, regenerate initramfs so the kernel can mount `/`. On RHEL‑like: `dracut -f --kver $(uname -r)`; on Debian/Ubuntu: `update-initramfs -u -k all`.

 

Rescue workflow (predictable)

1. Boot from live ISO, mount target root under `/mnt`, then mount `/boot` and `/boot/efi` if separate. 

2. `mount --bind /proc /mnt/proc && mount --bind /sys /mnt/sys && mount --bind /dev /mnt/dev` 

3. `chroot /mnt` → inspect `/etc/fstab`, regenerate initramfs, reinstall GRUB to the device (`grub2-install /dev/sdX` or `grub-install`) and rebuild config. 

4. Verify `lsblk` UUIDs match `fstab`. Reboot.

 

Common fixes

- Wrong root= UUID → edit GRUB kernel cmdline at boot, fix `fstab`, rebuild. 

- Missing storage driver → dracut/initramfs rebuild with `--add-drivers`. 

- Encrypted root (LUKS) → ensure `crypttab` and initramfs modules include `dm-crypt`/`cryptsetup`.


3) systemd Deep Dive: units, services, sockets, timers


Unit anatomy. Create custom services under `/etc/systemd/system/NAME.service`. Avoid editing `/usr/lib/systemd/system/*.service` — use drop‑ins: 


mkdir -p /etc/systemd/system/example.service.d

printf "[Service]\nEnvironment=ENV=prod\n" > /etc/systemd/system/example.service.d/10-env.conf

systemctl daemon-reload && systemctl restart example


Key sections

- `[Unit]` After=, Wants=; explicit ordering & failure isolation

- `[Service]` Type=simple/notify, ExecStart=, Restart=on-failure, RestartSec=

- `[Install]` WantedBy=multi-user.target (enablement)

 

Timers vs cron

- Use timers for dependency-aware, logged, retryable jobs with `journalctl -u`. 

- Example: run a health script every 5 minutes with jitter:


# service: /etc/systemd/system/health.service

[Unit]

Description=Health check

 

[Service]

Type=oneshot

ExecStart=/usr/local/bin/health.sh

 

# timer: /etc/systemd/system/health.timer

[Unit]

Description=Run health check periodically

 

[Timer]

OnUnitActiveSec=5min

RandomizedDelaySec=60

 

[Install]

WantedBy=timers.target


4) Accounts & Privilege: users, groups, sudo, PAM


- Create users: `useradd -m -s /bin/bash alice && passwd alice`; add to groups: `usermod -aG wheel,sudo alice`. 

- Sudo: edit with `visudo` and keep minimal, explicit rules; prefer group‑based policy. 

- PAM: logins pass through `/etc/pam.d/*`. Enforce password quality (`pam_pwquality`), lockouts (`pam_faillock`), and 2FA where required.

 

Template sudo policy


Defaults use_pty,log_output

Defaults logfile="/var/log/sudo.log"

%ops ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart app*


 

5) Disks & Partitioning: GPT, udev, device naming

- Discover: `lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,UUID,MOUNTPOINT` 

- Create GPT: `parted /dev/nvme0n1 --script mklabel gpt` then aligned partitions (`mkpart`), or `sgdisk` for scripted workflows. 

- Name devices predictably with `/dev/disk/by-id/*` (stable across reboots); prefer UUIDs in `/etc/fstab`.

 

fstab template

UUID=<fs-uuid>  /data  xfs  defaults,noatime  0  0

UUID=<swap-uuid> none  swap sw                 0  0


 

6) LVM Mastery: PV/VG/LV, thin pools, snapshots

Create: `pvcreate /dev/sdb` → `vgcreate vg0 /dev/sdb` → `lvcreate -L 200G -n lvdata vg0` → `mkfs.xfs /dev/vg0/lvdata`. 

Resize: `lvextend -r -L +50G /dev/vg0/lvdata` (the `-r` grows the FS in one step for supported FS). 

Thin provisioning:


lvcreate -L 500G -T vg0/poolthin

lvcreate -V 50G -T vg0/poolthin -n lvapp1

mkfs.ext4 /dev/vg0/lvapp1


Snapshots for fast backups or pre‑patch safety:


lvcreate -s -L 10G -n lvapp1_snap /dev/vg0/lvapp1

# mount, verify, then drop:

lvremove /dev/vg0/lvapp1_snap


 

7) Filesystems: ext4, XFS, Btrfs

ext4

- Solid default for general purpose. Consider `noatime,lazytime` to reduce metadata churn. 

- Grow online: `resize2fs /dev/vg0/lvdata` after LV growth.

XFS

- Default on RHEL-derived servers; great parallelism. Grow online via `xfs_growfs /mountpoint` (not shrink). 

- Check: `xfs_repair` (on unmounted FS). 

Btrfs

- Subvolumes and snapshots (Copy-on-Write), send/receive for replication, built‑in RAID modes; ideal for fast rollback and immutable infra patterns.

 

Common mount options


ext4

UUID=... /data ext4  defaults,noatime,lazytime,discard=async 0 0

xfs

UUID=... /data xfs   defaults,noatime,attr2,inode64          0 0

btrfs

UUID=... /data btrfs compress=zstd,autodefrag,noatime         0 0


 

8) Memory & Kernel Tuning: swap, VMware and sysctl hygiene

- Swap sizing: memory safety valve, not performance tool. For crash dumps/hibernation plan extra. 

- Edit `/etc/sysctl.d/99-local.conf`; apply with `sysctl --system`.

 

Practical sysctl set


vm.swappiness = 10

vm.dirty_background_ratio = 5

vm.dirty_ratio = 20

net.core.somaxconn = 1024

net.ipv4.tcp_tw_reuse = 1

net.ipv4.ip_local_port_range = 1024 65000


9) Networking: iproute2, VLAN/bridge/bond, NetworkManager

Core tools

- Addresses/routes: `ip addr`, `ip route`, `ip -br link` 

- Sockets: `ss -ltnup` 

- Traffic control (latency fairness): enable `fq_codel` on egress:


tc qdisc replace dev eth0 root fq_codel


Layer‑2 primitives

- VLANs: `ip link add link eth0 name eth0.20 type vlan id 20`

- Bridge: `ip link add name br0 type bridge && ip link set eth0 master br0`

- Bonding (active‑backup or LACP):


modprobe bonding

ip link add bond0 type bond mode 802.3ad miimon 100

ip link set eth0 master bond0 && ip link set eth1 master bond0


 

With NetworkManager (`nmcli`)


nmcli con add type bond ifname bond0 mode 802.3ad

nmcli con add type ethernet ifname eth0 master bond0

nmcli con add type vlan ifname bond0.20 dev bond0 id 20

nmcli con add type bridge ifname br0 && nmcli con add type ethernet ifname bond0 master br0


 

10) Firewalls: nftables and firewalld


nftables minimal policy

```nft

table inet filter {

  chains {

    input { type filter hook input priority 0; policy drop; }

    forward { type filter hook forward priority 0; policy drop; }

    output { type filter hook output priority 0; policy accept; }

  }

  set allowed_tcp { type inet_service; elements = { 22, 80, 443 }; }

  chain input {

    ct state established,related accept

    iif lo accept

    tcp dport @allowed_tcp accept

    ip protocol icmp accept

    ip6 nexthdr icmpv6 accept

    counter drop

  }

}

```

firewalld (zones)

```

firewall-cmd --set-default-zone=public

firewall-cmd --permanent --add-service=ssh

firewall-cmd --permanent --add-service=https

firewall-cmd --reload

```

 

NAT (masquerade)

```

nft add table ip nat

nft add chain ip nat prerouting  '{' type nat hook prerouting  priority -100 ';' '}'

nft add chain ip nat postrouting '{' type nat hook postrouting priority  100  ';' '}'

nft add rule ip nat postrouting oifname "eth0" masquerade

```

 

 

11) Time & NTP: chrony

- Ensure only one time service: `systemctl disable --now systemd-timesyncd` if running `chronyd`. 

- Validate peers & drift: `chronyc tracking`, `chronyc sources -v`. 

- Basic config `/etc/chrony.conf`:

```

pool pool.ntp.org iburst

makestep 1.0 3

rtcsync

logdir /var/log/chrony

```


12) Logging: journald vs rsyslog, persistence, shipping


- Persist journals: create `/var/log/journal` and set `Storage=persistent` in `journald.conf`. 

- Forward systemd journal to rsyslog (`ForwardToSyslog=yes`) or ship via RELP/TLS to a central collector. 

- Query examples: `journalctl -b`, `journalctl -p err -u nginx --since "1 hour ago"` 

- rsyslog TLS/RELP sketch:

```

# /etc/rsyslog.d/60-relp-tls.conf

module(load="imjournal")

module(load="omrelp")

module(load="gtls")

action(type="omrelp" target="log.example.net" port="6514" tls="on" tls.permittedpeer="log.example.net")

```

 

13) Security Core: SELinux/AppArmor, SSH hardening, fapolicyd


SELinux

- Check: `getenforce`, `sestatus`; switch temporarily: `setenforce 0/1`. 

- Find denials: `ausearch -m AVC -ts recent` → `sealert -a /var/log/audit/audit.log` 

- Allow with policy or boolean instead of disabling. Label fixes: `restorecon -Rv /path`.

 

SSH hardening (`/etc/ssh/sshd_config`)

```

Protocol 2

PermitRootLogin no

PasswordAuthentication no

PubkeyAuthentication yes

ChallengeResponseAuthentication no

X11Forwarding no

AllowUsers alice opsbot

AuthenticationMethods publickey

ClientAliveInterval 300

ClientAliveCountMax 2

```

Reload: `systemctl reload sshd` (service may be `sshd` or `ssh`).

 

fapolicyd (exec/LD_PRELOAD control): allowlisted binaries only. Pilot in detect mode before enforce.

 

14) Software Lifecycle: apt/dpkg and dnf/rpm


Debian/Ubuntu (apt)

- Update/upgrade: `apt update && apt full-upgrade` 

- Pin versions in `/etc/apt/preferences.d/` and keep history in `/var/log/apt/`

RHEL/Rocky/Alma (dnf)

- Lock critical packages: `dnf install 'dnf-command(versionlock)' && dnf versionlock add kernel*` 

- Query: `rpm -qa --last`, verify file owner: `rpm -qf /path/bin`

 

Build basics

- Debian: `dpkg-deb --build` for simple local packages; full workflow uses `debuild`/`debhelper`. 

- RPM: `rpmbuild -ba SPECS/pkg.spec` with `%files` and `%post` scripts; sign with GPG.

 

 

15) Virtualization & Containers: KVM/libvirt and Podman


KVM/libvirt

- Host packages: `qemu-kvm libvirt virt-install virt-manager` 

- Start a VM quickly:

```

virt-install --name testvm --memory 4096 --vcpus 2 \

  --disk size=40,bus=virtio --cdrom /iso/debian.iso \

  --network bridge=br0,model=virtio

```

- Bridges: create `br0` and attach NICs for L2 adjacency.

 

Podman (rootless) with systemd

```

podman run -d --name web -p 8080:80 docker.io/nginx:alpine

podman generate systemd --name web --files --new

# or Quadlet (.container files) under /etc/containers/systemd/

```


16) File Services: NFS and Samba


NFS server (v4)

```

dnf install nfs-utils

echo "/srv/share  10.0.0.0/24(rw,sync,no_root_squash)" >> /etc/exports

exportfs -rav

systemctl enable --now nfs-server

```

Client: `mount -t nfs4 nfsserver:/srv/share /mnt`

 

Samba (domain member)

- Join to AD with `realmd`/`adcli` and configure `winbind` or `sssd`. 

- Minimal share:

```

[shared]

   path = /srv/smb/shared

   browsable = yes

   read only = no

   valid users = "@DOMAIN\Domain Users"

```


17) Backup Strategies: rsync, tar incrementals, Borg


rsync

- Full mirror with metadata: `rsync -aHAX --delete --numeric-ids /src/ /dest/` 

- Resume large files over flaky links: add `--partial --inplace`

tar incrementals

- Maintain snapshot file between runs: `tar -g /var/backups/root.snar -cpf /backup/root-$(date +%F).tar /`

Borg

- Init repository with encryption, run `borg create` with sensible excludes, and `borg prune --keep-daily=7 --keep-weekly=4 --keep-monthly=6`

 

Test restores monthly in an isolated path; never trust backups you haven’t restored.

 

 

18) Observability & Performance: sysstat, perf, eBPF


- sysstat: `sar -n DEV 1 5`, `iostat -xz 1` (queue depth & await), `mpstat -P ALL 1` 

- perf (PMU counters): `perf stat -a -- sleep 10`, `perf top` 

- bpftrace examples:

```

bpftrace -e 'kprobe:tcp_sendmsg { @[comm] = count(); }'

bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'

```

 


19) Troubleshooting Playbook


Boot

- Kernel panic on root mount → verify `fstab` UUIDs, rebuild initramfs, reinstall GRUB, check LUKS/crypttab. 

- SELinux blocking service → `ausearch -m AVC`, `restorecon`, adjust boolean/policy. 

Network

- Link up but no traffic → check VLAN tags, bridge membership, firewall counters (`nft list ruleset | less`). 

Storage

- Path flaps on SAN → enable multipath (`mpathconf --enable`), verify `multipath -ll`, and set `no_path_retry` for HA behavior.

 

General pattern

1. Reproduce → 2. Isolate layer → 3. Inspect logs (`journalctl -xeu svc`) → 4. Change one thing → 5. Validate and revert if needed.

 

 

20) Automation Starter (Ansible skeleton)

```

inventory/

group_vars/all.yml

roles/base/{tasks,files,templates}

playbooks/site.yml

```

- Idempotent tasks only; handlers for restarts; gather facts; tag everything. 

- Keep prechecks that fail fast (kernel, disk, memory, network reachability).

 

21) Schedules & Jobs: cron vs timers


- Use cron for simple user jobs; migrate system jobs to timers for logs/service ordering. 

- Cron entry: `0 2 * /usr/local/snapshots.sh >> /var/log/snapshots.log 2>&1` 

- Timer provides jitter, retries, and lifecycle managed by systemd.

 

22) Audit & Compliance


- Enable `auditd`, load rules via `/etc/audit/rules.d/*.rules`. 

- Track changes to `/etc/passwd`, `/etc/shadow`, privileged binaries (`-F perm=x -F auid>=1000 -F euid=0`). 

- Report: `aureport --summary`, search: `ausearch -k policychange`.

 


23) Disaster Recovery


- vital: document exact steps to rebuild access (SSH keys, vault root, sudo policies). 

- Keep offline copies of encryption keys (LUKS/Borg). 

- Quarterly restore drills: time the RTO and document gaps.

 

24) Admin Cheat Sheets


User mgmt

```

id alice; groups alice

useradd -m -s /bin/bash alice

passwd -l alice    # lock account

chage -l alice     # password aging

```

Networking

```

ip -br a; ip -d link show bond0

ss -s; ss -tnp sport = :443

```

Storage

```

lsblk -f; blkid; udevadm info -q all -n /dev/sda

pvs; vgs; lvs -a -o+devices

```

 

 25) Appendix: Sample files

 

25.1 journald.conf

```

[Journal]

Storage=persistent

Compress=yes

SystemMaxUse=1G

RateLimitIntervalSec=30s

RateLimitBurst=5000

ForwardToSyslog=yes

```

 

25.2 nftables complete example

```nft

flush ruleset

 

table inet filter {

  chain input {

    type filter hook input priority 0; policy drop;

    ct state established,related accept

    iif lo accept

    tcp dport { 22, 80, 443 } accept

    ip protocol icmp accept

    ip6 nexthdr icmpv6 accept

    counter drop

  }

  chain forward { type filter hook forward priority 0; policy drop; }

  chain output  { type filter hook output  priority 0; policy accept; }

}

```

 

25.3 sshd_config (hardened)

```

Protocol 2

PermitRootLogin no

PasswordAuthentication no

PubkeyAuthentication yes

KexAlgorithms sntrup761x25519-sha512@openssh.com,curve25519-sha256,ecdh-sha2-nistp521


Linux Administration Playbook — Expert Edition (v1.0)
Linux Administration Playbook — Expert Edition (v1.0)

 
 
 

Recent Posts

See All
Automation using Power Cli

<# PowerCLI - vSphere Full Monitoring Automation File: PowerCLI - vSphere Full Monitoring Automation.ps1 Purpose: Complete, production-ready PowerCLI automation script collection for comprehensive vSp

 
 
 
VMware Real Time Scenario Interview Q & A

Part III Scenario 61: VM Network Adapter Type Mismatch Leading to Throughput & Latency Issues In a virtualised environment, several Windows and Linux VMs were upgraded from older hardware generations.

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page