Linux Administration Playbook
- sathyahraj

- Oct 18
- 9 min read
Updated: 5 days ago
Here is a practitioner-focused, end-to-end Linux administration guide aimed at experienced system administrators and SREs. It emphasizes commands, checklists, and “what to do when things go wrong” (failure-mode thinking), rather than deep theory. It’s distro-agnostic with examples primarily for both Debian/Ubuntu and RHEL/Rocky/Alma families, and call-outs when behaviour differs.
Table of Contents
1. Foundations: Layout, permissions, and must‑know commands
2. Boot & Recovery: UEFI, GRUB2, initramfs, and rescue workflows
3. systemd Deep Dive: units, services, sockets, and timers
4. Accounts & Privilege: users, groups, sudo, and PAM
5. Disks & Partitioning: GPT, udev, and device naming
6. LVM Mastery: PV/VG/LV, thin pools, and snapshots
7. Filesystems: ext4, XFS, Btrfs (features, tuning, growth)
8. Memory & Kernel Tuning: swap, vm.*, and sysctl hygiene
9. Networking: iproute2, VLAN/bridge/bond, NetworkManager
10. Firewalls: nftables and firewalld (policy, NAT, and zones)
11. Time & NTP: chrony, stratum sanity, and drift triage
12. Logging: journald vs rsyslog, persistence, and shipping
13. Security Core: SELinux/AppArmor, SSH hardening, fapolicyd
14. Software Lifecycle: apt/dpkg and dnf/rpm (pinning & rollback)
15. Virtualization & Containers: KVM/libvirt and Podman (systemd)
16. File Services: NFS and Samba (domain member patterns)
17. Backup Strategies: rsync, tar incrementals, and Borg
18. Observability & Performance: sysstat, perf, eBPF (bpftrace)
19. Troubleshooting Playbook: boot failures, network, and storage
20. Automation Starter: Ansible-ready conventions and layouts
21. Schedules & Jobs: cron vs systemd timers (idempotent jobs)
22. Audit & Compliance: auditd, rules hygiene, and reporting
23. Disaster Recovery: restore drills and notes you actually need
24. Admin Cheat Sheets: quick commands and one‑liners
25. Appendix: Sample configs and unit files
1) Foundations: Layout, permissions, and must‑know commands
Filesystem layout. Know what must be present and writable:
`/`, `/boot`, `/etc`, `/usr`, `/var`, `/home`, `/tmp`, `/run`.
Ownership and modes. Use `chmod`, `chown`, `setfacl`/`getfacl`. Prefer ACLs to avoid “world‑writable” creep; record exceptions.
Essential commands & habits
- Discovery: `lsblk -f`, `findmnt`, `mount`, `df -h`, `du -xh --max-depth 1 /path`
- Process: `ps -eo pid,comm,%cpu,%mem --sort=-%cpu`, `top/htop`, `pidstat`, `lsof -p <pid>`
- Networking: `ip a`, `ip r`, `ss -ltnp`, `resolvectl status` / `dig +short`, `tcpdump -nn -i <iface> port <p>`
- Packages: `apt`, `dnf`, `rpm`, `dpkg -l` (inventory)
- Services: `systemctl status NAME`, `journalctl -u NAME --since -2h`
Golden rules
1. Never change system files without a backup & comment block.
2. Prefer drop‑in snippets over editing vendor files (systemd, rsyslog, sudoers).
3. Script with set -euo pipefail and explicit logging; test with shellcheck.
2) Boot & Recovery: UEFI, GRUB2, initramfs, and rescue
Boot chain: firmware → UEFI/BIOS → GRUB2 → Linux kernel → initramfs (dracut/initramfs-tools) → `systemd` PID 1.
Reality checks
- GRUB config is generated from templates (`/etc/default/grub` + `/etc/grub.d/`) → `grub.cfg` via `grub2-mkconfig` (RHEL family) or `update-grub` (Debian/Ubuntu).
- When storage/driver topology changes, regenerate initramfs so the kernel can mount `/`. On RHEL‑like: `dracut -f --kver $(uname -r)`; on Debian/Ubuntu: `update-initramfs -u -k all`.
Rescue workflow (predictable)
1. Boot from live ISO, mount target root under `/mnt`, then mount `/boot` and `/boot/efi` if separate.
2. `mount --bind /proc /mnt/proc && mount --bind /sys /mnt/sys && mount --bind /dev /mnt/dev`
3. `chroot /mnt` → inspect `/etc/fstab`, regenerate initramfs, reinstall GRUB to the device (`grub2-install /dev/sdX` or `grub-install`) and rebuild config.
4. Verify `lsblk` UUIDs match `fstab`. Reboot.
Common fixes
- Wrong root= UUID → edit GRUB kernel cmdline at boot, fix `fstab`, rebuild.
- Missing storage driver → dracut/initramfs rebuild with `--add-drivers`.
- Encrypted root (LUKS) → ensure `crypttab` and initramfs modules include `dm-crypt`/`cryptsetup`.
3) systemd Deep Dive: units, services, sockets, timers
Unit anatomy. Create custom services under `/etc/systemd/system/NAME.service`. Avoid editing `/usr/lib/systemd/system/*.service` — use drop‑ins:
mkdir -p /etc/systemd/system/example.service.d
printf "[Service]\nEnvironment=ENV=prod\n" > /etc/systemd/system/example.service.d/10-env.conf
systemctl daemon-reload && systemctl restart example
Key sections
- `[Unit]` After=, Wants=; explicit ordering & failure isolation
- `[Service]` Type=simple/notify, ExecStart=, Restart=on-failure, RestartSec=
- `[Install]` WantedBy=multi-user.target (enablement)
Timers vs cron
- Use timers for dependency-aware, logged, retryable jobs with `journalctl -u`.
- Example: run a health script every 5 minutes with jitter:
# service: /etc/systemd/system/health.service
[Unit]
Description=Health check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/health.sh
# timer: /etc/systemd/system/health.timer
[Unit]
Description=Run health check periodically
[Timer]
OnUnitActiveSec=5min
RandomizedDelaySec=60
[Install]
WantedBy=timers.target
4) Accounts & Privilege: users, groups, sudo, PAM
- Create users: `useradd -m -s /bin/bash alice && passwd alice`; add to groups: `usermod -aG wheel,sudo alice`.
- Sudo: edit with `visudo` and keep minimal, explicit rules; prefer group‑based policy.
- PAM: logins pass through `/etc/pam.d/*`. Enforce password quality (`pam_pwquality`), lockouts (`pam_faillock`), and 2FA where required.
Template sudo policy
Defaults use_pty,log_output
Defaults logfile="/var/log/sudo.log"
%ops ALL=(ALL) NOPASSWD: /usr/bin/systemctl restart app*
5) Disks & Partitioning: GPT, udev, device naming
- Discover: `lsblk -o NAME,SIZE,MODEL,TYPE,FSTYPE,UUID,MOUNTPOINT`
- Create GPT: `parted /dev/nvme0n1 --script mklabel gpt` then aligned partitions (`mkpart`), or `sgdisk` for scripted workflows.
- Name devices predictably with `/dev/disk/by-id/*` (stable across reboots); prefer UUIDs in `/etc/fstab`.
fstab template
UUID=<fs-uuid> /data xfs defaults,noatime 0 0
UUID=<swap-uuid> none swap sw 0 0
6) LVM Mastery: PV/VG/LV, thin pools, snapshots
Create: `pvcreate /dev/sdb` → `vgcreate vg0 /dev/sdb` → `lvcreate -L 200G -n lvdata vg0` → `mkfs.xfs /dev/vg0/lvdata`.
Resize: `lvextend -r -L +50G /dev/vg0/lvdata` (the `-r` grows the FS in one step for supported FS).
Thin provisioning:
lvcreate -L 500G -T vg0/poolthin
lvcreate -V 50G -T vg0/poolthin -n lvapp1
mkfs.ext4 /dev/vg0/lvapp1
Snapshots for fast backups or pre‑patch safety:
lvcreate -s -L 10G -n lvapp1_snap /dev/vg0/lvapp1
# mount, verify, then drop:
lvremove /dev/vg0/lvapp1_snap
7) Filesystems: ext4, XFS, Btrfs
ext4
- Solid default for general purpose. Consider `noatime,lazytime` to reduce metadata churn.
- Grow online: `resize2fs /dev/vg0/lvdata` after LV growth.
XFS
- Default on RHEL-derived servers; great parallelism. Grow online via `xfs_growfs /mountpoint` (not shrink).
- Check: `xfs_repair` (on unmounted FS).
Btrfs
- Subvolumes and snapshots (Copy-on-Write), send/receive for replication, built‑in RAID modes; ideal for fast rollback and immutable infra patterns.
Common mount options
ext4
UUID=... /data ext4 defaults,noatime,lazytime,discard=async 0 0
xfs
UUID=... /data xfs defaults,noatime,attr2,inode64 0 0
btrfs
UUID=... /data btrfs compress=zstd,autodefrag,noatime 0 0
8) Memory & Kernel Tuning: swap, VMware and sysctl hygiene
- Swap sizing: memory safety valve, not performance tool. For crash dumps/hibernation plan extra.
- Edit `/etc/sysctl.d/99-local.conf`; apply with `sysctl --system`.
Practical sysctl set
vm.swappiness = 10
vm.dirty_background_ratio = 5
vm.dirty_ratio = 20
net.core.somaxconn = 1024
net.ipv4.tcp_tw_reuse = 1
net.ipv4.ip_local_port_range = 1024 65000
9) Networking: iproute2, VLAN/bridge/bond, NetworkManager
Core tools
- Addresses/routes: `ip addr`, `ip route`, `ip -br link`
- Sockets: `ss -ltnup`
- Traffic control (latency fairness): enable `fq_codel` on egress:
tc qdisc replace dev eth0 root fq_codel
Layer‑2 primitives
- VLANs: `ip link add link eth0 name eth0.20 type vlan id 20`
- Bridge: `ip link add name br0 type bridge && ip link set eth0 master br0`
- Bonding (active‑backup or LACP):
modprobe bonding
ip link add bond0 type bond mode 802.3ad miimon 100
ip link set eth0 master bond0 && ip link set eth1 master bond0
With NetworkManager (`nmcli`)
nmcli con add type bond ifname bond0 mode 802.3ad
nmcli con add type ethernet ifname eth0 master bond0
nmcli con add type vlan ifname bond0.20 dev bond0 id 20
nmcli con add type bridge ifname br0 && nmcli con add type ethernet ifname bond0 master br0
10) Firewalls: nftables and firewalld
nftables minimal policy
```nft
table inet filter {
chains {
input { type filter hook input priority 0; policy drop; }
forward { type filter hook forward priority 0; policy drop; }
output { type filter hook output priority 0; policy accept; }
}
set allowed_tcp { type inet_service; elements = { 22, 80, 443 }; }
chain input {
ct state established,related accept
iif lo accept
tcp dport @allowed_tcp accept
ip protocol icmp accept
ip6 nexthdr icmpv6 accept
counter drop
}
}
```
firewalld (zones)
```
firewall-cmd --set-default-zone=public
firewall-cmd --permanent --add-service=ssh
firewall-cmd --permanent --add-service=https
firewall-cmd --reload
```
NAT (masquerade)
```
nft add table ip nat
nft add chain ip nat prerouting '{' type nat hook prerouting priority -100 ';' '}'
nft add chain ip nat postrouting '{' type nat hook postrouting priority 100 ';' '}'
nft add rule ip nat postrouting oifname "eth0" masquerade
```
11) Time & NTP: chrony
- Ensure only one time service: `systemctl disable --now systemd-timesyncd` if running `chronyd`.
- Validate peers & drift: `chronyc tracking`, `chronyc sources -v`.
- Basic config `/etc/chrony.conf`:
```
pool pool.ntp.org iburst
makestep 1.0 3
rtcsync
logdir /var/log/chrony
```
12) Logging: journald vs rsyslog, persistence, shipping
- Persist journals: create `/var/log/journal` and set `Storage=persistent` in `journald.conf`.
- Forward systemd journal to rsyslog (`ForwardToSyslog=yes`) or ship via RELP/TLS to a central collector.
- Query examples: `journalctl -b`, `journalctl -p err -u nginx --since "1 hour ago"`
- rsyslog TLS/RELP sketch:
```
# /etc/rsyslog.d/60-relp-tls.conf
module(load="imjournal")
module(load="omrelp")
module(load="gtls")
action(type="omrelp" target="log.example.net" port="6514" tls="on" tls.permittedpeer="log.example.net")
```
13) Security Core: SELinux/AppArmor, SSH hardening, fapolicyd
SELinux
- Check: `getenforce`, `sestatus`; switch temporarily: `setenforce 0/1`.
- Find denials: `ausearch -m AVC -ts recent` → `sealert -a /var/log/audit/audit.log`
- Allow with policy or boolean instead of disabling. Label fixes: `restorecon -Rv /path`.
SSH hardening (`/etc/ssh/sshd_config`)
```
Protocol 2
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
ChallengeResponseAuthentication no
X11Forwarding no
AllowUsers alice opsbot
AuthenticationMethods publickey
ClientAliveInterval 300
ClientAliveCountMax 2
```
Reload: `systemctl reload sshd` (service may be `sshd` or `ssh`).
fapolicyd (exec/LD_PRELOAD control): allowlisted binaries only. Pilot in detect mode before enforce.
14) Software Lifecycle: apt/dpkg and dnf/rpm
Debian/Ubuntu (apt)
- Update/upgrade: `apt update && apt full-upgrade`
- Pin versions in `/etc/apt/preferences.d/` and keep history in `/var/log/apt/`
RHEL/Rocky/Alma (dnf)
- Lock critical packages: `dnf install 'dnf-command(versionlock)' && dnf versionlock add kernel*`
- Query: `rpm -qa --last`, verify file owner: `rpm -qf /path/bin`
Build basics
- Debian: `dpkg-deb --build` for simple local packages; full workflow uses `debuild`/`debhelper`.
- RPM: `rpmbuild -ba SPECS/pkg.spec` with `%files` and `%post` scripts; sign with GPG.
15) Virtualization & Containers: KVM/libvirt and Podman
KVM/libvirt
- Host packages: `qemu-kvm libvirt virt-install virt-manager`
- Start a VM quickly:
```
virt-install --name testvm --memory 4096 --vcpus 2 \
--disk size=40,bus=virtio --cdrom /iso/debian.iso \
--network bridge=br0,model=virtio
```
- Bridges: create `br0` and attach NICs for L2 adjacency.
Podman (rootless) with systemd
```
podman run -d --name web -p 8080:80 docker.io/nginx:alpine
podman generate systemd --name web --files --new
# or Quadlet (.container files) under /etc/containers/systemd/
```
16) File Services: NFS and Samba
NFS server (v4)
```
dnf install nfs-utils
echo "/srv/share 10.0.0.0/24(rw,sync,no_root_squash)" >> /etc/exports
exportfs -rav
systemctl enable --now nfs-server
```
Client: `mount -t nfs4 nfsserver:/srv/share /mnt`
Samba (domain member)
- Join to AD with `realmd`/`adcli` and configure `winbind` or `sssd`.
- Minimal share:
```
[shared]
path = /srv/smb/shared
browsable = yes
read only = no
valid users = "@DOMAIN\Domain Users"
```
17) Backup Strategies: rsync, tar incrementals, Borg
rsync
- Full mirror with metadata: `rsync -aHAX --delete --numeric-ids /src/ /dest/`
- Resume large files over flaky links: add `--partial --inplace`
tar incrementals
- Maintain snapshot file between runs: `tar -g /var/backups/root.snar -cpf /backup/root-$(date +%F).tar /`
Borg
- Init repository with encryption, run `borg create` with sensible excludes, and `borg prune --keep-daily=7 --keep-weekly=4 --keep-monthly=6`
Test restores monthly in an isolated path; never trust backups you haven’t restored.
18) Observability & Performance: sysstat, perf, eBPF
- sysstat: `sar -n DEV 1 5`, `iostat -xz 1` (queue depth & await), `mpstat -P ALL 1`
- perf (PMU counters): `perf stat -a -- sleep 10`, `perf top`
- bpftrace examples:
```
bpftrace -e 'kprobe:tcp_sendmsg { @[comm] = count(); }'
bpftrace -e 'tracepoint:syscalls:sys_enter_openat { printf("%s %s\n", comm, str(args->filename)); }'
```
19) Troubleshooting Playbook
Boot
- Kernel panic on root mount → verify `fstab` UUIDs, rebuild initramfs, reinstall GRUB, check LUKS/crypttab.
- SELinux blocking service → `ausearch -m AVC`, `restorecon`, adjust boolean/policy.
Network
- Link up but no traffic → check VLAN tags, bridge membership, firewall counters (`nft list ruleset | less`).
Storage
- Path flaps on SAN → enable multipath (`mpathconf --enable`), verify `multipath -ll`, and set `no_path_retry` for HA behavior.
General pattern
1. Reproduce → 2. Isolate layer → 3. Inspect logs (`journalctl -xeu svc`) → 4. Change one thing → 5. Validate and revert if needed.
20) Automation Starter (Ansible skeleton)
```
inventory/
group_vars/all.yml
roles/base/{tasks,files,templates}
playbooks/site.yml
```
- Idempotent tasks only; handlers for restarts; gather facts; tag everything.
- Keep prechecks that fail fast (kernel, disk, memory, network reachability).
21) Schedules & Jobs: cron vs timers
- Use cron for simple user jobs; migrate system jobs to timers for logs/service ordering.
- Cron entry: `0 2 * /usr/local/snapshots.sh >> /var/log/snapshots.log 2>&1`
- Timer provides jitter, retries, and lifecycle managed by systemd.
22) Audit & Compliance
- Enable `auditd`, load rules via `/etc/audit/rules.d/*.rules`.
- Track changes to `/etc/passwd`, `/etc/shadow`, privileged binaries (`-F perm=x -F auid>=1000 -F euid=0`).
- Report: `aureport --summary`, search: `ausearch -k policychange`.
23) Disaster Recovery
- vital: document exact steps to rebuild access (SSH keys, vault root, sudo policies).
- Keep offline copies of encryption keys (LUKS/Borg).
- Quarterly restore drills: time the RTO and document gaps.
24) Admin Cheat Sheets
User mgmt
```
id alice; groups alice
useradd -m -s /bin/bash alice
passwd -l alice # lock account
chage -l alice # password aging
```
Networking
```
ip -br a; ip -d link show bond0
ss -s; ss -tnp sport = :443
```
Storage
```
lsblk -f; blkid; udevadm info -q all -n /dev/sda
pvs; vgs; lvs -a -o+devices
```
25) Appendix: Sample files
25.1 journald.conf
```
[Journal]
Storage=persistent
Compress=yes
SystemMaxUse=1G
RateLimitIntervalSec=30s
RateLimitBurst=5000
ForwardToSyslog=yes
```
25.2 nftables complete example
```nft
flush ruleset
table inet filter {
chain input {
type filter hook input priority 0; policy drop;
ct state established,related accept
iif lo accept
tcp dport { 22, 80, 443 } accept
ip protocol icmp accept
ip6 nexthdr icmpv6 accept
counter drop
}
chain forward { type filter hook forward priority 0; policy drop; }
chain output { type filter hook output priority 0; policy accept; }
}
```
25.3 sshd_config (hardened)
```
Protocol 2
PermitRootLogin no
PasswordAuthentication no
PubkeyAuthentication yes
KexAlgorithms sntrup761x25519-sha512@openssh.com,curve25519-sha256,ecdh-sha2-nistp521






Comments