top of page

Site Recovery Manager

 What Is SRM?

VMware Site Recovery Manager is an enterprise-grade disaster recovery orchestration tool. It automates the planning, testing, failover, and failback of virtual machine workloads between two vCenter-managed sites: a primary (protected) site and a secondary (recovery) site 



Core Components and Architecture

  • vCenter & SRM Servers at Each Site: SRM runs alongside vCenter on both the protected and recovery sites (as appliances), coordinating actions during recovery events 

  • Replication Mechanisms: Supports VMware vSphere Replication (hypervisor-level) and array-based replication via Storage Replication Adapters (SRAs) or Virtual Volumes 



✅ Key Features

1. Automated Orchestration

  • Executes recovery plans with minimal manual steps.

  • Controls shutdown, storage sync, virtual machine startup, and network configuration in a predefined order to meet RTOs reliably 

2. Non‑Disruptive Testing

  • Runs recovery plan validation in an isolated environment without impacting production.

  • Uses temporary copies of replicated data to simulate failover scenarios safely 

3. Failback / Planned Migration

  • Seamlessly returns operations back to the original site once the primary site is restored.

  • Can be automated and treated as a planned migration rather than ad hoc manual restoration 

4. Flexible Recovery Planning

  • Allows for granular recovery plans including sequencing, IP/network settings, resource mappings, and custom scripting for post-failover automation.

5. Scalable & Application-Agnostic

  • Designed to protect thousands of VMs across multiple sites.

  • Works with any application running in the VMware environment without needing app-specific plugins

6. Support for Advanced Topologies

  • Supports active‑passive and bi‑directional (A ↔ B) configurations.

  • Shared recovery site model enables multiple protected sites to fail over into one recovery hub 

7. Integration Ecosystem

  • Integrates with VMware NSX for network virtualization, vSAN for storage, and VMware Cloud Foundation. Also usable in public cloud DRaaS setups (e.g., VMC on AWS, Azure VMware Solution) 



🧰 Deployment & Licensing

  • Installation: Deploy SRM as a Photon‑OS-based appliance on both sites. Use the HTML5 Clarity UI for configuration and management

  • Licensing: SRM is licensed per protected VM, not per CPU. Term and perpetual licenses are supported, and vSphere Replication is included free with vSphere Essentials Plus and above 



⚖️ Benefits and Use Cases

  • Reduced Downtime & Errors: Automated execution ensures consistency and speed.

  • Compliance Support: Non‑disruptive testing and documented recovery plans meet audit and regulatory needs.

  • Lower TCO: Leverages existing VMware integrations and automation to cut manual labor and overhead 

Typical Use Cases:

  • Critical site-level DR and failover

  • Planned datacenter migrations

  • Maintenance-based application testing

  • Hybrid cloud and multi-cloud DR strategies



🧠 Deployment Workflow Summary

Step

Description

1. Site Pairing

Establish SRM connection between protected and recovery vCenter servers

2. Inventory Mapping

Map resources like folders, networks, datastores and resource pools

3. Protection Groups

Group VMs based on policies: array-based, datastore, or vSphere replication

4. Recovery Plan Configuration

Define VM order, network customization, scripts, and priorities

5. Testing

Run non‑disruptive failover tests using isolated copies of replicated data

6. Execution

Initiate planned migration or disaster recovery failover

7. Reprotect / Failback

After recovery, reprotect VMs and optionally move them back to primary site

1. Designing the SRM Recovery Plan ✅

🔄 Recovery Plan Structure & Workflow

  • Protection Groups: Organize VMs into groups based on application tiers or data dependencies (e.g. web servers, DB servers). These map to replication sets and specify RTO/RPO requirements. 

  • Recovery Plans: A plan is essentially an automated runbook controlling VM shutdown, replication sync, startup sequence, IP/network customization, and scripts. Multiple plans can reference the same protection groups. 

  • Dependencies & Sequencing: Define inter-VM dependencies (e.g. DC first, then application servers), enabling parallel startup within priority groups for faster recovery.

  • Pre/Post actions & IP Customization: Customize VM IPs, gateways, run in-guest scripts (DNS updates, services start), and display prompts during recovery execution. 

🧪 Testing

  • Use non‑disruptive test recovery: runs against isolated snapshots, so production isn't impacted and replication continues simultaneously. Test environments can be isolated or duplicate networks, depending on needs.

  • Clean up test state afterward (remove placeholder VMs, delete snapshots). 

🛠️ Execution & Failback

  • Unplanned Failover: Trigger real recovery plan after a disaster; SRM orchestrates shutdown, final sync, startup, and IP reconfiguration.

  • Planned Migration / Failback: If the protected site is operational, plan migration can move workloads orderly with minimal data loss. SRM reprotects VMs and reverses direction. 



2. Topology Mapping & Resource Alignment

  • Site Pairing: Pair protected and recovery vCenter servers and their SRM instances via site-pair configuration.

  • Inventory Mapping: Map folders, resource pools, networks, and datastores between sites to ensure smooth migration. NSX universal logical switches can span L2 networks for seamless failover.

  • Resource Considerations: Ensure sufficient compute, storage, and network resources at the recovery site. Use few but large datastores and group VMs to minimize recovery latency. 



3. Choosing Replication Methods: vSphere vs. Array-Based

Feature

Array-Based Replication (ABR)

vSphere Replication (VR)

Replication Layer

Storage array (LUN/volume level)

Hypervisor (VM-level replication)

RPO

As low as sub‑minute (vendor dependent) 

5 minutes to 24 hours (5 min with vSAN/vVols) 

Write-order fidelity

Maintains across multiple VMs in group

Fidelity only within individual VM disks 

Scale

Up to thousands of VMs

~2,000 VMs per SRM instance 

Storage dependency

Requires same vendor array at both sites

Storage‑agnostic (VMware‑supported) 

Cost

Higher licensing and vendor-specific setup

Included with many vSphere licenses (Essentials Plus+) 

🟢 When to Choose:

  • Array-Based: Ideal for enterprise use cases requiring sub-minute RPOs, write-order consistency globally, large VM scale, and tight SLAs.

  • vSphere Replication: Best suited for smaller environments, mixed storage, budget-conscious setups, or non-critical workloads.

You can even mix both: use VR for lower-tier VMs, and ABR for critical workloads within the same SRM deployment—just don’t protect the same VM by both mechanisms. 


4. Putting It All Together: Architecture Planning

  1. Map required RTO/RPO per application/tier → choose replication accordingly.

  2. Design protection groups aligned to workloads, dependencies, and replication capabilities.

  3. Configure inventory mappings of compute, network, folders, and storage to match planned failover topologies.

  4. Build and test recovery plans:

    • set sequence, customize IPs

    • include pre/post scripts

    • test non-disruptively

  5. Plan execution strategy:

    • scheduled migrations

    • failover vs planned migration scenarios

    • reprotect and failback workflows

  6. Baseline performance with recommended settings (e.g. larger fewer datastores, grouped VM startups) to reduce latency. 



5. Best Practices & Practical Tips

  • Separate large VMs and page files to avoid unnecessary replication load.

  • Tune bandwidth and replication settings—CBT for VR, compression, network latency considerations. 

  • Use parallel startup within priority groups and minimize protection groups to improve RTO. 

  • Integrate SRM with NSX, vSAN, and VMware Cloud if using hybrid or multi‑site deployments for better automation. 

  • Document recovery plan history, run test reports, and align with compliance or audit requirements.


SRM Topology Explanation

ree


1. Protected (Primary) Site

  • vCenter Server and SRM appliance manage production workloads.

  • vSphere Replication appliances or storage arrays (with SRAs) handle replication.

  • Protected VMs reside in clusters and storage datastores ready for replication.

2. Recovery (Secondary) Site

  • Mirrored setup with vCenter + SRM appliance.

  • Placeholder VMs are created in advance to reserve inventory slots.

  • Replication targets: VR receives VM-level blocks; array-based replication mirrors LUNs/volumes.

3. Replication & Network Links

  • Network connectivity connects SRM, vSphere Replication services, and SRA ports (e.g. ports 31031, 44046) across sites.

  • Storage replication occurs either via the hypervisor (vSphere replication) or directly between arrays (ABR).

  • Replication traffic uses dedicated replication networks for isolation and performance.

4. Inventory & Resource Mapping

  • Folders, resource pools, datastores, and networks are mapped from the protected site to the recovery site.

  • NSX or inventory-based network mappings ensure consistent virtual networking and, if used, universal logical switches can allow seamless L2 failover.

5. Recovery Plan Execution Flow

  1. Initiate recovery or planned migration.

  2. Perform final sync (if source still online), shut down VMs.

  3. Recovery site powers on placeholder VMs in defined priority groups with dependencies.

  4. Launch post‑power-on scripts, apply IP changes, reconfigure services.

  5. After recovery or test, cleanup and optionally reprotect and fail back.



🔍 Key Takeaways from the Diagram

  • Provides a holistic view of components: vCenter servers, SRM appliances, replication layer, VM inventory, networks.

  • Shows dual replication modes: vSphere Replication (VM-level) vs. Array-Based Replication using SRAs.

  • Illustrates bi‑directional topology, supporting both planned migrations and failbacks.

  • Includes network port/service mapping — especially useful for firewall and compliance planning



🧩 Enhancing for Your Environment

You can tailor this layout to various SRM topologies:

  • Shared Recovery Site: Multiple protected sites mapping into one recovery site (multi-pair SRM)

  • Stretched Cluster Integration: Combine SRM with vSAN stretched clusters, protecting across metro sites to a third site for ultimate resiliency

  • NSX-Aware Deployment: Use Cross‑VC NSX logical networks and automated mapping, enabling identical IP addressing and security across sites—ideal for test and DR networks 



📝 How to Create Your Own Topology Diagram

Consider the following when building your custom diagram:

  • Clearly mark vCenter + SRM pairs at each site.

  • Show replication components: VR appliances and/or array replication adapters.

  • Annotate network connectivity: control, replication, and VM traffic.

  • Indicate inventory mappings: network, resource pools, datastores, folder names.

  • Define placeholder VM logic, recovery priority groups, and sequencing.

  • Include pre/post script stages, IP customization steps.

  • Layer in optional components like NSX, stretched clusters, or F5 BIG-IP for routing and DNS failover


 
 
 

Recent Posts

See All
Automation using Power Cli

<# PowerCLI - vSphere Full Monitoring Automation File: PowerCLI - vSphere Full Monitoring Automation.ps1 Purpose: Complete, production-ready PowerCLI automation script collection for comprehensive vSp

 
 
 
VMware Real Time Scenario Interview Q & A

Part III Scenario 61: VM Network Adapter Type Mismatch Leading to Throughput & Latency Issues In a virtualised environment, several Windows and Linux VMs were upgraded from older hardware generations.

 
 
 

Comments

Rated 0 out of 5 stars.
No ratings yet

Add a rating
bottom of page