Saturday, September 27, 2014

SnapMirror Performance Considerations



Performance impact of unplanned SnapMirror configurations

VOLUME SNAPMIRROR PERFORMANCE
Volume SnapMirror performance is centered on the update frequency, the network bandwidth, and the storage system utilization. Volume SnapMirror Async performance is particularly affected by the volume size, the rate of data changed, and the disk geometry for traditional volumes.

Disk geometry
For versions of Data ONTAP earlier than 7.0 and traditional volumes, it is recommended that the source and destination volumes contain disks of the same size, and be organized in the same RAID group configuration to gain optimal performance. For flexible volumes, disk geometry matching is no longer a consideration.

Snapshot COPY creation and update frequency
SnapMirror creates a Snapshot copy before every update and deletes a Snapshot copy at the end. On heavily loaded storage systems, Snapshot copy creation time can stretch out and restricts the frequency of SnapMirror updates. Stretched SnapMirror schedules result in SnapMirror creating many Snapshot copies on the source storage system at the same time, which can impact client access. For this reason staggered SnapMirror schedules are recommended to avoid system blockages.

Volume size and changed blocks
To perform an incremental update, the block map in the new Snapshot copy is compared to the block map in the baseline Snapshot copy. The time required to determine the block changes depends on the volume size. With Data ONTAP 7.0 and later, you can use the snap delta command to determine the rate of data change between Snapshot copies on a volume.

QTREE SNAPMIRROR PERFORMANCE
Qtree SnapMirror performance is impacted by deep directory structure and large numbers, such as tens of millions, of small files replicated.

Directory structures and large numbers of small files
To determine changed data, qtree SnapMirror looks at the inode file and defines which inodes are in the qtree of interest and which inodes have changed. If the inode file is large, but the inodes of interest are few, qtree SnapMirror spends a lot of time going through the inode file to find very few changes. Disk I/Os used to access the data become small and inefficient.

Transfer size
When a qtree SnapMirror update is transferring, the snapmirror status –l command shows how many kilobytes have been transferred so far; the value may be greater than the expected delta (changes expected). This overhead is due to metadata transfer, for example: 4-KB header, file creation, deletion, ACLs, and so on.

Few more points :
CONCURRENT TRANSFER LIMITATION
The transfer fails when the system reaches the maximum number of simultaneous replication operations. Each transfer beyond the limit will reattempt to run once per minute.
To optimize SnapMirror deployment, it is recommended that the schedules be staggered. For qtree SnapMirror, if there are too many qtrees per destination volume, the solution is to re-baseline those qtrees to another volume.

CPU UTILIZATION
SnapMirror may have some impact, but in the majority of cases, it is not very significant.
You can monitor storage system CPU using Operations Manager Performance Advisor or the Data ONTAP sysstat command

SYSTEM ACTIVITIES
On heavily loaded systems, SnapMirror competes with other processes and may impact response times.
To address this problem you can set the system priority to High or Very High on dedicated storage systems for SnapMirror replication using FlexShare® software.
Schedule SnapMirror updates : You can also schedule SnapMirror updates at times when NFS or CIFS traffic is low and reduce the frequency of updates.

NETWORK DISTANCE AND BANDWIDTH
When deploying SnapMirror, you have to consider the round-trip travel time of a packet from the source to the destination storage system, because network distance causes write latency. The round trip has a latency of approximately 2 milliseconds if the source and the destination storage systems are 100 miles apart.

Networking issues impacting SnapMirror performance can be addressed by limiting the bandwidth using the system-wide or per-transfer network throttle features.

Networking issues can also be addressed by using a dedicated path for SnapMirror transfers or using multiple paths for load balancing and failover.
If the network still does not perform up to expectations, look for typical network problems. For example, duplex mismatches can cause networks to be very slow.

How to resume normal operations after Disaster ?



 
Disaster strikes.  In this example, a backhoe has dug up the network cables that connect the data Center to clients.  The Data Center volume (dc_vol) is unavailable.

From the Disaster Recovery Site, break the mirror and the SnapMirror replica becomes writable.
The syntax of the snapmirror break command is:
destination>snapmirror break destination_vol.
After breaking the mirror, direct clients to the Disaster Recovery volume (dr_vol), and they continue reading and writing their data.

The Data Center volume is offline and becoming out of data.  The last shared Snapshot copy is preserved, however.  After the problem is fixed, a combination of snapmirror resync and snapmirror break commands will help you to resume normal operations.

Re-establish Normal Operations 1

With the problem fixed, you can now move the new production data to the Data Center with the snapmirror resync command executed from the Data Center storage system. 
The syntax of the snapmirror resync command is: destination>snapmirror resync destination_vol.

Executing the snapmirror resync command from the Data Center storage system has the effect of  reversing the direction of the SnapMirror relationship.  The Data Center storage system is now the destination storage system; the Data Center volume (dc_vol) is now the destination volume.  The Disaster Recovery Center volume is now the source of the new production data written while the Data Center storage system was offline.

While the Data Center source volume is receiving data from the snapmirror resync operation, clients are still accessing their data from the disaster recovery site.

Re-establish Normal Operations 2

The next step in re-establishing normal operations is to stop user access to the disaster recovery volume and complete the update of the Data Center volume with any production data written since the beginning of the snapmirror resync command.
Execute the snapmirror update command from the Data Center storage system.
The syntax of the command is:  
destination>snapmirror update –S source:source_volume destination_vol.
(In our example, the Data Center storage system became the destination when the snapmirror resync command was used to move the Disaster Recovery production data to the Data Center storage system.)

Re-establish Normal Operations 3

The Data Center volume (dc_vol) has all of the production data; however, at this point dc_vol is a read-only SnapMirror replica of dr_vol, the Disaster Recovery volume. You must now reverse the direction of the SnapMirror relationship by breaking the mirror with the SnapMirror break command executed from the Data Center storage system.
The syntax of the snapmirror break command is:
destination>snapmirror break destination_vol.

Re-establish Normal Operations 4

The final step to resuming normal operations is to execute the snapmirror resync command from the Disaster Recovery site.
The syntax of the snapmirror resync command is:
destination>snapmirror resync destination_vol.

This final SnapMirror resync command executed from the Disaster Recovery Site will return SnapMirror to the original source and destination relationship.

How to control bandwidth consumption for SnapMirror transfers (throttling) ?



THROTTLING NETWORK
Throttle network usage can be configured on a per transfer basis, using the kbs argument in the snapmirror.conf.

Dynamic throttle allows you to change the throttle value for a SnapMirror relationship while the transfer is active. This feature is available from Data ONTAP 7.1 and later.
snapmirror throttle <n> dst_hostname:dst_path
<n> is the new throttle value in kilobytes per second

System-wide throttling is available from Data ONTAP 7.2 and later and limits the total bandwidth used by all transfers at any time (SnapMirror and SnapVault transfers).
There are three options.

a) Enable or disable system-wide throttling on all systems: replication.throttle.enable [on|off]

b) Set maximum bandwidth for all incoming transfers: replication.throttle.incoming.max_kbs <value>

c) Set maximum bandwidth for all outgoing transfers: replication.throttle.outgoing.max_kbs <value>

The default value is unlimited, which means there is no limit on total bandwidth used. Valid transfer rate values are 1 to 125,000 kilobytes per second.

How to configure SnapMirror from scratch ?

SnapMirror Configuration


1.    The source and destination Netapp Controllers must be connected via Fiber Channel or Ethernet TCP/IP networks

2.    Install the snapMirror license à A Standard SnapMirror License on source and destination Netapp controllers must be installed.
For ex: license add <code>

3.    To support all functionality of SnapMirror the version of Data Ontap on the source and destination controller must be the same.

4.    Source and Destination volumes must be of the same type
Traditional Volumes
Flex Volume 32bit
Flex Volume 64bit

5.    The destination volume must be >= the size of the source volume.
** If you wish to increase the size of source volume you must manually increase the size of the destination volume first, then increase the size of the source volume.

** This is done automatically using the options fs_size_fixed on command to force the destination volume to remain the same size as the source volume

6.    The name and IP address of source and destination controllers must be in the /etc/hosts file and/or be resolvable through DNS


7.    Check options
snapmirror.enable            on

8.    No deduplication on source or destination volumes

9.    TCP/IP over Ethernet will be used to do the replication A single IP is used on both source and destination Netapp controllers using a single virtual interface to provide fault tolerance
The IP Address of the controllers is registered in DNS 

On the source, specify the host name or IP address of the snapMirror destination systems you wish to authorize to replicate this source system.
For Ex: options snapmirror.access  host=dst_hostname1,dst_hostname2