Saturday, September 27, 2014

SnapMirror Performance Considerations



Performance impact of unplanned SnapMirror configurations

VOLUME SNAPMIRROR PERFORMANCE
Volume SnapMirror performance is centered on the update frequency, the network bandwidth, and the storage system utilization. Volume SnapMirror Async performance is particularly affected by the volume size, the rate of data changed, and the disk geometry for traditional volumes.

Disk geometry
For versions of Data ONTAP earlier than 7.0 and traditional volumes, it is recommended that the source and destination volumes contain disks of the same size, and be organized in the same RAID group configuration to gain optimal performance. For flexible volumes, disk geometry matching is no longer a consideration.

Snapshot COPY creation and update frequency
SnapMirror creates a Snapshot copy before every update and deletes a Snapshot copy at the end. On heavily loaded storage systems, Snapshot copy creation time can stretch out and restricts the frequency of SnapMirror updates. Stretched SnapMirror schedules result in SnapMirror creating many Snapshot copies on the source storage system at the same time, which can impact client access. For this reason staggered SnapMirror schedules are recommended to avoid system blockages.

Volume size and changed blocks
To perform an incremental update, the block map in the new Snapshot copy is compared to the block map in the baseline Snapshot copy. The time required to determine the block changes depends on the volume size. With Data ONTAP 7.0 and later, you can use the snap delta command to determine the rate of data change between Snapshot copies on a volume.

QTREE SNAPMIRROR PERFORMANCE
Qtree SnapMirror performance is impacted by deep directory structure and large numbers, such as tens of millions, of small files replicated.

Directory structures and large numbers of small files
To determine changed data, qtree SnapMirror looks at the inode file and defines which inodes are in the qtree of interest and which inodes have changed. If the inode file is large, but the inodes of interest are few, qtree SnapMirror spends a lot of time going through the inode file to find very few changes. Disk I/Os used to access the data become small and inefficient.

Transfer size
When a qtree SnapMirror update is transferring, the snapmirror status –l command shows how many kilobytes have been transferred so far; the value may be greater than the expected delta (changes expected). This overhead is due to metadata transfer, for example: 4-KB header, file creation, deletion, ACLs, and so on.

Few more points :
CONCURRENT TRANSFER LIMITATION
The transfer fails when the system reaches the maximum number of simultaneous replication operations. Each transfer beyond the limit will reattempt to run once per minute.
To optimize SnapMirror deployment, it is recommended that the schedules be staggered. For qtree SnapMirror, if there are too many qtrees per destination volume, the solution is to re-baseline those qtrees to another volume.

CPU UTILIZATION
SnapMirror may have some impact, but in the majority of cases, it is not very significant.
You can monitor storage system CPU using Operations Manager Performance Advisor or the Data ONTAP sysstat command

SYSTEM ACTIVITIES
On heavily loaded systems, SnapMirror competes with other processes and may impact response times.
To address this problem you can set the system priority to High or Very High on dedicated storage systems for SnapMirror replication using FlexShare® software.
Schedule SnapMirror updates : You can also schedule SnapMirror updates at times when NFS or CIFS traffic is low and reduce the frequency of updates.

NETWORK DISTANCE AND BANDWIDTH
When deploying SnapMirror, you have to consider the round-trip travel time of a packet from the source to the destination storage system, because network distance causes write latency. The round trip has a latency of approximately 2 milliseconds if the source and the destination storage systems are 100 miles apart.

Networking issues impacting SnapMirror performance can be addressed by limiting the bandwidth using the system-wide or per-transfer network throttle features.

Networking issues can also be addressed by using a dedicated path for SnapMirror transfers or using multiple paths for load balancing and failover.
If the network still does not perform up to expectations, look for typical network problems. For example, duplex mismatches can cause networks to be very slow.

No comments:

Post a Comment