Wednesday, November 19, 2014

Performance issue on NetApp filer

Incase you receive incident from HOST/System team that there is performance issue, what will be your very basic steps for troubleshooting? 

 

1               Steps


The performance issue on filer is generally related to one of the below activities on filer:

  1. SnapVault replication in progress
  2. Snapmirror replication in progress
  3. NDMP tape backup in progress
  4. More than one of the above activities running simultaneously.
 

1.1            Check CPU utilization:


To verify that the issue is actually related to CPU utilization on filer run the below command.
As per the output below the CPU utilization is around 96% which is high and need corrective action.


FAS3050 > sysstat -c 5 -i -u -s 5
 CPU   Total    Net kB/s       Disk kB/s          Tape kB/s Cache Cache  CP  CP Disk
            ops/s    in   out          read  write          read write   age   hit time ty util
 95%    1630  5089  5448     69608   5591     0 60017     0s  58% 100%  #  22%
 98%     417    57     2114     93420      0        0 84689     0s  57% 100%  #  38%
 97%      10     4       41         86399      0        0 87375     0s  57% 100%  #  27%
 94%     851  3739   4182     91529  25743    0 73157     1s  57% 100%  b  24%
 95%    3805 19196 12931   50400  31652    0 14575     0s  61% 100%  :  50%

1.2            Check tape backup status:

The below output shows high tape write activity.We bring down CPU  utilization we can stop the tape bacukp.

FAS3050 > sysstat -c 5 -i -u -s 5
 CPU   Total    Net kB/s       Disk kB/s          Tape kB/s Cache Cache  CP  CP Disk
            ops/s    in   out          read  write          read write   age   hit time ty util
 95%    1630  5089  5448     69608   5591     0 60017     0s  58% 100%  #  22%
 98%     417    57     2114     93420      0        0 84689     0s  57% 100%  #  38%
 97%      10     4       41         86399      0        0 87375     0s  57% 100%  #  27%
 94%     851  3739   4182     91529  25743    0 73157     1s  57% 100%  b  24%
 95%    3805 19196 12931   50400  31652    0 14575     0s  61% 100%  :  50%


Stop tape backup from backup server:

    1. Stop backup from backup server by aborting the related ndmpd backup group for the filer.
    2. If  you still see nmdp save session on the backup server kill the save sessions.

Verify that the backup sessions are stopped on the filer by running the below command:

FAS3050 >  ndmpd status
ndmpd ON.
Session: 75
  Active
  version:                4
  Operating on behalf of primary host.
  tape device:    nrst3a
  mover state:    Active
  data state:     Active
  data operation: Backup

If you still see the active backup sessions running  issue the ndmpd killallcommand on filer.


1.3            Check SnapVault replication status:

To check the snapvault replication staus run the below command on affected filer:

FAS3050>snapvault status

Snapvault is ON.
Source                                     Destination                                 State          Lag        Status
FAS3050:/vol/VOLak1a/log  FAS6030:/vol/VOLak1asv/log  Source         09:15:13   Idle
FAS3050:/vol/VOLak2a/log  FAS6030:/vol/VOLak2asv/log  Source         08:12:43   Idle

If you see status as Transferring, Stop the snapvault replication if the CPU utilization is still very high and end users still feel slowness.

snapvault abort secondry_system:/vol/volx/secondry_qtree

For example:

FAS3050>snapvault abort FAS6030:/vol/ntest/sg

1.4            Check SnapMirror replication status:


FAS01> snapmirror status
Snapmirror is on.
Source              Destination       State          Lag        Status
              FAS01:VOL1    FAS02:VOL1     Source         02:16:47   Idle
              FAS01:VOL2    FAS02:VOL2     Source         02:19:02   Idle
              FAS01:VOL3    FAS02:VOL3     Source         02:17:30   Idle
FAS01:VOL4   FAS02:VOL4      Source     02:10:36  Transferring  (2080 MB    done)


If you see status as Transferring, Stop the snapmirror replication if the CPU utilization is still very high and end users still feel slowness.

snapmirror abort destination_filer:destination_volume
For example:


FAS02*> snapmirror abort FAS01:vol01
Mon Dec 13 20:00:51 GMT [replication.src.err:error]: SnapMirror: source transfer from asiambclr1mbdb9 to FAS01:vol01 : transfer failed.


1.5            Verify the CPU utilization

Issue “sysstat” command as below on filer to verify the current CPU utilization after performing the above steps.

FAS3050 > sysstat -c 5 -i -u -s 5

 CPU   Total    Net kB/s       Disk kB/s          Tape kB/s Cache Cache  CP  CP Disk
            ops/s    in   out          read  write          read write   age   hit time ty util
 55%    1630  5089  5448     69608   5591     0              0     0s  58% 100%  #  22%
 68%     417    57     2114     93420      0        0                0     0s  57% 100%  #  38%
 72%      10     4       41         86399      0        0      0     0s  57% 100%  #  27%
 64%     851  3739   4182     91529  25743    0      0     1s  57% 100%  b  24%
 65%    3805 19196 12931   50400  31652    0      0     0s  61% 100%  :  50%

 

1.6            Identify the cause of high CPU utilization:


Check the logs on filer and backup server to get the cause for the snapmirror/snapvault replication or ndmp tape backup were running out of window. For example: backup running out of backup window.






1 comment:

  1. Hi jaspreet, every post is so good. Thanks for spending time on making these. I will go through each of the posts. Thanks pal.

    ReplyDelete