Incase you receive incident from HOST/System team that there is performance issue, what will be your very basic steps for troubleshooting?
1 Steps
The
performance issue on filer is generally related to one of the below activities
on filer:
- SnapVault replication in progress
- Snapmirror replication in progress
- NDMP tape backup in progress
- More than one of the above activities running simultaneously.
1.1 Check CPU utilization:
To
verify that the issue is actually related to CPU utilization on filer run the
below command.
As
per the output below the CPU utilization is around 96% which is high and need
corrective action.
FAS3050
> sysstat -c 5 -i -u -s 5
CPU Total Net kB/s
Disk kB/s Tape kB/s Cache Cache CP CP
Disk
ops/s
in out read
write read write age
hit time ty util
95% 1630
5089 5448 69608 5591
0 60017 0s 58% 100%
# 22%
98% 417
57 2114
93420 0
0 84689 0s
57% 100% # 38%
97% 10
4 41 86399 0
0 87375 0s
57% 100% # 27%
94% 851
3739 4182 91529
25743 0 73157 1s
57% 100% b 24%
95% 3805 19196 12931 50400 31652
0 14575 0s 61% 100%
: 50%
1.2 Check tape backup status:
The below output shows high tape write activity.We bring down
CPU utilization we can stop the tape
bacukp.
FAS3050
> sysstat -c 5 -i -u -s 5
CPU
Total Net kB/s Disk
kB/s Tape kB/s Cache
Cache CP
CP Disk
ops/s
in out read
write read write age
hit time ty util
95% 1630
5089 5448 69608 5591
0 60017 0s 58% 100%
# 22%
98% 417
57 2114
93420 0
0 84689 0s
57% 100% # 38%
97% 10
4 41 86399 0
0 87375 0s
57% 100% # 27%
94% 851
3739 4182 91529
25743 0 73157 1s
57% 100% b 24%
95% 3805 19196 12931 50400 31652
0 14575 0s 61% 100%
: 50%
Stop
tape backup from backup server:
- Stop backup from backup server by aborting the related ndmpd backup group for the filer.
- If you still see nmdp save session on the backup server kill the save sessions.
Verify
that the backup sessions are stopped on the filer by running the below command:
FAS3050
> ndmpd status
ndmpd
ON.
Session:
75
Active
version: 4
Operating on behalf of primary host.
tape device: nrst3a
mover state: Active
data state: Active
data operation: Backup
If
you still see the active backup sessions running issue the “ndmpd killall”
command on filer.
1.3 Check SnapVault replication status:
To
check the snapvault replication staus run the below command on affected filer:
FAS3050>snapvault status
Snapvault
is ON.
Source Destination State Lag Status
FAS3050:/vol/VOLak1a/log FAS6030:/vol/VOLak1asv/log Source
09:15:13 Idle
FAS3050:/vol/VOLak2a/log FAS6030:/vol/VOLak2asv/log Source
08:12:43 Idle
If
you see status as Transferring,
Stop the snapvault replication if the CPU utilization is still very high and
end users still feel slowness.
snapvault abort
secondry_system:/vol/volx/secondry_qtree
For
example:
FAS3050>snapvault abort FAS6030:/vol/ntest/sg
1.4 Check SnapMirror replication status:
FAS01>
snapmirror status
Snapmirror
is on.
Source Destination
State
Lag Status
FAS01:VOL1 FAS02:VOL1 Source 02:16:47 Idle
FAS01:VOL2 FAS02:VOL2 Source 02:19:02 Idle
FAS01:VOL3 FAS02:VOL3 Source 02:17:30 Idle
FAS01:VOL4 FAS02:VOL4 Source
02:10:36 Transferring (2080 MB done)
If
you see status as Transferring, Stop the snapmirror replication if the CPU
utilization is still very high and end users still feel slowness.
snapmirror abort destination_filer:destination_volume
For example:
FAS02*> snapmirror abort FAS01:vol01
Mon Dec 13 20:00:51 GMT
[replication.src.err:error]: SnapMirror: source transfer from asiambclr1mbdb9
to FAS01:vol01 : transfer failed.
1.5 Verify the CPU utilization
Issue
“sysstat” command as below on filer to verify the current CPU utilization after
performing the above steps.
FAS3050 > sysstat -c 5 -i -u -s 5
CPU Total Net kB/s
Disk kB/s Tape kB/s Cache Cache CP CP
Disk
ops/s
in out read
write read write age
hit time ty util
55% 1630
5089 5448 69608 5591
0 0 0s
58% 100% # 22%
68% 417
57 2114
93420 0
0 0 0s 57% 100%
# 38%
72% 10
4 41 86399 0
0 0
0s 57% 100% # 27%
64% 851
3739 4182 91529
25743 0 0
1s 57% 100% b 24%
65% 3805 19196 12931 50400 31652
0 0 0s
61% 100% : 50%
1.6 Identify the cause of high CPU utilization:
Check the logs on filer and
backup server to get the cause for the snapmirror/snapvault replication or ndmp
tape backup were running out of window. For
example: backup running out of backup window.
Hi jaspreet, every post is so good. Thanks for spending time on making these. I will go through each of the posts. Thanks pal.
ReplyDelete