Performance monitoring is a complex topic, but it’s something that is vital to the successful implementation and maintenance of any system. In the past I’ve had several posts about using Perl for gathering performance statistics from a 7-mode system (using ONTAP 7.3.x, which is quite old at this point), so I thought it might be a good time for an update.
I originally documented some of this information in a response on the NetApp Community site. This post expands on that a bit and documents it externally.
The NetApp PowerShell Toolkit has three cmdlets which we can use to determine what objects, counters, and instances are available, and a fourth cmdlet to actually collect the data.
Finding the Right Performance Object
Performance reporting in the clustered Data ONTAP API is broken out by two things: Object
and Counter
. In order to monitor something, for example aggregate performance, we need to find the object which pertains to that “something”. We do this using the Get-NcPerfObject
cmdlet.
Throughout the rest of this post I will be using the example of aggregate monitoring, specifically how many reads and writes are being done against an aggregate.
PS C:\> Get-NcPerfObject
Name PrivilegeLevel
---- --------------
affinity diag
affiperclass diag
affiperqid diag
affitotal diag
aggregate admin
...
...
...
For my cDOT 8.3 cluster this returned 358 items, which is a lot of different categories of monitoring! For many things we can help reduce the ones to consider by using the PrivilegeLevel
. The most commonly monitored things are going to be at either admin or advanced privilege level, whereas diag is used for very detailed, infrequently needed, counters. To view non-diag objects, we change the command slightly.
Get-NcPerfObject | ?{ $_.PrivilegeLevel -ne "diag" }
PS C:\Users\Andrew> Get-NcPerfObject | ?{ $_.PrivilegeLevel -ne "diag" }
Name PrivilegeLevel
---- --------------
aggregate admin
audit_ng admin
audit_ng:vserver admin
cifs admin
cifs:node admin
cifs:vserver admin
client admin
client:vserver admin
cluster_peer admin
cpx admin
cpx_op advanced
disk admin
disk:constituent admin
disk:raid_group admin
ext_cache admin
ext_cache_obj admin
This results in just 113 objects returned, a much shorter list to consider. This privilege level also indicates how much permission on the cluster the user collecting the information will need. A user with diag privileges is going to have considerably more permission on the cluster than one with only admin or advanced.
Finding the Counters
Now that we know what objects are available they give us a categorical view of what’s available. To find out what counters are being collected for each one we use the Get-NcPerfCounter
cmdlet. Using the aggregate
object as an example, we see the following:
PS C:\Users\Andrew> Get-NcPerfCounter -Name aggregate | ?{ $_.PrivilegeLevel -ne "diag" } | Select-Object Name,PrivilegeLevel,Unit,Properties,Desc | Format-Table
Name PrivilegeLevel Unit Properties Desc
---- -------------- ---- ---------- ----
cp_read_blocks admin per_sec rate Number of blocks read per second during a CP on the aggregate
cp_read_blocks_hdd admin per_sec rate Number of blocks read per second during a CP on the aggregate HDD disks
cp_read_blocks_ssd admin per_sec rate Number of blocks read per second during a CP on the aggregate SSD disks
cp_reads admin per_sec rate Number of reads per second done during a CP to the aggregate
cp_reads_hdd admin per_sec rate Number of reads per second done during a CP to the aggregate HDD disks
cp_reads_ssd admin per_sec rate Number of reads per second done during a CP to the aggregate SSD disks
instance_name admin none string Name of the aggreagte instance
instance_uuid admin none string UUID for aggregate instance
node_name admin none string Node Name
node_uuid admin none string,no-display System node id
total_transfers admin per_sec rate Total number of transfers per second serviced by the aggregate
total_transfers_hdd admin per_sec rate Total number of transfers per second serviced by the aggregate HDD disks
total_transfers_ssd admin per_sec rate Total number of transfers per second serviced by the aggregate SSD disks
user_read_blocks admin per_sec rate Number of blocks read per second on the aggregate
user_read_blocks_hdd admin per_sec rate Number of blocks read per second on the aggregate HDD disks
user_read_blocks_ssd admin per_sec rate Number of blocks read per second on the aggregate SSD disks
user_reads admin per_sec rate Number of user reads per second to the aggregate
user_reads_hdd admin per_sec rate Number of user reads per second to the aggregate HDD disks
user_reads_ssd admin per_sec rate Number of user reads per second to the aggregate SSD disks
user_write_blocks admin per_sec rate Number of blocks written per second to the aggregate
user_write_blocks_hdd admin per_sec rate Number of blocks written per second to the aggregate HDD disks
user_write_blocks_ssd admin per_sec rate Number of blocks written per second to the aggregate SSD disks
user_writes admin per_sec rate Number of user writes per second to the aggregate
user_writes_hdd admin per_sec rate Number of user writes per second to the aggregate HDD disks
user_writes_ssd admin per_sec rate Number of user writes per second to the aggregate SSD disks
Notice that, once again, I removed the counters which are at the diag level. You may want to look at them, but for the most part they are things that only infrequently need to be monitored because they are very low level details.
I included the properties field because it’s very important…it tells us how to read the counter. From the API documentation:
- raw: single counter value is used
- delta: change in counter value between two samples is used
- rate: delta divided by the time in seconds between samples is used
- average: delta divided by the delta of a base counter is used
- percent: 100*average is used
Looking at the descriptions, it appears that we want to look at the user_reads
, user_writes
, and total_transfers
counters to determine how much activity is happening on our aggregate. Each of these is a rate counter, which means we need to measure it once, wait some known amount of time (e.g. 5 seconds), then measure again and divide by the number of seconds.
Instances of the Object
Now that we know the objects and counters, and we’ve determined what we want to monitor, we need to find the instances. To do that we use the Get-NcPerfInstance
cmdlet.
PS C:\Users\Andrew> Get-NcPerfInstance -Name aggregate | Where-Object { $_.Name -notlike "*root" }
Name Uuid
---- ----
VICE01_aggr1_sas 96f8b6c9-4444-11b2-be67-123478563412
VICE02_aggr1_sas 49f45938-45a8-11b2-9ea8-123478563412
VICE03_aggr1_sas 0b916a30-45a8-11b2-9a6d-123478563412
VICE04_aggr1_sas 6ee009b9-45a8-11b2-8bac-123478563412
VICE05_aggr1_sata 8dffa99a-45a8-11b2-839d-123478563412
VICE06_aggr1_sata 15c61be8-b5a6-4db1-b61a-8566bd967c32
I excluded root aggregates from this listing using the Where-Object
snippet because I’m not interested in those at this time.
Reporting Performance
We now have everything needed to monitor performance: the object, the counters, and the instance. We use the Get-NcPerfData
cmdlet to query for information.
Get-NcPerfData -Name aggregate -Instance VICE01_aggr1_sas -Counter user_reads,user_writes,total_transfers
Here is what it looks like in action:
PS C:\> (Get-NcPerfData -Name aggregate -Instance VICE01_aggr1_sas -Counter user_reads,user_writes,total_transfers).counters | Select-Object Name,Value
Name Value
---- -----
total_transfers 10477200561
user_reads 10168492251
user_writes 157344312
Remember that these are rate counters. To determine the values, we simply measure at two intervals and divide…
# collect the first values
$one = (Get-NcPerfData -Name aggregate -Instance VICE01_aggr1_sas -Counter user_reads,user_writes,total_transfers).counters
# wait a few seconds
Start-Sleep -Seconds 5
# collect the second values
$two = (Get-NcPerfData -Name aggregate -Instance VICE01_aggr1_sas -Counter user_reads,user_writes,total_transfers).counters
# an object to print results in
$result = "" | Select-Object "user_reads","user_writes","total_transfers"
# do the math for each counter...(value_at_t2 - value_at_t1) / time
$result.user_reads = (($two | ?{ $_.Name -eq "user_reads" }).value - ($one | ?{ $_.Name -eq "user_reads" }).value ) / 5
$result.user_writes = (($two | ?{ $_.Name -eq "user_writes" }).value - ($one | ?{ $_.Name -eq "user_writes" }).value ) / 5
$result.total_transfers = (($two | ?{ $_.Name -eq "total_transfers" }).value - ($one | ?{ $_.Name -eq "total_transfers" }).value ) / 5
# print the result
$result
And the output, remember this is a per second average over the time between polls (5 seconds in this instance):
user_reads user_writes total_transfers
---------- ----------- ---------------
47.4 18.6 81.6
We can modify this slightly to get a per-second report for an aggregate:
$aggregate = "VICE01_aggr1_sas"
$waitSeconds = 1
Write-Host "user_reads user_writes total_transfers"
Write-Host "---------- ----------- ---------------"
# collect the first values
$one = (Get-NcPerfData -Name aggregate -Instance $aggregate -Counter user_reads,user_writes,total_transfers).counters
while ($true) {
# wait a bit
Start-Sleep -Seconds $waitSeconds
# collect the second values
$two = (Get-NcPerfData -Name aggregate -Instance $aggregate -Counter user_reads,user_writes,total_transfers).counters
# an object to print results in
$result = "" | Select-Object "user_reads","user_writes","total_transfers"
# do the math for each counter...(value_at_t2 - value_at_t1) / time...and print
$result.user_reads = (($two | ?{ $_.Name -eq "user_reads" }).value - ($one | ?{ $_.Name -eq "user_reads" }).value ) / $waitSeconds
$result.user_writes = (($two | ?{ $_.Name -eq "user_writes" }).value - ($one | ?{ $_.Name -eq "user_writes" }).value ) / $waitSeconds
$result.total_transfers = (($two | ?{ $_.Name -eq "total_transfers" }).value - ($one | ?{ $_.Name -eq "total_transfers" }).value ) / $waitSeconds
# format the output and display it
"{0,10} {1,11} {2,15}" -f $result.user_reads,$result.user_writes,$result.total_transfers
# set the starting values for the next iteration
$one = $two
}
Giving us an easy to read, per second, output of the number of reads, writes, and total transfers for our aggregate…
user_reads user_writes total_transfers
---------- ----------- ---------------
102 0 102
0 0 0
1 0 1
0 0 0
7 26 89
1 40 58
Performance Monitoring is Fun!
This has been just a short introduction to performance monitoring of a cDOT system using the PowerShell Toolkit. There is a huge number of things that can be monitored, and you can choose to display the information however you like…maybe a real-time report of performance for troubleshooting, intermittent collection to go into a summary report, collection at regular intervals to feed into a trend analysis tool.
Please reach out to me using the comments below or the NetApp Community site with any questions about how to collect performance information from your systems.
The post cDOT Performance Monitoring Using PowerShell appeared first on The Practical Administrator.