SHIFT

--- Sjoerd Hooft's InFormation Technology ---

User Tools

Site Tools


reallocate
Differences

This shows you the differences between two versions of the page.

Link to this comparison view

reallocate [2013/04/20 09:32] (current)
sjoerd created
Line 1: Line 1:
 += NetApp Reallocate =
  
 += What is Reallocation =
 +Reallocate optimizes the layout of data on disk for "​Sequential Read Access"​. There are three types of reallocation,​ volume/lun reallocation,​ aggregate reallocation and read reallocation. Having said that, the first one comes in two flavors, traditional and physical. When most engineers talk about reallocation they talk about volume/lun reallocation so in this article we'll do the same thing.
 +\\
 +
 +== Reallocation ==
 +Reallocation should be run in the following way. First you measure the current layout which will present you with an optimization score between 1(optimized) and 10(very unoptimized). You set a threshold (4 by default), and if the optimization score is above the threshold you run the reallocation process.
 +\\
 +Reallocation has two flavors, traditional and physical. The [[https://​fieldportal.netapp.com/​viewcontent.asp?​qv=1&​docid=33904|TR-3929]] guide from NetApp provides an excellent description from these two flavors:
 +
 +=== Traditional Reallocation ===
 +The reallocation process progresses through the file system and moves data blocks by rewriting them when Data ONTAP determines that the layout can be improved. If no improvement is predicted, no data is moved. NetApp Snapshot® data is not moved even when active file system data has been moved to new, optimized locations. Because data is rewritten to disk, if Snapshot copies are used, additional space is required to maintain the copies.
 +
 +=== Physical Reallocation ===
 +The reallocate tool also provides a physical reallocation option. Physical reallocation follows the same process as traditional reallocation;​ however, instead of completely rewriting data to the disks, the data blocks are moved by changing the physical block location while maintaining the logical block location within the FlexVol® volume. The benefit of using physical reallocation is that no additional space for is required for Snapshot copies, compared to using traditional reallocation.
 +
 +> Note that the best practice is to always use the physical reallocation if possible.
 +
 +== Aggregate Reallocation ==
 +The [[https://​fieldportal.netapp.com/​viewcontent.asp?​qv=1&​docid=33904|TR-3929]] guide from NetApp also provides an excellent description for aggregate reallocation:​
 +
 +An additional option, -A, is available at the aggregate abstraction. This reallocation method reallocates blocks within an aggregate to improve contiguous free space. The –A option does not reallocate all of the data in the aggregate following the normal reallocation method. It should not be used to improve sequential read performance. Because aggregate reallocation uses the physical reallocation method to move blocks to create contiguous free space, the impacts of using physical reallocate still apply.
 +
 +> Note that the best practice for using the -A option is only when directed to do so by NetApp.
 +
 +== Read Reallocate ==
 +The [[https://​fieldportal.netapp.com/​viewcontent.asp?​qv=1&​docid=33904|TR-3929]] guide from NetApp also provides an excellent description for read reallocation:​
 +Read reallocate is a volume option that performs opportunistic reallocation on data to improve performance. Read reallocation uses the normal workload reads along with the read-ahead engine to determine the current layout optimization. If the read was less than optimal, the data will be reallocated to improve the next read of this data. Read reallocate offers both the traditional and physical reallocation methods associated with the reallocate command. Also, because read reallocate uses the existing read workload, it does not require additional scanning or scheduling.
 +
 +=== Should Read Reallocate Be Turned On ===
 +Now, the thing is, read reallocation happens after you paid the penalty for unoptimized data reads. But, it will prevent you from having the penalty the next time. It doesn'​t have a big impact on performance because it uses the existing read workload. ​
 +\\
 +Turning it on is quite simple:
 +<​code>​
 +vol options <​myvol>​ read_realloc on
 +</​code>​
 +And [[https://​communities.netapp.com/​community/​netapp-blogs/​pseudo_benchmark/​blog/​2011/​06/​29/​readrealloc|this]] and [[http://​jakub.wartak.pl/​blog/?​p=343|this]] are the reports of two people who tested it.
 +\\
 +I think that the conclusion is that if you have a lot of sequential reads in your environment you should turn it on, monitor the impact and use it in combination with physical reallocation for your LUNs and volumes.
 +
 += Reallocation Versus Aggregate Reallocation =
 +If you're not sure whether you understand the difference between (volume/​lun) reallocation and aggregate reallocation I suggest you read this excellent article from Erick Moore. It uses pictures to understand it ;):
 +
 +http://​www.theselights.com/​2010/​03/​understanding-netapp-volume-and.html
 +
 += More Information =
 +
 +== Reallocate Command =
 +
 +<​code>​
 +> reallocate
 +usage:
 +reallocate on | off
 +reallocate start [-t threshold] [-p] [-o] [-n] [-i interval] <​path>​ | /​vol/<​volname>​
 +reallocate start -f [-p] <​path>​ | /​vol/<​volname>​
 +reallocate start -A [-o] [-i interval] <​aggr_name>​
 +  NOTE: -A is for aggregate (freespace) reallocation.
 +        Do NOT use -A after growing an aggregate if you wish to
 +        optimize the layout of existing data; instead use
 +            reallocate start -f /​vol/<​volname>​
 +        for each volume in the aggregate.
 +reallocate status [-v] [<​path>​ | <​aggr_name>​]
 +reallocate stop <​path>​ | <​aggr_name>​
 +reallocate quiesce <​path>​ | <​aggr_name>​
 +reallocate restart [-i] <​path>​ | <​aggr_name>​
 +reallocate schedule [-d] [-s <​schedule>​] <​path>​ | <​aggr_name>​
 +reallocate measure [-l logfile] [-t threshold] [-o] [-i interval] <​path>​ | /​vol/<​volname>​
 +</​code>​
 + 
 +Reallocate parameters:
 +* -p 
 +** Executes reallocate by using physical reallocation. Generally recommended. ​
 +* -f 
 +** Executes a forced reallocation. Generally not recommended.
 +* -o 
 +** Executes reallocation one time only.
 +* -n 
 +** Executes reallocation without measuring the layout first.
 +* -t 
 +** Forces reallocate to use a custom threshold.
 +
 += Run Reallocate After Adding Storage to Aggregate =
 +
 +In [[netappaggrexpansion|this article]] we had to run reallocation after adding disks to the aggregate. Please read the article for performance impact etc. To run reallocate after adding disks issue this command for every volume:
 +<​code>​
 +filer01> reallocate start -f /​vol/​Volume_1
 +Reallocation scan will be started on '/​vol/​Volume_1'​.
 +Monitor the system log for results.
 +</​code>​
 +
 +Note that you only use the above command after disks have been added to an aggregate. Adding disks can create hotspots because:
 +# New disks are empty
 +# So that's where the new data goes
 +# New data is the most used
 +# So the new disks are the most uses for as well reading as writing.
 +
 +Also note that on the long run this will solve itself. On the short run, run reallocate to bring back performance to the level it was before adding the disks to the aggregate.
 +
 += Run Reallocate to Improve Performance =
 +
 +If you experience lots of latency, and performance is not what you'd expect from your super filer, reallocation might be something you're looking for. Might be. Just remember that reallocate only optimizes for "​Sequential Reads"​. If you don't know what that means, keep on reading, it will be explained below.
 +\\
 +As explained before, if you want to reallocate a volume or a lun you have to do the following steps:
 +* Check the stats
 +* Check the reallocation level
 +* Start the reallocation
 +* Schedule the reallocation
 +
 +== Checking the Stats ==
 +You can start by checking the stats of the LUN. The command below will give you the Ops, Read and Write commands in KB, the Latency and the disk Queue length:
 +<​code>​
 +lun stats -o -i 1 /​vol/​volname/​lunname
 +</​code>​
 +For example:
 +<​code>​
 +filer01a>​ lun stats -o -i 1 /​vol/​SATA_PRD/​192_SATA_PRD
 + Read Write Other QFull   ​Read ​ Write Average ​  ​Queue ​    ​Partner ​ Lun
 +  Ops   ​Ops ​  ​Ops ​          ​kB ​    kB Latency ​ Length ​  ​Ops ​    kB
 +    0     ​0 ​    ​0 ​    ​0 ​     0      0    0.00    0.00     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0    16     ​0 ​    ​0 ​     0    170    1.06    2.01     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    1    28     ​0 ​    ​0 ​     8    185    0.44    0.09     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0    23     ​0 ​    ​0 ​     0    195    5.69    0.04     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    4    31     ​0 ​    ​0 ​    ​32 ​   104    0.65    1.04     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0     ​1 ​    ​0 ​    ​0 ​     0      4    0.00    0.08     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0     ​8 ​    ​0 ​    ​0 ​     0     ​44 ​   0.62    1.00     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0    15     ​0 ​    ​0 ​     0     ​52 ​   2.40    1.00     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0     ​1 ​    ​0 ​    ​0 ​     0      4    1.00    0.01     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0     ​4 ​    ​0 ​    ​0 ​     0     ​20 ​   3.50    1.05     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +---
 +    0     ​4 ​    ​0 ​    ​0 ​     0      4    0.25    1.01     ​0 ​     0 /​vol/​SATA_PRD/​192_SATA_PRD
 +
 +</​code>​
 +You can stop the stats by pressing <​CTRL>​+C
 +\\
 +Note that this can only be done on the LUN level, as far as I know there is no command available for the volume. For this level of monitoring we use "​NetApp Management Console"​ combined with "​[[netappoperationsmanager|NetApp Operations Manager]]"​.
 +\\
 +Now the hard part is to analyze these stats. If the queue length is high, it might be that the LUN is simply to heavily utilized and you need to spread the load. If latency is really high it might be a reallocation issue, a faulty disk, a configuration error, the filer might be under heavy constraints etc. 
 +
 +== Check Reallocation Level ==
 +Checking the current reallocation level is done by issuing an measuring command. To check the reallocation level (one time only) on a LUN level do this:
 +<​code>​
 +reallocate measure -o /​vol/​volname/​lunname
 +</​code>​
 +To check reallocation (one time only) on a volume level do this:
 +<​code>​
 +reallocate measure -o /​vol/​volname
 +</​code>​
 +For example:
 +<​code>​
 +filer01a>​ reallocate measure /​vol/​SATA_PRD/​192_SATA_PRD
 +Reallocation scan will be started on '/​vol/​SATA_PRD/​192_SATA_PRD'​.
 +Monitor the system log for results.
 +filer01a>​ reallocate measure /​vol/​SATA_DMZ
 +Reallocation scan will be started on '/​vol/​SATA_DMZ'​.
 +Monitor the system log for results.
 +</​code>​
 +As the command output already tells you, the output of these commands will be logged in the system log, just like the start:
 +<​code>​
 +Event: wafl.scan.start
 +Severity: info
 +Message: Starting WAFL layout measurement on volume SATA_PRD.
 +Triggered: Wed Oct 17 09:55:18 CEST
 +
 +Event: wafl.scan.start
 +Severity: info
 +Message: Starting WAFL layout measurement on volume SATA_DMZ.
 +Triggered: Wed Oct 17 09:56:16 CEST
 +</​code>​
 +
 +> Weird, both show the volume while the first should show the LUN...
 +
 +Results:
 +<​code>​
 +Event: wafl.reallocate.check.value
 +Severity: info
 +Message: Allocation measurement check on '/​vol/​SATA_PRD/​192_SATA_PRD'​ is 3.
 +Triggered: Wed Oct 17 10:04:19 CEST
 +
 +Event: wafl.reallocate.check.value
 +Severity: info
 +Message: Allocation measurement check on '/​vol/​n_04A_SATA_DMZ'​ is 2.
 +Triggered: Wed Oct 17 10:21:51 CEST
 +</​code>​
 +
 +== Start Reallocation ==
 +Now, in case you have a LUN or a volume that has a high number you'll want to optimize this by starting the reallocation:​
 +<​code>​
 +reallocate start -f -p /​vol/​volname/​lunname
 +</​code>​
 +Now note that the -f  does a full reallocation and -p is for a physical reallocation which is faster and doesn'​t degrade performance that much. 
 +
 +== Schedule Reallocation ==
 +Before you can schedule anything you'll first have to create the job you want to schedule. Scheduling can only be done on existing jobs. Sor first create a job with a specific threshold to prevent it from starting directly:
 +<​code>​
 +reallocate start -t 3 -p /​vol/​volname/​lunname
 +</​code>​
 +
 +> Note that the above command will immediately start a measurement of the system.
 +
 +Then use the schedule command to schedule the job. Note that you can only schedule it to start on a specific time, you cannot schedule it stop at a specific time:
 +<​code>​
 +reallocate schedule -s "0 23 * 6"
 +</​code>​
 +The above example schedules the job for every Saturday at 23:00 hours. The scheduling field look a bit like scheduling with [[cron]], except for the month field which is missing:
 +<​code>​
 +-s  schedule is a string with the following fields:
 +
 +minute hour day_of_month day_of_week
 +
 +    minute is a value from 0 to 59.
 +    hour is a value from 0 (midnight) to 23 (11:00 p.m.).
 +    day_of_month is a value from 1 to 31.
 +    day_of_week is a value from 0 (Sunday) to 6 (Saturday).
 +</​code>​
 +
 +=== Stopping (Quescing) and Resuming a Scheduled Reallocation Job ===
 +Although you can't schedule the job to stop when business hours start again, you can do so manually:
 +Stopping the job:
 +<​code>​
 +reallocate quiesce /​vol/​volname/​lunname
 +</​code>​
 +Resuming the job:
 +<​code>​
 +reallocate restart [-i] /​vol/​volname/​lunname
 +</​code>​
 +> Note that you can give a parameter -i, but that will ignore the checkpoint and simply start from the beginning, which is what you don't want, otherwise you could have just stopped the job.
 +
 += Data Patterns =
 +Reallocate optimizes the layout of data on disk for "​Sequential Read Access"​. The workload that most benefits from this is "​Sequential Reads After Random Writes"​. ​
 +
 +Examples for these kind of workloads are:
 +* Online transaction processing databases that have large table scans
 +* Email systems that use database storage with verification processes
 +* Host-side backup of LUNs
 +
 +If you want to read more on what this is exactly please look [[datapatterns|here]]
 +
 += A Reallocate Bug =
 +
 +On one of our filers (ONTAP 7.3.4) we had a weird status on one of our aggregates and all of it's containing volumes. An appreciated opportunity to dive into reallocaton on NetApps filers. Because that's what the status was about:
 +
 +Aggregate status: redirect
 +> Meaning: Aggregate reallocation or file reallocation with the "​-p"​ option has been started on the aggregate, read performance will be degraded
 +
 +Volumes statuses: redirect,​active_redirect
 +> Volume redirect: The volume'​s containing aggregate is undergoing aggregate reallocation or file reallocation with the -p option. Read performance to volumes in the aggregate might be degraded.
 +> Volume active_redirect:​ The volume'​s containing aggregate is undergoing reallocation (with the -p option specified). Read performance may be reduced while the volume is in this state.
 +
 +The explanation is from the [[http://​www.datadisk.co.uk/​html_docs/​netapp/​netapp_cs.htm|NetApp Cheat Sheet]]. ​
 +
 +We have a few performance issues on this filer and are working on it from different angles, so I decided to not ignore this and look into it.
 +
 +== The Filer in Place ==
 +The aggregate
 +<​code>​
 +storage01*>​ reallocate status -v aggr0
 +Reallocation scans are on
 +No reallocation status was found for '​aggr0'​.
 +</​code>​
 +
 +One of the volumes:
 +<​code>​
 +storage01>​ reallocate status -v /​vol/​OS_Volume
 +Reallocation scans are on
 +No reallocation status was found for '/​vol/​OS_Volume'​.
 +</​code>​
 +
 +All of the schedules:
 +<​code>​
 +storage01>​ reallocate status -v
 +Reallocation scans are on
 +/vol/vol0:
 +        State: Idle
 +        Flags: whole_vol,​measure_only,​repeat
 +    Threshold: 4
 +     ​Schedule:​ n/a
 +     ​Interval:​ 1 day
 + ​Optimization:​ 8
 +  Measure Log: n/a
 +</​code>​
 +
 +> As you can see this schedule only does a measurement of the volume. ​
 +
 +== About the Status ==
 +So what now about the status I noticed. It turned out that the relocate statuses stay on the status filed when a reallocate has been run on the aggregate in the past. See [[https://​kb.netapp.com/​support/​index?​page=content&​id=2012768&​actp=LIST_RECENT&​viewlocale=en_US&​searchid=1328797690201|this]] netapp knowledgebase article:
 +
 +> If block reallocation has been run on the aggregate, then the aggregate will show the "​redirect"​ status. This status can only be cleared by reverting to a version of Data ONTAP prior to 7.2.3. ​ If block reallocation has not been run on an aggregate, then the "​redirect"​ keyword will not be displayed.
 +Flexible volumes within aggregates that have started a block level reallocation (reallocate -A) may show an "​active_redirect"​ status within 'vol status -v' output. ​ This is only true if there are blocks that have been reallocated,​ but the redirect scanner (final phase of block level reallocation) has not completed.
 +
 +So redirect is normal, the active_redirect not. According to [[https://​communities.netapp.com/​thread/​15696|this thread]] you could do:
 +<​code>​
 +priv set diag
 +wafl scan redirect volumename
 +priv set
 +</​code>​
 +But I haven'​t found any confirmation on that.
 +
 +Right now I'm thinking of opening a support case. 
 +
 +=== Status Follow-Up ===
 +I did open a case for this, but it took a long time for IBM (we have a rebranded netapp/​n-series) to find the [[http://​support.netapp.com/​NOW/​cgi-bin/​bol?​Type=Detail&​Display=395114|actual bug report]]. So, it is a bug if you want to remove the status run these commands:
 +<​code>​
 +priv set diag
 +wafl scan redirect -a <vol>
 +</​code>​
 +
 += Links =
 +NetApp Cheat Sheet: http://​www.datadisk.co.uk/​html_docs/​netapp/​netapp_cs.htm \\
 +NetApps TR-3929: Reallocate Best Practices Guide https://​fieldportal.netapp.com/​viewcontent.asp?​qv=1&​docid=33904 \\
 +NetApp Community: https://​communities.netapp.com/​thread/​6530 \\
 +https://​communities.netapp.com/​thread/​7456 \\
 +http://​www.wafl.co.uk/​reallocate/​ \\
 +http://​www.theselights.com/​2010/​03/​understanding-netapp-volume-and.html \\
 +https://​kb.netapp.com/​support/​index?​page=content&​id=2012768&​actp=LIST_RECENT&​viewlocale=en_US&​searchid=1328797690201 \\
 +http://​support.netapp.com/​NOW/​cgi-bin/​bol?​Type=Detail&​Display=395114 \\
 +
 +{{tag>​netapp storage performance}}
reallocate.txt · Last modified: 2013/04/20 09:32 by sjoerd