NetApp CDoT Move Root Aggregate

by John C. Wray III Wednesday, September 24, 2014 8:46 AM

Taken straight from NetApp's site.

https://kb.netapp.com/support/index?page=content&id=3013873&locale=en_US

In clustered Data ONTAP 8.2, it has become a lot easier to move the root volume to a new root aggregate than it was in earlier releases. Still, there are a number of steps involved, especially if the maintenance is to be non-disruptive.

Warning: Although not required, it is recommended to upgrade to 8.2.1P1 or 8.2.2RC1 (or an as of the writing of this document not available later release) before doing this maintenance, to avoid Burt # 810014. Due to the nature of this bug, avoid moving the root aggregate more than 1 time if possible.

Due to the possible impact of a system that is down for the duration of this maintenance on other nodes in the cluster, the first steps are somewhat different for 2-node, 4-node, or larger clusters, and exist to protect data access in the rest of the cluster.

In the example below, root aggregate aggr0_node1 on node01 is going to be hosted on 3 manually specified drives.

Warning: Below steps marked in green are very important to perform first. If the steps for 2-node or 4-node clusters are not performed, you will risk an outage of all nodes in the cluster, and will need to contact support to get data access recovered, this can possibly take hours. There have been users who have skipped these pre-steps and have experienced real outages as a result.

Important: If you are running a 2-node cluster, make sure to disable cluster HA prior to maintenance (this will not disable failover but will change quorum voting for 2-node clusters) and make sure the node that will not be changed (the partner node of the system being worked on) is set to be epsilon.
::>cluster ha modify -configured false
::> set adv
::*>  cluster modify -node node01 -epsilon false
::*>  cluster modify -node node02 -epsilon true
::*>  cluster show


The commands above will allow halting node01 to maintenance mode without a takeover (required for some of the steps later on), and prevent node02 from going out of quorum as a result. Prior to this halt, relevant storage will be moved to node02 so it can continue to be served in steps below. If above steps are not followed in a 2-node cluster, the surviving node hosting all the storage will not serve any of it or it's partner's data and there will be an outage of all data on both nodes of the 2-node cluster.

If you are running a 4-node cluster, run the following prior to maintenance. Check if the node you are working on is epsilon; if it is, move epsilon to a different node in the cluster to reduce the risk of a cluster quorum loss in case of an unexpected failover in other parts of the cluster.
::> set adv
::*>  cluster show
::*>  cluster modify -node node01 -epsilon false
::*>  cluster modify -node node03 -epsilon true
::*>  cluster show

If you are running a cluster with more than 4 nodes, no additional protections are neccessary.

Important: Check the administration documentation on size requirements for the root volume for the platform you are using. If the new root aggregate drives you are using are smaller in size, you may need to have a root aggregate of more than 3 disks to acommodate for the space.

If you are running a cluster with only a single node, please refer to  KB 1014615 How to move mroot to a new root aggregate in a single node cluster. Single node cluster root volume migration cannot be done non-disruptively.

Perform the following steps to create the new root aggregate and have it host the new root volume:

  1. Relocate the data aggregates on the node you want to change and move them to the partner. Include all SFO aggregates that have data volumes on them.
    Aggr relocate will cause the relevant aggregates to be reassigned to the partner. Any volumes in aggregates not relocated will be inaccessible during the maintenance, so be sure to verify afterwards if all aggregates with user data are moved. Once maintenance is done, be sure to move the aggregates back.
    ::>aggr relocation start -node node01 -destination node02 -aggregate-list aggr1_node1,aggr2_node1
  2. Once you have successfully moved all data aggregates, only the root aggregate and at least 3 spare drives should remain on the maintenance node. Run the following to verify:
    ::> aggr show  -nodes node01 -fields has-mroot
    aggregate   has-mroot
    ----------- ---------

    aggr0_node1 true      
  3. Create the new root aggregate with 3 disks, specifying the exact drives.
    ::*>aggr create -aggregate newroot -disklist node01:0a.11.16,node01:0a.11.19,node01:0a.11.22 -force-small-aggregate true
    ---->Notice the requirement for the -force-small-aggregate true flag due to the fact that there are only 3 drives. This is an advanced level option.
  4. Migrate all LIFs on the relevant node to other nodes in the cluster (make sure any failover groups are configured correctly!)
    ::*>net int migrate-all -node node01
  5. Verify that the aggregrate is fully created. If the drives used for the new aggregate were not zeroed beforehand, the aggregate creation may take some time. Don't start the next step until after the newly created root aggregate is fully ready.

    You can verify the status with sysconfig -r in the nodeshell of the relevant node. Again, this may take a few hours if no zeroed disks were available for disk creation.
    ::*> run -node node01 sysconfig -r
  6. Verify all of above steps have been done correctly before doing below. Once you are confident of quorum settings (epsilon and cluster ha) are correct, the data aggregates and LIFs are moved to the right nodes, and the new root aggregate is fully created and ready, then reboot the node without takeover with the following command, and go to Maintenance mode.
    ::*> reboot -node node01 -inhibit-takeover true

    NetApp Data ONTAP 8.2 Cluster-Mode
    Copyright (C) 1992-2013 NetApp.
    All rights reserved.
    md1.uzip: 39168 x 16384 blocks
    md2.uzip: 7360 x 16384 blocks
    *******************************
    *                             *
    * Press Ctrl-C for Boot Menu. *
    *                             *
    *******************************      
    ^C^C^C^C                      <------ Ctrl-C
    Boot Menu will be available.

    Select  one of the following:

    (1) Normal Boot.
    (2) Boot without /etc/rc.
    (3) Change password.
    (4) Clean configuration and initialize all disks.
    (5) Maintenance mode boot.
    (6) Update flash from backup config.
    (7) Install new software first.
    (8) Reboot node.
    Selection (1-8)? 5   <---------------option 5

  7. Set the new aggregate to CFO, which allows it to become the new root aggregate. Then set the root flag, which will make it the new root aggregate. After booting, this aggregate will automatically have a new root volume pre-created. This new root volume is called AUTOROOT. (or AUTOROOT-1 if a volume with that name already exists)

    *> aggr options newroot ha_policy cfo
    Setting ha_policy to cfo will substantially increase the client outage
    during giveback for volumes on aggregate "newroot".

    Are you sure you want to proceed (y/n)? y

    *> aggr options newroot root
    Aggregate 'newroot' will become root at the next boot.

    Bring the system back up so you can clear the recovery flag.

    *>halt

    LOADER-A> boot_ontap
  8. The system will now boot with a newly created skeleton root volume. Based on the data stored in the cf card and NVRAM, the node knows its identity in the cluster. Because the node previously had cluster database data in its root volume, and this volume is now empty, the system will set a recovery flag and give a warning at boot:

    Sep 11 17:00:33 [node01:mgmtgwd.rootvol.recovery.new:EMERGENCY]: A new root volume was detected. This node is not fully operational. Contact technical support to obtain the root volume recovery procedures.
    Sep 11 17:00:33 [node01:callhome.root.vol.recovery.reqd:EMERGENCY]: Call home for ROOT VOLUME NOT WORKING PROPERLY: RECOVERY REQUIRED.

    Wed Sep 11 17:00:35 EST 2013
    login: admin
    Password:
    ******************************************************
    * This is a serial console session. Output from this *
    * session is mirrored on the SP/RLM console session. *
    ******************************************************
    ***********************
    **  SYSTEM MESSAGES  **
    ***********************

    A new root volume was detected.  This node is not fully operational.  Contact
    support personnel for the root volume recovery procedures. 
  9. To unset the recovery flag and let the node synchronize its cluster database with the rest of the cluster, bring the system to a halt:

    ::*> halt -node node01 -inhibit-takeover true
    (system node halt)
  10. Unset the recovery flag at the loader prompt and boot the node back up.

    LOADER-A*> unsetenv bootarg.init.boot_recovery
    LOADER-A*> boot_ontap

    The node should boot normally, without the recovery warning this time. It will take a few seconds at the end of the logging before the login prompt is visible while the node is synchronizing the cluster database.
  11. Check for the health of the node with cluster ring show in set advanced, all rings should show numbers.
    ::*>set adv

    Warning: These advanced commands are potentially dangerous; use them only when directed to do so by NetApp personnel.
    Do you want to continue? {y|n}: y

    ::*>cluster ring show
    Node      UnitName Epoch    DB Epoch DB Trnxs Master    Online
    --------- -------- -------- -------- -------- --------- ---------
    cm3240c-rtp-01
              mgmt     22       22       2971287  cm3240c-rtp-01
                                                            master
    cm3240c-rtp-01
              vldb     20       20       1        cm3240c-rtp-01
                                                            master
    cm3240c-rtp-01
              vifmgr   20       20       932      cm3240c-rtp-01
                                                            master
    cm3240c-rtp-01
              bcomd    20       20       1        cm3240c-rtp-01
                                                            master
    cm3240c-rtp-01
              crs      4        4        2        cm3240c-rtp-01
                                                            master
    cm3240c-rtp-02
              mgmt     22       22       2971287  cm3240c-rtp-01
                                                            secondary
    cm3240c-rtp-02
              vldb     20       20       1        cm3240c-rtp-01
                                                            secondary
    cm3240c-rtp-02
              vifmgr   20       20       932      cm3240c-rtp-01
                                                            secondary
    cm3240c-rtp-02
              bcomd    20       20       1        cm3240c-rtp-01
                                                            secondary

Warning: If there are dashes or RPC errors for any of the rings shown instead of numbers in the 3 columns, wait for at least 10 more minutes and check cluster ring show again. If after waiting for 20 minutes after logging dashes or errors still show for some of the rows, contact NetApp support before taking further action.

  1. If you are running a 2-node cluster, run the following to re-enable HA after all rings show numbers:

::>cluster ha modify -configured true

Check the health of the HA relationship:

::> storage failover show
                              Takeover
Node           Partner        Possible State Description
-------------- -------------- -------- -------------------------------------
cm3240c-rtp-01 cm3240c-rtp-02 true     Connected to cm3240c-rtp-02
cm3240c-rtp-02 cm3240c-rtp-01 true     Connected to cm3240c-rtp-01
2 entries were displayed.

::> set diag

Warning: These diagnostic commands are for use by NetApp personnel only.
Do you want to continue? {y|n}: y

::*> cluster ha show
      High Availability Configured: true
      High Availability Backend Configured (MBX): true

Warning: If you see false after enabling HA, contact NetApp support before taking further action.

Important Additional Cleanup steps: 

  1. With the system back and in fully redundant state, you can now delete the old volume and aggregate. The old root volume is likely called vol0 (the default) but might be called something else. Check aggr status in the nodeshell to see what volume resides in the old root aggregate. In this example, it is called vol0.
    ::*> run -node node01

    node01> vol offline vol0
    Volume 'vol0' is now offline.
    node01> vol destroy vol0
    Are you sure you want to destroy volume 'vol0'? y
    Volume 'vol0' destroyed.
  2. Return to the Cluster-shell to delete the old root aggregate
    ::*> aggr delete -aggregate aggr0_node1

    Warning: Are you sure you want to destroy aggregate "aggr0_node1"? {y|n}: y
    [Job 110] Job succeeded: DONE

  3. When moving the root volume, some of the volume and aggregate changes are done outside of the knowledge of the cluster database.
    As a result, it is important to make sure that the cluster database is modified to know about the aggregate and volume changes made during this maintenance. To make the volume location database aware of the changes, run the following diag level commands:
    ::*>set diag
    ::*>volume remove-other-volume -volume vol0 -vserver node01
    ::*>volume add-other-volumes -node node01

    Verify the correctness of the vldb with the following diag level command:
    ::*>debug vreport show
    This table is currently empty.


    Info: WAFL and VLDB volume/aggregate records are consistent.
    If the above message is displayed, it is confirmed that there are no issues reported.
     
  4. Move back the relocated aggregates as there are no issues:

    ::>aggr relocation start -node node02 -destination node01 -aggregate-list aggr1_node1,aggr2_node1
  5. Add the NVFAIL option to the new root vol.
    ::*> node run -node node01 vol options AUTOROOT nvfail on
    ::*> volume show -volume AUTOROOT -fields nvfail
    vserver       volume   nvfail
    ------------- -------- ------
    node01        AUTOROOT on
  6. Rename the new root volume and new aggregate to the name before. The new root volume is most likely named AUTOROOT and it can be changed if desired.
    In this example, AUTOROOT volume is renamed to vol0 and the aggregate created as newroot is renamed as aggr0.
    ::*> vol rename -volume AUTOROOT -newname vol0 -vserver node01
    (volume rename)
    [Job 111] Job succeeded: Successful
    ::*> aggr rename newroot -newname aggr0_node1
    [Job 112] Job succeeded: DONE
  7. There are size restrictions for the root volume. Make sure the root volume size is still correct and adheres to the size reqiurements as stated in the Data ONTAP system administration guide for your release. You might need to increase the size of the volume. If the new aggregate consists of smaller drives than the drives used before, it might need an additional disk to hold the space required.

Tags:

Storage | NetApp | CDOT

Comments are closed