top of page
Writer's pictureHanh Nguyen

Steps to Remove Node from Cluster When the Node Crashes Due to OS/Hardware Failure and cannot boot u

Initial Stage

At this stage, the Oracle Clusterware is up and running on nodes lc2n1 & lc2n2 (good nodes) . Node lc2n3 is down and cannot be accessed. Note that the Virtual IP of lc2n3 is running on Node 1. The rest of the lc2n3 resources are OFFLINE:

[oracle@lc2n1 ~]$ crsstat Name                                     Target     State      Host ——————————————————————————- ora.LC2DB1.LC2DB11.inst                  ONLINE     ONLINE     lc2n1 ora.LC2DB1.LC2DB12.inst                  ONLINE     ONLINE     lc2n2 ora.LC2DB1.LC2DB13.inst                  ONLINE     OFFLINE ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv       ONLINE     ONLINE     lc2n1 ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv       ONLINE     ONLINE     lc2n2 ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv       ONLINE     OFFLINE ora.LC2DB1.LC2DB1_SRV1.cs                ONLINE     ONLINE     lc2n1 ora.LC2DB1.db                            ONLINE     ONLINE     lc2n2 ora.lc2n1.ASM1.asm                       ONLINE     ONLINE     lc2n1 ora.lc2n1.LISTENER_LC2N1.lsnr            ONLINE     ONLINE     lc2n1 ora.lc2n1.gsd                            ONLINE     ONLINE     lc2n1 ora.lc2n1.ons                            ONLINE     ONLINE     lc2n1 ora.lc2n1.vip                            ONLINE     ONLINE     lc2n1 ora.lc2n2.ASM2.asm                       ONLINE     ONLINE     lc2n2 ora.lc2n2.LISTENER_LC2N2.lsnr            ONLINE     ONLINE     lc2n2 ora.lc2n2.gsd                            ONLINE     ONLINE     lc2n2 ora.lc2n2.ons                            ONLINE     ONLINE     lc2n2 ora.lc2n2.vip                            ONLINE     ONLINE     lc2n2 ora.lc2n3.ASM3.asm                       ONLINE     OFFLINE ora.lc2n3.LISTENER_LC2N3.lsnr            ONLINE     OFFLINE ora.lc2n3.gsd                            ONLINE     OFFLINE ora.lc2n3.ons                            ONLINE     OFFLINE ora.lc2n3.vip                            ONLINE     ONLINE     lc2n1  [oracle@lc2n1 ~]$


Step 1 Remove oifcfg information for the failed node

Generally most installations use the global flag of the oifcfg command and therefore they can skip this step. They can confirm this using:

[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg getif eth0  192.168.56.0  global  public eth1  192.168.57.0  global  cluster_interconnect

If the output of the command returns global as shown above then you can skip the following step (executing the command below on a global defination will return an error as shown below.

If the output of the oifcfg getif command does not return global then use the following command

[oracle@lc2n1 bin]$ $CRS_HOME/bin/oifcfg delif -node lc2n3 PROC-4: The cluster registry key to be operated on does not exist. PRIF-11: cluster registry error


Step 2 Remove ONS information

Execute the following command to find out the remote port number to be used

[oracle@lc2n1 bin]$ cat $CRS_HOME/opmn/conf/ons.config localport=6113 remoteport=6200 loglevel=3 useocr=on

and remove the information pertaining to the node to be deleted using:

[oracle@lc2n1 bin]$ $CRS_HOME/bin/racgons remove_config lc2n3:6200


Step 3 Remove resources

In this step, the resources that were defined on this node have to be removed. These resources include Database Instances, ASm, Listener and Nodeapps resources. A list of these can be acquired by running crsstat (crs_stat -t) command from any node

[oracle@lc2n1 ~]$ crsstat |grep OFFLINE ora.LC2DB1.LC2DB13.inst                  ONLINE     OFFLINE ora.LC2DB1.LC2DB1_SRV1.LC2DB13.srv       ONLINE     OFFLINE ora.lc2n3.ASM3.asm                       ONLINE     OFFLINE ora.lc2n3.LISTENER_LC2N3.lsnr            ONLINE     OFFLINE ora.lc2n3.gsd                            ONLINE     OFFLINE ora.lc2n3.ons                            ONLINE     OFFLINE    

Before removing any resource it is recommended to take a backup of the OCR:

[root@lc2n1 ~]# cd $CRS_HOME/cdata/lc2 [root@lc2n1 lc2]# $CRS_HOME/bin/ocrconfig -export ocr_before_node_removal.exp [root@lc2n1 lc2]# ls -l ocr_before_node_removal.exp -rw-r–r– 1 root root 151946 Nov 15 15:24 ocr_before_node_removal.exp

Use ‘srvctl’ from the database home to delete the database instance on node 3:

[oracle@lc2n1 ~]$ . oraenv ORACLE_SID = [oracle] ? LC2DB1 [oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove instance -d LC2DB1 -i LC2DB13 Remove instance LC2DB13 from the database LC2DB1? (y/[n]) y

Use ‘srvctl’ from the ASM home to delete the ASM instance on node 3:

[oracle@lc2n1 ~]$ . oraenv ORACLE_SID = [oracle] ? +ASM1 [oracle@lc2n1 ~]$ $ORACLE_HOME/bin/srvctl remove asm -n lc2n3

Next remove the listener resource.

Please note that there is no ‘srvctl remove listener’ subcommand prior to 11.1 so this command will not work in 10.2. Using ‘netca’ to delete the listener from a down node also is not an option as netca needs to remove the listener configuration from the listener.ora.

10.2 only:

The only way to remove the listener resources is to use the command ‘crs_unregister’, please use this command only in this particular scenario:

[oracle@lc2n1 lc2]$ $CRS_HOME/bin/crs_unregister ora.lc2n3.LISTENER_LC2N3.lsnr

11.1 only:

Set the environment to the home from which the listener runs (ASM or database):

[oracle@lc2n1 ~]$ . oraenv ORACLE_SID = [oracle] ? +ASM1 [oracle@lc2n1 lc2]$ $ORACLE_HOME/bin/srvctl remove listener -n lc2n3

As user root stop the nodeapps resources:

[root@lc2n1 oracle]# $CRS_HOME/bin/srvctl stop nodeapps -n lc2n3 [root@lc2n1 oracle]# crsstat |grep OFFLINE ora.lc2n3.LISTENER_LC2N3.lsnr            OFFLINE    OFFLINE ora.lc2n3.gsd                            OFFLINE    OFFLINE ora.lc2n3.ons                            OFFLINE    OFFLINE ora.lc2n3.vip                            OFFLINE    OFFLINE        

Now remove them:

[root@lc2n1 oracle]#  $CRS_HOME/bin/srvctl remove nodeapps -n lc2n3 Please confirm that you intend to remove the node-level applications on node lc2n3 (y/[n]) y

At this point all resources from the bad node should be gone:

[oracle@lc2n1 ~]$ crsstat Name                                     Target     State      Host ——————————————————————————- ora.LC2DB1.LC2DB11.inst                  ONLINE     ONLINE     lc2n1 ora.LC2DB1.LC2DB12.inst                  ONLINE     ONLINE     lc2n2 ora.LC2DB1.LC2DB1_SRV1.LC2DB11.srv       ONLINE     ONLINE     lc2n1 ora.LC2DB1.LC2DB1_SRV1.LC2DB12.srv       ONLINE     ONLINE     lc2n2 ora.LC2DB1.LC2DB1_SRV1.cs                ONLINE     ONLINE     lc2n1 ora.LC2DB1.db                            ONLINE     ONLINE     lc2n2 ora.lc2n1.ASM1.asm                       ONLINE     ONLINE     lc2n1 ora.lc2n1.LISTENER_LC2N1.lsnr            ONLINE     ONLINE     lc2n1 ora.lc2n1.gsd                            ONLINE     ONLINE     lc2n1 ora.lc2n1.ons                            ONLINE     ONLINE     lc2n1 ora.lc2n1.vip                            ONLINE     ONLINE     lc2n1 ora.lc2n2.ASM2.asm                       ONLINE     ONLINE     lc2n2 ora.lc2n2.LISTENER_LC2N2.lsnr            ONLINE     ONLINE     lc2n2 ora.lc2n2.gsd                            ONLINE     ONLINE     lc2n2 ora.lc2n2.ons                            ONLINE     ONLINE     lc2n2 ora.lc2n2.vip                            ONLINE     ONLINE     lc2n2  


Step 4 Execute rootdeletenode.sh

From the node that you are not deleting execute as root the following command which will help find out the node number of the node that you want to delete

[oracle@lc2n1 ~]$ $CRS_HOME//bin/olsnodes -n lc2n1   1 lc2n2   2 lc2n3   3

this number can be passed to the rootdeletenode.sh command which is to be executed as root from any node which is going to remain in the cluster.

[root@lc2n1 ~]# cd $CRS_HOME/install [root@lc2n1 install]# ./rootdeletenode.sh lc2n3,3 CRS-0210: Could not find resource ‘ora.lc2n3.ons’. CRS-0210: Could not find resource ‘ora.lc2n3.vip’. CRS-0210: Could not find resource ‘ora.lc2n3.gsd’. CRS-0210: Could not find resource ora.lc2n3.vip. CRS nodeapps are deleted successfully clscfg: EXISTING configuration version 3 detected. clscfg: version 3 is 10G Release 2. Successfully deleted 14 values from OCR. Key SYSTEM.css.interfaces.nodelc2n3 marked for deletion is not there. Ignoring. Successfully deleted 5 keys from OCR. Node deletion operation successful. ‘lc2n3,3’ deleted successfully [root@lc2n1 install]# $CRS_HOME/bin/olsnodes -n lc2n1   1 lc2n2   2


Step 5 Update the Inventory

From the node which is going to remain in the cluster run the following command as owner of the CRS_HOME. The argument to be passed to the CLUSTER_NODES is a comma seperated list of node names of the cluster which are going to remain in the cluster. This step needs to be performed from once per home (Clusterware, ASM and RDBMS homes).

[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/crs “CLUSTER_NODES={lc2n1,lc2n2}” CRS=TRUE  Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oracle/oraInventory ‘UpdateNodeList’ was successful.

[oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm “CLUSTER_NODES={lc2n1,lc2n2}” Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oracle/oraInventory ‘UpdateNodeList’ was successful. [oracle@lc2n1 install]$ $CRS_HOME/oui/bin/runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/10.2.0/db_1 “CLUSTER_NODES={lc2n1,lc2n2}” Starting Oracle Universal Installer…

No pre-requisite checks found in oraparam.ini, no system pre-requisite checks will be executed. The inventory pointer is located at /etc/oraInst.loc The inventory is located at /u01/app/oracle/oraInventory ‘UpdateNodeList’ was successful.

0 views0 comments

Recent Posts

See All

12c Grid Infrastructure Quick Reference

This is a 12c Grid Infrastructure quick reference DETAILS “crsctl stat res -t” output example This example is from a 12c GI Flex Cluster...

OCR – Vote disk Maintenace

Prepare the disks For OCR or voting disk addition or replacement, new disks need to be prepared. Please refer to Clusteware/Gird...

コメント


bottom of page