Saturday 6 February 2016

oakcli validate -c OSDiskStorage ERROR: Raid device /dev/md1 not clean

Recently, we were validating all our Oracle Database appliance (ODA) hardware.

From the ILOM , I could check, all the hardware is fine, but as of today( 2/6/2016) it doesn't report the correct status of hard drives.

To verify hard drive health,  I run the following commands

oakcli validate -c OSDiskStorage --> this command is to check harddrives on the host compute node.

It failed with:

[root@OVP-S-ODA21 /]# oakcli validate -c OSDiskStorage
INFO: Checking Operating System Storage
SUCCESS: The OS disks have the boot stamp
RESULT: Raid device /dev/md0 found clean
ERROR: Raid device /dev/md1 not clean    <<<<============
ERROR: mdadm detail command output

It seems, for some reason , raid dropped the drive. This is evident from the below comamnd output

[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdax2[1]     <<<<============
585954688 blocks [2/1] [_U]   <<<<============

unused devices:
[root@OVP-S-ODA21 ~]# 



While working with Oracle support, they suggested to remove and add the disk back to raid.

To remove the disk, oracle suggested to run this command:

mdadm --manage /dev/md1 --remove /dev/sdaw2

while I executed it, it failed saying , it cannot remove ( obviously it cannot find it )

[root@OVP-S-ODA21 ~]# mdadm --manage /dev/md1 --remove /dev/sdaw2
mdadm: hot remove failed for /dev/sdaw2: No such device or address
[root@OVP-S-ODA21 ~]#

To add it back

[root@OVP-S-ODA21 ~]# mdadm -a /dev/md1 /dev/sdaw2
mdadm: added /dev/sdaw2
[root@OVP-S-ODA21 ~]#

It successfully added the disk back to raid. To validate the addition, I ran   cat /proc/mdstat  and saw, raid, we recovering the newly added disk.

[root@OVP-S-ODA21 ~]#
[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdaw2[2] sdax2[1]
585954688 blocks [2/1] [_U]
[>....................] recovery = 0.2% (1558400/585954688) finish=93.7min speed=103893K/sec

unused devices:
[root@OVP-S-ODA21 ~]# 


After a couple of hours, revalidated that the recovery went fine , and the disk is successfully added back to raid.


[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdaw2[0] sdax2[1]
585954688 blocks [2/2] [UU]

unused devices:
[root@OVP-S-ODA21 ~]#

I reran the ODA command "oakcli validate -c OSDiskStorage"  and it came out clean

[root@OVP-S-ODA21 ~]# oakcli validate -c OSDiskStorage
INFO: Checking Operating System Storage
SUCCESS: The OS disks have the boot stamp
RESULT: Raid device /dev/md0 found clean
RESULT: Raid device /dev/md1 found clean
RESULT: Physical Volume /dev/md1 in VolGroupSys has 370206.05M out of total 599986.80M
RESULT: Volumegroup VolGroupSys consist of 1 physical volumes,contains 4 logical volumes, has 0 volume snaps with total size of 599986.80M and free space of 370206.05M
RESULT: Logical Volume LogVolOpt in VolGroupSys Volume group is of size 60.00G
RESULT: Logical Volume LogVolRoot in VolGroupSys Volume group is of size 30.00G
RESULT: Logical Volume LogVolSwap in VolGroupSys Volume group is of size 24.00G
RESULT: Logical Volume LogVolU01 in VolGroupSys Volume group is of size 100.00G
RESULT: Device /dev/mapper/VolGroupSys-LogVolRoot is mounted on / of type ext3 in (rw)
RESULT: Device /dev/md0 is mounted on /boot of type ext3 in (rw)
RESULT: Device /dev/mapper/VolGroupSys-LogVolU01 is mounted on /u01 of type ext3 in (rw)
RESULT: Device /dev/mapper/VolGroupSys-LogVolOpt is mounted on /opt of type ext3 in (rw)
RESULT: / has 13226 MB free out of total 29758 MB
RESULT: /boot has 41 MB free out of total 99 MB
RESULT: /u01 has 42698 MB free out of total 99194 MB
RESULT: /opt has 44648 MB free out of total 59516 MB
[root@OVP-S-ODA21 ~]#


More details are available in the below metalink note:

ODA (Oracle Database Appliance) : How to replace FAILED SYSTEM BOOT DISK ( Doc ID 1382300.1 )

Thanks,
Suresh Nooka.

No comments:

Post a Comment