Monday 23 May 2016

adop phase=prepare AdminServer has not yet started by master. Will wait for another minute

i was trying to run adop phase=prepare  and see the below error

"AdminServer has not yet started by master. Will wait for another minute"

The same error is seen , even if I run 

adop phase=fs_clone allnodes=no force=yes



      [EVENT]     [END   2016/05/19 23:29:32] Executing txkADOPValidation script on ovpxvcppd
    [EVENT]     [END   2016/05/19 23:29:32] Executing txkADOPValidation script on ovpxvcppd
        [EVENT]     [START 2016/05/19 23:29:32] Performing check to see if pending cleanup actions exist
          [EVENT]     No pending cleanup actions, proceeding with other steps
        [EVENT]     [END   2016/05/19 23:29:32] Performing check to see if pending cleanup actions exist
          [ERROR]     Cannot connect to database
          [ERROR]     Error Message : ORA-12514: TNS:listener does not currently know of service requested in connect descriptor (DBD ERROR: OCIServerAttach)
          [ERROR]     *** Error occurred while calling ad_zd_log. Please check log file for the details. ***
          [ERROR]     *** Error occurred while calling ad_zd_log. Please check log file for the details. ***
          [ERROR]     *** Error occurred while calling ad_zd_log. Please check log file for the details. ***
        AdminServer has not yet started by master. Will wait for another minute


First thing first , i thought its a db connection issue, since we see the below error in the log file:

 ORA-12514: TNS:listener does not currently know of service requested in connect descriptor (DBD ERROR: OCIServerAttach)

I checked the tnsnames.ora file  at $TNS_ADMIN and found the following entry 

  (DESCRIPTION=
                (ADDRESS=(PROTOCOL=tcp)(HOST=xxxxxx)(PORT=1541))
            (CONNECT_DATA=
                (SERVICE_NAME=ebs_patch)
                (INSTANCE_NAME=EBSPD1)
            )
        )

Ran the following in database so that the listener registers the service ebs_patch as well.

alter system set service_names='EBSDB,ebs_patch'; 

Ran adop phase=abort and adop phase=cleanup and re-ran adop phase=prepare.

Again , it failed with the error " AdminServer has not yet started by master. Will wait for another minute"

Upon researching , I see no single hit in metalink(support.oracle.com)

Upon searching in google, some people commented, that the admin server in the patch file system needs to be started manually, for this command to finish successfully. 

I started up the admin server manually.

you can use the below command from patch file system

./adadminsrvctl.sh start forcepatchfs

or just run the nohup ./startWebLogic.sh & 
from
/u01/install/APPS/fs1/FMW_Home/user_projects/domains/EBS_domain_EBSPD/bin

After starting the admin server, I reran adop phase=abort and adop phase=cleanup and re-ran adop phase=prepare and it went through successfully.

Thursday 25 February 2016

Weblogic : Unresolved application library references - ASCP deployment

I was trying to deploy ASCP application ear file (PlanningUI.ear) in weblogic.

It deployed fine, but while activating the changes, it fails with the below error:

Message icon - Error An error occurred during activation of changes, please see the log for details.
Message icon - Error [J2EE:160149]Error while processing library references. Unresolved application library references, defined in weblogic-application.xml: [Extension-Name: adf.oracle.domain, exact-match: false], [Extension-Name: oracle.jsp.next, exact-match: false], [Extension-Name: oracle.adf.desktopintegration.model, exact-match: false].
 
Just to check if I can ignore the above errors, I tried to start the deployment to service all requests
and as expected it failed with below error
 
 Message icon - Error weblogic.management.DeploymentException: [Deployer:149003]Unable to access application source information in '/u01/install/ascp/FMW_Home/user_projects/domains/ascp_domain/servers/ASCPManagedServer/stage/PlanningUI/PlanningUI.ear' for application 'PlanningUI'. The specific error is: [Deployer:149158]No application files exist at '/u01/install/ascp/FMW_Home/user_projects/domains/ascp_domain/servers/ASCPManagedServer/stage/PlanningUI/PlanningUI.ear'..

Upon looking further , it seems to be complaining about the following libraries :

adf.oracle.domain
oracle.jsp.next
oracle.adf.desktopintegration.model

I verified, if these libraries are available at all and find those libraries available . To check that you can search for the above libraries in the deployments tab.

However, if I click on each of the library and check the targets for the library, the library is only deployed for the admin server and not for the managed server.

Since I was trying to deploy the planning ear file on the managed server and since the libraries are only deployed on the admin server, I faced this issue.

I clicked on each of the library and deployed them to both admin and managed server. I did that not only the above libraries , but for all the libraries available.

Then retried the deployment and it went successful.

Also , i had to set StartScriptEnabled to true, in the node manager properties file earlier, to get this to working.

/u01/install/ascp/FMW_Home/wlserver_10.3/common/nodemanager

Hope this helps some one.

Monday 22 February 2016

adop issue ebs 12.2

We recently installed EBS 12.2 for use with VCP/ASCP/Demantra suite of products.

while we had quite a few issues, I am mentioning about this particular issue here.

We were trying to apply patch 19549533 on our brand new ebs installation.

To apply the patch, these are the steps:

adop phase=prepare
adop phase=apply patches=19549533
adop phase=finalize
adop phase=cutover

When I was trying to run adop phase=prepare , it failed with below error:

---------

SETUP VALIDATION is in progress. This may take few minutes to complete.
        [UNEXPECTED]Error occurred while executing "perl /u01/install/APPS/fs1/EBSapps/appl/ad/12.0.0/patch/115/bin/txkADOPValidations.pl  -contextfile=/u01/install/APPS/fs1/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -patchctxfile=/u01/install/APPS/fs2/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -phase=prepare -logloc=/u01/install/APPS/fs_ne/EBSapps/log/adop/35/prepare_20160222_134702/EBSQA_ovpxvcpqa -promptmsg=hide"
        [UNEXPECTED]Error 1 occurred while Executing txkADOPValidation script on ovpxvcpqa
        Log file: /u01/install/APPS/fs_ne/EBSapps/log/adop/35/adop_20160222_134702.log


[STATEMENT] Please run adopscanlog utility, using the command

"adopscanlog -latest=yes"

to get the list of the log files along with snippet of the error message corresponding to each log file.


adop exiting with status = 1 (Fail)

------------

Same error is seen, when I try to run adop phase=fs_clone

I did not find any relevant hits in google or metalink. While taking a closer look at the log files, one of the logfiles said the following:

-----
[root@ovpxvcpqa EBSQA_ovpxvcpqa]# tail -f ADOPValidations_Mon_Feb_22_13_34_11_2016.log
Nodes with context files in the FND_OAM_CONTEXT_FILES table on both run and patch file systems: NONE
Nodes without context files in the FND_OAM_CONTEXT_FILES table on either/or run and patch file systems: ovpxvcpqa
Corrective Action:
- If the run file system context file for a node is missing, run AutoConfig on the run file system of that node to sync with the value with the database.
- If the patch file system context file of a node is missing, run AutoConfig on the patch file system of that node with the -syncctx option as follows to sync with the value with the database.
On UNIX:
 sh /bin/adconfig.sh contextfile= -syncctx
On Windows:
 \bin\adconfig.cmd contextfile= -syncctx
Exiting validations as further tests will break.
^C
[root@ovpxvcpqa EBSQA_ovpxvcpqa]#
--------

I was sure that the run file system was fine, but was doubtful about the patch file system.
Anyways, i decided to run the below command on both run and patch file systems:

 sh /bin/adconfig.sh contextfile= -syncctx

For run file system:

source the run file system using 

. /u01/install/APPS/EBSapps.env run

[oracle@ovpxvcpqa EBSQA_ovpxvcpqa]$ sh $AD_TOP/bin/adconfig.sh contextfile=/u01/install/APPS/fs1/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -syncctx
Enter the APPS user password:
The log file for this session is located at: /u01/install/APPS/fs1/inst/apps/EBSQA_ovpxvcpqa/admin/log/02221207/adconfig.log

Option specified      : Synchronize context file
Only context file synchronization will be performed

        Classpath                   : /u01/install/APPS/fs1/FMW_Home/Oracle_EBS-app1/shared-libs/ebs-appsborg/WEB-INF/lib/ebsAppsborgManifest.jar:/u01/install/APPS/fs1/EBSapps/comn/java/classes

        Using ContextFile     : /u01/install/APPS/fs1/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml

        Synchronizing the context file ......COMPLETED

AutoConfig completed successfully.
[oracle@ovpxvcpqa EBSQA_ovpxvcpqa]$

For patch file system:

Source the patch file system by running:

. /u01/install/APPS/EBSapps.env patch

[oracle@ovpxvcpqa EBSQA_ovpxvcpqa]$ sh $AD_TOP/bin/adconfig.sh contextfile=/u01/install/APPS/fs2/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -syncctx
Enter the APPS user password:
The log file for this session is located at: /u01/install/APPS/fs2/inst/apps/EBSQA_ovpxvcpqa/admin/log/02221207/adconfig.log

Option specified      : Synchronize context file
Only context file synchronization will be performed

        Classpath                   : /u01/install/APPS/fs2/FMW_Home/Oracle_EBS-app1/shared-libs/ebs-appsborg/WEB-INF/lib/ebsAppsborgManifest.jar:/u01/install/APPS/fs2/EBSapps/comn/java/classes

        Using ContextFile     : /u01/install/APPS/fs2/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml

        Synchronizing the context file ......COMPLETED

AutoConfig completed successfully.
[oracle@ovpxvcpqa EBSQA_ovpxvcpqa]$


After completing this , I re-ran 

adop phase=fs_clone and it fails with the below error:


  There is already a session which is incomplete. Details are:
        Session Id            :   35
        Prepare phase status  :   NOT COMPLETED
        Apply phase status    :   NOT COMPLETED
        Cutover  phase status :   NOT COMPLETED
        Abort phase status    :   NOT COMPLETED
        Session status        :   FAILED
  Will continue with previous session
  [UNEXPECTED]FS clone cannot be called while a patching cycle is active


To resolve this, I had to run the 

adop phase=abort 

Then do the cleanup using :

adop phase=cleanup cleanup_mode=standard

Then I re-ran 

adop phase=fs_clone

This time , it went past the point where it errored out previously :)  but soon failed with below error

----
Validation Results for Node: ovpxvcpqa
---------------------------------------
[ERROR]: At least one Oracle inventory check has failed.

[WARNING]: There could be issues while validating the ports used for E-Business Suite instance against ports used in /etc/services. Refer the log file for more details.

[WARNING]: Either some of the required entries in /etc/hosts file might be missing (e.g. localhost or hostname) OR the file /etc/hosts could not be read.

Some validations failed, please check /u01/install/APPS/fs_ne/EBSapps/log/adop/36/fs_clone_20160222_140723/EBSQA_ovpxvcpqa/ADOPValidations_Mon_Feb_22_14_07_46_2016.log

=========================== END OF VALIDATION REPORT ======================
        [UNEXPECTED]Error occurred while executing "perl /u01/install/APPS/fs1/EBSapps/appl/ad/12.0.0/patch/115/bin/txkADOPValidations.pl  -contextfile=/u01/install/APPS/fs1/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -patchctxfile=/u01/install/APPS/fs2/inst/apps/EBSQA_ovpxvcpqa/appl/admin/EBSQA_ovpxvcpqa.xml -phase=fs_clone -logloc=/u01/install/APPS/fs_ne/EBSapps/log/adop/36/fs_clone_20160222_140723/EBSQA_ovpxvcpqa -promptmsg=hide"
        [UNEXPECTED]Error 1 occurred while Executing txkADOPValidation script on ovpxvcpqa
        Log file: /u01/install/APPS/fs_ne/EBSapps/log/adop/36/adop_20160222_140723.log

-------

So from the error , it was evident that there is an issue with oraInventory.

Then , I following log file also pointed that the issue is apparently with 
/u01/install/APPS/fs1/EBSapps/10.1.2  Oracle home.



/u01/install/APPS/fs_ne/EBSapps/log/adop/36/fs_clone_20160222_140723/EBSQA_ovpxvcpqa/ADOPValidations_Mon_Feb_22_14_07_46_2016.log:
----------------------------------------------------------------------------------------------------------------------------------

Lines #(191-195):
====================================

ERROR: /u01/install/APPS/fs1/EBSapps/10.1.2 is not registered in the inventory
Corrective Action: Provide the location of a valid inventory file. If you believe the inventory is valid, you may want to attach the /u01/install/APPS/fs1/EBSapps/10.1.2.

[oracle@ovpxvcpqa APPS]$

Ok . Good. So how do we fix this inventory issue.

Upon researching, there is a very good note on metalink, which descibes about , what homes to expect on a 12.2 ebs environment and how to re-attach them.

R12.2 How To Re-attach Oracle Homes To The Central Inventory (Doc ID 1586607.1)

using the above note I attached the problem OH.

/u01/install/APPS/fs1/EBSapps/10.1.2/oui/bin/runInstaller -silent -attachHome -invPtrLoc /etc/oraInst.loc ORACLE_HOME="/u01/install/APPS/fs1/EBSapps/10.1.2" ORACLE_HOME_NAME="OH582721206"

/u01/install/APPS/fs2/EBSapps/10.1.2/oui/bin/runInstaller -silent -attachHome -invPtrLoc /etc/oraInst.loc ORACLE_HOME="/u01/install/APPS/fs2/EBSapps/10.1.2" ORACLE_HOME_NAME="OH165974127"

Excellent!!!

Now I again re-run the adop phase=fs_clone

This time , it seems to be running successfully ...

and yeaahhh it completed successfully...

----
#############################################################

END: FSCloneApplyFSCloneApply - appsTier Finished at Mon Feb 22 15:30:10 CST 2016
Status: Completed Successfully
Total Time Taken : 35.950 mins
#############################################################



END:  doClone..


CLONE APPLY COMPLETED SUCCESSFULLY
----

Hope this helps some one.


Wednesday 17 February 2016

Weblogic install fail with insufficient space in file system

While I was trying to install weblogic , it complained saying there is 0 MB space left in the file system.

however , If I go and check I see there is 17T of space left in the file system. Yes you read it correct. Its 17 Terrabyte

Somehow it seems oracle installer  doesn't recognize if there is more than certain GB of space, and its code fails, saying you have 0 MB space left.

I put a quota on the file system and thats when it reduced from 17T to 100G

I click the next button on in the installer and it proceeds as normal.

So in case you face it, try to lower the space or put a quota on the filesystem and that should help you.

vnc issue

I had this issue earlier as well , but all the blogs misled me.

When I start the vnc , I get the grey vnc screen but no terminal window on it.
there is nothing much I can do on the vnc screen without the terminal (putty) window

While i google , lot of the blogs say about modifying the /home/oracle/.vnc/xstartup file

I modified it and restart vnc number of times, but nothing helped.

Then , I turn to the vnc log file to see if I can find a clue of whats wrong.

The following line caught my attention:

/root/.vnc/xstartup: line 11: xterm: command not found

upon checking xterm command is not found.

[root@ovpxpippd .vnc]#
[root@ovpxpippd .vnc]# ls -l /usr/bin/xterm
ls: cannot access /usr/bin/xterm: No such file or directory
[root@ovpxpippd .vnc]#
[root@ovpxpippd .vnc]#

I proceed to install xterm.

[root@ovpxpippd .vnc]#
[root@ovpxpippd .vnc]# yum install xterm
Loaded plugins: security
Setting up Install Process
Resolving Dependencies
--> Running transaction check
---> Package xterm.x86_64 0:215-8.el5_4.1 will be installed
--> Processing Dependency: libtermcap.so.2()(64bit) for package: xterm-215-8.el5_4.1.x86_64
--> Processing Dependency: libXaw.so.7()(64bit) for package: xterm-215-8.el5_4.1.x86_64
--> Running transaction check
---> Package libXaw.x86_64 0:1.0.2-8.1 will be installed
--> Processing Dependency: libXpm.so.4()(64bit) for package: libXaw-1.0.2-8.1.x86_64
---> Package libtermcap.x86_64 0:2.0.8-46.1 will be installed
--> Processing Dependency: /etc/termcap for package: libtermcap-2.0.8-46.1.x86_64
--> Running transaction check
---> Package libXpm.x86_64 0:3.5.5-3 will be installed
---> Package termcap.noarch 1:5.5-1.20060701.1 will be installed
--> Finished Dependency Resolution

Dependencies Resolved

=================================================================================================================================================================
 Package                              Arch                             Version                                        Repository                            Size
=================================================================================================================================================================
Installing:
 xterm                                x86_64                           215-8.el5_4.1                                  el5_latest                           411 k
Installing for dependencies:
 libXaw                               x86_64                           1.0.2-8.1                                      el5_latest                           329 k
 libXpm                               x86_64                           3.5.5-3                                        el5_latest                            44 k
 libtermcap                           x86_64                           2.0.8-46.1                                     el5_latest                            14 k
 termcap                              noarch                           1:5.5-1.20060701.1                             el5_latest                           265 k

Transaction Summary
=================================================================================================================================================================
Install       5 Package(s)

Total download size: 1.0 M
Installed size: 2.8 M
Is this ok [y/N]: y
Downloading Packages:
(1/5): libXaw-1.0.2-8.1.x86_64.rpm                                                                                                        | 329 kB     00:00
(2/5): libXpm-3.5.5-3.x86_64.rpm                                                                                                          |  44 kB     00:00
(3/5): libtermcap-2.0.8-46.1.x86_64.rpm                                                                                                   |  14 kB     00:00
(4/5): termcap-5.5-1.20060701.1.noarch.rpm                                                                                                | 265 kB     00:01
(5/5): xterm-215-8.el5_4.1.x86_64.rpm                                                                                                     | 411 kB     00:00
-----------------------------------------------------------------------------------------------------------------------------------------------------------------
Total                                                                                                                            206 kB/s | 1.0 MB     00:05
Running rpm_check_debug
Running Transaction Test
Transaction Test Succeeded
Running Transaction
  Installing : 1:termcap-5.5-1.20060701.1.noarch                                                                                                             1/5
  Installing : libtermcap-2.0.8-46.1.x86_64                                                                                                                  2/5
  Installing : libXpm-3.5.5-3.x86_64                                                                                                                         3/5
  Installing : libXaw-1.0.2-8.1.x86_64                                                                                                                       4/5
  Installing : xterm-215-8.el5_4.1.x86_64                                                                                                                    5/5
  Verifying  : libtermcap-2.0.8-46.1.x86_64                                                                                                                  1/5
  Verifying  : libXaw-1.0.2-8.1.x86_64                                                                                                                       2/5
  Verifying  : xterm-215-8.el5_4.1.x86_64                                                                                                                    3/5
  Verifying  : libXpm-3.5.5-3.x86_64                                                                                                                         4/5
  Verifying  : 1:termcap-5.5-1.20060701.1.noarch                                                                                                             5/5

Installed:
  xterm.x86_64 0:215-8.el5_4.1

Dependency Installed:
  libXaw.x86_64 0:1.0.2-8.1            libXpm.x86_64 0:3.5.5-3            libtermcap.x86_64 0:2.0.8-46.1            termcap.noarch 1:5.5-1.20060701.1

Complete!
[root@ovpxpippd .vnc]#
[root@ovpxpippd .vnc]#
[root@ovpxpippd .vnc]#


Then started up vnc, and yes, now I get my terminal window.

So although you have the vnc packages , but unless you have the xterm packages installed , you cannot make use of vnc . 

Hope this helps some one.

emctl config oms -store_repos_details command failed

Once , i was trying to migrate OEM database from one database server to another.

1. We created the standby on the new server
2. During downtime, we synched it up
3. Opened the standby as primary
4. Tried re-pointing the connection string to new database using:
emctl config oms -store_repos_details -repos_conndesc "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=ovpd-scan)(PORT=1521)))(LOAD_BALANCE=ON)(CONNECT_DATA=(SERVICE_NAME=CMPREP)))" -repos_user SYSMAN 

This failed with the below error:

[oracle@ovp-v-oempd1 bin]$ emctl config oms -store_repos_details -repos_conndesc "(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=ovpd-scan)(PORT=1521)))(LOAD_BALANCE=ON)(CONNECT_DATA=(SERVICE_NAME=CMPREP)))" -repos_user SYSMAN 
Oracle Enterprise Manager Cloud Control 12c Release 4 
Copyright (c) 1996, 2014 Oracle Corporation. All rights reserved. 
Error occurred. Check the log /u01/app/oracle/product/gc_inst1/em/EMGC_OMS1/sysman/log/secure.log 

When I check in the secure.log it just says

secure.log 

2016-01-18 14:31:17,589 [main] INFO util.EmctlUtil logp.251 - Connecting over t3s to: ovp-v-oempd1.compassminerals.com/7102 using id: weblogic 
2016-01-18 14:32:23,368 [main] INFO util.EmctlUtil logp.251 - Unable to get mbean conn over t3s :null 
2016-01-18 14:32:23,368 [main] INFO oms.StoreReposDetails logp.251 - Since there is a failure, rolling back the getLockConn 
2016-01-18 14:32:23,373 [main] ERROR oms.StoreReposDetails logp.251 - 
java.lang.NullPointerException 
at weblogic.rjvm.ResponseImpl.unmarshalReturn(ResponseImpl.java:237) 
at weblogic.rmi.internal.BasicRemoteRef.invoke(BasicRemoteRef.java:223) 
at weblogic.management.remote.iiop.IIOPServerImpl_1036_WLStub.newClient(Unknown Source) 
at weblogic.management.remote.common.RMIServerWrapper.newClient(ClientProviderBase.java:348) 
at javax.management.remote.rmi.RMIConnector.getConnection(RMIConnector.java:2327) 
at javax.management.remote.rmi.RMIConnector.connect(RMIConnector.java:277) 
at weblogic.management.remote.common.WLSRMIConnector.doConnect(WLSRMIConnector.java:152) 
at weblogic.management.remote.common.WLSRMIConnector.access$100(WLSRMIConnector.java:29) 

at oracle.sysman.emctl.util.EmctlUtil.getMBeanServerConn(EmctlUtil.java:775) 
at oracle.sysman.emctl.config.oms.StoreReposDetails.processCmd(StoreReposDetails.java:248) 
at oracle.sysman.emctl.config.oms.StoreReposDetails.main(StoreReposDetails.java:549) 
Caused by: java.lang.NullPointerException 
at oracle.security.jps.az.internal.runtime.policy.SystemPolicyImpl.(SystemPolicyImpl.java:85) 
at oracle.security.jps.az.internal.runtime.service.PDPServiceImpl.initializeUncontrolledMode(PDPServiceImpl.java:570) 
at oracle.security.jps.az.internal.runtime.service.PDPServiceImpl.initial(PDPServiceImpl.java:489) 
at oracle.security.jps.az.internal.runtime.service.PDPServiceImpl.getSystemPolicy(PDPServiceImpl.java:888) 
at oracle.security.jps.az.internal.runtime.s 


I Initially thought its some issue with the weblogic, but I couldnot find any issues that I could relate to the above error.

I could connect fine using the same tns entry .

So .. Long story short, it turned out to be a DNS issue.

The DNS servers in our network are very crappy and unstable.

To resolve this issue I had to do the following:


2) Stopped the OMS using, 

emctl stop oms -all -force 

3) Killed remaining processes and restarted the oms 
-> OMS services were started successfully 

4) Temporarily changed the repository database IP address to old IP db address 10.177.10.10 and added an entry in /etc/hosts file 

5) Forced a local hostname resolution on the OMS host by adding a line which points to old IP address of reposioty database 10.177.10.10 to new host newhost.oracle.com in /etc/hosts file. 

6) Changed the host in the connection string using emctl on the OMS, from 10.177.10.10 to newhost.oracle.com 

emctl config oms -store_repos_details -repos_conndesc '(DESCRIPTION=(ADDRESS=(PROTOCOL=TCP)(HOST=newhost.oracle.com)(PORT=1521))(CONNECT_DATA=(SERVER=DEDICATED)(SERVICE_NAME=EMREP)))' -repos_user sysman 

Which completed successfully 

7) Stopped OMS 

8) Changed the IP address of repository database back to new IP 10.177.10.50 and removed entries from /etc/hosts file (as DNS was configured to point to new IP address), also removed old IP address entries from /etc/hosts on OMS host. 

9) Removed the line from the /etc/hosts of the OMS, allowing the use of the DNS to resolve newhost.oracle.com. Now newhost.oracle.com maps to 10.177.10.50. 

10) Started OMS services successfully 


This is a bit complex and very unpredictable, but helped us in this situation.

Hope this helps some one.

Saturday 6 February 2016

oakcli validate -c OSDiskStorage ERROR: Raid device /dev/md1 not clean

Recently, we were validating all our Oracle Database appliance (ODA) hardware.

From the ILOM , I could check, all the hardware is fine, but as of today( 2/6/2016) it doesn't report the correct status of hard drives.

To verify hard drive health,  I run the following commands

oakcli validate -c OSDiskStorage --> this command is to check harddrives on the host compute node.

It failed with:

[root@OVP-S-ODA21 /]# oakcli validate -c OSDiskStorage
INFO: Checking Operating System Storage
SUCCESS: The OS disks have the boot stamp
RESULT: Raid device /dev/md0 found clean
ERROR: Raid device /dev/md1 not clean    <<<<============
ERROR: mdadm detail command output

It seems, for some reason , raid dropped the drive. This is evident from the below comamnd output

[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdax2[1]     <<<<============
585954688 blocks [2/1] [_U]   <<<<============

unused devices:
[root@OVP-S-ODA21 ~]# 



While working with Oracle support, they suggested to remove and add the disk back to raid.

To remove the disk, oracle suggested to run this command:

mdadm --manage /dev/md1 --remove /dev/sdaw2

while I executed it, it failed saying , it cannot remove ( obviously it cannot find it )

[root@OVP-S-ODA21 ~]# mdadm --manage /dev/md1 --remove /dev/sdaw2
mdadm: hot remove failed for /dev/sdaw2: No such device or address
[root@OVP-S-ODA21 ~]#

To add it back

[root@OVP-S-ODA21 ~]# mdadm -a /dev/md1 /dev/sdaw2
mdadm: added /dev/sdaw2
[root@OVP-S-ODA21 ~]#

It successfully added the disk back to raid. To validate the addition, I ran   cat /proc/mdstat  and saw, raid, we recovering the newly added disk.

[root@OVP-S-ODA21 ~]#
[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdaw2[2] sdax2[1]
585954688 blocks [2/1] [_U]
[>....................] recovery = 0.2% (1558400/585954688) finish=93.7min speed=103893K/sec

unused devices:
[root@OVP-S-ODA21 ~]# 


After a couple of hours, revalidated that the recovery went fine , and the disk is successfully added back to raid.


[root@OVP-S-ODA21 ~]# cat /proc/mdstat
Personalities : [raid1]
md0 : active raid1 sdaw1[0] sdax1[1]
104320 blocks [2/2] [UU]

md1 : active raid1 sdaw2[0] sdax2[1]
585954688 blocks [2/2] [UU]

unused devices:
[root@OVP-S-ODA21 ~]#

I reran the ODA command "oakcli validate -c OSDiskStorage"  and it came out clean

[root@OVP-S-ODA21 ~]# oakcli validate -c OSDiskStorage
INFO: Checking Operating System Storage
SUCCESS: The OS disks have the boot stamp
RESULT: Raid device /dev/md0 found clean
RESULT: Raid device /dev/md1 found clean
RESULT: Physical Volume /dev/md1 in VolGroupSys has 370206.05M out of total 599986.80M
RESULT: Volumegroup VolGroupSys consist of 1 physical volumes,contains 4 logical volumes, has 0 volume snaps with total size of 599986.80M and free space of 370206.05M
RESULT: Logical Volume LogVolOpt in VolGroupSys Volume group is of size 60.00G
RESULT: Logical Volume LogVolRoot in VolGroupSys Volume group is of size 30.00G
RESULT: Logical Volume LogVolSwap in VolGroupSys Volume group is of size 24.00G
RESULT: Logical Volume LogVolU01 in VolGroupSys Volume group is of size 100.00G
RESULT: Device /dev/mapper/VolGroupSys-LogVolRoot is mounted on / of type ext3 in (rw)
RESULT: Device /dev/md0 is mounted on /boot of type ext3 in (rw)
RESULT: Device /dev/mapper/VolGroupSys-LogVolU01 is mounted on /u01 of type ext3 in (rw)
RESULT: Device /dev/mapper/VolGroupSys-LogVolOpt is mounted on /opt of type ext3 in (rw)
RESULT: / has 13226 MB free out of total 29758 MB
RESULT: /boot has 41 MB free out of total 99 MB
RESULT: /u01 has 42698 MB free out of total 99194 MB
RESULT: /opt has 44648 MB free out of total 59516 MB
[root@OVP-S-ODA21 ~]#


More details are available in the below metalink note:

ODA (Oracle Database Appliance) : How to replace FAILED SYSTEM BOOT DISK ( Doc ID 1382300.1 )

Thanks,
Suresh Nooka.