What is slow RMAN or Media Management Library?

Backup execution time  is slow.. where is the problem?

When we perform backup using third party software and backup of database is slow, there always is some uncertain who is causing the slowness.

The Database Administrator (DBA) says: is Media Management Library (aka MML) and Backup Operator says: is RMAN.

To end this conflict I ​​will show how to identify where is the problem.

I'll use the term RMAN (like RMAN spend time), but actually means Database spend time. Because RMAN is only a client. So typically the database is slow and not RMAN Client is slow.

Note: I will not diagnose what is causing it, I will only  help you  identify whether the problem is, in  MML or RMAN.

Media Management

The Oracle Media Management Layer (MML) API lets third-party vendors build a media manager, software that works with RMAN and the vendor's hardware to allow backups to sequential media devices such as tape drives. A media manager handles loading, unloading, and labeling of sequential media such as tapes.

RMAN Interaction with a Media Manager

RMAN does not issue specific commands to load, label, or unload tapes. When backing up, RMAN gives the media manager a stream of bytes and associates a unique name with this stream. When RMAN must restore the backup, it asks the media manager to retrieve the byte stream. All details of how and where that stream is stored are handled entirely by the media manager. For example, the media manager labels and keeps track of the tape and names of files on each tape, and automatically loads and unloads tapes, or signals an operator to do so.

RMAN provides a list of files requiring backup or restore to the media manager, which in turn makes all decisions regarding how and when to move the data.

Before RMAN making a call to any of functions in the media management API, the server create a EVENT WAIT. So, These with  EVENT WAIT  is possible to get the number of  wait (in seconds or number) that the server has been waiting for this call to return.

So, we can calculate it and find out how much time RMAN spend waiting MML (e.g writing  or querying a backup piece filename) processing request and return to RMAN.

Complete list  EVENT of MML

Oracle 11.2 or above:

SELECT NAME
FROM   V$EVENT_NAME
WHERE  NAME LIKE '%MML%';
NAME
----------------------------------------
Backup: MML initialization
Backup: MML v1 open backup piece
Backup: MML v1 read backup piece
Backup: MML v1 write backup piece
Backup: MML v1 close backup piece
Backup: MML v1 query backup piece
Backup: MML v1 delete backup piece
Backup: MML create a backup piece
Backup: MML commit backup piece
Backup: MML command to channel
Backup: MML shutdown
Backup: MML obtain textual error
Backup: MML query backup piece
Backup: MML extended initialization
Backup: MML read backup piece
Backup: MML delete backup piece
Backup: MML restore backup piece
Backup: MML write backup piece
Backup: MML proxy initialize backup
Backup: MML proxy cancel
Backup: MML proxy commit backup piece
Backup: MML proxy session end
Backup: MML datafile proxy backup?
Backup: MML datafile proxy restore?
Backup: MML proxy initialize restore
Backup: MML proxy start data movement
Backup: MML data movement done?
Backup: MML proxy prepare to start
Backup: MML obtain a direct buffer
Backup: MML release a direct buffer
Backup: MML get base address
Backup: MML query for direct buffers

Previous version of Oracle Database 11.2 the Event name MML does not exists because it’s was changed on version 11.2 from %STB% to %MML%.

So, If you are using Oracle 11.1 or previous you can query V$EVENT_NAME where NAME like '%sbt%'.

SELECT NAME
FROM   V$EVENT_NAME
WHERE  NAME LIKE '%sbt%';

Backup: sbtinit
Backup: ssbtopen
Backup: ssbtread
Backup: ssbtwrite
Backup: ssbtbackup
.
.
.

So, lets start...
Oracle store statistics (cumulative, since database was started) of these wait on v$system_event. I always use GV$ because is very common we admin RAC env on this days.

Before start backup I'll take a snapshot intial of gv$system_event...by creating a table RMAN_MML_EVENT_T1.

Click on icon "View Source" to see formated text.

CREATE TABLE  RMAN_MML_EVENT_T1 AS
SELECT inst_id,
  event,
  TOTAL_WAITS,
  TOTAL_TIMEOUTS,
  TIME_WAITED,
  AVERAGE_WAIT,
  TIME_WAITED_MICRO,
  sysdate as SNAPSHOT_TIME
FROM gv$system_event
WHERE event LIKE 'Backup%';

SQL> select * from RMAN_MML_EVENT_T1;

   INST_ID EVENT                                   TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO SNAPSHOT_TIME
---------- --------------------------------------- ----------- -------------- ----------- ------------ ----------------- -----------------
         1 Backup: MML initialization                      371              0       54365       146.54         543651136 08-08-12 17:11:05
         1 Backup: MML create a backup piece               450              0        4827        10.73          48270960 08-08-12 17:11:05
         1 Backup: MML commit backup piece                 450              0        7417        16.48          74172281 08-08-12 17:11:05
         1 Backup: MML shutdown                            371              0          47          .13            469267 08-08-12 17:11:05
         1 Backup: MML query backup piece                  894              0       11222        12.55         112222166 08-08-12 17:11:05
         1 Backup: MML extended initialization             371              0           0            0              3655 08-08-12 17:11:05
         1 Backup: MML delete backup piece                 444              0        5348        12.05          53480530 08-08-12 17:11:05
         1 Backup: MML write backup piece              1378078              0     3053683         2.22        3.0537E+10 08-08-12 17:11:05

8 rows selected.

I started backup using RMAN and MML (Tivoli Storage Manager). When backup finished you can query V$RMAN_BACKUP_JOB_DETAILS to get accurate time of backup

SELECT START_TIME,
  END_TIME,
  ROUND(INPUT_BYTES  /1024/1024/1024,2) IMPUT_GBYTES ,
  ROUND(OUTPUT_BYTES /1024/1024/1024,2) OUTPUT_GBYTES,
  INPUT_TYPE,
  ELAPSED_SECONDS
FROM V$RMAN_BACKUP_JOB_DETAILS
WHERE TRUNC(START_TIME) = TRUNC(SYSDATE)
AND INPUT_TYPE LIKE 'DB%';

START_TIME        END_TIME          IMPUT_GBYTES OUTPUT_GBYTES INPUT_TYPE    ELAPSED_SECONDS
----------------- ----------------- ------------ ------------- ------------- ---------------
08-08-12 17:23:44 08-08-12 17:26:38        12.85         10.06 DB FULL                   174

In my case the backup full take 174 seconds to backup read 12.85GB and Write on MML 10.06GB of data

So, after backup finish I take the 2nd snapshot by creating the table RMAN_SNAPSHOT_T2.


CREATE TABLE  RMAN_SNAPSHOT_T2 AS
SELECT inst_id,
  event,
  TOTAL_WAITS,
  TOTAL_TIMEOUTS,
  TIME_WAITED,
  AVERAGE_WAIT,
  TIME_WAITED_MICRO,
  sysdate as SNAPSHOT_TIME
FROM gv$system_event
WHERE event LIKE 'Backup%';

SQL> select * from RMAN_MML_EVENT_T2;

   INST_ID EVENT                                   TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED AVERAGE_WAIT TIME_WAITED_MICRO SNAPSHOT_TIME
---------- --------------------------------------- ----------- -------------- ----------- ------------ ----------------- -----------------
         1 Backup: MML initialization                      373              0       54665       146.56         546652333 08-08-12 17:27:45
         1 Backup: MML create a backup piece               454              0        4860        10.71          48604759 08-08-12 17:27:45
         1 Backup: MML commit backup piece                 454              0        7482        16.48          74820999 08-08-12 17:27:45
         1 Backup: MML shutdown                            373              0          47          .13            471590 08-08-12 17:27:45
         1 Backup: MML query backup piece                  900              0       11281        12.53         112808077 08-08-12 17:27:45
         1 Backup: MML extended initialization             373              0           0            0              3665 08-08-12 17:27:45
         1 Backup: MML delete backup piece                 446              0        5373        12.05          53727006 08-08-12 17:27:45
         1 Backup: MML write backup piece              1419274              0     3067298         2.16        3.0673E+10 08-08-12 17:27:45

8 rows selected.

Now I can calculate the values from RMAN_MML_EVENT_T2 minus RMAN_MML_EVENT_T1 to get the real time spend on MML.
Note:
EVENT: Name of the wait event
TOTAL_WAITS: Total number of waits for the event
TOTAL_TIMEOUTS: Total number of timeouts for the event
TIME_WAITED: Total amount of time waited for the event (in hundredths of a second)
AVERAGE_WAIT: Average amount of time waited for the event (in hundredths of a second)
TIME_WAITED_MICRO: Total amount of time waited for the event (in microseconds)

SELECT t1.inst_id,
  t1.event,
  t2.total_waits       - t1.total_waits total_waits,
  t2.total_timeouts    -t1.total_timeouts total_timeouts,
  t2.time_waited       - t1.time_waited time_waited,
  t2.time_waited_micro - t1.time_waited_micro time_waited_micro
FROM RMAN_MML_EVENT_T1 T1,
  RMAN_MML_EVENT_T2 T2
WHERE t1.inst_id = t2.inst_id
AND t1.event     = t2.event;

   INST_ID EVENT                                   TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED TIME_WAITED_MICRO
---------- --------------------------------------- ----------- -------------- ----------- -----------------
         1 Backup: MML initialization                        2              0         300           3001197
         1 Backup: MML create a backup piece                 4              0          33            333799
         1 Backup: MML commit backup piece                   4              0          65            648718
         1 Backup: MML shutdown                              2              0           0              2323
         1 Backup: MML query backup piece                    6              0          59            585911
         1 Backup: MML extended initialization               2              0           0                10
         1 Backup: MML delete backup piece                   2              0          25            246476
         1 Backup: MML write backup piece                41196              0       13615         136141912

8 rows selected.

As I can see above the MML spend more time writing backup piece.

So, I'll sum the time to get total time spend on MML.

SELECT SUM (total_waits) total_waits,
  SUM(total_timeouts) total_timeouts ,
  SUM (time_waited)/100 time_waited_in_second,
  SUM (time_waited_micro) time_waited_micro
FROM
  (SELECT t1.inst_id,
    t1.event,
    t2.total_waits       - t1.total_waits total_waits,
    t2.total_timeouts    -t1.total_timeouts total_timeouts,
    t2.time_waited       - t1.time_waited time_waited,
    t2.time_waited_micro - t1.time_waited_micro time_waited_micro
  FROM RMAN_MML_EVENT_T1 T1,
    RMAN_MML_EVENT_T2 T2
  WHERE t1.inst_id = t2.inst_id
  AND t1.event     = t2.event
  )

TOTAL_WAITS TOTAL_TIMEOUTS TIME_WAITED_IN_SECOND TIME_WAITED_MICRO
----------- -------------- --------------------- -----------------
      41218              0                140.97         140960346

Calculating time total of backup window, time spend on MML and time spend of RMAN.

Note: TIME_SPEND_BY_RMAN = (ELAPSED_SECOND_BACKUP-TIME_SPEND_BY_MML_SECOND)

ELAPSED_SECONDS_BACKUP          TIME_SPEND_BY_MML_SECOND       TIME_SPEND_BY_RMAN_SECOND
------------------------------ ------------------------------ -------------------
174                             140.97                         33.03

Summarizing:
Total time of backup : 174
Time spend by MML: 141
Time spend by RMAN : 33

If this backup is slow is because MML take (141*100/174) 81% of time spend of backup window.

Additional info:
As my backup was done over Lan:
(10.06GB * 1024 = 10301MB)
10301MB / 144 = 71Mbytes/second

As I'm using network interface of 1 Gbit I can consider a normal throughput.

Also you can monitoring in real time where is wait.

Just execute this script above:

Note : if you are using previous version of 11.2 change %MML% to %sbt%.

vi monitoring_mml.sh
sqlplus -s sys/<password>@<db_name> as sysdba<<EOF
set echo off
COLUMN EVENT FORMAT a17
COLUMN SECONDS_IN_WAIT FORMAT 999
COLUMN STATE FORMAT a15
COLUMN CLIENT_INFO FORMAT a30
set linesize 200

select to_char(sysdate,'dd-mm-yyyy hh24:mi:ss') actual_date from dual;

SELECT p.SPID, sw.EVENT, sw.SECONDS_IN_WAIT AS SEC_WAIT, 
       sw.STATE, CLIENT_INFO
FROM   gV\$SESSION_WAIT sw, gv\$SESSION s, gV\$PROCESS p
WHERE  sw.EVENT LIKE '%MML%'
AND    s.SID=sw.SID
AND    s.PADDR=p.ADDR;
EOF
exit

Using shell execute the command above, and you will see in real time the wait on MML.

while true
do
sh monitoring_mml.sh
sleep 1
done

.

Find us on Google+


Local/SCAN Listener – Enhancing Security (Oracle Security Alert)

Recently we discovered  a possible vulnerability on SCAN Listener,  so we opened   SR  and Oracle give us a solution.

I recommend all apply this security. “As far as I know only the availability can be affected, none concern about data integrity” .

Thread: How prevent REMOTE LISTENER register on SCAN LISTENER
https://forums.oracle.com/forums/thread.jspa?threadID=2369472

Oracle Security Alert for CVE-2012-1675

This security alert addresses the security issue CVE-2012-1675, a vulnerability in the TNS listener which has been recently disclosed as “TNS Listener Poison Attack” affecting the Oracle Database Server. This vulnerability may be remotely exploitable without authentication, i.e. it may be exploited over a network without the need for a username and password. A remote user can exploit this vulnerability to impact the confidentiality, integrity and availability of systems that do not have recommended solution applied.

Affected Products and Versions
Oracle Database 11g Release 2, versions 11.2.0.2, 11.2.0.3
Oracle Database 11g Release 1, version 11.1.0.7
Oracle Database 10g Release 2, versions 10.2.0.3, 10.2.0.4, 10.2.0.5

Solution

Recommendations for protecting against this vulnerability can be found at:

Please note that Oracle has added Oracle Advanced Security SSL/TLS to the Oracle Database Standard Edition license when used with the Real Application Clusters and Oracle has added Oracle Advanced Security SSL/TLS to the Enterprise Edition Real Application Clusters (Oracle RAC) and RAC One Node options so that the directions provided in the Support Notes referenced above can be applied by all Oracle customers without additional cost.

Note: Please refer to the Oracle licensing documentation available on Oracle.com regarding licensing changes that allow Oracle Advanced Security SSL/TLS to be used with Oracle SE Oracle Real Application Clusters and Oracle Enterprise Edition Real Application Customers (Oracle RAC) and Oracle RAC OneNode Options.

Due to the threat posed by a successful attack, Oracle strongly recommends that customers apply this Security Alert solution as soon as possible.

http://www.oracle.com/technetwork/topics/security/alert-cve-2012-1675-1608180.html

.


RACcheck – RAC Configuration Audit Tool

RACcheck is a tool developed by the RAC Assurance development team for use by customers to automate the assessment of RAC systems for known configuration problems and best practices.

RACcheck is a RAC Configuration Audit tool  designed to audit various important configuration settings within a Real Application Clusters (RAC), Oracle Clusterware (CRS), Automatic Storage Management (ASM) and Grid Infrastructure environment. The tool audits configuration settings within the following categories:

  1. OS kernel parameters
  2. OS packages
  3. Many other OS configuration settings important to RAC.
  4. CRS/Grid Infrastructure
  5. RDBMS
  6. ASM
  7. Database parameters
  8. Many other database configuration settings important to RAC.

Features
1. RACcheck is NON-INTRUSIVE and does not change anything in the environment, except as detailed below:

– SSH user equivalence for the RDBMS software owner is assumed to be configured among all the database servers being audited in order for it to execute commands on the remote database server nodes. If the tool determines that this user equivalence is not established it will offer to set it up either temporarily or permanently at the option of the user. If the user chooses to set up SSH user equivalence temporarily then the script will do so for the duration of the execution of the tool but then it will return the system to the state in which it found SSH user equivalence originally. For those wishing to configure SSH user equivalence outside the tool (if not already configured), consult My Oracle Support Note: 372795.1.

– RACcheck creates a number of small output files into which the data necessary to perform the assessment is collected

– RACcheck creates and executes some scripts dynamically in order to accomplish some of the data collection

– RACcheck cleans up after itself any temporary files that are created and not needed as part of the collection.

2. RACcheck interrogates the system to determine the status of the Oracle stack components (ie., Grid Infrastructure, RDBMS, RAC, etc) and whether they are installed and/or running. Depending upon the status of each component, the tool runs the appropriate collections and audit checks. If due to local environmental configuration the tool is unable to properly determine the needed environmental information please refer to the TROUBLESHOOTING section.

3. Watchdog daemon – RACcheck automatically runs a daemon in the background to monitor command execution progress. If, for any reason, one of the commands run by the tool should hang or take longer than anticipated, the monitor daemon kills the hung command after a configurable timeout so that main tool execution can progress. If that happens then the collection or command that was hung is skipped and a notation is made in the log. If the default timeout is too short please see the TROUBLESHOOTING section regarding adjustment of the RAT_TIMEOUT, and RAT_ROOT_TIMEOUT parameters.

4. If RACcheck’s driver files are older than 90 days, the driver files are considered to be “stale” and the script will notify the user of a stale driver file. A new version of the tool and its driver files (kit) must be obtained from MOS Note 1268927.1.

5. When the RACcheck completes the collection and analysis it produces two reports, summary and detailed. A output .zip file is also produced by RACcheck. This output .zip file can be provided to Oracle Support for further analysis if an SR needs to be logged. The detailed report will contain Benefit/Impact, Risk and Action/Repair information. In many cases it will also reference publicly available documents with additional information about the problem and how to resolve it.

6. The results of the audit checks can be optionally uploaded into database tables for reporting purposes. See below for more details on this subject.

7. In some cases customers may want to stage RACcheck on a shared filesystem so that it can be accessed from various systems but be maintained in a single location rather than being copied to each cluster on which it may be used. The default behavior of the tool is to create a subdirectory and its output files in the location where the tool is staged. If that staging area is a read only filesystem or if the user for any reason would like the output to be created elsewhere then there is an environment variable which can be used for that purpose. The RAT_OUTPUT parameter can be set to any valid writable location and the output will be created there.

Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.2.0.2 – Release: 10.2 to 11.2

  • Linux x86
  • IBM AIX on POWER Systems (64-bit)
  • Oracle Solaris on SPARC (64-bit)
  • Linux x86-64

To download RAC Check tool use this note on MoS:
RACcheck – RAC Configuration Audit Tool [ID 1268927.1]

Example of report output:

raccheck Report

Enjoy


IBM – Live Partition Mobility for Oracle RAC

This paper documents the concept and recommendation of using Live Partition Mobility (LPM) in an Oracle RAC environment.
It describes the configuration of the IBM Power systems infrastructure and Oracle RAC to perform the Live Partition Mobility of an Oracle RAC node.
The paper describes two scenarios, the first is given as example for test purpose in order to setup the configuration and understand the interaction of all components.
The second scenario is the officially supported LPM process for an Oracle RAC environment.
This paper is illustrated with a real example of the LPM of a RAC node from a Porwer6 processor based source server to a Power7 target server.

Introduction

Live Partition Mobility (LPM) is a feature of PowerVM Enterprise Edition which allows for moving an LPAR from an IBM Power system to another physical IBM Power system.
LPM increases availability of the application and improves workload management. LPM improves flexibility of the entire infrastructure as you are able to continuous run the applications during planned maintenance of your server, by migrating without disruption the logical partitions to another server.
Also you are able to easily manage the applications workload by migrating LPARs and get free CPU and Memory resources for your most important workload production running on the source server.
Both source and target systems can be POWER6 or POWER7 processor based. The I/O adapters configured in the AIX LPAR must be defined as virtual devices and requires that the network and the SAN disk be accessed though a Virtual I/O Server Partition. LPM feature is not compatible with physical resources, so they have to be removed at least for the LPM operation, and if any, reattached to the LPAR after the migration.
Remember that you can keep physical adapters configured in the LPAR and also configure virtual adapters as a backup path for the network access and heterogeneous multi-path for the SAN disk access.
Also all the disks have to be defined from the SAN and shared to VIO servers of both source and target servers.
LPM process consist of several steps, such as reading the LPAR configuration on the source server to create it on the target server, creating the virtual devices at the target VIO Server, copying the physical memory blocks from the source to the target server, activating the LPAR on the target server and starting the AIX processes on the target server partition.
Once the processes are running on the target server partition, the virtual devices corresponding to the LPAR are removed from the source VIO Server and the LPAR is deleted on the source server.
The major migration step is the copy of the logical memory blocks (LMBs) though the network while AIX processes are running. At the end of the memory copy, almost 95%, a checkpoint operation is done and the processes are run from the target server. Depending on the amount of memory assigned to the LPAR and on the memory activity of the running processes, this checkpoint operation may freeze the processes for a short duration.

From an Oracle RAC environment point of view, this step is the most critical and requires some certification tests. That is the reason why as the time of writing, LPM is supported for Oracle single instance database and Oracle RAC is not officially supported.
In the following you will see a functional example of LPM operation in Oracle RAC environment, so you can run it as a test without support from both IBM and Oracle.
The supported process for LPM operation in an Oracle RAC environment consists in stopping Oracle for the migration step while keeping AIX alive.

Live Partition Mobility for Oracle RAC

LPM_a_RAC_node_July 27 2011
Enjoy…


Monitoring the Cluster in Real Time with CHM (Cluster Health Monitor)

Why Cluster Health Monitor ?

Oracle Clusterware & Oracle database performance/node reboot due to lack of CPU/Memory resources cause Customers to ask how to monitor their OS. Some customers have rudimentary scripts that utilize vmstat, mpstat but they are often node collected at regular intervals. In some cases, we have seen customers collect this once per hour which does not make it very useful when the node is hung/evited via reboot in the middle of the hour. OSwatcher did a wonderful job of making the data collection uniform with uniform collection intervals. Cluster Health Monitor extends OSwatcher by ensuring it is always scheduled and collects data points while providing a client GUI to view current load.

With this new tool we need to buy a new display to monitor the activities of the cluster in real time.

Why do this? Because it’s cool you have full control of your environment in real time.

In this post I’ll show you how to install and configure the IPD Cluster Monitor.

  • 2 Servers Host (linux OEL 5) already with Oracle Clusterware/RAC installed.
  • 1 Desktop Client  (my laptop) to monitor the Cluster (using GUI Mode).

Let’s start.

What platforms can I run the Cluster Health Monitor?   Updated 19/07/2011

The Cluster Health Monitor is NOT available for Itanium platform (Linux, Windows, and HP Itanium) on all version.

11.2.0.1 and earlier: Linux and Windows only (download from OTN)
11.2.0.2: Solaris (Sparc and x86-64) and Linux
11.2.0.3 (to be released): AIX, Solaris (Sparc and x86-64), Linux , and Windows

The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.

Installation

For OTN version of Cluster Health Monitor, the complete steps to install the tool is explained in the readme file shipped with the product

For 11.2.0.2 or later version, the cluster health monitor is installed automatically when Grid Infrastructure (aka CRS) is installed.  The resource name for Cluster Health Monitor is ora.crf that is managed by ohasd.

Where can I get latest copy of Cluster Health Monitor?

The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.

Prior to 11.2.0.2 on Linux and prior to 11.2.0.3 Windows excluding Itanium platform, the Cluster Health Monitor can be downloaded from OTN.

http://www.oracle.com/technetwork/database/clustering/downloads/index.html

Important:  GUI Mode (Available only with OTN version) to version 11.2.0.1. GUI Mode is  not avaliable to 11.2.0.2

Online mode can be used to detect problems live on the problem environment. The data can be viewed using Cluster Health Monitor utility. The GUI is not installed on the nodes of the server but can be installed on any other client.

If you are using Oracle Clusterware 11.2.0.2 until today (09/08/2011) the  GUI Mode is not avaliable.

If you are using 11.2.0.1 or previous you must install CHM on Servers and Client.

I’ll show you how to perform a full installation of CHM on servers and client.

Installing CHM on Servers Linux

On Linux, the tool requires Linux kernel version greater than or equal to 2.6.9 and architecture is x86. The install will work on x86_64 as well if the kernel is configured to run 32-bit binaries.

1. In Linux, create user ‘:’ (e.g. crfuser:oinstall) on all the nodes where tool is being installed. Make sure username’s home is the same on all nodes. Typically, on most systems, you will issue:

On all nodes:

useradd  -d /opt/crfuser -s /bin/sh -g oinstall crfuser
passwd crfuser
Changing password for user crfuser.
New UNIX password:
BAD PASSWORD: it is based on a dictionary word
Retype new UNIX password:
passwd: all authentication tokens updated successfully.

while logged in as root.

2. In Linux, setup passwordless ssh for the user created in step 1. Test that the ” can ssh to all nodes (including the local node) using hostname (without domain) without password and without any user intervention like acknowledging prompts.

You can use this post:
https://levipereira.wordpress.com/2010/12/07/configure-ssh-for-user-equivalence/
P.S When prompt : “Enter passphrase (empty for no passphrase):” type [enter] don’t create passphrase.

The CHM have your own database. So, you must specify the location of database of CHM to store OS metrics.
This location MUST be outside of the location where you unzipped the ZIP file because all the directories under that location which were created by unzip will be removed.
BDB files can be kept as it is for later usage. The location should be a path on a volume with at least 2GB per node space available and writable by privileged user only.
It cannot be on root filesystem in Linux. This location is required to be same on all hosts.
The path MUST not be on shared disk. If a shared BDB path is provided to multiple hosts, BDB corruption will happen.

I create a new disk of 5GB on each server and create an new VG on Linux to Store DB of CHM.

/dev/mapper/VG_ORACRFDB-LV_ORACRFDB
                      4.9G  334M  4.1G   7% /opt/oracrfdb/db

4. Login as ‘crfuser’ on Linux.
Unzip the crfpack.zip file.

cd /opt/crfuser/install/
$ ls
crfpack-linux.zip
$ unzip crfpack-linux.zip
Archive:  crfpack-linux.zip
   creating: admin/
   creating: admin/run/
....
   creating: log/
   creating: mesg/

5. Run crfinst.pl (see below for usage details) script on a node with desired node list, specified as comma separated list, for cluster-wide install. You will find this script in the install subdirectory.

   $ cd /opt/crfuser/install/install/
   $ ./crfinst.pl -i alemanha,holanda -b /opt/oracrfdb/db -m alemanha

Performing checks on nodes: "alemanha holanda" ...
/opt/crfuser/install doesn't exist on holanda, creating it...
Assigning holanda as replica

Generating cluster wide configuration file...

Creating a bundle for remote nodes...

Installing on nodes "holanda alemanha" ...

Configuration complete on nodes "holanda alemanha" ...

Please run "/opt/crfuser/install/install/crfinst.pl -f, optionally specifying BDB location with -b  as root on each node to complete the install process.

$ su -

6. Once the step 5 finishes, it will instruct you to run crfinst.pl script with -f and optionally -b on each node while logged in as root/admin to finalize the install on that node.

# /opt/crfuser/install/install/crfinst.pl -f
Removing contents of BDB Directory /opt/oracrfdb/db

Installation completed successfully at /usr/lib/oracrf...

# ssh holanda
root@holanda's password:
Last login: Tue Jul 19 15:41:02 2011 from alemanha.partnerit.com.br
[root@holanda ~]# /opt/crfuser/install/install/crfinst.pl -f
Removing contents of BDB Directory /opt/oracrfdb/db

Installation completed successfully at /usr/lib/oracrf...

7. Enable the tool on all nodes. Once the finalize operation is complete, run
the following while logged in as privileged user:

   # /etc/init.d/init.crfd enable, on Linux

to enable the tool.

DO NOT bypass any of above steps or try other ways to install because the daemons will not work correctly, and you will not be supported.

Finished install CHM on Servers.

Installing GUI Mode on Windows 7 x64

Standalone UI installation. Oracle recommends to not install the UI on the servers. You can use this option to install the UI-only client on a separate machine outside of cluster.

We must have perl installed on the Desktop Client to install the GUI Mode in Windows.

We not need to install perl in Windows. We can use the perl installed on Oracle Client. (if you installed Oracle Client on your destkop.. of course!!!).

As I have installed Oracle Client on my Destop, I will use the perl of installation of Oracle Client.

1. Download CHM from OTN for Windows.

2. Unzip the crfpack.zip file
3. Install CHM GUI Mode on C:\oracle\product\crf

# Using CMD prompt on Windows
c:\> cd C:\Users\Levi\Downloads\crfpack-winnt\install
\> c:\oracle\product\11.2.0\client_1\perl\bin\perl.exe crfinst.pl -g c:\oracle\product\crf
Installation completed successfully at c:\\oracle\\product\\crf ...

4. Set the environment variable PERL to Oracle Client on file “C:\oracle\product\crf\bin\crfgui.bat”
Change this:

...
set PERL=perl.exe
...

To:

set PERL=C:\oracle\product\11.2.0\client_1\perl\bin\perl.exe

Now you can use your CHM Gui Mode, connect on your cluster.

C:\oracle\product\crf\bin>crfgui.bat -m 192.168.217.10
Cluster Health Analyzer V1.10
        Look for Loggerd via node alemanha
 ...Connected to Loggerd on alemanha
Note: Node alemanha is now up
Cluster 'MyCluster', 2 nodes. Ext time=2011-07-19 23:12:58
Making Window: IPD Cluster Monitor V1.10 on mlevi, Logger V1.04.20091223, Cluster "MyCluster"  (View 0), Refresh rate: 1 sec

You can monitor your cluster.

Inside the GUI, you can use ‘node ‘ command to open a view which gives more detailed information about a node in a Node View. Alternatively, you can double click a node to get the Node View.
A Node View presents the detailed statistics on interesting processes, disks and NICs based on heuristics.

Click on Host Holanda…

Click on Host Alemanha..

One can drill down the details on partitions for the disks listed in Node View by double clicking the disk. The information is presented in the Disk View.
The Disk View provides a detailed list of partitions and corresponding stats for each one of them. It also clearly marks partitions which are found to belong to certain categories like Voting/OCR/SWAP/ASM disks.

Both Cluster View and Node View show text alerts at the bottom. These alerts are generated when the sampled value of a resource metric either goes above or falls below a threshold that could lead to potential problems on the node and
hence on the cluster.

You can check problem of  network private (interconnect) on S.O level for example:

LATENCY(ms) and other things are monitored.


The default refresh rate of the GUI is 1 second. To change the refresh rate, use -r with number of seconds (i.e. -r 5 for a 5 second refresh rate)

C:\oracle\product\crf\bin>crfgui.bat -r 5 -m 192.168.217.10

Historical Mode

Invoking the GUI with ‘-d’ option starts it in historical mode.

C:\oracle\product\crf\bin>crfgui.bat -d “hh:mm:ss” -m 192.168.217.10

where -d is used to specify hours (), minutes () and seconds () in the past from the current time to start the GUI from e.g. crfgui -d “05:10:00” starts the GUI and displays information from the database which is 5 hours and
10 minutes in the past from the current time.

Invoking the GUI with ‘-i’ option provides the same shell at the command prompt as is seen in the GUI windows with a prompt of ‘toprac>’. You can use ‘?’ at this prompt to get detailed information about available commands and options.

Enjoy


Load Balancing and Failover with Oracle 10gR2 RAC

Oracle Net is a software component that resides on the client and on the Oracle database server. It establishes and maintains the connection between the client application and the server, and exchanges messages between them using industry standard protocols. For the client application and a database to communicate, the client application must specify location details for the database it wants to connect to, and the database must provide some sort of identification or address.On the database server, the Oracle Net Listener, commonly known as the Listener, is a process that listens for client connection requests. The configuration file for the Listener is the listener.ora.The client uses a connect descriptor to specify the database to which to connect. This connect descriptor contains a protocol and a database service name. When a client requests a connection, the Listener on the server receives the request and forwards the connection to the Oracle database. You can define your connect descriptors in the tnsnames.ora file on the client machine, or include them as part of the connection request.When the client connects to the cluster database using a service, you can use the Oracle Net connection load balancing feature to spread user connections across all of the instances that are supporting that service. There are two types of load balancing that you can implement: client-side and server-side load balancing.In an Oracle RAC database, client connections should use both types of connection load balancing. When you create an Oracle RAC database using Oracle Database Configuration Assistant (DBCA), DBCA configures and enables server-side load balancing by default.

load-bal-and-failover-10gr2

Enjoy…


Oracle Real Application Clusters on IBM AIX Best practices in memory tuning and configuring for system stability

Introduction

Customers who experience Oracle Real Application Clusters (RAC) node evictions due to excessive AIX kernel paging should carefully review and implement these recommended best practices. Testing and experience have found that memory over commitments may cause scheduling delays for Oracle’s ‘oprocd’ process resulting in node evictions.
Implementing all of these recommendations will reduce scheduling delays and corresponding oprocd initiated evictions.

Problem validation

This paper addresses the best practices for environments experience node evictions caused bycritical processes not being able to get scheduled in a timely fashion on AIX due to memory overcommitment. To validate that node evections are caused by this situation, the followingvalidation steps should be taken.

Click link below…

rac_aix_memory_tuning October 17 2011

Enjoy