Monitoring the Cluster in Real Time with CHM (Cluster Health Monitor)

Why Cluster Health Monitor ?

Oracle Clusterware & Oracle database performance/node reboot due to lack of CPU/Memory resources cause Customers to ask how to monitor their OS. Some customers have rudimentary scripts that utilize vmstat, mpstat but they are often node collected at regular intervals. In some cases, we have seen customers collect this once per hour which does not make it very useful when the node is hung/evited via reboot in the middle of the hour. OSwatcher did a wonderful job of making the data collection uniform with uniform collection intervals. Cluster Health Monitor extends OSwatcher by ensuring it is always scheduled and collects data points while providing a client GUI to view current load.

With this new tool we need to buy a new display to monitor the activities of the cluster in real time.

Why do this? Because it’s cool you have full control of your environment in real time.

In this post I’ll show you how to install and configure the IPD Cluster Monitor.

  • 2 Servers Host (linux OEL 5) already with Oracle Clusterware/RAC installed.
  • 1 Desktop Client  (my laptop) to monitor the Cluster (using GUI Mode).

Let’s start.

What platforms can I run the Cluster Health Monitor?   Updated 19/07/2011

The Cluster Health Monitor is NOT available for Itanium platform (Linux, Windows, and HP Itanium) on all version.

11.2.0.1 and earlier: Linux and Windows only (download from OTN)
11.2.0.2: Solaris (Sparc and x86-64) and Linux
11.2.0.3 (to be released): AIX, Solaris (Sparc and x86-64), Linux , and Windows

The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.

Installation

For OTN version of Cluster Health Monitor, the complete steps to install the tool is explained in the readme file shipped with the product

For 11.2.0.2 or later version, the cluster health monitor is installed automatically when Grid Infrastructure (aka CRS) is installed.  The resource name for Cluster Health Monitor is ora.crf that is managed by ohasd.

Where can I get latest copy of Cluster Health Monitor?

The Cluster Health Monitor is integrated part of 11.2.0.2 Oracle Grid Infrastructure for Linux (not on Linux Itanium) and Solaris (Sparc 64 and x86-64 only), so installing 11.2.0.2 Oracle Grid Infrastructure on those platforms will automatically install the Cluster Health Monitor. AIX will have the Cluster Health Monitor starting from 11.2.0.3. The Cluster Health Monitor is also enabled for Windows (except Windows Itanium) in 11.2.0.3.

Prior to 11.2.0.2 on Linux and prior to 11.2.0.3 Windows excluding Itanium platform, the Cluster Health Monitor can be downloaded from OTN.

http://www.oracle.com/technetwork/database/clustering/downloads/index.html

Important:  GUI Mode (Available only with OTN version) to version 11.2.0.1. GUI Mode is  not avaliable to 11.2.0.2

Online mode can be used to detect problems live on the problem environment. The data can be viewed using Cluster Health Monitor utility. The GUI is not installed on the nodes of the server but can be installed on any other client.

If you are using Oracle Clusterware 11.2.0.2 until today (09/08/2011) the  GUI Mode is not avaliable.

If you are using 11.2.0.1 or previous you must install CHM on Servers and Client.

I’ll show you how to perform a full installation of CHM on servers and client.

Installing CHM on Servers Linux

On Linux, the tool requires Linux kernel version greater than or equal to 2.6.9 and architecture is x86. The install will work on x86_64 as well if the kernel is configured to run 32-bit binaries.

1. In Linux, create user ‘:’ (e.g. crfuser:oinstall) on all the nodes where tool is being installed. Make sure username’s home is the same on all nodes. Typically, on most systems, you will issue:

On all nodes:

useradd  -d /opt/crfuser -s /bin/sh -g oinstall crfuser
passwd crfuser
Changing password for user crfuser.
New UNIX password:
BAD PASSWORD: it is based on a dictionary word
Retype new UNIX password:
passwd: all authentication tokens updated successfully.

while logged in as root.

2. In Linux, setup passwordless ssh for the user created in step 1. Test that the ” can ssh to all nodes (including the local node) using hostname (without domain) without password and without any user intervention like acknowledging prompts.

You can use this post:
https://levipereira.wordpress.com/2010/12/07/configure-ssh-for-user-equivalence/
P.S When prompt : “Enter passphrase (empty for no passphrase):” type [enter] don’t create passphrase.

The CHM have your own database. So, you must specify the location of database of CHM to store OS metrics.
This location MUST be outside of the location where you unzipped the ZIP file because all the directories under that location which were created by unzip will be removed.
BDB files can be kept as it is for later usage. The location should be a path on a volume with at least 2GB per node space available and writable by privileged user only.
It cannot be on root filesystem in Linux. This location is required to be same on all hosts.
The path MUST not be on shared disk. If a shared BDB path is provided to multiple hosts, BDB corruption will happen.

I create a new disk of 5GB on each server and create an new VG on Linux to Store DB of CHM.

/dev/mapper/VG_ORACRFDB-LV_ORACRFDB
                      4.9G  334M  4.1G   7% /opt/oracrfdb/db

4. Login as ‘crfuser’ on Linux.
Unzip the crfpack.zip file.

cd /opt/crfuser/install/
$ ls
crfpack-linux.zip
$ unzip crfpack-linux.zip
Archive:  crfpack-linux.zip
   creating: admin/
   creating: admin/run/
....
   creating: log/
   creating: mesg/

5. Run crfinst.pl (see below for usage details) script on a node with desired node list, specified as comma separated list, for cluster-wide install. You will find this script in the install subdirectory.

   $ cd /opt/crfuser/install/install/
   $ ./crfinst.pl -i alemanha,holanda -b /opt/oracrfdb/db -m alemanha

Performing checks on nodes: "alemanha holanda" ...
/opt/crfuser/install doesn't exist on holanda, creating it...
Assigning holanda as replica

Generating cluster wide configuration file...

Creating a bundle for remote nodes...

Installing on nodes "holanda alemanha" ...

Configuration complete on nodes "holanda alemanha" ...

Please run "/opt/crfuser/install/install/crfinst.pl -f, optionally specifying BDB location with -b  as root on each node to complete the install process.

$ su -

6. Once the step 5 finishes, it will instruct you to run crfinst.pl script with -f and optionally -b on each node while logged in as root/admin to finalize the install on that node.

# /opt/crfuser/install/install/crfinst.pl -f
Removing contents of BDB Directory /opt/oracrfdb/db

Installation completed successfully at /usr/lib/oracrf...

# ssh holanda
root@holanda's password:
Last login: Tue Jul 19 15:41:02 2011 from alemanha.partnerit.com.br
[root@holanda ~]# /opt/crfuser/install/install/crfinst.pl -f
Removing contents of BDB Directory /opt/oracrfdb/db

Installation completed successfully at /usr/lib/oracrf...

7. Enable the tool on all nodes. Once the finalize operation is complete, run
the following while logged in as privileged user:

   # /etc/init.d/init.crfd enable, on Linux

to enable the tool.

DO NOT bypass any of above steps or try other ways to install because the daemons will not work correctly, and you will not be supported.

Finished install CHM on Servers.

Installing GUI Mode on Windows 7 x64

Standalone UI installation. Oracle recommends to not install the UI on the servers. You can use this option to install the UI-only client on a separate machine outside of cluster.

We must have perl installed on the Desktop Client to install the GUI Mode in Windows.

We not need to install perl in Windows. We can use the perl installed on Oracle Client. (if you installed Oracle Client on your destkop.. of course!!!).

As I have installed Oracle Client on my Destop, I will use the perl of installation of Oracle Client.

1. Download CHM from OTN for Windows.

2. Unzip the crfpack.zip file
3. Install CHM GUI Mode on C:\oracle\product\crf

# Using CMD prompt on Windows
c:\> cd C:\Users\Levi\Downloads\crfpack-winnt\install
\> c:\oracle\product\11.2.0\client_1\perl\bin\perl.exe crfinst.pl -g c:\oracle\product\crf
Installation completed successfully at c:\\oracle\\product\\crf ...

4. Set the environment variable PERL to Oracle Client on file “C:\oracle\product\crf\bin\crfgui.bat”
Change this:

...
set PERL=perl.exe
...

To:

set PERL=C:\oracle\product\11.2.0\client_1\perl\bin\perl.exe

Now you can use your CHM Gui Mode, connect on your cluster.

C:\oracle\product\crf\bin>crfgui.bat -m 192.168.217.10
Cluster Health Analyzer V1.10
        Look for Loggerd via node alemanha
 ...Connected to Loggerd on alemanha
Note: Node alemanha is now up
Cluster 'MyCluster', 2 nodes. Ext time=2011-07-19 23:12:58
Making Window: IPD Cluster Monitor V1.10 on mlevi, Logger V1.04.20091223, Cluster "MyCluster"  (View 0), Refresh rate: 1 sec

You can monitor your cluster.

Inside the GUI, you can use ‘node ‘ command to open a view which gives more detailed information about a node in a Node View. Alternatively, you can double click a node to get the Node View.
A Node View presents the detailed statistics on interesting processes, disks and NICs based on heuristics.

Click on Host Holanda…

Click on Host Alemanha..

One can drill down the details on partitions for the disks listed in Node View by double clicking the disk. The information is presented in the Disk View.
The Disk View provides a detailed list of partitions and corresponding stats for each one of them. It also clearly marks partitions which are found to belong to certain categories like Voting/OCR/SWAP/ASM disks.

Both Cluster View and Node View show text alerts at the bottom. These alerts are generated when the sampled value of a resource metric either goes above or falls below a threshold that could lead to potential problems on the node and
hence on the cluster.

You can check problem of  network private (interconnect) on S.O level for example:

LATENCY(ms) and other things are monitored.


The default refresh rate of the GUI is 1 second. To change the refresh rate, use -r with number of seconds (i.e. -r 5 for a 5 second refresh rate)

C:\oracle\product\crf\bin>crfgui.bat -r 5 -m 192.168.217.10

Historical Mode

Invoking the GUI with ‘-d’ option starts it in historical mode.

C:\oracle\product\crf\bin>crfgui.bat -d “hh:mm:ss” -m 192.168.217.10

where -d is used to specify hours (), minutes () and seconds () in the past from the current time to start the GUI from e.g. crfgui -d “05:10:00” starts the GUI and displays information from the database which is 5 hours and
10 minutes in the past from the current time.

Invoking the GUI with ‘-i’ option provides the same shell at the command prompt as is seen in the GUI windows with a prompt of ‘toprac>’. You can use ‘?’ at this prompt to get detailed information about available commands and options.

Enjoy

Advertisements

9 Comments on “Monitoring the Cluster in Real Time with CHM (Cluster Health Monitor)”

  1. orawiss says:

    Levi,
    CHM is installed by default with 11.2.0.2 version;
    What else needed to do to get it configured? nothing_?

    Thanks,
    Wissem

    Like

  2. […] Nothing … Just use. Cluster Health Monitor (CHM) FAQ [ID 1328466.1] See this example: Monitoring the Cluster in Real Time with CHM (Cluster Health Monitor) To monitor Database: PERFORMANCE TUNING USING ADVISORS AND MANAGEABILITY FEATURES: AWR, ASH, and […]

    Like

  3. Orhan Karaman says:

    Levi,
    I’ve successfully setup CHM GUI in my desktop. But when i try to connect to my Exadata using crfgui.bat -m it fails with below message. Do you have any comment?

    ?E: Cannot connect to Loggerd on
    Reason: Invalid node name
    Note: Node localnode is now up

    Like

  4. davyp74 says:

    Hi Levi,
    i’m trying to install CHM GUI Mode on my Win 7 (64bit), client is 11.2.0.1 (32 bit).
    I get an error when try to execute crfinst.pl

    C:\oracle\oracrf\install>C:\oracle\product\11.2.0\client\perl\bin\perl.exe crfinst.pl -g c:\oracle\product\crf
    The getpwuid function is unimplemented at crfinst.pl line 685.

    Can you help me?

    Like

  5. Mandy says:

    This is a very good tip especially to those fresh to the blogosphere.
    Brief but very accurate information… Thank you for sharing this one.
    A must read article!

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s