TIP: Some points about setting up RMAN on RAC Environment

In this post I will some tips for setting up RMAN in RAC environment.

I will not cover  topics about  RMAN that can be configured in standalone environment (e.g Incremental backup, use of FRA, etc.)

First question: Is there a difference of setting up RMAN between  standalone and RAC environments?

The answer is YES, not too much but some points must be observed.

RMAN Catalog

First of all, In my point of view use RMAN Catalog is mandatory. Because it’s a HA environment and  to restore a database without RMAN Catalog can take long time.

To protect and keep backup metadata for longer retention times than can be accommodated by the control file, you can create a recovery catalog. You should create the recovery catalog schema in a dedicated standalone database. Do not locate the recovery catalog with other production data. If you use Oracle Enterprise Manager, you can create the recovery catalog schema in the Oracle Enterprise Manager repository database.

About HA of RMAN Catalog?

I always recommend to place the Host that will hold RMAN Catalog on VirtualMachine, because  is a machine which require low resource and disk space and have low activity.

In case of the failure of Host RMAN Catalog is easy move that host to another Physical Host or Recover the whole virtual machine.

But if the option of use a VM is not avaliable. Use another cluster (e.g Test Cluster) env if avaliable.

The Database of RMAN catalog must be in ARCHIVELOG. Why?

It’s a prod env (is enough), will generate very small amount of  archivelogs, and in case of any corruption or user errors (e.g User generated new incarnation of Prod Database  during a Test Validation of Backup in Test env) can be necessary recovery point in time.

Due a small database of low activity I see some customers not giving importance to this database. It’s should not happens.

High availability of execution of backup using RMAN:

We have some challenges:

  1.  The backup must not affect the availability cluster, but the backup must be executed daily.
  2.  The backup cannot be dependent of nodes (i.e backup must be able to execute in all nodes independently if have some nodes active or not)
  3. Where store the scripts of backup? Where store the Logs?

I don’t recommend use any nodes of cluster to start/store scripts backups. Due if that node fail backup will not be executed.
Use the Host where is stored RMAN Catalog to store your backup scripts too and start these scripts from this host, the utility RMAN works as client only… the backup is always performed on server side.

Doing this you will centralize all scripts and logs of backup from your environment. That will ease the management of backup.

Configuring the RMAN Snapshot Control File Location in a RAC 11.2

RMAN creates a copy of the control file for read consistency, this is the snapshot controlfile. Due to the changes made to the controlfile backup mechanism in 11gR2 any instances in the cluster may write to the snapshot controlfile. Therefore, the snapshot controlfile file needs to be visible to all instances.
The same happens when a backup of the controlfile is created directly from sqlplus any instance in the cluster may write to the backup controfile file.
In 11gR2 onwards, the controlfile backup happens without holding the control file enqueue. For non-RAC database, this doesn’t change anything.
But, for RAC database, the snapshot controlfile location must be in a shared file system that will be accessible from all the nodes.
The snapshot controlfile MUST be accessible by all nodes of a RAC database.

See how do that:

https://forums.oracle.com/forums/thread.jspa?messageID=9997615

Since version 11.1 : Node Affinity Awareness of Fast Connections

In some cluster database configurations, some nodes of the cluster have faster access to certain data files than to other data files. RMAN automatically detects this, which is known as node affinity awareness. When deciding which channel to use to back up a particular data file, RMAN gives preference to the nodes with faster access to the data files that you want to back up. For example, if you have a three-node cluster, and if node 1 has faster read/write access to data files 7, 8, and 9 than the other nodes, then node 1 has greater node affinity to those files than nodes 2 and 3.

Channel Connections to Cluster Instances with RMAN

Channel connections to the instances are determined using the connect string defined by channel configurations. For example, in the following configuration, three channels are allocated using dbauser/pwd@service_name. If you configure the SQL Net service name with load balancing turned on, then the channels are allocated at a node as decided by the load balancing algorithm.

However, if the service name used in the connect string is not for load balancing, then you can control at which instance the channels are allocated using separate connect strings for each channel configuration. So,your backup scripts will fail if  that node/instance is down.

So, my recommendation in admin-managed database environment is create a set of nodes to perform the backup.

E.g : If  you have 3 nodes you should use one or two node to perform backup, while the other node is less loaded. If you are using Load Balance in your connection… the new connection will be directed to the least loaded node.

Autolocation for Backup and Restore Commands

RMAN automatically performs autolocation of all files that it must back up or restore. If you use the noncluster file system local archiving scheme, then a node can only read the archived redo logs that were generated by an instance on that node. RMAN never attempts to back up archived redo logs on a channel it cannot read.

During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node1, while log 1002 is backed up to the drive attached to node2. If you then allocate channels that connect to each node, then the channel connected to node1 can restore log 1001 (but not 1002), and the channel connected to node2 can restore log 1002 (but not 1001).

Configuring Channels to Use Automatic Load Balancing

To configure channels to use automatic load balancing, use the following syntax:

CONFIGURE DEVICE TYPE [disk | sbt] PARALLELISM number_of_channels;

Where number_of_channels is the number of channels that you want to use for the operation. After you complete this one-time configuration, you can issue BACKUP or RESTORE commands.

Setup  Parallelism on RMAN  is not enough to keep a balance, because if you start the backup from remote host using default SERVICE_NAME and if you are using parallelism the RMAN can start a session in each node and the backup be performed by all nodes at same time, this is not a problem, but can cause a performance   issue on your environment due high load.

Even at night the backup can cause performance problems due maintenance of the database (statistics gathering, verification of new SQL plans “automatic sql tuning set”, etc).

The bottlenecks are usually in or LAN or SAN, so use all nodes to perform backup can be  a waste. If the backup is run via LAN you can gain by using more than one node, but the server that is receiving the backup data will become a bottleneck.

I really don’t like to use more than 50% of nodes of RAC to execute backup due it can increase the workload in all nodes of clusters and this can be a problem to the application or database.

So, thinking to prevent it we can configure a Database Service to control where backup will be performed.

Creating a Database Service to perform Backup

Before start I should explain about limitation of database service.

Some points about Oracle Services.

When a user or application connects to a database, Oracle recommends that you use a service for the connection. Oracle Database automatically creates one database service (default service is always the database  name)  when the database is created.   For more flexibility in the management of the workload using the database, Oracle Database enables you to create multiple services and specify which database instances offer the services.

You can define services for both policy-managed and administrator-managed databases.

  • Policy-managed database: When you define services for a policy-managed database, you assign the service to a server pool where the database is running. You can define the service as either uniform (running on all instances in the server pool) or singleton (running on only one instance in the server pool).
  • Administrator-managed database: When you define a service for an administrator-managed database, you define which instances normally support that service. These are known as the PREFERRED instances. You can also define other instances to support a service if the preferred instance fails. These are known as AVAILABLE instances.
About Service Failover in Administrator-Managed Databases

When you specify a preferred instance for a service, the service runs on that instance during normal operation. Oracle Clusterware attempts to ensure that the service always runs on all the preferred instances that have been configured for a service. If the instance fails, then the service is relocated to an available instance. You can also manually relocate the service to an available instance.

About Service Failover in Policy-Managed Databases

When you specify that a service is UNIFORM, Oracle Clusterware attempts to ensure that the service always runs on all the available instances for the specified server pool. If the instance fails, then the service is no longer available on that instance. If the cardinality of the server pool increases and a instance is added to the database, then the service is started on the new instance. You cannot manually relocate the service to a specific instance.

When you specify that a service is SINGLETON, Oracle Clusterware attempts to ensure that the service always runs on only one of the available instances for the specified server pool. If the instance fails, then the service fails overs to a different instance in the server pool. You cannot specify which instance in the server pool the service should run on.

For SINGLETON services, if a service fails over to an new instance, then the service is not moved back to its original instance when that instance becomes available again.

Summarizing about use Services

If your database is Administrator-Managed  we can create a service and define  where backup will be executed, and how much nodes we can use with preferred and available nodes.

If your database is Policy-Managed we cannot define where backup will be executed, but we can configure a service SINGLETON, that will be sure that backup  will be executed in  only node, if that node fail the service will be moved to another available node, but we cannot choose in which node backup will be performed.

Note:

For connections to the target and auxiliary databases, the following rules apply:

Starting with 10gR2, these connections can use a connect string that does not bind to any particular instance. This means you can use load balancing.

Once a connection is established, however, it must be a dedicated connection  that cannot migrate to any other process or instance. This means that you still can’t use MTS or TAF.

Example creating service for Administrator-Managed Database

The backup will be executed on db11g2 and db11g3, but can be executed on db11g1 if db11g2 and db11g3 fail.

Set ORACLE_HOME  to same used by Database

$ srvctl add service -d db_unique_name -s service_name -r preferred_list [-a available_list] [-P TAF_policy]

$ srvctl add service -d db11g -s srv_rman -r db11g2,db11g3 -a db11g1 -P NONE -j LONG

$ srvctl start service -d db11g -s srv_rman
Example creating service for Policy-Managed Database

Using a service SINGLETON the backup will be executed on node which service was started/assigned. The service will be changed to another host only if that node fail.

Set ORACLE_HOME  to same used by Database

$ srvctl config database -d db11g |grep "Server pools"

Server pools: racdb11gsp

$ srvctl add service -d db11g -s srv_rman -g racdb11gsp -c SINGLETON

$ srvctl start service -d db11g -s srv_rman

If you have more than 2 nodes on cluster (with policy managed database) and you want use only 2 or more nodes to perform backup, you can choose the options below.

Configure a Service UNIFORM (the service will be available on all nodes)   you can control how much instance will be used to perform backup, but you cannot choose in which node backup will be performed. In fact the service does not control anything, you will set  PARALLELISM (RMAN) equal number of nodes wich you want use .

Ex: I have 4 Nodes but I want start backup in 2 nodes. I must choose parallelism 2.  Remember that Oracle can start 2 Channel on same host, this depend on workload of each node.

Using Policy Managed Database you should be aware that you do not care where (node) each instance is running, you will have a pool with many nodes and Oracle will manage all resources inside that pool. For this reason is not possible to control where you will place a heavier load.

This will only work if you are performing online backup or are using Parallel Backup.

Configuring RMAN to Automatically Backup the Control File and SPFILE

If you set CONFIGURE CONTROLFILE AUTOBACKUP to ON, then RMAN automatically creates a control file and an SPFILE backup after you run the BACKUP or COPYcommands. RMAN can also automatically restore an SPFILE, if this is required to start an instance to perform recovery, because the default location for the SPFILE must be available to all nodes in your Oracle RAC database.

These features are important in disaster recovery because RMAN can restore the control file even without a recovery catalog. RMAN can restore an autobackup of the control file even after the loss of both the recovery catalog and the current control file. You can change the default name that RMAN gives to this file with the CONFIGURE CONTROLFILE AUTOBACKUP FORMAT command. Note that if you specify an absolute path name in this command, then this path must exist identically on all nodes that participate in backups.

RMAN performs the control file autobackup on the first allocated channel. Therefore, when you allocate multiple channels with different parameters, especially when you allocate a channel with the CONNECT command, determine which channel will perform the control file autobackup. Always allocate the channel for this node first.

Enjoy..

Advertisements

7 Comments on “TIP: Some points about setting up RMAN on RAC Environment”

  1. Linda says:

    Hi, thanks.
    I use Rman agent for netbackup and the “hot backups” are performing well wednesday.
    I have two nodes and it is running from node #2 because node #1 is always super loaded and I put it on the cron of node #2. Archives backups also only from node #2.
    Please advice.

    Like

    • Levi Pereira says:

      Hi Linda,
      I updated this post with a better explanation about load balance.
      You have two nodes and knows that node 1 should not be utilized to perform backup due high load.
      So, if you have a Administrator-Managed Database create a service which have instance on node 2 as preffered and instance on node 1 as avaliable. RMAN will start the session on Node 2, and will start on node 1 only if node 2 is not avaliable.
      If you have a Policy-Managed Database I recommend you create a service UNIFORM (due high load) the Oracle decide where backup will be performed, if you use more that one channel the RMAN can start a backup in both nodes if the nodes are with same load.

      Regards,
      Levi Pereira

      Like

  2. Nikhil Mehta says:

    Hi Levi,

    I am very new to ORACLE RAC. My confusion is which user takes backup in RMAN. Grid user or oracle user. Sorry to ask such a basic question. Thanks in advance.

    Regards,
    Nikhil Mehta

    Like

    • Levi Pereira says:

      Any database user with grant on role “SYSDBA” of database or any OS user on group “DBA”(unix), “OSDBA”(windows) of OS.

      Regards,
      Levi Pereira

      Like

  3. Patrick David says:

    Hi,

    We have a two nodes RAC cluster. And we are upgrading it from Oracle 10g R2 to 11g R2. We have required the help of an external consultant to help us in this migration. My question is about the RMAN recovery catalog. The consultant asked us to create two recovery catalogs, on two different servers, and on each site, each of them backing up one node of the cluster. For instance imagine site A and B. We have one node of our RAC cluster running on site A and the other one on site B. I was thinking to create one RMAN catalog in a database hosted on a virtual server on site A, and it would backup one node of our RAC. But the consultant told us in case of site A becomes unavailable and if unfortunately we have to perform a restore/recovery of one instance in node B then as the recovery catalog is on site A it wouldn’t be available and so impossible to perform the restore/recovery. So to avoid this situation he advised us to create two RMAN recovery catalogs, one on each node, and each RMAN recovery catalog would backup each instances on each nodes of our RAC.

    What do you think of that? Is that pertinent or not at all?

    Thanks in advance for your answer.

    Regards

    Patrick

    Like

    • Levi Pereira says:

      Hi,

      You are think to create 2 different catalog in different servers to a database? I don’t get reason behind this.

      It will not work… how you will keep both catalog updated?

      Are you using RAC extended? If yes… Where is stored Quorum Voting Disk? you can place RMAN Catalog there?

      I was thinking to create one RMAN catalog in a database hosted on a virtual server on site A, and it would backup one node of our RAC. But the consultant told us in case of site A becomes unavailable and if unfortunately we have to perform a restore/recovery of one instance in node B then as the recovery catalog is on site A it wouldn’t be available and so impossible to perform the restore/recovery.

      My recommendation is:

      If your concern is high availability of RMAN catalog.

      You have two options:

      Scenario 1:
      Create a RMAN catalog on Site A and use a RMAN catalog Standby on site B ( don’t need be a Dataguard) only a database in recovery mode applying archivelogs (any Shell script can copy archivelog to site B and apply it).

      Scenario 2:
      If you want to use the same cluster where RAC is installed. Just create a other RAC database (recovery catalog) one instance in each site and problem is solved.

      If you lost Site A:
      Scenario 1:
      Put database in open Mode and Resync catalog with target database
      Resynchronizing the Recovery Catalog Before Control File Records Age Out

      http://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmcatdb.htm#BRADV89642

      Scenario 2:
      It’s transparent Catalog will keep running.

      If you lost Both Site A and B:

      You will need:
      Restore OS and Oracle Producuts (Clusterware,RAC,ASM,and so on) of Nodes of Clusterware and the host of RMAN catalog.
      Restore database of RMAN catalog and after that restore database.
      So, I believe the time to recovery Database RMAN Catalog is short compared to time to restore whole RAC env.
      You need think on actions that you will take at moment of failure.

      I don’t believe you need a complex env to guarantee HA of RMAN Catalog.

      A recovery catalog creates redundancy for the RMAN repository stored in the control file of each target database. The recovery catalog serves as a secondary metadata repository. If the target control file and all backups are lost, then the RMAN metadata still exists in the recovery catalog.

      So, If you lost RMAN Database Catalog you need bring up then before number of days configured on parameter CONTROL_FILE_RECORD_KEEP_TIME (default is 7 days) .

      So, make sure that CONTROL_FILE_RECORD_KEEP_TIME is longer than the interval between backups or resynchronizations. Otherwise, control file records could be reused before they are propagated to the recovery catalog. An extra week is a safe margin in most circumstances.

      If you are backing to TAPE… Just configure AUTOBACKUP CONTROLFILE to DISK and keep this backup file safe until you recovery your catalog, AUTOBACKUP will hold you controlfile(RMAN Catalog).

      Regards,
      Levi Pereira

      Like

  4. Hi,

    How do you create/define a service as a SINGLETON 10g?

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s