Long-Term RMAN Backups (Online Backup Without Keep All Logs)

102091053-short-term-long-term.1910x1000

Short-term backups takes periodic images of active data in order to provide a method of recovering data that have been deleted or destroyed.
These backups are retained only for a few days or weeks. The primary purpose of short-term backup is to recover recent data or if a media failure or disaster occurs, then you can use it to restore your backups and bring your data available.

Most of Backup System have two different setup for Short-Term and Long-Term backups.

Short-Term Backups are classic backup which have short retention and are stored on  pool of data active, often these data are always available on Tape Library or Disk.

Long-Term Backups are archival backups which have long retention and often are stored only on tapes, usually these tapes are off-site or in safes.

Will try explain in few lines the purpose of Long-Term backups.

You can back up the database on the first day of every year to satisfy a regulatory requirement and store the media offsite.
Years after you make the archival backup, you could restore and recover it to query the data as it appeared at the time of the backup. We have a Data Presevation.

Data preservation is related to data protection, but it serves a different purpose.
For example, you may need to preserve a copy of a database as it existed at the end of a business quarter.
This backup is not part of the disaster recovery strategy. The media to which these backups are written are often unavailable after the backup is complete.
You may send the tape into fire storage or ship a portable hard drive to a testing facility.

RMAN provides a convenient way to create a backup and exempt it from your backup retention policy. This type of backup is known as an archival backup.

Why this introduction about short-term or long-term backups?

I manage some environments and I see that some DBA fail to understand this difference and end up making incorrect configurations for archival backup and classic backups.

IMHO, the main villain is RDBMS 10.2 and earlier version, because there is no long-term backup concept when take online backup.

In 10.2  to take a archival backup we must shutdown database to not keep all archivelogs (many envs this is not possible), if we take online backup with long retention (KEEP LOGS) you need keep all archivelogs until first backup be obsolete. On this version there is no difference between a classic backup or archival backups. (in fact have diference, but it’s a slight difference)

So, because that many DBA still stuck on this concept of this version.

In 11.1 this changed, you can create a consistent archival backup without need keep all archivelogs.

When to use this rman command ( “backup database keep logs forever/ until sysdate + XXX”) on 11.1 version or above?
It should be only used if the DBA have a requirement to keep ALL changes of database forever or until some date and need recover ANY point-in-time for a database life. If you don’t need it, this setup is insane and increase cost and management of its environment.

Let’s start with some tests. I’m using rdbms 11.2.0.4.

How to take a Long-Term Backup in 11.1 or higher version.

Best practices:  3 important things.

  1. Use RESTORE POINT to easy the restore.RESTORE POINT clause that creates a normal restore point, which is a label for an SCN to which the backup must be recovered to be made consistent. The SCN is captured just after the datafile backups complete. RMAN resynchronizes restore points with the recovery catalog and maintains the restore points as long as the backup exists.
    More detail on documentation below.
    https://docs.oracle.com/cd/E11882_01/backup.112/e10642/rcmbckba.htm#BRADV89551
  2. Use non-default TAG this will easy  manage backup later.
  3. Use CHECK LOGICAL Option to avoid backup logical corrupted blocks.By default, RMAN does not check for logical corruption. If you specify CHECK LOGICAL on the BACKUP command, however, then RMAN tests data and index blocks for logical corruption, such as corruption of a row piece or index entry, and log them in the alert log located in the Automatic Diagnostic Repository (ADR). If you use RMAN with CHECK LOGICAL option when backing up or restoring files, then it detects all types of block corruption that are possible to detect.What a disappointment you try to restore a backup taken on last year and find out which there are  many logical blocks corrupted.

P.S: I always use CHECK LOGICAL for my all backups (short or long-term backups)

 

Performing a Long-Term Backup

Recovery Manager: Release 11.2.0.4.0 - Production on Thu Feb 18 15:56:59 2016

Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.

connected to target database: TESTDB (DBID=2633899005)
 connected to recovery catalog database

RMAN> run {
 2> ALLOCATE CHANNEL tape1 
    DEVICE TYPE 'SBT_TAPE' PARMS 
    'ENV=(TDPO_OPTFILE=/usr/tivoli/tsm/client/oracle/bin64/tdpo.opt)';
 3> BACKUP CHECK LOGICAL 
    FILESPERSET 10 
    FULL DATABASE 
    TAG 'MONTHLY_FULL_20160218' 
    FORMAT 'RMAN_MONTHLY_FULL_%d_%s_%p_%T_%c' 
    KEEP UNTIL TIME 'SYSDATE + 400' 
    RESTORE POINT BKP_MONTHLY_20160218_1556;
 4> RELEASE CHANNEL tape1;
 5> }
 6> exit
 allocated channel: tape1
 channel tape1: SID=241 device type=SBT_TAPE
 channel tape1: Data Protection for Oracle: version 5.5.2.0

Starting backup at Feb 18 2016 15:57:07
 current log archived

backup will be obsolete on date Mar 24 2017 15:57:15
 archived logs required to recover from this backup will be backed up
 channel tape1: starting compressed full datafile backup set
 channel tape1: specifying datafile(s) in backup set
 input datafile file number=00001 name=+TSM/testdb/datafile/system.501.901826545
 input datafile file number=00006 name=+TSM/testdb/datafile/users.487.901826545
 input datafile file number=00002 name=+TSM/testdb/datafile/sysaux.542.901826547
 input datafile file number=00003 name=+TSM/testdb/datafile/undotbs1.442.901826547
 input datafile file number=00004 name=+TSM/testdb/datafile/users.314.901826547
 input datafile file number=00005 name=+TSM/testdb/datafile/users.510.901826547
 channel tape1: starting piece 1 at Feb 18 2016 15:57:15
 channel tape1: finished piece 1 at Feb 18 2016 15:58:40
 piece handle=RMAN_MONTHLY_FULL_TESTDB_328_1_20160218_1 tag=MONTHLY_FULL_20160218 
 comment=API Version 2.0,MMS Version 5.5.2.0
 channel tape1: backup set complete, elapsed time: 00:01:25

backup will be obsolete on date Mar 24 2017 15:58:40
 archived logs required to recover from this backup will be backed up
 channel tape1: starting compressed full datafile backup set
 channel tape1: specifying datafile(s) in backup set
 including current SPFILE in backup set
 channel tape1: starting piece 1 at Feb 18 2016 15:58:41
 channel tape1: finished piece 1 at Feb 18 2016 15:58:42
 piece handle=RMAN_MONTHLY_FULL_TESTDB_329_1_20160218_1 tag=MONTHLY_FULL_20160218 
 comment=API Version 2.0,MMS Version 5.5.2.0
 channel tape1: backup set complete, elapsed time: 00:00:01
 current log archived
 backup will be obsolete on date Mar 24 2017 15:58:46
 archived logs required to recover from this backup will be backed up
 channel tape1: starting compressed archived log backup set
 channel tape1: specifying archived log(s) in backup set
 input archived log thread=1 sequence=91 RECID=830 STAMP=904147125
 channel tape1: starting piece 1 at Feb 18 2016 15:58:46
 channel tape1: finished piece 1 at Feb 18 2016 15:58:47
 piece handle=RMAN_MONTHLY_FULL_TESTDB_330_1_20160218_1 tag=MONTHLY_FULL_20160218 
 comment=API Version 2.0,MMS Version 5.5.2.0
 channel tape1: backup set complete, elapsed time: 00:00:01

backup will be obsolete on date Mar 24 2017 15:58:47
 archived logs required to recover from this backup will be backed up
 channel tape1: starting compressed full datafile backup set
 channel tape1: specifying datafile(s) in backup set
 including current control file in backup set
 channel tape1: starting piece 1 at Feb 18 2016 15:58:49
 channel tape1: finished piece 1 at Feb 18 2016 15:58:50
 piece handle=RMAN_MONTHLY_FULL_TESTDB_331_1_20160218_1 tag=MONTHLY_FULL_20160218 
 comment=API Version 2.0,MMS Version 5.5.2.0
 channel tape1: backup set complete, elapsed time: 00:00:01
 Finished backup at Feb 18 2016 15:58:50

released channel: tape1

Recovery Manager complete.

Change Backup To UNAVAILABLE

If this backup will be off-line in Tape Library  then mark it as UNAVAILABLE to avoid RMAN try use it during normal restore operation. Normal Restore should use short-term backups which usually  is active on Tape Library.

As I used non-default tag it’s easy change it.

RMAN> CHANGE BACKUP TAG MONTHLY_FULL_20160218 UNAVAILABLE;

changed backup piece unavailable
 backup piece handle=RMAN_MONTHLY_FULL_TESTDB_328_1_20160218_1 RECID=311 STAMP=904147035
 changed backup piece unavailable
 backup piece handle=RMAN_MONTHLY_FULL_TESTDB_329_1_20160218_1 RECID=312 STAMP=904147121
 changed backup piece unavailable
 backup piece handle=RMAN_MONTHLY_FULL_TESTDB_330_1_20160218_1 RECID=313 STAMP=904147126
 changed backup piece unavailable
 backup piece handle=RMAN_MONTHLY_FULL_TESTDB_331_1_20160218_1 RECID=314 STAMP=904147129
 Changed 4 objects to UNAVAILABLE status

How to restore it?

RMAN> shutdown abort;

Oracle instance shut down

RMAN> startup nomount;

connected to target database (not started)
 Oracle instance started

Total System Global Area 801701888 bytes

Fixed Size 2250648 bytes
 Variable Size 251660392 bytes
 Database Buffers 536870912 bytes
 Redo Buffers 10919936 bytes

RMAN> run {
 2> SET UNTIL RESTORE POINT BKP_MONTHLY_20160218_1556;
 3> RESTORE CONTROLFILE;
 4> STARTUP MOUNT;
 5> RESTORE DATABASE;
 6> RECOVER DATABASE;
 7> ALTER  DATABASE OPEN RESETLOGS;
 8> };

executing command: SET until clause

Starting restore at 18-02-2016 16:56:32
 allocated channel: ORA_SBT_TAPE_1
 channel ORA_SBT_TAPE_1: SID=225 device type=SBT_TAPE
 channel ORA_SBT_TAPE_1: Data Protection for Oracle: version 5.5.2.0
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: SID=233 device type=DISK

channel ORA_SBT_TAPE_1: starting datafile backup set restore
 channel ORA_SBT_TAPE_1: restoring control file
 channel ORA_SBT_TAPE_1: reading from backup piece RMAN_MONTHLY_FULL_TESTDB_331_1_20160218_1
 channel ORA_SBT_TAPE_1: piece handle=RMAN_MONTHLY_FULL_TESTDB_331_1_20160218_1 
 tag=MONTHLY_FULL_20160218
 channel ORA_SBT_TAPE_1: restored backup piece 1
 channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:00:02
 output file name=+TSM/testdb/controlfile/current.518.901826451
 Finished restore at 18-02-2016 16:56:42

database is already started
 database mounted
 released channel: ORA_SBT_TAPE_1
 released channel: ORA_DISK_1

Starting restore at 18-02-2016 16:56:48
 Starting implicit crosscheck backup at 18-02-2016 16:56:48
 allocated channel: ORA_DISK_1
 channel ORA_DISK_1: SID=225 device type=DISK
 Finished implicit crosscheck backup at 18-02-2016 16:56:49

Starting implicit crosscheck copy at 18-02-2016 16:56:49
 using channel ORA_DISK_1
 Finished implicit crosscheck copy at 18-02-2016 16:56:49

searching for all files in the recovery area
 cataloging files...
 no files cataloged

allocated channel: ORA_SBT_TAPE_1
 channel ORA_SBT_TAPE_1: SID=233 device type=SBT_TAPE
 channel ORA_SBT_TAPE_1: Data Protection for Oracle: version 5.5.2.0
 using channel ORA_DISK_1

channel ORA_SBT_TAPE_1: starting datafile backup set restore
 channel ORA_SBT_TAPE_1: specifying datafile(s) to restore from backup set
 channel ORA_SBT_TAPE_1: restoring datafile 00001 to +TSM/testdb/datafile/system.501.901826545
 channel ORA_SBT_TAPE_1: restoring datafile 00002 to +TSM/testdb/datafile/sysaux.542.901826547
 channel ORA_SBT_TAPE_1: restoring datafile 00003 to +TSM/testdb/datafile/undotbs1.442.901826547
 channel ORA_SBT_TAPE_1: restoring datafile 00004 to +TSM/testdb/datafile/users.314.901826547
 channel ORA_SBT_TAPE_1: restoring datafile 00005 to +TSM/testdb/datafile/users.510.901826547
 channel ORA_SBT_TAPE_1: restoring datafile 00006 to +TSM/testdb/datafile/users.487.901826545
 channel ORA_SBT_TAPE_1: reading from backup piece RMAN_MONTHLY_FULL_TESTDB_328_1_20160218_1
 channel ORA_SBT_TAPE_1: piece handle=RMAN_MONTHLY_FULL_TESTDB_328_1_20160218_1 
 tag=MONTHLY_FULL_20160218
 channel ORA_SBT_TAPE_1: restored backup piece 1
 channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:01:45
 Finished restore at 18-02-2016 16:58:43

Starting recover at 18-02-2016 16:58:44
 using channel ORA_SBT_TAPE_1
 using channel ORA_DISK_1

starting media recovery

channel ORA_SBT_TAPE_1: starting archived log restore to default destination
 channel ORA_SBT_TAPE_1: restoring archived log
 archived log thread=1 sequence=91
 channel ORA_SBT_TAPE_1: reading from backup piece RMAN_MONTHLY_FULL_TESTDB_330_1_20160218_1
 channel ORA_SBT_TAPE_1: piece handle=RMAN_MONTHLY_FULL_TESTDB_330_1_20160218_1 
 tag=MONTHLY_FULL_20160218
 channel ORA_SBT_TAPE_1: restored backup piece 1
 channel ORA_SBT_TAPE_1: restore complete, elapsed time: 00:00:01
 archived log file name=+TSM/testdb/archivelog/2016_02_18/thread_1_seq_91.267.904150729 thread=1 sequence=91
 channel default: deleting archived log(s)
 archived log file name=+TSM/testdb/archivelog/2016_02_18/thread_1_seq_91.267.904150729 RECID=834 STAMP=904150729
 media recovery complete, elapsed time: 00:00:02
 Finished recover at 18-02-2016 16:58:51

database opened
 new incarnation of database registered in recovery catalog
 starting full resync of recovery catalog
 full resync complete

How to migrate old fashion to new fashion to make long-term backup

Many environments inherit old backup configurations  of old DBAs, then is very common to see set up long-term backup environments using  “backup  database keep logs;”.

You can ask: I don’t want to keep all logs and  want to keep only my backups which is not obsolete? It’s possible?

Yes, There is a  workaround to avoid RMAN still keeping all logs forever, if you are using 11.1 or above.

AGAIN, if you are using 10.2 or later version you CAN’T do this below.

 

What you need to do.

First of all you need re-configure your scripts to  long-term backups by using restore point from now.

What to do with old backups.

We can leave this the old setup until the last backup take with “KEEP LOGS” retention be obsolete. All Archivelogs Backups will be obsolete and RMAN will automaticaly deleted it. (recommended)

P.S: If you have a Long Term backup with “KEEP FOREVER”  the Archivelogs Backups never will be obsolete.

There is a workaround (or bug)  to delete all archivelogs and make a “self-contained” backup even using “keep logs”. This can be dangerous if not configured properly,  RMAN will delete all archived logs that not satisfy retention policy. That’s mean, if it go wrong you will have a backup of database without archivelog to recovery it. (i.e you don’t will be able to recovery that backup).

Steps:

  • Find on RMAN Catalog all  Backups with clause keep.
  • Find all Backup of Archivelogs between 24h before and 24h after the backup with  clause “Keep” was taken.
    • e.g  backup was finished at 02/01/2016 10:00, find all backup of archivelogs between 02/01/2016 10:00 and 03/01/2016 10:00
  • Change theses backups (archivelogs, controlfile and datafile) to UNAVAILABLE
  • After change these backups to UNAVAILABLE all Backups of Archivelogs will be orphan and will be obsolete.

 

Eg.

(dd-mm-yyyy hh24:mi)

Backup 1 ( 01/01/2016 12:00)

Backup 2 (01/02/2016 12:00)

Backup 3 (01/03/2016 12:00)

Backup 4 (01/04/2016 12:00)

 

 

Scenario 1:

Change only  Backup 1  and Archivelogs to Restore this Backup to unavailable.

  • All archivelogs that does not have status UNAVAILABLE  until 01/02/2016 12:00 will be obsolete and will deleted on next “delete obsolete”.
  • All futures backups of archivelog will be kept until Backup 2 be obsolete.

Scenario 2:

Change only Backup 2  and Archivelogs to Restore this Backup to unavailable.

  • All archivelogs will be kept.

Scenario 3:

Change only Backup 1,2,3 and 4 to UNAVAILABLE.

  • All archivelogs will be obsolete, if run “delete obsolete” you can restore Backup 1,2,3 or 4, BUT WILL NOT BE ABLE TO RECOVERY IT since there is no Archivelog.

Scenario 4:

Change to unavailable Backup 1,2,3,4   and archivelogs to restore these database.

  • All archivelogs with Status AVAILABLE will be obsolete and all archivelogs will obey default retention policy.

 

It seens to me a bug, because Oracle just ignore retention policy of archivelog log, if Backup with Keep clause is UNAVAILABLE.

 

 

 

 


IBM POWER7 AIX and Oracle Database performance considerations


This post is intended to provide DBA's with advice and website links to assist with IBM Power Systems running in an Oracle environment.
Most issues that show up on POWER7 are the result of not following best practices advice that applies to all Power Systems generations.

Metalink: IBM POWER7 AIX and Oracle Database performance considerations -- 10g & 11g (Doc ID 1507249.1)

IBM White paper: IBM POWER7 AIX and Oracle Database performance considerations


TIP: Some points about setting up RMAN on RAC Environment

In this post I will some tips for setting up RMAN in RAC environment.

I will not cover  topics about  RMAN that can be configured in standalone environment (e.g Incremental backup, use of FRA, etc.)

First question: Is there a difference of setting up RMAN between  standalone and RAC environments?

The answer is YES, not too much but some points must be observed.

RMAN Catalog

First of all, In my point of view use RMAN Catalog is mandatory. Because it’s a HA environment and  to restore a database without RMAN Catalog can take long time.

To protect and keep backup metadata for longer retention times than can be accommodated by the control file, you can create a recovery catalog. You should create the recovery catalog schema in a dedicated standalone database. Do not locate the recovery catalog with other production data. If you use Oracle Enterprise Manager, you can create the recovery catalog schema in the Oracle Enterprise Manager repository database.

About HA of RMAN Catalog?

I always recommend to place the Host that will hold RMAN Catalog on VirtualMachine, because  is a machine which require low resource and disk space and have low activity.

In case of the failure of Host RMAN Catalog is easy move that host to another Physical Host or Recover the whole virtual machine.

But if the option of use a VM is not avaliable. Use another cluster (e.g Test Cluster) env if avaliable.

The Database of RMAN catalog must be in ARCHIVELOG. Why?

It’s a prod env (is enough), will generate very small amount of  archivelogs, and in case of any corruption or user errors (e.g User generated new incarnation of Prod Database  during a Test Validation of Backup in Test env) can be necessary recovery point in time.

Due a small database of low activity I see some customers not giving importance to this database. It’s should not happens.

High availability of execution of backup using RMAN:

We have some challenges:

  1.  The backup must not affect the availability cluster, but the backup must be executed daily.
  2.  The backup cannot be dependent of nodes (i.e backup must be able to execute in all nodes independently if have some nodes active or not)
  3. Where store the scripts of backup? Where store the Logs?

I don’t recommend use any nodes of cluster to start/store scripts backups. Due if that node fail backup will not be executed.
Use the Host where is stored RMAN Catalog to store your backup scripts too and start these scripts from this host, the utility RMAN works as client only… the backup is always performed on server side.

Doing this you will centralize all scripts and logs of backup from your environment. That will ease the management of backup.

Configuring the RMAN Snapshot Control File Location in a RAC 11.2

RMAN creates a copy of the control file for read consistency, this is the snapshot controlfile. Due to the changes made to the controlfile backup mechanism in 11gR2 any instances in the cluster may write to the snapshot controlfile. Therefore, the snapshot controlfile file needs to be visible to all instances.
The same happens when a backup of the controlfile is created directly from sqlplus any instance in the cluster may write to the backup controfile file.
In 11gR2 onwards, the controlfile backup happens without holding the control file enqueue. For non-RAC database, this doesn’t change anything.
But, for RAC database, the snapshot controlfile location must be in a shared file system that will be accessible from all the nodes.
The snapshot controlfile MUST be accessible by all nodes of a RAC database.

See how do that:

https://forums.oracle.com/forums/thread.jspa?messageID=9997615

Since version 11.1 : Node Affinity Awareness of Fast Connections

In some cluster database configurations, some nodes of the cluster have faster access to certain data files than to other data files. RMAN automatically detects this, which is known as node affinity awareness. When deciding which channel to use to back up a particular data file, RMAN gives preference to the nodes with faster access to the data files that you want to back up. For example, if you have a three-node cluster, and if node 1 has faster read/write access to data files 7, 8, and 9 than the other nodes, then node 1 has greater node affinity to those files than nodes 2 and 3.

Channel Connections to Cluster Instances with RMAN

Channel connections to the instances are determined using the connect string defined by channel configurations. For example, in the following configuration, three channels are allocated using dbauser/pwd@service_name. If you configure the SQL Net service name with load balancing turned on, then the channels are allocated at a node as decided by the load balancing algorithm.

However, if the service name used in the connect string is not for load balancing, then you can control at which instance the channels are allocated using separate connect strings for each channel configuration. So,your backup scripts will fail if  that node/instance is down.

So, my recommendation in admin-managed database environment is create a set of nodes to perform the backup.

E.g : If  you have 3 nodes you should use one or two node to perform backup, while the other node is less loaded. If you are using Load Balance in your connection… the new connection will be directed to the least loaded node.

Autolocation for Backup and Restore Commands

RMAN automatically performs autolocation of all files that it must back up or restore. If you use the noncluster file system local archiving scheme, then a node can only read the archived redo logs that were generated by an instance on that node. RMAN never attempts to back up archived redo logs on a channel it cannot read.

During a restore operation, RMAN automatically performs the autolocation of backups. A channel connected to a specific node only attempts to restore files that were backed up to the node. For example, assume that log sequence 1001 is backed up to the drive attached to node1, while log 1002 is backed up to the drive attached to node2. If you then allocate channels that connect to each node, then the channel connected to node1 can restore log 1001 (but not 1002), and the channel connected to node2 can restore log 1002 (but not 1001).

Configuring Channels to Use Automatic Load Balancing

To configure channels to use automatic load balancing, use the following syntax:

CONFIGURE DEVICE TYPE [disk | sbt] PARALLELISM number_of_channels;

Where number_of_channels is the number of channels that you want to use for the operation. After you complete this one-time configuration, you can issue BACKUP or RESTORE commands.

Setup  Parallelism on RMAN  is not enough to keep a balance, because if you start the backup from remote host using default SERVICE_NAME and if you are using parallelism the RMAN can start a session in each node and the backup be performed by all nodes at same time, this is not a problem, but can cause a performance   issue on your environment due high load.

Even at night the backup can cause performance problems due maintenance of the database (statistics gathering, verification of new SQL plans “automatic sql tuning set”, etc).

The bottlenecks are usually in or LAN or SAN, so use all nodes to perform backup can be  a waste. If the backup is run via LAN you can gain by using more than one node, but the server that is receiving the backup data will become a bottleneck.

I really don’t like to use more than 50% of nodes of RAC to execute backup due it can increase the workload in all nodes of clusters and this can be a problem to the application or database.

So, thinking to prevent it we can configure a Database Service to control where backup will be performed.

Creating a Database Service to perform Backup

Before start I should explain about limitation of database service.

Some points about Oracle Services.

When a user or application connects to a database, Oracle recommends that you use a service for the connection. Oracle Database automatically creates one database service (default service is always the database  name)  when the database is created.   For more flexibility in the management of the workload using the database, Oracle Database enables you to create multiple services and specify which database instances offer the services.

You can define services for both policy-managed and administrator-managed databases.

  • Policy-managed database: When you define services for a policy-managed database, you assign the service to a server pool where the database is running. You can define the service as either uniform (running on all instances in the server pool) or singleton (running on only one instance in the server pool).
  • Administrator-managed database: When you define a service for an administrator-managed database, you define which instances normally support that service. These are known as the PREFERRED instances. You can also define other instances to support a service if the preferred instance fails. These are known as AVAILABLE instances.
About Service Failover in Administrator-Managed Databases

When you specify a preferred instance for a service, the service runs on that instance during normal operation. Oracle Clusterware attempts to ensure that the service always runs on all the preferred instances that have been configured for a service. If the instance fails, then the service is relocated to an available instance. You can also manually relocate the service to an available instance.

About Service Failover in Policy-Managed Databases

When you specify that a service is UNIFORM, Oracle Clusterware attempts to ensure that the service always runs on all the available instances for the specified server pool. If the instance fails, then the service is no longer available on that instance. If the cardinality of the server pool increases and a instance is added to the database, then the service is started on the new instance. You cannot manually relocate the service to a specific instance.

When you specify that a service is SINGLETON, Oracle Clusterware attempts to ensure that the service always runs on only one of the available instances for the specified server pool. If the instance fails, then the service fails overs to a different instance in the server pool. You cannot specify which instance in the server pool the service should run on.

For SINGLETON services, if a service fails over to an new instance, then the service is not moved back to its original instance when that instance becomes available again.

Summarizing about use Services

If your database is Administrator-Managed  we can create a service and define  where backup will be executed, and how much nodes we can use with preferred and available nodes.

If your database is Policy-Managed we cannot define where backup will be executed, but we can configure a service SINGLETON, that will be sure that backup  will be executed in  only node, if that node fail the service will be moved to another available node, but we cannot choose in which node backup will be performed.

Note:

For connections to the target and auxiliary databases, the following rules apply:

Starting with 10gR2, these connections can use a connect string that does not bind to any particular instance. This means you can use load balancing.

Once a connection is established, however, it must be a dedicated connection  that cannot migrate to any other process or instance. This means that you still can’t use MTS or TAF.

Example creating service for Administrator-Managed Database

The backup will be executed on db11g2 and db11g3, but can be executed on db11g1 if db11g2 and db11g3 fail.

Set ORACLE_HOME  to same used by Database

$ srvctl add service -d db_unique_name -s service_name -r preferred_list [-a available_list] [-P TAF_policy]

$ srvctl add service -d db11g -s srv_rman -r db11g2,db11g3 -a db11g1 -P NONE -j LONG

$ srvctl start service -d db11g -s srv_rman
Example creating service for Policy-Managed Database

Using a service SINGLETON the backup will be executed on node which service was started/assigned. The service will be changed to another host only if that node fail.

Set ORACLE_HOME  to same used by Database

$ srvctl config database -d db11g |grep "Server pools"

Server pools: racdb11gsp

$ srvctl add service -d db11g -s srv_rman -g racdb11gsp -c SINGLETON

$ srvctl start service -d db11g -s srv_rman

If you have more than 2 nodes on cluster (with policy managed database) and you want use only 2 or more nodes to perform backup, you can choose the options below.

Configure a Service UNIFORM (the service will be available on all nodes)   you can control how much instance will be used to perform backup, but you cannot choose in which node backup will be performed. In fact the service does not control anything, you will set  PARALLELISM (RMAN) equal number of nodes wich you want use .

Ex: I have 4 Nodes but I want start backup in 2 nodes. I must choose parallelism 2.  Remember that Oracle can start 2 Channel on same host, this depend on workload of each node.

Using Policy Managed Database you should be aware that you do not care where (node) each instance is running, you will have a pool with many nodes and Oracle will manage all resources inside that pool. For this reason is not possible to control where you will place a heavier load.

This will only work if you are performing online backup or are using Parallel Backup.

Configuring RMAN to Automatically Backup the Control File and SPFILE

If you set CONFIGURE CONTROLFILE AUTOBACKUP to ON, then RMAN automatically creates a control file and an SPFILE backup after you run the BACKUP or COPYcommands. RMAN can also automatically restore an SPFILE, if this is required to start an instance to perform recovery, because the default location for the SPFILE must be available to all nodes in your Oracle RAC database.

These features are important in disaster recovery because RMAN can restore the control file even without a recovery catalog. RMAN can restore an autobackup of the control file even after the loss of both the recovery catalog and the current control file. You can change the default name that RMAN gives to this file with the CONFIGURE CONTROLFILE AUTOBACKUP FORMAT command. Note that if you specify an absolute path name in this command, then this path must exist identically on all nodes that participate in backups.

RMAN performs the control file autobackup on the first allocated channel. Therefore, when you allocate multiple channels with different parameters, especially when you allocate a channel with the CONNECT command, determine which channel will perform the control file autobackup. Always allocate the channel for this node first.

Enjoy..


Oracle Database Failover Active/Passive Unix/Linux Plataform – Free of Charge

In this post I will show you how to setting up environment high availability without the option Oracle RAC.

Oracle Fail Safe is available only for Windows, for Unix / Linux would need third party software  Cluster to do the  Failover.

Good News From Oracle:

Oracle Clusterware

Oracle Clusterware provides cluster membership and high availability services. It provides the cluster membership for features such as Oracle Real Application Clusters and Oracle ASM. It includes the following features:

  • Application monitoring, restart, and failover
  • Cluster membership services
  • Server monitoring and fencing
  • Single Client Access Name (SCAN)
  • Server Pools
  • Grid Naming Services

Oracle Clusterware can be used to protect any application (restarting or failing over the application in the event of a failure), free of charge, if one or more of the following conditions are met:

  • The server OS is supported by a valid Oracle Unbreakable Linux support contract.
  • The product to be protected is either:
    • Any Oracle product (e.g. Oracle Applications, Siebel, Hyperion, Oracle Database EE, Oracle Database XE)
    • Any third-party product that directly or indirectly stores data in an Oracle database
  • At least one of the servers in the cluster is licensed for Oracle Database (SE or EE)

A cluster is defined to include all the machines that share the same Oracle Cluster Registry (OCR) and Voting Disk.

http://download.oracle.com/docs/cd/E11882_01/license.112/e10594/editions.htm

See step by step here using clusterware 11.1, we can improvise this setup to 11.2 using SCAN feature which is more easy.

http://www.oracle.com/technetwork/products/clusterware/overview/si-db-failover-11g-134623.pdf

If link above is off, click here


Local/SCAN Listener – Enhancing Security (Oracle Security Alert)

Recently we discovered  a possible vulnerability on SCAN Listener,  so we opened   SR  and Oracle give us a solution.

I recommend all apply this security. “As far as I know only the availability can be affected, none concern about data integrity” .

Thread: How prevent REMOTE LISTENER register on SCAN LISTENER
https://forums.oracle.com/forums/thread.jspa?threadID=2369472

Oracle Security Alert for CVE-2012-1675

This security alert addresses the security issue CVE-2012-1675, a vulnerability in the TNS listener which has been recently disclosed as “TNS Listener Poison Attack” affecting the Oracle Database Server. This vulnerability may be remotely exploitable without authentication, i.e. it may be exploited over a network without the need for a username and password. A remote user can exploit this vulnerability to impact the confidentiality, integrity and availability of systems that do not have recommended solution applied.

Affected Products and Versions
Oracle Database 11g Release 2, versions 11.2.0.2, 11.2.0.3
Oracle Database 11g Release 1, version 11.1.0.7
Oracle Database 10g Release 2, versions 10.2.0.3, 10.2.0.4, 10.2.0.5

Solution

Recommendations for protecting against this vulnerability can be found at:

Please note that Oracle has added Oracle Advanced Security SSL/TLS to the Oracle Database Standard Edition license when used with the Real Application Clusters and Oracle has added Oracle Advanced Security SSL/TLS to the Enterprise Edition Real Application Clusters (Oracle RAC) and RAC One Node options so that the directions provided in the Support Notes referenced above can be applied by all Oracle customers without additional cost.

Note: Please refer to the Oracle licensing documentation available on Oracle.com regarding licensing changes that allow Oracle Advanced Security SSL/TLS to be used with Oracle SE Oracle Real Application Clusters and Oracle Enterprise Edition Real Application Customers (Oracle RAC) and Oracle RAC OneNode Options.

Due to the threat posed by a successful attack, Oracle strongly recommends that customers apply this Security Alert solution as soon as possible.

http://www.oracle.com/technetwork/topics/security/alert-cve-2012-1675-1608180.html

.


How to fast recover data lost due to user errors (without RMAN)

To a DBA time is money. In cases of downtime …. more time stoped, less money and less time stoped, more money.

Many times we make mistakes for not knowing all the resources available in certain situations that require quick action.

When someone calls you and says that a someone mistakenly deleted data, usually the first thought that comes is that we need to perform the restore of the data.

Here comes an important issue. How to restore data in a short period to cause less impact and less downtime.

Below is the request of a colleague who was in this situation, read:
Version:10g R2
One of our DBA colleagues accidently updated all the records of a table to a particular values without a WHERE clause.I need to revert it back to its previous state (as of 11 am today morning)
Since this is 3 TB schema, we don’t take logical backup for this schema. Flashback feauture is not enabled either.
I have LEVEL0 backup from Monday morning and LEVEL1 backup from Tuesday (today) morning. How can I restore the table to the previous state?

We have many options, but what option you decide to use will make all difference.

The first feature that I will try to use is the FLASHBACK QUERY.  Although this feature exists since version 9i unfortunately for many people is still a novelty.

Let’s test:

$ sqlplus user_test@db11g

SQL*Plus: Release 11.2.0.2.0 Production on Fri Feb 3 14:32:56 2012
Copyright (c) 1982, 2009, Oracle.  All rights reserved.
Enter password:

Connected to:
Oracle Database 11g Release 11.2.0.1.0 - Production
With the Automatic Storage Management option

SQL> select flashback_on from v$database;

FLASHBACK_ON
------------------
NO

### CREATING TABLE############
SQL>
CREATE TABLE USER_ERROR AS SELECT * FROM DBA_OBJECTS;
Table created.
##############################

### Populating Data ##########
INSERT INTO USER_ERROR SELECT * FROM USER_ERROR;
COMMIT;
76416 rows inserted
commited

INSERT INTO USER_ERROR SELECT * FROM USER_ERROR;
COMMIT;
152832 rows inserted
commited

INSERT INTO USER_ERROR SELECT * FROM USER_ERROR;
COMMIT;
305664 rows inserted
commited

INSERT INTO USER_ERROR SELECT * FROM USER_ERROR;
COMMIT;
611328 rows inserted
commited
##############################

### Checking Data
SELECT COUNT(1) FROM USER_ERROR;

COUNT(1)
----------------------
1222656

SQL> select count(object_type) qtd,object_type
from user_error
group by object_type
order by 1;

       QTD OBJECT_TYPE
---------- -------------------
        16 LOB PARTITION
        16 EDITION
        16 MATERIALIZED VIEW
        16 RULE
        32 DESTINATION
        48 JAVA SOURCE
        48 SCHEDULE
        64 SCHEDULER GROUP
       112 CONTEXT
       144 INDEXTYPE
       144 WINDOW
       144 UNDEFINED
       160 RESOURCE PLAN
       160 CLUSTER
       208 EVALUATION CONTEXT
       208 JOB CLASS
       256 DIRECTORY
       304 RULE SET
       304 PROGRAM
       400 CONSUMER GROUP
       448 JOB
       576 QUEUE
       816 XML SCHEMA
       880 OPERATOR
      2928 LIBRARY
      3552 PROCEDURE
      3824 TYPE BODY
      3984 SEQUENCE
      4448 TABLE PARTITION
      4864 INDEX PARTITION
      4880 JAVA DATA
      5392 FUNCTION
     13344 JAVA RESOURCE
     17168 TRIGGER
     17296 LOB
     23680 PACKAGE BODY
     24752 PACKAGE
     44624 TYPE
     58752 TABLE
     84384 INDEX
     86384 VIEW
    366768 JAVA CLASS
    446112 SYNONYM

43 rows selected.

SQL> SELECT CURRENT_TIMESTAMP FROM DUAL;

CURRENT_TIMESTAMP
---------------------------------------------------------------------------
03-FEB-12 02.38.08.300885 PM -02:00

### UPDATE WITHOUT WHERE CLAUSE

SQL> UPDATE USER_ERROR SET OBJECT_TYPE='UNKNOWN';

1222656 rows updated.

SQL> COMMIT;

Commit complete.

SQL> select count(object_type) qtd,object_type
from user_error
group by object_type
order by 1;

       QTD OBJECT_TYPE
---------- -------------------
   1222656 UNKNOWN

###### RECOVERING DATA QUICKLY #####

SQL>  CREATE TABLE USER_ERROR_RECOVERED
      AS
      SELECT * FROM USER_ERROR
      AS OF TIMESTAMP TO_TIMESTAMP('03-02-2012 14:38:08','DD-MM-YYYY HH24:MI:SS');

Table created.

SQL> SELECT COUNT(1) FROM USER_ERROR_RECOVERED;

  COUNT(1)
----------
   1222656

SQL> select count(object_type) qtd,object_type
from user_error_recovered
group by object_type
order by 1;
       QTD OBJECT_TYPE
---------- -------------------
        16 RULE
        16 LOB PARTITION
        16 MATERIALIZED VIEW
        16 EDITION
        32 DESTINATION
        48 JAVA SOURCE
        48 SCHEDULE
        64 SCHEDULER GROUP
       112 CONTEXT
       144 UNDEFINED
       144 WINDOW
       144 INDEXTYPE
       160 CLUSTER
       160 RESOURCE PLAN
       208 JOB CLASS
       208 EVALUATION CONTEXT
       256 DIRECTORY
       304 PROGRAM
       304 RULE SET
       400 CONSUMER GROUP
       448 JOB
       576 QUEUE
       816 XML SCHEMA
       880 OPERATOR
      2928 LIBRARY
      3552 PROCEDURE
      3824 TYPE BODY
      3984 SEQUENCE
      4448 TABLE PARTITION
      4864 INDEX PARTITION
      4880 JAVA DATA
      5392 FUNCTION
     13344 JAVA RESOURCE
     17168 TRIGGER
     17296 LOB
     23680 PACKAGE BODY
     24752 PACKAGE
     44624 TYPE
     58752 TABLE
     84384 INDEX
     86384 VIEW
    366768 JAVA CLASS
    446112 SYNONYM

43 rows selected.

### DROPPING TABLE #######

SQL> DROP TABLE USER_ERROR;

Table dropped.

SQL> SELECT COUNT(1) FROM USER_ERROR;
SELECT COUNT(1) FROM USER_ERROR
                     *
ERROR at line 1:
ORA-00942: table or view does not exist

SQL> FLASHBACK TABLE USER_ERROR TO BEFORE DROP;

Flashback complete.

SQL>  SELECT COUNT(1) FROM USER_ERROR;

  COUNT(1)
----------
   1222656

SQL>

In the example above I was able to restore data in a short time, saving work and time.

For this procedure succeeds the data must still be in the UNDO tablespace, then when more quickly you identify the error and try to fix it, Will increase the probability of success.

Oracle used the term “flashback” to cover very different things, I don’t like this because it confuse which feature we can use.

To use Flasback Table or Flasback Query you don’t need the “flashback feature” enabled, because flashback query is based on undo information, unlike flashback database which needs flashback logs + redo logs.

What Oracle Editions I can use feature flasback query?
Flashback Query – Is enabled to use in all Oracle Editions no additional cost.
http://www.oracle.com/us/products/database/product-editions-066501.html

What Oracle Editions I can use feature flasback table?
Flashback Table – Is enabled to use only in Oracle Enterprise Edition. (Works in SE edition)
http://www.oracle.com/us/products/database/product-editions-066501.html

Enjoy…

Google


RACcheck – RAC Configuration Audit Tool

RACcheck is a tool developed by the RAC Assurance development team for use by customers to automate the assessment of RAC systems for known configuration problems and best practices.

RACcheck is a RAC Configuration Audit tool  designed to audit various important configuration settings within a Real Application Clusters (RAC), Oracle Clusterware (CRS), Automatic Storage Management (ASM) and Grid Infrastructure environment. The tool audits configuration settings within the following categories:

  1. OS kernel parameters
  2. OS packages
  3. Many other OS configuration settings important to RAC.
  4. CRS/Grid Infrastructure
  5. RDBMS
  6. ASM
  7. Database parameters
  8. Many other database configuration settings important to RAC.

Features
1. RACcheck is NON-INTRUSIVE and does not change anything in the environment, except as detailed below:

– SSH user equivalence for the RDBMS software owner is assumed to be configured among all the database servers being audited in order for it to execute commands on the remote database server nodes. If the tool determines that this user equivalence is not established it will offer to set it up either temporarily or permanently at the option of the user. If the user chooses to set up SSH user equivalence temporarily then the script will do so for the duration of the execution of the tool but then it will return the system to the state in which it found SSH user equivalence originally. For those wishing to configure SSH user equivalence outside the tool (if not already configured), consult My Oracle Support Note: 372795.1.

– RACcheck creates a number of small output files into which the data necessary to perform the assessment is collected

– RACcheck creates and executes some scripts dynamically in order to accomplish some of the data collection

– RACcheck cleans up after itself any temporary files that are created and not needed as part of the collection.

2. RACcheck interrogates the system to determine the status of the Oracle stack components (ie., Grid Infrastructure, RDBMS, RAC, etc) and whether they are installed and/or running. Depending upon the status of each component, the tool runs the appropriate collections and audit checks. If due to local environmental configuration the tool is unable to properly determine the needed environmental information please refer to the TROUBLESHOOTING section.

3. Watchdog daemon – RACcheck automatically runs a daemon in the background to monitor command execution progress. If, for any reason, one of the commands run by the tool should hang or take longer than anticipated, the monitor daemon kills the hung command after a configurable timeout so that main tool execution can progress. If that happens then the collection or command that was hung is skipped and a notation is made in the log. If the default timeout is too short please see the TROUBLESHOOTING section regarding adjustment of the RAT_TIMEOUT, and RAT_ROOT_TIMEOUT parameters.

4. If RACcheck’s driver files are older than 90 days, the driver files are considered to be “stale” and the script will notify the user of a stale driver file. A new version of the tool and its driver files (kit) must be obtained from MOS Note 1268927.1.

5. When the RACcheck completes the collection and analysis it produces two reports, summary and detailed. A output .zip file is also produced by RACcheck. This output .zip file can be provided to Oracle Support for further analysis if an SR needs to be logged. The detailed report will contain Benefit/Impact, Risk and Action/Repair information. In many cases it will also reference publicly available documents with additional information about the problem and how to resolve it.

6. The results of the audit checks can be optionally uploaded into database tables for reporting purposes. See below for more details on this subject.

7. In some cases customers may want to stage RACcheck on a shared filesystem so that it can be accessed from various systems but be maintained in a single location rather than being copied to each cluster on which it may be used. The default behavior of the tool is to create a subdirectory and its output files in the location where the tool is staged. If that staging area is a read only filesystem or if the user for any reason would like the output to be created elsewhere then there is an environment variable which can be used for that purpose. The RAT_OUTPUT parameter can be set to any valid writable location and the output will be created there.

Applies to:
Oracle Server – Enterprise Edition – Version: 10.2.0.1 to 11.2.0.2 – Release: 10.2 to 11.2

  • Linux x86
  • IBM AIX on POWER Systems (64-bit)
  • Oracle Solaris on SPARC (64-bit)
  • Linux x86-64

To download RAC Check tool use this note on MoS:
RACcheck – RAC Configuration Audit Tool [ID 1268927.1]

Example of report output:

raccheck Report

Enjoy