Understanding Structure of the Controlfile

The first block of the controlfile is a header block that records just the controlfile block size and the number of blocks in the controlfile. The controlfile block size is the same as the database block size. When mounting a database, Oracle checks that the controlfile block size and the file size recorded in the controlfile header block match the db_block_size parameter and the file size reported by the operating system (if available). Otherwise an error is raised to indicate that the controlfile might have been corrupted or truncated.

After the header block, all controlfile blocks occur in pairs. Each logical block is represented by two physical blocks. This is necessary for the controlfile transaction mechanism.

It is theoretically possible that a hot backup of a controlfile could contain a split block. Therefore all controlfile blocks other than the file header have a cache header and tail that can be compared when mounting a database and whenever a controlfile block is read. The block type is 0 for virgin controlfile blocks and 21 otherwise. The physical controlfile block number is used in place of an RDBA in the cache header, and a controlfile sequence number is used in place of an SCN to record when the block was last changed. An ORA-00227 error is returned if the header and tail do not match, or if the block checksum does not match the checksum recorded in the cache header (if any).

The controlfile contains several different types of records, each in its own record section of one or more logical blocks. Records may span block boundaries within their section. The fixed view V$CONTROLFILE_RECORD_SECTION lists the types of records stored in each record section, together with the size of the record type, and the number of record slots available and in use within that section. The underlying X$KCCRS structure includes the starting logical block number (RSLBN) for each section.

Controlfile Transactions

Sessions must hold an exclusive lock on the CF enqueue for the duration of controlfile transactions. This prevents concurrent controlfile transactions, and in-flux controlfile reads, because a shared lock on the CF enqueue is needed for controlfile reads. However, there is also a need for recoverability should a process, instance or system failure occur during a controlfile transaction.

For the first record section of the controlfile, the database information entry section, this requirement is trivial, because the database information entry only takes about 210 bytes and is therefore guaranteed to always fit into a single controlfile block that can be written atomically. Therefore changes to the database entry can be implicitly committed as they are written, without any recoverability concerns.

Recoverability for changes to the other controlfile records sections is provided by maintaining all the information in duplicate. Each logical block is represented by two physical blocks. One contains the current information, and the other contains either an old copy of the information, or a pending version that is yet to be committed. To keep track of which physical copy of each logical block contains the current information, Oracle maintains a block version bitmap with the database information entry in the first record section of the controlfile.

To read information from the controlfile, a session must first read the block version bitmap to determine which physical block to read. Then if a change must be made to the logical block, the change is first written to the alternate physical block for that logical block, and then committed by atomically rewriting the block containing the block version bitmap with the bit representing that logical block flipped. When changes need to be made to multiple records in the same controlfile block, such as when updating the checkpoint SCN in all online datafiles, those changes are buffered and then written together. Note that each controlfile transaction requires at least 4 serial I/O operations against the controlfile, and possibly more if multiple blocks are affected, or if the controlfile is multiplexed and asynchronous I/O is not available. So controlfile transactions are potentially expensive in terms of I/O latency.

Whenever a controlfile transaction is committed, the controlfile sequence number is incremented. This number is recorded with the block version bitmap and database information entry in the first record section of the controlfile. It is used in the cache header of each controlfile block in place of an SCN to detect possible split blocks from hot backups. It is also used in queries that perform multiple controlfile reads to ensure that a consistent snapshot of the controlfile has been seen. If not, an ORA-00235 error is returned.

The controlfile transaction mechanism is not used for updates to the checkpoint heartbeat. Instead the size of the checkpoint progress record is overstated as half of the available space in a controlfile block, so that one physical block is allocated to the checkpoint progress record section per thread. Then, instead of using pairs of physical blocks to represent each logical block, each checkpoint progress record is maintained in its own physical block so that checkpoint heartbeat writes can be performed and committed atomically without affecting any other data.


Controlfile Resizing

The slots in some control file record sections can be reused circularly. The most obvious examples are the log history, archived logs and offline ranges, but the various backup related record types are also cyclically reusable.

The control_file_record_keep_time parameter sets the minimum number of days that must have elapsed before a reusable controlfile record slot can be reused. The default is 7 days. If all the slots in a record section are in use and that number of days has not yet elapsed since the timestamp on the earliest entry, then Oracle will dynamically expand the record section (and thus the controlfile too) to make more slots available, up to a maximum of 65535 slots per section, or the controlfile size limit. (The controlfile size limit is based on the number of blocks that can be represented in the block version bitmap, and is thus most unlikely to be reached.) Informational “kccrsz” messages about the dynamic expansion of the controlfile (or the failure to do so) may be seen in the alert.log.

There are V$ views for each reusable controlfile record section, each with a timestamp column. These views can be used to estimate the number of controlfile slots that would be required in each record section for a particular keep time setting. The control_file_record_keep_time parameter can also be set to zero to prevent keep time related controlfile expansion. However, dynamic controlfile expansion may nevertheless be required for the non-reusable record sections. For example, the controlfile may grow when creating a tablespace if either of the datafile or tablespace record sections are already full.

Controlfile resizing happens in a special controlfile transaction under the protection of the CF enqueue. An instance or system failure during the resizing of the controlfile can result in its corruption. Therefore controlfile backups are, as always, important.


Controlfile Dumps

The contents of the current controlfile can be dumped in text form to a process trace file in the user_dump_dest directory using the CONTROLF dump. The levels for this dump are as follows.

Dump Level Dump Contains
1 only the file header
2 just the file header, the database info record, and checkpoint progress records
3 all record types, but just the earliest and latest records for circular reuse record types
4 as above, but includes the 4 most recent records for circular reuse record types
5+ as above, but the number of circular reuse records included doubles with each level

For example, the following syntax could be used to get a text dump on the controlfile in the trace file of the current process showing all the controlfile record types but only the oldest and most recent of the circular reuse records.

oradebug setmypid
oradebug dump controlf 3

Of course, the session must be connected AS SYSDBA to use the ORADEBUG facility. However, any session with the ALTER SESSION privilege can use the following event syntax to take the same dump.


alter session set events 'immediate trace name controlf level 3';

Wait event: refresh controlfile command

The refresh controlfile command wait event is not important for tuning, but can be helpful in problem diagnosis. It occurs when mounting a database, and when a controlfile transaction changes the range of used record numbers in any controlfile record section. This information is cached in the SGA and must be updated in all instances whenever it is changed by a controlfile transaction. Accordingly, a refresh controlfile command is sent to the CKPT process in each instance causing it to refresh the cached controlfile information in the SGA from the physical controlfile. While this is being done, the requesting process waits on the refresh controlfile command wait event.

The inter-process communication message that is used can be seen in X$MESSAGES as follows.

SQL> select dest, description from x$messages where description like 'refresh %';

DEST DESCRIPTION
---- ----------------------------------------
CKPT refresh control file

This doc was written by Xuji

Advertisements

3 Comments on “Understanding Structure of the Controlfile”

  1. Amit says:

    Good explaination there but would you not suggest rebuilding the CF if records_total and records_used columns are having same values? i.e. everytime the section is being used or a new entry is entred by the system in CF, it has to expand which in turn going to lock CF for expenssion and may result in performance issue. Specially “archive log” and “log history” sections?

    Like


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s