Wednesday, April 05, 2006

How to check for and repair block corruption with RMAN in Oracle 9i and Oracle 10g

Problem: the application encounters an ORA-01578 runtime error because there are one or more corrupt blocks in a table it is reading.

How can corrupt blocks be caused?
First of all we have two diffent kinds of block corruption:
- physical corruption (media corrupt)
- logical corruption (soft corrupt)
Physical corruption can be caused by defected memory boards, controllers or broken sectors on a hard disk;
Logical corrution can amoung other reasons be caused by an attempt to recover through a NOLOGGING action.
There are two initialization parameters for dealing with block corruption:
- DB_BOCK_CHECKSUM (calculates a checksum for each block before it is written to disk, every time)
causes 1-2% performance overhead
- DB_BLOCK_CHECKING (serverprocess checks block for internal consistency after every DML)
causes 1-10% performance overhead
If performance is not a big issue then you should use these!

Normally RMAN checks only for physically corrupt blocks
with every backup it takes and every image copy it makes.
This is a common misunderstanding amoung a lot of DBAs.
RMAN doesn not automatically detect logical corruption by default!
We have to tell it to do so by using
CHECK LOGICAL!
The info about corruptions can be found in the following views:

SYS @ orcl AS SYSDBA SQL > select * from v$backup_corruption;

RECID STAMP SET_STAMP SET_COUNT PIECE# FILE# BLOCK#
---------- ---------- ---------- ---------- ---------- ---------- ----------
BLOCKS CORRUPTION_CHANGE# MAR CORRUPTIO
---------- ------------------ --- ---------
1 586945441 586945402 3 1 5 81
4 0 YES CORRUPT

-- SYS @ orcl AS SYSDBA SQL > select * from v$copy_corruption;

Here is a case study:

HR @ orcl SQL > select last_name, salary
2 from employees;

ERROR at line 2:

ORA-01578: ORACLE data block corrupted (file # 5, block # 83)
# this could be an ORA-26040 in Oracle 8i! and before
ORA-01110: data file 5: '/u01/app/oracle/oradata/orcl/
example01.dbf'


This is what you find in the alert_.log:

Wed Apr 5 08:17:40 2006
Hex dump of (file 5, block 83) in trace file
/u01/app/oracle/admin/orcl/udump/orcl_ora_14669.trc
Corrupt block relative dba: 0x01400053 (file 5, block 83)
Bad header found during buffer read
Data in bad block:
type: 67 format: 7 rdba: 0x0a545055
last change scn: 0x0000.0006d162 seq: 0x1 flg: 0x04
spare1: 0x52 spare2: 0x52 spare3: 0x0
consistency value in tail: 0xd1622301
check value in block header: 0x63be
computed block checksum: 0xe420
Reread of rdba: 0x01400053 (file 5, block 83)
found same corrupted data

Wed Apr 5 08:17:41 2006
Corrupt Block Found
TSN = 6, TSNAME = EXAMPLE
RFN = 5, BLK = 83, RDBA = 20971603
OBJN = 51857, OBJD = 51255, OBJECT = , SUBOBJECT =
SEGMENT OWNER = , SEGMENT TYPE =

Starting with Oracle 9i we can use RMAN
to check a database for both physically and logically corrupt blocks.

Here is the syntax:
RMAN> backup validate check logical database;
Starting backup at 05-04-2006:08:23:20
allocated channel: ORA_DISK_1
channel ORA_DISK_1: sid=136 devtype=DISK
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
input datafile fno=00001 name=/u01/app/oracle/oradata/orcl/
system01.dbf
input datafile fno=00003 name=/u01/app/oracle/oradata/orcl/
sysaux01.dbf
input datafile fno=00005 name=/u01/app/oracle/oradata/orcl/
example01.dbf
input datafile fno=00002 name=/u01/app/oracle/oradata/orcl/
undotbs01.dbf
input datafile fno=00004 name=/u01/app/oracle/oradata/orcl/
users01.dbf
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:45
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
channel ORA_DISK_1: starting full datafile backupset
channel ORA_DISK_1: specifying datafile(s) in backupset
including current control file in backupset
including current SPFILE in backupset
channel ORA_DISK_1: backup set complete, elapsed time: 00:00:03
Finished backup at 05-04-2006:08:24:10

RMAN does not physically backup the database with this command
but it reads all blocks and checks for corruptions.

If it finds corrupted blocks it will place the information about the corruption into a view:

SYS @ orcl AS SYSDBA SQL > select * from v$database_block_corruption;
FILE# BLOCK# BLOCKS CORRUPTION_CHANGE# CORRUPTIO
---------- ---------- ---------- ------------------ ---------
5 81 4 0 CORRUPT

this is what we find in the alert_.log:
Corrupt block relative dba: 0x014000b1 (file 5, block 177)
Bad header found during backing up datafile
Data in bad block:
type: 67 format: 7 rdba: 0x0a545055
last change scn: 0x0000.0007bc77 seq: 0x3 flg: 0x04
spare1: 0x52 spare2: 0x52 spare3: 0x0
consistency value in tail: 0xbc772003
check value in block header: 0xb32
computed block checksum: 0xe4c1
Reread of blocknum=177, file=/u01/app/oracle/oradata/orcl/
example01.dbf.
found same corrupt data


Now we can tell RMAN to recover all the blocks
which it has found as being corrupt:

RMAN> blockrecover corruption list;
# (all blocks from v$database_block_corruption)

Starting blockrecover at 05-04-2006:10:09:15
using channel ORA_DISK_1
channel ORA_DISK_1: restoring block(s) from datafile copy /u01/app/
oracle/flash_recovery_area/ORCL/datafile/o1_mf_example_236tmb1c_.dbf

starting media recovery
archive log thread 1 sequence 2 is already on disk as file /u01/app/oracle/
flash_recovery_area/ORCL/archivelog/2006_04_05/o1_mf_1_2_236wxbsp_.arc

archive log thread 1 sequence 1 is already on disk as file
/u01/app/oracle/oradata/
orcl/redo01.log

media recovery complete, elapsed time: 00:00:01
Finished blockrecover at 05-04-2006:10:09:24


this is in the alert_.log:
Starting block media recovery
Wed Apr 5 10:09:22 2006
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/
archivelog/2006_04_05/o1_mf_1_2_%u_.arc

Wed Apr 5 10:09:23 2006
Media Recovery Log /u01/app/oracle/flash_recovery_area/ORCL/
archivelog/2006_04_05/o1_mf_1_2_236wxbsp_.arc ( restored)

Wed Apr 5 10:09:23 2006
Recovery of Online Redo Log: Thread 1 Group 1 Seq 1 Reading mem 0
Mem# 0 errs 0: /u01/app/oracle/oradata/orcl/redo01.log
Wed Apr 5 10:09:23 2006
Completed block media recovery


I recommend you to check your database for corrupt blocks
with RMAN on a regular basis, proactively.
If you do so you RMAN finds out about block corruptions
before your application runs into an ORA-01578 and
before you find out that you have backed up the corrupt blocks again and again.

There have been incidents when DBAs found out
that they did not have a backup with the un-corruted block any more,
because you have deleted the last one with a not corrupted version.
They could not recover the block any more!

For more detailed info about recovering corrupt blocks
(without and with RMAN, releases 7-10g)
pls see metalink also:
Subject: Handling Oracle Block Corruptions in Oracle7/8/8i/9i/10g
Doc ID: Note:28814.1 Type: BULLETIN
Last Revision Date: 26-MAR-2006 Status: PUBLISHED

4 Comments:

At Tue May 30, 02:02:00 PM, Anonymous Anonymous said...

Nice article Lutz. You mentioned, v$database_block_corruption; restore validate doesn't populate this view...right?

BTW...do you know where one can get one of those Oracle 10g DBA Views posters you have in your picture?

Thanks.

 
At Tue Jun 13, 01:56:00 PM, Anonymous Anonymous said...

What do we do if the corruption occurs in a temp file. I dropped and re-created the temp tablespace, but the corruption still persists. We are using RAW devices for the temp files.

 
At Wed Aug 23, 04:00:00 AM, Blogger roobaron said...

Thanks for the tip about validate and check logical, the example in the doco is as good.

 
At Wed Nov 08, 08:12:00 AM, Anonymous Anonymous said...

Thanks,

Its very much beneficial for evry dba's.

Pratheep

 

Post a Comment

<< Home