ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110] 故障处理

一、问题背景

2015年1月16日早晨8点03分,数据库后台报ORA-00600[kddummy_blkchk]错误,系统显示SMON进程在恢复的死事务的过程中遇到了坏块,在多次尝试无效后,最终SMON出现致命错误导致系统宕机。

Fri Jan 16 08:03:19 2015
Errors in file /u01/oracle/admin/actdb/udump/actdb_ora_10355644.trc:
ORA-00600: [kddummy_blkchk], [118], [639865], [6110], [], [], [], []
Fri Jan 16 08:03:19 2015
Corrupt Block Found
         TSN = 8, TSNAME = ACTUARY_DATA
         RFN = 118, BLK = 639865, RDBA = 495567737
         OBJN = 1087274, OBJD = 1097591, OBJECT = TBL_TRAD_QID, SUBOBJECT = 
         SEGMENT OWNER = ACTUARY, SEGMENT TYPE = Table Segment

Fri Jan 16 08:34:25 2015
Errors in file /u01/oracle/admin/actdb/bdump/actdb_p006_65077728.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
Fri Jan 16 08:34:27 2015
Doing block recovery for file 118 block 639852
Block recovery from logseq 90945, block 1260180 to scn 9106774485353
Fri Jan 16 08:35:03 2015
Errors in file /u01/oracle/admin/actdb/bdump/actdb_smon_45548262.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
Fri Jan 16 08:35:04 2015
Errors in file /u01/oracle/admin/actdb/bdump/actdb_pmon_51053242.trc:
ORA-00474: SMON process terminated with error

二、问题处理

出现该问题,首先可以看到SMON尝试对文件118的639852块做恢复。而在恢复的过程中会因为发现坏块,会导致宕库。所以建议首先设置10513事件去阻止SMON对死事务的恢复。

Fri Jan 16 09:37:45 2015
ALTER SYSTEM SET event='10513 trace name context forever,level 2' SCOPE=SPFILE;

设置完事件后重启数据库,发现SMON已经不在尝试去做恢复了,但是后台进程依然会报错。这里主要是检测到报检测的坏块的错误。

Fri Jan 16 09:42:28 2015
Corrupt Block Found
         TSN = 8, TSNAME = ACTUARY_DATA
         RFN = 118, BLK = 639852, RDBA = 495567724
         OBJN = 1087274, OBJD = 1097591, OBJECT = TBL_TRAD_QID, SUBOBJECT = 
         SEGMENT OWNER = ACTUARY, SEGMENT TYPE = Table Segment

于是考虑对该对象进行重建。在执行CTAS对表TBL_TRAD_QID进行重建的过程中无法完成,报如下错误:

Fri Jan 16 09:50:32 2015
Errors in file /u01/oracle/admin/actdb/udump/actdb_ora_42664692.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
Fri Jan 16 09:54:39 2015
Errors in file /u01/oracle/admin/actdb/udump/actdb_ora_34472916.trc:
ORA-00600:  [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
Fri Jan 16 09:54:45 2015
Corrupt Block Found
         TSN = 8, TSNAME = ACTUARY_DATA
         RFN = 118, BLK = 639852, RDBA = 495567724
         OBJN = 0, OBJD = 1097591, OBJECT = , SUBOBJECT = 
         SEGMENT OWNER = , SEGMENT TYPE = Invalid Type

此时可以看到错误发生在用户进程上面,而不是发生在SMON进程上,因此不会导致数据库宕机。但是因为这个表对于应用来说很重要,所以必须考虑重建进行修复。但是重建的时候还是会去读这个对象,读的过程同样会对表做逻辑检查,一旦做逻辑检查,就会出现上述错误,从而导致读取也会失败,报ORA-00600错误。所以为了能够正常的读取该对象,我们考虑关闭数据库的逻辑校验。修改下列两个参数。

SQL> alter system set db_block_checking=false;
System altered.
SQL> alter system set db_block_checksum=false;
System altered.

修改参数成功后,再次运行CTAS重建表成功。把老的对象成功drop掉之后,将event 10513事件取消和两个校验参数重新设置回来后重启数据库仍然报错。

Fri Jan 16 11:18:05 2015
Errors in file /u01/oracle/admin/actdb/bdump/actdb_smon_51446784.trc:
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
Doing block recovery for file 118 block 639852
Block recovery from logseq 90953, block 66187 to scn 9106774948071

此时SMON进程再一次尝试恢复该对象,但是这个对象在我们的数据库中已经被我们删除不存在了,经过分析发现表drop掉了,会放在回收站中,物理的对象仍然存在。为了彻底的解决这个问题,需要把对象彻底的purge删除。

purge table table_name

三、原因分析

从后台trace,我们可以看到是下列语句造成了该问题。

ksedmp: internal or fatal error
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110], [], [], [], []
ORA-00607: Internal error occurred while making a change to a data block
ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639865], [6110], [], [], [], []
Current SQL statement for this session:
UPDATE Tbl_Trad_QID A
                 SET QCCXRQ =
                     (SELECT /*+PARALLEL(NT,3)*/
                       ACCTDATE
                        FROM NREGCLM NT
                       WHERE NT.BDH = A.BDH
                         AND A.YXYQCBZW=1
                         AND ROWNUM = 1)

下列语句涉及到一组更新,在mos上查询下列错误能够找到下列bug。

ALERT: Bug 7662491 – Array Update can corrupt a row. ORA-600 [kghstack_free1] ORA-600 [kddummy_blkchk][6110/6129] (文档 ID 861965.1)

7662491: INSTANCE CRASH / ORA-600 [KDDUMMY_BLKCHK] HIT DURING RECOVER

根据文档的提示,能在trace中找到下列信息:

kdbchk: the amount of space used is not equal to block size
        used=7888 fsc=0 avsp=302 dtl=8064
rechecking block failed with error code 6110

同时文档还指出

The trace file shows that the error is produced by an UPDATE with
OP:11.19 (Array Update)

and check code [6110] (“kdbchk: the amount of space used is not equal to block size”) or check code [6129] (“kdbchk: fsbo() wrong, (hsz )”). Note that check codes are not limited to 6129/6110.

可以看到OP:11.19就是Array Update操作。我们在仔细的看Trace,我们发现下列信息。

CHANGE #1 TYP:0 CLS: 1 AFN:118 DBA:0x1d89c370 OBJ:1097591 SCN:0x0848.55de4efc SEQ:  2 OP:11.19

这里可以看到在Trace里面对象1097591做了OP:11.19(Array Update)的操作,命中了这个Bug。

强烈建议:安装补丁7662491。

稿源:Buddy Yuan (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » ORA-00600: internal error code, arguments: [kddummy_blkchk], [118], [639852], [6110] 故障处理

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录