|
| ORA-600 [17033] | 21 October 1999 |
|
Just started getting this error:
ORA-00600: internal error code, arguments: [17033], [509], [128], [], [], [], [], [] An entry is being generated every few seconds, and the trace files keep filling up the file system. For now I'm moving them off, but it looks like queries are being affected as well. This happened one other time earlier this summer, and bouncing the database took care of the problem. The only thing I can find on MetaLink is this reference to the _kgl_bucket_count parameter: Bug 381193 defines a scenario where ORA-600 [17033] is signaled due to lack of resources for the library cache hash table. The workaround for this bug is to set _kgl_bucket_count = 4. Again, this should only be set if it has been determined that the ORA-600 is actually caused by this bug as it can be signaled for other reasons.Looks like bouncing the database is inevitable. Should I set this parameter as described? Any other suggestions? | ||
| First try to flush the shared pool. If that does not work then bounce. If the problem comes back, a change to _kgl_bucket_count will be necessary. Don't just go for 4. Use the APT script kgl_bucket_count.sql to work out the right number for your instance. Of course, the normal caveats about X$ tables and hidden parameters apply. The correct value for this parameter depends on the number of named objects in the library cache, once the library cache has reached its steady state. That is, it is sensitive to the shared pool size, as well as to the application workload. If you change the shared pool size or the workload significantly, then this parameter should be checked. |
|
| ORA-600 [2845] | 24 November 1999 |
| I have just run into a load of ORA-600 [2845] [0] [90] [67328] errors (Oracle 7.3.4.3.0 running on IBM SP AIX 4.3). Oracle have informed me that this problem is related to index node splitting leading to the corruption of an index. Is there anything you would add to this? Oracle also advised me to upgrade to 7.3.4.4.0 and rebuild the offending indexes. Are there any methods you would recommend for proactively looking for indexes that might become corrupt, if this is at all possible? | ||
| You would get this error when attempting to access file 90, block 67328 by rowid, if either the file or block number is invalid. If the row number were invalid, you would get ORA-1410. In Oracle8, the ORA-1410 error is used for both conditions. From the information you've given me, I cannot confirm that you have an index corruption, but it is certainly a strong possibility. To confirm you would need to run ANALYZE TABLE VALIDATE STRUCTURE CASCADE. Validating the structure of the index itself (without the CASCADE keyword) does not check the validity of the rowids. |
|
| Core dumps | 9 December 1999 |
| We are trying to import into a new database and keep getting core dumps. We have tried a number of times and the failure does not occur at the same place in the import each time. We are not sure whether this is an Oracle problem or an environment problem. We are using Oracle 8.0.5 on Solaris 2.7. | ||
| These errors are normally due to incompatibility between the Oracle and operating system versions in use. You may not have relinked Oracle after upgrading the operating system. Or you may be short a few operating system patches. These errors can also be due to Oracle or operating system bugs, but I would suggest you make sure that it is not your own fault before contacting either support organization. |
|
| Fractured block found | 11 December 1999 |
I came on this error message in an alert log recently, a couple of days after the event:
Corrupt block relative dba: 0x26c085d6 file=155. blocknum=34262. Fractured block found during buffer read Data in bad block - type:6. format:2. rdba:0x26c085d6 last change scn:0x0000.00970ace seq:0x2 flg:0x00 consistency value in tail 0x00000001 check value in block header: 0x0, check value not calculated spare1:0x0, spare2:0x0, spare2:0x0When I did a block dump of the block (155, 34262) it was perfectly sound (and I am fairly sure it would not have been subject to a drop/create cycle in between). My guess at the moment is that there was a soft corruption in the SGA that Oracle detected before writing the block out, so it re-read the block and applied redo to bring it up to date. | ||
|
I suspect that the disk is faulty.
The buffer cannot have been reconstructed from redo, because this happened during a buffer read, not during instance recovery.
The fact that you did not get an ORA-1578 error indicates that the situation was resolved with a buffer read retry.
However, the tail value could never have been valid.
Therefore this is not an in-flux buffer being written by another node of an OPS database, or a fractured block due to a system crash.
Check the hardware diagnostic logs for the disk, and if you find any errors or warnings, I suggest you have it replaced. |
|
| Instance crashing | 3 January 2000 |
| Oracle crashed last night when starting up after a backup. There were trace files for all the background processes except LGWR, complaining about the death of other background processes. I was able to get the instance up again with a STARTUP FORCE, and everything seems OK. Why might this have happened? This is Oracle 7.3.4 on HP-UX 10.20. | ||
| A known cause of this problem is that Omniback does not close all its file descriptors in the process forked to startup Oracle after a cold backup. This can cause LGWR to exceed the operating system limit on the number of open files per process. If you start the database up manually it is fine. The problem often hits shortly after a data file has been added to the database. The solution is to increase the relevant kernel parameter (maxfiles) or upgrade to a newer version of Omniback. |
|
| ORA-600 [2667] | 12 January 2000 |
| I have a 7.3.4 NT database that is crashing with ORA-600 [2667] and a core dump at irregular intervals. | ||
| The 2667 error is raised by LGWR if its buffers have been corrupted. One known cause of this is the (re)compilation or (re)loading of a large package body. Look for an INVALID package body, and try to compile it to see if the error is reproduced. If so, breaking the package into two smaller ones is likely to workaround the problem. |
|
| ORA-600 [2103] | 17 January 2000 |
| During a hot backup of an Oracle 7.3 database, the LGWR crashed with ORA-600 [2103] [900]. | ||
| This is a timeout trying to get the controlfile enqueue to perform a control file transaction - probably trying to put a tablespace into backup mode, or bring it out again. This could be an operating system or hardware problem, so check your diagnostic logs. Alternately this could happen in OPS if your DLM has crashed. And of course it could be an Oracle bug. If this is happening frequently, you can set _controlfile_enqueue_timeout in seconds (default 900) to make it timeout more quickly, so you can get going again. |
|
| ORA-7445 | 17 January 2000 |
| We've recently started getting individual processes crashing with ORA-7445. Do you know what it is? | ||
| This error covers a multitude of sins. Basically it means that the process got an unexpected signal from the operating system. The primary argument is the signal number or name. Normally it is either a segmentation violation or a bus error, but I have also seen illegal instruction traps. These are all Oracle bugs, so your best course of action is to get the stack trace from the trace file and send it to Oracle Support. |
|
|
I had a look in the trace file, and there is no stack trace, only this ...
ORA-07445: exception detected: core-dump [4] [9] [0] [0] [] [] ----- Call Stack Trace ----- calling call entry argument values in hex location type point (? means dubious value) -------------------- -------- -------------------- ---------------------------- Error from U_get_previous_frame_x is 1 Stack is not Windable Stack has no Unwind_descriptorWhat now? |
|
This happens because you are running an older version on HP-UX on the newer PA8000 hardware architecture.
You can work around it by getting the stack trace from the core file using the t command in xdb.
Change to your core_dump_dest directory and find the subdirectory for the right core file, then start up xdb as follows.
xdb $ORACLE_HOME/bin/oracle core |
|
| ORA-600 [4036] | 20 January 2000 |
| Whenever I run a particular batch job, the instance aborts with ORA-600 [4036]. The batch job incorrectly tries to bring the R_BIG rollback segment online after each commit, and gets ORA-1636 because it is already online. However, we have been ignoring this error for two years, and the job has worked just fine. If I restart both instances cleanly after the crash, and rerun the batch job, it seems to succeed. | ||
| The [4036] error is happening when trying to terminate an active transaction (ktudax) because the SGA rollback segment array shows that there are no active transactions in that rollback segment. The corruption of the SGA rollback segment array may be due to bug 602492 or bug 967166. You will need to work with Oracle support to make further progress on this issue. |
|
| NetApp problems | 28 January 2000 |
|
We have 6 databases running Oracle8.1.5 on 3 Solaris 2.6 servers.
All the database file reside on a Network Appliance box.
According to their documentation, Oracle runs perfectly on this hardware,
and the system administrator and I followed their recommended configuration exactly.
Nevertheless, we have been having problems.
On several occasions all 6 database have crashed simultaneously, needing media recovery. This happened again last night. Oracle Support suggest that it may have been a hardware problem, but our system administrator has checked and the only problem was a full file system. A copy of the log writer trace file is attached. Is it possible for you to give me some explanations about this? My boss wants to know what really occurred. Is it Oracle, Solaris, NetApp or what? ORA-00345: redo log write error block 99252 count 2 ORA-00312: online log 1 thread 1: '/oracle_logs/oradata/stage02/online_redo/stage02_redo01a.log' ORA-27063: skgfospo: number of bytes read/written is incorrect SVR4 Error: 28: No space left on device Additional information: -1 Additional information: 1024 ORA-00346: log member marked as STALE ksedmp: internal or fatal error ORA-00600: internal error code, arguments: [2745], [0], [], [], [], [], [], [] | ||
|
The trace shows that a write to an online log file failed with errno 28, which normally indicates that the file system is full.
This can only occur if your log files were sparse files, which they should not be.
However, some backup and restore software can turn Oracle files into sparse files - so that is one area to check.
Copy the online log files to another file system using Unix cp and check that the final size is the unchanged.
If so, as I suspect, the error 28 should never have been returned to Oracle, and should be regarded as a NetApp bug.
The instances crashed, because Oracle must mark any log file for which a write fails as STALE, and stop using it. Oracle did this, but then it had no valid online log file member to write to, so it crashed with [2745] which is the expected behaviour. I've had a look at the NetApp site, and read their paper 3023 saying that it is supported with Oracle. Nevertheless, I am skeptical. Log files should already exist on disk, and should not hit file system full errors. Also, how does Oracle do synchronous writes over NFS? I don't know what special intelligence they have got there, but I strongly suspect that using a NetApp box for Oracle is a bad idea, despite the certification. |
|
| ORA-1578: ORACLE data block corrupted | 2 February 2000 |
Please let me know what to do about this error.
It occurs every time we gather statistics.
ORA-01578: ORACLE data block corrupted (file # 80, block # 2843) ORA-01110: data file 5: '/oracle/PRD/saprawe4c/pool2d.data03' | ||
First work out which database segment holds that block ...
select segment_name, segment_type from dba_extents where file_id = 80 and 2843 between block_id and block_id + blocks - 1 /If it is an index, you can drop it and recreate it. If it is a table, you've lost the data in that block. You may be able to get it back from backups. You may be able to derive it from the indexes. You may be able to work it out from a hex block dump. Oracle have a tool to edit and fix corrupted blocks (BBED). You may be happy to just salvage the rest of the data from the table. As you can see, there are lots of options, and it is non-trivial. |
|
| OSD-04008: WriteFile() failure | 9 February 2000 |
I realise this is a bit obscure, but would appreciate any hints.
The following segment from an alert.log indicates that some non-Oracle process is locking portions of the datafiles at the O/S level.
KCF: write/open error block=0x4fd1 online=1
file=3 D:\ORANT\DATABASE\USR1WIND.ORA
error=27072 txt: 'OSD-04008: WriteFile() failure, unable to write to file
O/S-Error: (OS 33) The process cannot access the file because another process has locked a portion of the file.'
Instance terminating due to error 1242
Would you have any suggestions as to the most likely culprits in the O/S?
What would be the best NT tool to use to track down the offending process?
| ||
| This is a reasonably well known problem on NT. Generally the error is seen when a routine NT file system backup takes a byte-range lock on one of the Oracle datafiles in that file system in order to back it up. There are many solutions, such as configuring the backup software to skip open files, or at least not to lock them; or you can use ocopy to copy the datafiles and back those up instead; or you can shutdown the instance; and so on. |
|
| ORA-600 [17034] | 28 February 2000 |
| I am getting an ORA-600 error that appears to be associated with triggers. | ||
I need to see an example and a process state dump to help you with this.
The syntax to get a process state dump when an ORA-600 error occurs is
alter session set events '600 trace name processstate forever, level 10';The dump will be written to the trace file in the user_dump_dest directory. |
|
|
Here is an example. The trace file is attached.
SQL> create table emp(empno number(5), empname varchar2(30));
Table created.
SQL> alter table emp add constraint empl_pk primary key (empno);
Table altered.
SQL> create or replace trigger emp_bef_ins
2 before insert on emp for each row
3 begin null; end;
4 /
Trigger created.
SQL> alter session set events '600 trace name processstate forever, level 10';
Session altered.
SQL> insert into emp values(1,'steve adams');
1 row created.
SQL> commit;
Commit complete.
SQL> insert into emp values(1,'steve adams');
insert into emp values(1,'steve adams')
*
ERROR at line 1:
ORA-00600: internal error code, arguments: [17034], [2281304776], [0], [], [], [], [], []
ORA-00001: unique constraint (DEMO.EMPL_PK) violated
Here I drop the trigger, and there is no ORA-600.
SQL> drop trigger emp_bef_ins;
Trigger dropped.
SQL> insert into emp values(1,'steve adams');
insert into emp values(1,'steve adams')
*
ERROR at line 1:
ORA-00001: unique constraint (DEMO.EMPL_PK) violated
|
| I have not seen this before, so I don't know the bug number, but it is clearly a bug. The state dump shows that the call has completed, but library cache call pins are still thought to be held. The error indicates that Oracle is trying to unpin something that is no longer pinned. I suggest that you send the trace file, the spool output above, and my comment here to Oracle Support. It may be that it is a known bug, in which case an upgrade will probably help you. |
| Copyright © Ixora Pty Ltd |
|