The Alert Log
There are a number of sources of information that the Oracle
Database provides to help in getting to the bottom of issues. The first place
in a situation where you do not know what has caused the issue is to check the
alert log for the instance. This log file contains messages for significant
events which will include general information and warnings/errors
·
All startup and shutdown commands, times taken
for transitions from nomount to mount
·
All Log switches and archive logs
·
Details on tablespace changes
·
Changes to the database involving datafiles and
redo logs.
·
Initialization parameters
·
Unable to open file ( if there is an issue with
a datafile or control file during normal startup)
·
Corrupt data blocks
·
Deadlocks
·
Problems with archiving such as the destination
being 100%
·
Trace file information for background processes
which have encountered problems
These messages are in chronological order so checking the
last number of changes or errors is as easy as going to the end of the file and
scanning up. Many of the messages will also include the details of a trace
file. These trace files are generated when a server or background process
encounters an error.
The Location of the Alert Log is set in the database
pararamter : BACKGROUND_DUMP_DEST
(This database parameter is dynamic and can be changed with
shutting down the instance)
The naming convention for the alert log is alert_<ORACLE_SID>.log
Viewing the alert log at the Operating System level can be
achieved by navigating to its location and opening with any editor. You can
also find it in Enterprise Manager on the database home page in the Diagnostic
Summary section with an Alert log link.
Trace Files
As mentioned in relation to the alert log, when a background
process encounters a problem it will write its output to a trace file in the
same location as the alert log : BACKGROUND_DUMP_DEST.
When a server process encounters a problem it will go to:
USER_DUMP_DEST
Trace file headers contain the version of Oracle and
Operating System the trace file was generated on , instance name and
process/thread id, followed by the actual error. When a trace file is generated
then unlike message in the alert log which can be informational, an actual
error has occurred.
You can limit the size a trace file will be (unlike the
alert log which cannot be limited) with the database parameter
MAX_DUMP_FILE_SIZE which by default is unlimited.
SQL> ALTER SYSTEM SET MAX_DUMP_FILE_SIZE = "50m"
scope = both
Server Generated
Alerts
Both the alert log and trace files looked at previously
provide a starting point when looking at an issue, but out of the box, Oracle
comes with an alert system which is very configurable. As well as using this to
look at immediate problems, it can be configured to alert you before the issue
becomes critical and effects the end users.
You can interact with the alert system through Database
Control which will be the easiest way but you can also you setup changes
through API’s and be notified by subscribing to the ALERT_QUE which will be
covered in the Alert System Architecture.
A server generated alert will fire when a certain limit is
reached or when certain events occur. These are looked after by the background
process MMON. When we talk about limits we are talking about database metrics
of which over 120 can be configured ( V$ALERT_TYPES). A metric is a statistic converted into some
meaningful figure such as disk reads(statistic) per second or space
used(statistic) is in proportion to total space which would be metrics.
Two Types of Alerts
1.
Threshold (stateful) - these can persist for some time until cleared.
An example would be a warning when a tablespace it 87% full. Once action is
taken these clear
2.
Nonthreshold(stateless) – unpredictable events
for example snapshot to old errors which are resolved immediately
For threshold alerts, when the problem condition is cleared
by fixing it, it’s moved to the alert history unlike Nonthreshold alerts which
go there directly.
Again you can view alerts in enterprise manager on the home
page under Alert and Related Alerts and under the Alert Log link. Each alert
can be drilled down into to find out more information.
Alert System
Architecture
The statistics used by the alert system are collected when
the database parameter STATISTICS_LEVEL is set to
TYPICAL(DEFAULT) or ALL. Setting this to BASIC will disable statistic gathering.
MMON gathers these statistics and calculates the metrics as all as monitoring
thresholds. If a metric breaks a given threshold then it will write it to the predefined
ALERT_QUE(if there is a problem with this queue or any other it would write it
to the alert log, this is the only interaction between the alert system and the
alert log). The ALERT_QUE can have more than one consumer so third-party
software could subscribe to it using the DBMS_AQADM package and CREATE_AQ_AGENT
, ADD_SUBSCRIBER procedures.
A notification rule
can be created that when an alert occurs, the DBA can take appropriate action
sometimes even before the end user notices.
User Trace Files
Trace files for a
user’s session which normally are generated on demand in helping to diagnose an
issue locally.
Setting the SQL_TRACE
value to TRUE for the session will generate performance statistics for the
processing of all SQL in that session.
SQL> ALTER
SESSION SET SQL_TRACE TRUE;
These files will be
produced in : USER_DUMP_DEST
If the user is
using an application then they probably will not be able to open a SQL session
and run commands like the above. In this case you need to first find the SID
from v$session that the user is running. This can be narrowed down using any of
the columns in v$session.
Then you can
monitor the session using the DBMS_MONITOR package like below
SQL>
execute dbms_monitor.session_trace_enable(
session_id=>101,serial_num=>99);
SQL>
execute dbms_monitor.session_trace_disable(
session_id=>101,serial_num=>99);
You can implement tracing through Database Control and
indeed it maybe easier to do this, especially in identifying the session for
the user be that through TOP CONSUMERS or something else.
System Log Files
You may or not also by the System Administrator or have to
co-ordinate with another team who manage the operating system but these would
need checked periodically and immediately if there was an issue. From
experience I’ve spent time going through the whole above process only to find
out the System Administrator never checked the system log files and in there
found out the root cause ( possible mount point issue or SMTP issue) e.g. in
Unix systems : /var/log/syslog or /var/adm/messages
Summary
For reactive situations, you will want to first check the
alert log and then drill down into any trace files which look like candidates
for the issue.
For proactive monitoring you will want to start playing with
and configuring the server alert system.
If trying to identify unique issues, you can trace the database by user , session
or program
No comments:
Post a Comment