Comprehensive Oracle Database Health Check

Overview

Download PDF


An Oracle database health check involves a thorough review of the database's performance, configuration, and security to ensure optimal operation. It typically includes examining system resource usage (CPU, memory, and storage), checking database logs for errors, and verifying backup and recovery processes. Performance metrics such as query response times, index efficiency, and I/O bottlenecks are analyzed to identify potential issues. A health check also assesses database security by reviewing user access, roles, and privileges. Regular checks help detect early signs of failure or inefficiency, ensuring that the database runs smoothly and securely.


Oracle database deployment and design are critical to ensuring the system is robust, scalable, and efficient. The deployment process involves installing the database on physical or virtual servers, configuring the storage, memory, and network settings for optimal performance, and setting up failover options like Oracle RAC or Data Guard for high availability.


Fundamentally, Oracle's architecture follows a multi-tiered approach with the instance (memory and background processes), the database (physical storage), and clients (applications) interacting with it. Key components include tablespaces, data files, redo logs, and control files, which manage data and recovery. Security, backup strategies, and efficient indexing are essential in the design phase to meet business continuity, performance, and security requirements.


The design should also account for scalability, by utilizing Oracle's partitioning, clustering, and load balancing features, while maintaining security through role-based access and encryption.
Improving Oracle Database performance involves several strategies. First, optimizing SQL queries can significantly reduce execution time. Second, proper indexing can enhance data retrieval speed. Third, regular database maintenance, such as statistics collection and defragmentation, ensures optimal performance. Fourth, using Oracle's built-in performance tuning tools, like Automatic Workload Repository (AWR) and SQL Tuning Advisor, helps identify and resolve bottlenecks. Lastly, scaling hardware resources, such as CPU and memory, can provide the necessary power for high-demand operations.

Below are some major categories involving performing Health check of Oracle Database.

  • Oracle Database Architecture Health Check
  • Oracle Database Engine Health Check
  • Oracle Database Processes Level Health Check

    Oracle Database Architecture Health Check

    Oracle Database Architecture Health Check ensures optimal performance, much like reviewing the core components of a computer. The database’s memory (SGA and PGA) must be balanced to handle workloads efficiently. ASM (Automatic Storage Management) is used to manage datafiles for efficient disk I/O. Critical components like Redo Log files are stored on fast, high-performance storage for quick recovery, while Archive logs are placed on separate storage for point-in-time recovery. Regular backups are stored on tape drives or other media, isolated from the main database storage, ensuring data protection.

    • Check CPU Utilization

    Top

    cat /proc/stat

    yum install sysstat
    mpstat

    • Check Storage I/O

    iostat

    yum install iotop
    iotop

    Using Resource Monitor:
    • Press Ctrl + Shift + Esc to open Task Manager.
    • Go to the Performance tab and click Open Resource Monitor at the bottom.
    • Switch to the Disk tab in Resource Monitor.
    • Here, you can monitor disk I/O statistics, including:
    o Disk Reads/sec: Number of read operations per second.
    o Disk Writes/sec: Number of write operations per second.
    o Disk Queue Length: The length of the queue of I/O operations waiting for disk access.

    Using Performance Monitor (PerfMon):
    • Press Windows + R, type perfmon, and press Enter.
    • In Performance Monitor, add counters for disk-related metrics:
    o Right-click on Performance Monitor in the left panel.
    o Click Add Counters.
    o Choose PhysicalDisk and select counters such as:
     Disk Reads/sec
     Disk Writes/sec
     Avg. Disk sec/Read
     Avg. Disk sec/Write
     Disk Queue Length
    These counters give detailed real-time data about I/O performance.

    • Check Memory Utilization


    Free -g

    cat /proc/meminfo

    vmstat 1

    Using Task Manager:
    • Press Ctrl + Shift + Esc to open Task Manager.
    • Go to the Performance tab, and select Memory on the left.

    Using Resource Monitor:
    • Open Task Manager (Ctrl + Shift + Esc), go to the Performance tab, and click on Open Resource Monitor.
    • Switch to the Memory tab to get detailed information about memory utilization.

    Using Performance Monitor (PerfMon):
    • Press Windows + R, type perfmon, and press Enter.
    • In Performance Monitor, you can add memory-related counters to track usage.
    o Right-click in the window, choose Add Counters.
    o Select Memory, and choose counters like:
     Available MBytes
     Committed Bytes
     Cache Faults/sec
     Page Faults/sec

    • Redo Log File Should be on Fast Storage


    dd if=/dev/zero of=testfile bs=1G count=1 oflag=direct

    dd if=testfile of=/dev/null bs=1G count=1 iflag=direct

    winsat disk -drive c

    • Backups Should Be Copy External Storage Like Tape Drive or Other Storage.

    Oracle Database Engine Health Check

    An Oracle Database Engine Health Check is essential for maintaining optimal performance and involves monitoring several critical components. The System Global Area (SGA) should be appropriately sized to minimize memory contention, while the Program Global Area (PGA) needs careful management to ensure efficient memory allocation for session data and sorting operations. Additionally, the size of the redo logs plays a significant role in recovery performance, and resizing them can prevent frequent log switches that may hinder throughput. Monitoring log switches is crucial, as frequent occurrences can indicate transaction volume issues, and analyzing wait events provides insights into resource contention, helping identify bottlenecks and optimize the database environment effectively.

    System Global Area (SGA): This shared memory area contains data and control information for the Oracle instance, and its size should be tuned to accommodate the workload, minimizing memory contention.

    Program Global Area (PGA): This memory region is allocated to each Oracle process, and monitoring its usage helps ensure efficient memory allocation for sorting, hashing, and managing session data.

    Redo Log Size: The size of the redo log files impacts recovery performance; appropriately sizing these logs can enhance database recovery speed and prevent frequent log switches.

    Log Switches: Frequent log switches can indicate issues with transaction volume or log size, and monitoring this helps in optimizing the log configuration for better throughput.

    Wait Events: Analyzing wait events provides insights into resource contention, helping identify bottlenecks and performance issues, which is essential for tuning and optimizing the database environment.

    Shared Server and Dedicated Server Process

    For Configure Shared Server
    select name,value from gv$parameter where name in ('shared_servers','dispatchers','max_shared_servers','shared_server_sessions','max_dispatchers');
    SELECT * FROM V$SHARED_SERVER;
    SELECT * FROM V$DISPATCHER;
    ALTER SYSTEM SET SHARED_SERVERS = 5;
    ALTER SYSTEM SET DISPATCHERS = '(PROTOCOL=TCP)(DISPATCHERS=3)';
    ALTER SYSTEM SET MAX_SHARED_SERVERS = 10;
    ALTER SYSTEM SET SHARED_SERVER_SESSIONS = 100;
    ALTER SYSTEM SET MAX_DISPATCHERS = 5;
    SELECT * FROM V$SHARED_SERVER;
    SELECT * FROM V$DISPATCHER;
    Monitor Block Tracking File Size which use for Incremental Backup.

    ALTER DATABASE Disable BLOCK CHANGE TRACKING;
    ALTER DATABASE ENABLE BLOCK CHANGE TRACKING USING FILE ‘/u01/blocktracking.ora';
    SELECT filename, status, bytes FROM v$block_change_tracking;
    Configure Keep_Pool and Move Frequently Used Objects in Keep Pool

    select table_name,BUFFER_POOL from dba_tables where OWNER='T24' AND not BUFFER_POOL='DEFAULT';
    alter TABLE T24.F_BATCH storage (buffer_pool keep);

    Oracle Database Processes Execution Level Health Check

    An Oracle Database Execution Level Health Check focuses on monitoring and optimizing SQL query performance and execution plans. It involves analyzing SQL statements using tools like EXPLAIN PLAN, SQL Trace, and TKPROF to identify inefficient queries, missing indexes, or suboptimal joins. The check also reviews bind variable usage, index efficiency, and parallel execution settings to enhance performance. By detecting long-running queries or heavy resource-consuming statements, this health check ensures that database operations are executed optimally, improving overall query performance and resource utilization.
    Operational Health Check

    • Archive Log Location Space Monitoring


    SELECT TRUNC (first_time) "Date", inst_id, TO_CHAR (first_time, 'Dy') "Day",
    COUNT (1) "Total",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '00', 1, 0)) "h0",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '01', 1, 0)) "h1",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '02', 1, 0)) "h2",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '03', 1, 0)) "h3",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '04', 1, 0)) "h4",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '05', 1, 0)) "h5",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '06', 1, 0)) "h6",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '07', 1, 0)) "h7",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '08', 1, 0)) "h8",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '09', 1, 0)) "h9",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '10', 1, 0)) "h10",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '11', 1, 0)) "h11",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '12', 1, 0)) "h12",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '13', 1, 0)) "h13",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '14', 1, 0)) "h14",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '15', 1, 0)) "h15",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '16', 1, 0)) "h16",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '17', 1, 0)) "h17",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '18', 1, 0)) "h18",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '19', 1, 0)) "h19",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '20', 1, 0)) "h20",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '21', 1, 0)) "h21",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '22', 1, 0)) "h22",
    SUM (DECODE (TO_CHAR (first_time, 'hh24'), '23', 1, 0)) "h23",
    ROUND (COUNT (1) / 24, 2) "Avg"
    FROM gv$log_history
    WHERE thread# = inst_id
    AND first_time > sysdate -7
    GROUP BY TRUNC (first_time), inst_id, TO_CHAR (first_time, 'Dy')
    ORDER BY 1,2;

    • Monitor Flash Recover Area


    select name, floor(space_limit / 1024 / 1024/1024) "Size GB", ceil(space_used / 1024 / 1024/1024) "Used GB" from v$recovery_file_dest order by name;

    • Check Tablespace Size


    SELECT ts.tablespace_name,
    size_info.megs_alloc,
    size_info.megs_free/1024,
    size_info.megs_used,
    size_info.pct_free,
    size_info.pct_used,
    size_info.MAX,
    size_info.Available /1024 Available_GB,
    Available_Percentage
    FROM (SELECT a.tablespace_name,
    ROUND (a.bytes_alloc / 1024 / 1024) megs_alloc,
    ROUND (NVL (b.bytes_free, 0) / 1024 / 1024) megs_free,
    ROUND ( (a.bytes_alloc - NVL (b.bytes_free, 0)) / 1024 / 1024)
    megs_used,
    ROUND ( (NVL (b.bytes_free, 0) / a.bytes_alloc) * 100)
    Pct_Free,
    100 - ROUND ( (NVL (b.bytes_free, 0) / a.bytes_alloc) * 100)
    Pct_used,
    ROUND (maxbytes / 1048576) MAX,
    ((ROUND (maxbytes / 1048576))- (ROUND ( (a.bytes_alloc - NVL (b.bytes_free, 0)) / 1024 / 1024))) as Available
    , (((ROUND (maxbytes / 1048576))- (ROUND ( (a.bytes_alloc - NVL (b.bytes_free, 0)) / 1024 / 1024))) /(ROUND (maxbytes / 1048576)))*100 as Available_Percentage
    FROM ( SELECT f.tablespace_name,
    SUM (f.bytes) bytes_alloc,
    SUM (
    DECODE (f.autoextensible,
    'YES', f.maxbytes,
    'NO', f.bytes))
    maxbytes
    FROM dba_data_files f
    GROUP BY tablespace_name) a,
    ( SELECT f.tablespace_name, SUM (f.bytes) bytes_free
    FROM dba_free_space f
    GROUP BY tablespace_name) b
    WHERE a.tablespace_name = b.tablespace_name(+)
    UNION ALL
    SELECT h.tablespace_name,
    ROUND (SUM (h.bytes_free + h.bytes_used) / 1048576) megs_alloc,
    ROUND (
    SUM (
    (h.bytes_free + h.bytes_used) - NVL (p.bytes_used, 0))
    / 1048576)
    megs_free,
    ROUND (SUM (NVL (p.bytes_used, 0)) / 1048576) megs_used,
    ROUND (
    ( SUM (
    (h.bytes_free + h.bytes_used)
    - NVL (p.bytes_used, 0))
    / SUM (h.bytes_used + h.bytes_free))
    * 100)
    Pct_Free,
    100
    - ROUND (
    ( SUM (
    (h.bytes_free + h.bytes_used)
    - NVL (p.bytes_used, 0))
    / SUM (h.bytes_used + h.bytes_free))
    * 100)
    pct_used,
    ROUND (
    SUM (
    DECODE (f.autoextensible,
    'YES', f.maxbytes,
    'NO', f.bytes)
    / 1048576))
    MAX,
    (((ROUND (SUM (DECODE (f.autoextensible,'YES', f.maxbytes,'NO', f.bytes) / 1048576))) -(ROUND (SUM (NVL (p.bytes_used, 0)) / 1048576)))/1024) as Available_GB
    ,((((ROUND (SUM (DECODE (f.autoextensible,'YES', f.maxbytes,'NO', f.bytes) / 1048576))) -(ROUND (SUM (NVL (p.bytes_used, 0)) / 1048576)))) /(ROUND (SUM (DECODE (f.autoextensible,'YES', f.maxbytes,'NO', f.bytes) / 1048576)))) * 100 as Available_Percentage
    FROM sys.v_$TEMP_SPACE_HEADER h,
    sys.v_$Temp_extent_pool p,
    dba_temp_files f
    WHERE p.file_id(+) = h.file_id
    AND p.tablespace_name(+) = h.tablespace_name
    AND f.file_id = h.file_id
    AND f.tablespace_name = h.tablespace_name
    GROUP BY h.tablespace_name) size_info,
    sys.dba_tablespaces ts,
    sys.dba_tablespace_groups tsg
    WHERE ts.tablespace_name = size_info.tablespace_name
    AND ts.tablespace_name = tsg.tablespace_name(+)
    order by Available_Percentage asc;

    • Monitor Database Backups

    SELECT status status,
    object_type object_type,
    input_bytes / 1024 / 1024 / 1024 input_bytes,
    output_bytes / 1024 / 1024 / 1024 output_bytes,
    TO_CHAR (start_time, 'mm/dd/yyyy:hh:mi:ss') AS start_time,
    TO_CHAR (end_time, 'mm/dd/yyyy:hh:mi:ss') AS end_time
    FROM v$rman_status
    WHERE start_time > SYSDATE - 1 AND operation = 'BACKUP';

    • Monitor Database Alert Log

    Performance Related Health Check

    • Check Database Session Locking


    select blocking_session, sid, serial#, wait_class, seconds_in_wait
    from
    v$session where blocking_session is not NULL order by blocking_session;
    select username, sid,serial#,sql_id , wait_time from v$session where (sid ) in
    (
    select blocking_session from v$session
    );
    select inst_id,last_call_et,username,machine,event,status from gv$session where username='T24' and lower(event) like '%contention%';
    select * from gv$session a , gv$lock b where a.sid=b.sid and A.INST_ID=b.inst_id and username='T24' and lower(event) like '%contention%';
    select * from dba_blockers;

    • Check the Queries which last_call_et column value high and increasing in your Database


    select event,last_call_et,username,module,program,machine,sql_id,prev_sql_id,
    ((select max(sql_text) from gv$sql a where a.sql_id=s.sql_id)) as sql1,
    ((select max(sql_text) from gv$sql a where a.sql_id=s.prev_sql_id)) as sql2
    from gv$session s where status='ACTIVE' and not username is null order by last_call_et desc;

    • Check Active / In Active Database Sessions


    select inst_id,sid,last_call_et,username,machine,event,status from gv$session where username='T24' order by last_call_et desc;
    select inst_id,sid,last_call_et,username,machine,event,status from gv$session where username='T24' and status='ACTIVE' order by last_call_et desc;
    Check User Level Session Locking.
    SELECT se.inst_id,se.last_call_et,
    lk.SID, se.username, se.OSUser, se.Machine,DECODE (lk.TYPE,'TX', 'Transaction','TM', 'DML','UL', 'PL/SQL User Lock',lk.TYPE)
    lock_type, DECODE (lk.lmode, 0, 'None', 1, 'Null',
    2, 'Row-S (SS)',
    3, 'Row-X (SX)',
    4, 'Share',
    5, 'S/Row-X (SSX)',
    6, 'xclusive',
    TO_CHAR (lk.lmode))
    mode_held,
    DECODE (lk.request,
    0, 'None',
    1, 'Null',
    2, 'Row-S (SS)',
    3, 'Row-X (SX)',
    4, 'Share',
    5, 'S/Row-X (SSX)',
    6, 'Exclusive',
    TO_CHAR (lk.request))
    mode_requested,
    TO_CHAR (lk.id1) lock_id1,
    TO_CHAR (lk.id2) lock_id2,
    ob.owner,
    ob.object_type,
    ob.object_name,
    DECODE (lk.Block, 0, 'No', 1, 'Yes', 2, 'Global') block,
    se.lockwait
    FROM GV$lock lk, dba_objects ob, GV$session se
    WHERE lk.TYPE IN ('TX', 'TM', 'UL')
    AND lk.SID = se.SID
    AND lk.id1 = ob.object_id(+)
    AND lk.inst_id = se.inst_id
    and last_call_et > 100;

    • Generate AWR Report
    • Check "DB Time"
    • Check "SQL ordered by Elapsed Time"
    • Check Top 10 Foreground Events by Total Wait Time
    • Wait Classes by Total Wait Time
    • Check Table Stats


    select max(LAST_ANALYZED),'execute dbms_stats.gather_table_stats(ownname=>' || '''' || 't24' || '''' || ',' || 'tabname=>' || '''' || table_name || '''' || ',DEGREE=> 10, estimate_percent=> 10 );' from dba_tab_statistics where owner='T24' and stale_stats ='YES'
    group by table_name order by max(LAST_ANALYZED) asc;
    select max(LAST_ANALYZED),'execute dbms_stats.gather_table_stats(ownname=>' || '''' || 't24' || '''' || ',' || 'tabname=>' || '''' || table_name || '''' || ',DEGREE=> 10, estimate_percent=> 10 );' from dba_tab_statistics where owner='T24' and stale_stats ='YES' group by table_name order by max(LAST_ANALYZED) asc;
    select owner,table_name,LAST_ANALYZED from dba_tables;
    Check Invalid Objects
    select * from dba_ind_partitions where index_owner='T24' and not status='USABLE' ;
    select index_name,status from dba_indexes where not status='VALID' and owner='T24';
    Check Execution Plan for Query
    explain plan for insert into test1 select * from test1;
    select * from table(dbms_xplan.display);

    • Check Usage of Indexes

    ALTER INDEX abc1 MONITORING USAGE;
    SELECT index_name, table_name, monitoring, used FROM V$OBJECT_USAGE;
    ALTER INDEX abc1 NO MONITORING USAGE;

    • Check List of Queries Perform Full Scan

    SELECT sql_id,
    child_number,
    object_owner,
    object_name,
    operation,
    options
    FROM v$sql_plan
    WHERE operation = 'TABLE ACCESS'
    AND options = 'FULL' ;