Tuesday, June 4, 2013

OGG-01031 Pump Abending

A fairly common occurance in the Golden Gate world. The pump abends as the target destination becomes full.
In this scenario, simply clearing space on the drive will not restart the pump.
The pump complains as below


2013-05-14 10:41:42 INFO OGG-00993 Oracle GoldenGate Capture for Oracle, p1ppppd1.prm: EXTRACT P1PPPPD1 started. 2013-05-14 10:41:47 INFO OGG-01226 Oracle GoldenGate Capture for Oracle, p1ppppd1.prm: Socket buffer size set to 27985 (flush size 27985). 2013-05-14 10:41:58 ERROR OGG-01031 Oracle GoldenGate Capture for Oracle, p1ppppd1.prm: There is a problem in network communication, a remote file problem, encryption keys for target and source do not match (if using ENCRYPT) or an unknown error. (Reply received is Unable to open file "/u01/oraogg/dirdat/rp/rp000014" (error 13, Permission denied)). 2013-05-14 10:41:58 ERROR OGG-01668 Oracle GoldenGate Capture for Oracle, p1ppppd1.prm: PROCESS ABENDING.

We tell the pump to Rollover onto a new trace file.


alter extract P1PPPPD1 etrollover
2013-05-14 10:57:19 INFO OGG-01520 Oracle GoldenGate Command Interpreter for Oracle: Rollover performed. For each affected output trail of Version 10 or higher format, after starting the source extract, issue ALTER EXTSEQNO for that trail's reader (either pump EXTRACT or REPLICAT) to move the reader's scan to the new trail file; it will not happen automatically.


Run info pumpname to get the new trail file name.

At the target end, the replicat should have parsed all the data received until the drive became full and data stopped coming through.
To let the target know to switch to the new file, use below command.


alter replicat R1PPPPD2 ,EXTSEQNO 15 , EXTRBA 0 --- replicat was reading the seqno 14. moved to 15.

Wednesday, April 24, 2013

LUNs size and ASM

It is fairly straight forward to query ASM instance to find which path and what their sizes are. I found myself increasingly looking at a fallen ASM instance(storage configuration problems) and trying to figure out which disk belongs to which LUN and what size it is.
After doing this for 2 times manually; I had to write a script to do this.



for i in `ls /dev/oracleasm/disks`
do
fp=$(/etc/init.d/oracleasm querydisk -p $i |grep /dev/mapper| awk -F ":" '{ print $1 }')
echo "----------$i--------------"
echo "$fp"
p=$(echo "$fp"| awk -F "/" '{ print substr($4,1, length($4) - 2) }')
sudo /sbin/multipath -ll| grep -A 1 $p|awk '{ print $1 }'
done;


It goes without saying, you need to customize for your environment and setup. Here it is Linux multipath.

Saturday, March 30, 2013

Data Guard Real time apply check

Physical standby can be in two modes. Real-time apply or Archive log shipping mode.
In real-time apply mode, the LGWR from the primary writes to the standby redo log at the target site.
Comparing archive log sequence numbers is not the correct way to check if the dataguard is operating correctly.
I have written the below script to check if the real time apply is working properly.

This works on the dataguard physical standby database.
It checks two columns (value and Datum_time) in V$dataguard_stats.


#!/bin/bash
################################################################################
## File Name : /adg_lag_check.sh #
## Description : Job to check gap in real time apply for DG #
## Author : Harris #
## Call Syntax : No parameters, writes local log file #
## Last Modified: 28/Mar/2013 By Harris #
################################################################################
export ORACLE_HOME=/u01/app/mydb01/oracle/product/11.2.0.3/db
export ORACLE_SID=mydb01
export PATH=$ORACLE_HOME/bin:$PATH
DATE=`date '+%m%d%Y%H%M%S'`
DBA_GROUP='harris@company.com'
sqlplus -s / as sysdba << EOF > adg_lag.log
set head off
set numf 99999999999
set feedback off
set echo off
set serveroutput on
select CASE WHEN
((extract(second from to_dsinterval(value)) + extract(minute from to_dsinterval(value)) * 60
+ extract(hour from to_dsinterval(value)) *60*60 + extract(day from to_dsinterval(value)) *60*60*24)
> 900) THEN 'Critical Data Guard lag more than 15mins'
WHEN value is null THEN ' Critical Data Broken status'
WHEN (((sysdate - to_date(DATUM_TIME,'MM/DD/YYYY HH24:MI:SS'))*24*60) > 15) THEN 'Critical Data Guard Network broken'
ELSE 'Data Guard OK'
END
from v\$dataguard_stats where name='apply lag';

EOF

if grep -c 'Critical' adg_lag.log; then
cat adg_lag.log | mailx -s "DataGuard Problem" $DBA_GROUP
fi;


Wednesday, March 27, 2013

Redo logs on Physical Standby

Redo logs on Physical Standby Online redo logs on a physical standby database is not really required. For real-time apply , standby redo logs are a must on standby site.

Standby redo logs are optional at the source site, but always advisable to be prepared for switchover or failovers.

If DB_CREATE_ONLINE_LOG_DEST_1 is not set and
DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST is specified,
the redo logs are created multiplexed each under DB_CREATE_FILE_DEST and DB_RECOVERY_FILE_DEST

If you need to move the redo log to another diskgroup or another filesystem on standby database, these are the steps


--cancel the recovery
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE CANCEL;

--change the Standby File Management
ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT='MANUAL';

--Clear the redo logs
ALTER DATABASE CLEAR LOGFILE GROUP 2;

--Drop the redo logs
ALTER DATABASE DROP LOGFILE GROUP 2;

--Add the new logs at the correct locations
ALTER DATABASE ADD LOGFILE THREAD 1 GROUP 2 ('+REDO_DG1','+REDO_DG2') size 1024M;

-- If you need to add standby redo logs, do so...
ALTER DATABASE ADD STANDBY LOGFILE THREAD 1 GROUP 5 ('+REDO_DG','+REDO_DG2') size 1024M;



Put the standby database back into action


ALTER SYSTEM SET STANDBY_FILE_MANAGEMENT='AUTO';

ALTER DATABASE RECOVER MANAGED STANDBY DATABASE USING CURRENT LOGFILE DISCONNECT;

Wednesday, March 20, 2013

EMACS and SQLPLUS

I have recently discovered EMACS thanks to the unfaultering effort by my co-brother Roy. I have only one word for EMACS- fan-"bleedy"-tastic.
Although not very intuitive, it is immensely powerful and extremely great to improve productivity.

I will break this blog into three sections for easier reading.

Section 1 - How does this fit into my work?




Remember the days spent with a SQL in a notepad, and server has no scripts. The day drags on with you running Ctrl+C, Ctrl+V , then right click into the Putty Console.


Now here is how it happens in Emacs.

So start the SQL-Plus buffer in Emacs. This should be inbuilt inside the 24.2 version onwards.

M-x sql-oracle
It should prompt for a userid, password and host name. Hostnames is the TNS names entry in your local tnsnames.ora file.

If you get the below error, like I did
ORA-12557: TNS:protocol adapter not loadable

Set ORACLE_HOME and PATH to same Oracle home(I have two Oracle Homes in my laptop) .Restart EMacs
Try M-x sql-oracle, again
This time hopefully you should get connected.
You are on the command prompt so you try any of the SQLs.

Now the interesting bit of working efficiently.
C-X, C-f

Then enter a existing or new SQL file name.

Your screen should split with the SQL file and the SQLPlus buffer in another.

Use C- to set the mark and move the keys to select the whole SQL

To send the selected SQL into SQLPlus buffer use the C-C C-r

Section 2 - Errors I encountered



Some of the errors I encountered were --

If you get the below error,

No SQL process started

You can try below

M-x sql-set-sqli-buffer
select *SQL* ( should be the default)

That lead me to another error

There is not suitable SQLi buffer

To fix this error, note the bottom of the opened SQL file. The status bar should say SQL[Oracle].

If it says SQL[ANSI], it needs to be changed by doing

M-x sql-set-product
key in "oracle"

Now the "No SQL process started" error can be fixed by using the M-x sql-set-sqli-buffer like above.

Now use C-X C-r to send the SQL from the file to the SQL buffer.

Section 3 - Couple of other hacks.




You must have noticed that pressing the backspace deletes the "SQL>" prompt. To fix this , add this to your .emacs file

(setq comint-prompt-regexp "^SQL>")

(custom-set-variables
'(comint-prompt-read-only t)
'(comint-use-prompt-regexp t))
(custom-set-faces)




PS:

This blog was written in EMACS

Roy's awesome blog on technology can be found here

Thursday, August 16, 2012

DataGuard ReadOnly to Apply and back

Dataguard and Reporting Before the days of Active Dataguard, were the days of read-only reporting off the physical standby. It pains to see data-centers with production like hardware and unused physical standby databases doing nothing but receive redo content once every few hours. Active Dataguard is a boon in that way.

In older versions, reporting environments typically received all the archive logs through the day and then, applied them overnight. The database is available through the day as read only , for querying and taken offline once a day for applying the logs and bringing back online to read-only mode.

Below is a shell script to do exactly that.

Its self explanatory




#!/bin/ksh
################################################################
################################################################
########## AUTHOR          : Harris
########## DATE            : 13 Aug 2012
########## VERSION         : 1.0
########## DESCRIPTION     :This script is used to stop physical
#### standby in readonly mode, apply the logs and bring back
##### online
################################################################
# VERSION |    DATE    |   Author   | DETAIL OF CHANGES
#---------------------------------------------------------------
# V0.1    | 08/13/2012 |Harris Baskaran |Initial Script
################################################################
################################################################
set -x
ORATAB=/var/opt/oracle/oratab
##Entries to update##########
export ORACLE_SID=STANDBYTNS
export CONNECTSTR=dg/dg@PRODTNS
export ENVIRON=DB service name
#####Review before editing parameters below this

export NLS_DATE_FORMAT='Mon DD YYYY HH24:MI:SS'
export ORACLE_HOME=`grep -i "$ORACLE_SID:" $ORATAB | grep -v "^#" | grep -v "^\*"| grep ":" | awk -F":" '{print $2}'`
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/ctx/lib:$ORACLE_HOME/lib:$LD_LIBRARY_PATH

touch error.log
touch shutdown_report.log
touch apply_report.log
touch check_report.log
touch switch2ro.log
touch handshake.log


## Email functionality

mail_it_to_all()
{
#Email the report
SEND_DATE=`date`
RECIPIENTS=dba@company.com
MESSAGE_TITLE="Reporting DB ${ORACLE_SID} on ${ENVIRON} ${SEND_DATE}"
/usr/bin/mailx -s "${MESSAGE_TITLE}" ${RECIPIENTS}  << EOF
${SEND_DATE}
`cat error.log`
`cat shutdown_report.log`
`cat apply_report.log`
`cat check_report.log`
`cat switch2ro.log`
`cat handshake.log`
EOF
}


##Find Sequence of logs shipped
log_shipping_check()
{
sqlplus -s -l / as sysdba  << LOGSHIP
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
set heading off
SELECT SEQUENCE# FROM V\$ARCHIVED_LOG WHERE FIRST_TIME = (SELECT MAX(FIRST_TIME) FROM V\$ARCHIVED_LOG GROUP BY THREAD#);
exit
LOGSHIP
}

## function to get Production Sequence
runsqlprod()
{
sqlplus -s -l ${CONNECTSTR}  << PRDCHK
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
set heading off
SELECT distinct SEQUENCE# FROM V\$ARCHIVED_LOG WHERE (THREAD#,FIRST_TIME ) IN (SELECT THREAD#,MAX(FIRST_TIME) FROM V\$ARCHIVED_LOG GROUP BY THREAD#);
exit
PRDCHK
}

## function to get Standby Sequence
runsqlstdby()
{
sqlplus -s -l / as sysdba  << STDCHK
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
set heading off
select max(sequence#) from V\$ARCHIVED_LOG where applied='YES';
exit
STDCHK
}

shutdown_db()
{
sqlplus -s -l / as sysdba <<E1
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
spool shutdown_report.log
select name, open_mode from V\$database;
select username, machine from v\$session where status='ACTIVE';
shutdown immediate;
spool off
exit
E1
}

apply_logs()
{
sqlplus -s -l / as sysdba <<E2
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
spool apply_report.log
startup nomount;
alter database mount standby database;
recover managed standby database using current logfile disconnect from session;
spool off
exit
E2
}


#### custom handshake function with whatever reporting job you run
handshake()
{
sqlplus -s -l / as sysdba <<E4
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
##Replace with code here
##@handshake.sql
exit
E4
}

switch_to_ro()
{
sqlplus -s -l / as sysdba <<E2
whenever sqlerror exit sql.sqlcode;
whenever oserror exit FAILURE
spool switch2ro.log
recover managed standby database cancel;
alter database open read only;
spool off
exit
E2
}



##Find out if the logs are shipped already, if not no point continuing

PRODSEQ=`runsqlprod`
SHIPPED=`log_shipping_check`

if [ ${SHIPPED} -lt ${PRODSEQ} ]
then
echo "The "${ENVIRON}" Reporting and Source Logs are not shipped through the day. Need DBA intervention" >> error.log
echo "Production sequence is "${PRODSEQ} >> error.log
echo "Standby Sequence shipped is "${SHIPPED} >> error.log
mail_it_to_all()
exit 1
fi




####Shutdown the reporting to allow log application

shutdown_db

if grep -c 'ORA-' shutdown_report.log
then
echo ${ENVIRON}"-Error in bringing down reporting database..."
mail_it_to_all
exit 1
else
echo ${ENVIRON}"-Reporting Shutdown..."
fi


####Apply the archive logs and bring to current

apply_logs

if grep -c 'ORA-' apply_report.log
then
echo ${ENVIRON}"-Error in applying the logs on reporting database..."
mail_it_to_all
exit 1
else
echo ${ENVIRON}"-Reporting Logs Applying..."
fi




####Ensure it is synched with Production



### Sleep 15 minutes 10 times until Standby catches up with Prod site
### Increase freq as required


for i in 1 2 3 4 5 6 7 8 9 10
do
STDBSEQ=`runsqlstdby`
if [ ${STDBSEQ} -lt ${PRODSEQ} ]
then
sleep 60
else
break
fi
done


## If catch up still doesnt happen, send email and die!
if [ ${STDBSEQ} -lt ${PRODSEQ} ]
then
echo "The ${ENVIRON} Reporting and Source could not be synched. Need DBA intervention" >> error.log
echo "Production sequence is ${PRODSEQ} " >> error.log
echo "Standby Sequence is ${STDBSEQ} " >> error.log
mail_it_to_all()
exit 1
EEOF
fi

## Bring to read only

switch_to_ro

####Handshake with Application

#handshake()

if grep -c 'ORA-' handshake.log
then
echo ${ENVIRON}"-Error in handshaking with reporting module..."
mail_it_to_all
exit 1
else
echo ${ENVIRON}"-Reporting handshake done..."
fi

mail_it_to_all
mkdir `date "+%m%d%y"`
/usr/bin/mv *.log `date "+%m%d%y"`\/.

Thursday, January 19, 2012

SCN headroom and CVE-2012-0082

CVE-2012-0082 is a security flaw in Oracle Databases; which manifests in some scenarios which can significantly increase the rate SCN increases. InfoWorld also has documented the problem in details - InfoWorld(http://www.infoworld.com/d/security/fundamental-oracle-flaw-revealed-184163-0)
Oracle’s CPU2012Jan addresses this problem. Oracle Document 1374524.1

To understand the problem it is necessary to define SCN headroom.
The SCN value is an ever increasing value for a database. The threshold for this SCN value is also ever increasing number too. The max limit of SCN is defined by the number of seconds past the year 1988 X 16k, hence as time passes this value also keeps increasing. This is the soft-limit. There is also a hard limit of 218 trillion.

It is also necessary to understand that when a DB link operation between two databases, Oracle automatically selects the higher SCN out of the two databases and syncs them between the two databases.


--On first database
SQL> select name from V$database;

NAME
---------
Database1

SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
398829972857 -- note the SCN number



--On second database
SQL> select name from V$database;

NAME
---------
Database2

SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
391409782850


SQL> select sysdate from dual@Database1.WORLD; -- DB link query

SYSDATE
---------
19-JAN-12



SQL> select dbms_flashback.get_system_change_number from dual;

GET_SYSTEM_CHANGE_NUMBER
------------------------
398829973230 -- notice the significant increase for synchronizing the SCNs




InfoWorld also documents that Oracle Hot backups can significantly increase the scn generation. To find the headroom at any given time use the below query.


select
version,
to_char(SYSDATE,'YYYY/MM/DD HH24:MI:SS') CURR_TIME,
((((
((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) + ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
(((to_number(to_char(sysdate,'DD'))-1))*24*60*60) + (to_number(to_char(sysdate,'HH24'))*60*60) +
(to_number(to_char(sysdate,'MI'))*60) + (to_number(to_char(sysdate,'SS')))) * (16*1024))
- dbms_flashback.get_system_change_number)/ (16*1024*60*60*24)) scn_headroom
from v$instance;


See Oracle document : ID 1393363.1
Reading through Oracle’ patch if the headroom is less than 62, urgent attention is required. i.e. Apply patch immediately and contact Oracle.
To ensure that the headroom is not being reached rapidly, run the below query.


select max(first_change#)-min(first_change#), to_char(first_time,'YYYY-MM-DD') from V$log_history
group by to_char(first_time,'YYYY-MM-DD')
order by 2 desc;




Sometimes there may be a significant jump in the SCN which can be associated with a DBlink activity. Hot backups can also significantly increase the SCN generation rate as detailed by InfoWorld
If you notice that the SCN rate is very high there is plenty to worry about. If the high SCN rate database is interconnected via DBlink with other databases it can bring down a lot of databases together.
Along with the patch, Oracle have also added a hidden parameter _external_scn_rejection_threshold_hours which needs to be set to 24. There is not much details about this hidden parameter , but it might have something to do with the distributed transaction SCN synch.
As always please ensure you have read and understood the Oracle documentation before applying.