Loading log files into Oracle

by MikeHogg8. March 2012 17:51

One of my last Oracle projects was pretty neat, because I started working with the new 11g feature, external tables.  This allowed Oracle to mount a file as a table, and was incredibly fast compared to using sqlloader, which was what we had been doing for years. 

In this case I was loading unix log files in the order of millions of rows for each daily file, by loading the external table, and then processing that table into our permanent logging table.  The data sets involved here were pretty big, and so usual manipulation like inserts for millions of rows would take hours and hours, so changing from Sql Loader to external tables saved a lot of time, but I still had a lot of inserts to make, so I added some tweaks, like dropping indices and recreating them after, and then updated stats with the new indices for Oracle’s Query Optimizer. 

Once I had the files shared on a network location accessible to this Oracle unix server, I loaded them with this proc:

procedure LoadExtTable(filedate varchar2) is
begin
 
execute immediate 'create table mdf_meta_activity_dump ( IP_ADDRESS VARCHAR2(255), PID NUMBER,' ||
                      'SYMBOL VARCHAR2(255), USER_ID VARCHAR2(50), APPLICATION VARCHAR2(60),' ||
                      'HOSTNAME VARCHAR2(60), SYMBOL_MESSAGE VARCHAR2(255), SYMBOL_FORMAT VARCHAR2(255),' ||
                      'SCRIPT_NAME VARCHAR2(255), PROCMON_PROCESS VARCHAR2(255), TIME_STAMP DATE )' ||
                      'organization external (type oracle_loader default directory exttabdir access parameters ' ||
                      '(RECORDS DELIMITED BY NEWLINE FIELDS TERMINATED by ''|'' ' ||
 
                      ' ) LOCATION (''\someplace\somedb\udpserver\udp.txt''));';
 
end;

I would process the dump with this proc, which also updated two other tables and was written to be re-runnable, so that, in case of failure or just manual mistake, running the same file of millions of rows would not result in a mess of a million duplicates. 

You will also see here Oracle bulk statements, and logging, which allowed someone to monitor the process realtime, as it usually took some minutes or hours.

procedure ProcessActivityDump is
cursor c_log(p_file_date date) is
select s.id, d.user_id, d.symbol_message, d.time_stamp, p_file_date, trunc(d.time_stamp), to_char(d.time_stamp,'M')
 from mdf_meta_symbol s
 join mdf_meta_activity_dump d
 on s.name = d.symbol
              ;
 
  type t_activity is table of c_log%rowtype;
  r_activity t_activity;
  v_count number;
  v_file_date date;
begin
-- PROCS
    merge into mdf_meta_proc p
using (select distinct procmon_process, script_name from mdf_meta_activity_dump) d
on (p.procmonjob = d.procmon_process and p.script = d.script_name) 
when not matched then
insert (id, procmonjob, script, active_fg, insert_date, audit_date, audit_user)
values(seq_mdf_id.nextval, procmon_process, script_name, 1, sysdate, sysdate, 'PKG_META');
    Log_This('PKG_META.ProcessActivityDump','MDF_META_PROC new rows inserted: ' || sql%rowcount ,'INFO');
-- SYMBOL, rerunnable
    merge into mdf_meta_symbol s
using (select distinct symbol, p.id from mdf_meta_activity_dump join mdf_meta_proc p on procmon_process = procmonjob and script_name = script) d
on (s.name = d.symbol)
when not matched then
insert(id, name, proc_id) values (seq_mdf_id.nextval, symbol, d.id);
    Log_This('PKG_META.ProcessActivityDump','MDF_META_SYMBOL new rows inserted: ' || sql%rowcount ,'INFO');
-- ACTIVITY
select file_date into v_file_date from (
 select trunc(time_stamp) file_date, count(*) 
 from mdf_meta_activity_dump
 group by trunc(time_stamp) 
 order by count(*) desc) where rownum = 1;
 
-- delete existing activity for this day, to make rerunnable
delete from mdf_meta_activity where file_date = v_file_date;
    Log_This('PKG_META.ProcessActivityDump','Dump_Date: ' || v_file_date || ' rows deleted in preparation for new load: ' || sql%rowcount ,'INFO');
-- now add the activity, logging only every 200k or so
-- maybe need to drop idx and recreate after
-- create index IDX_MDF_META_ACT_SYMID on MDF_META_ACTIVITY (SYMBOL_ID)
open c_log(v_file_date);
    v_count := 0;
    loop
fetch c_log bulk collect into r_activity limit 1000;
exit when r_activity.count = 0;
      forall idx in 1..r_activity.count
insert into mdf_meta_activity
values   r_activity(idx);
 
      v_count := v_count + r_activity.count;
if Mod(v_count, 200000) = 0then
        Log_This('PKG_META.ProcessActivityDump','Cumulative insert now at ' || v_count || ' rows','INFO');
end if;
 
end loop;
    RebuildIndices;
    GatherStats;
end ProcessActivityDump;

And that’s it.

Add comment

biuquote
  • Comment
  • Preview
Loading

About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more.amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here