Loading log files into Oracle

by MikeHogg8. March 2012 17:51

One of my last Oracle projects was pretty neat, because I started working with the new 11g feature, external tables.  This allowed Oracle to mount a file as a table, and was incredibly fast compared to using sqlloader, which was what we had been doing for years. 

In this case I was loading unix log files in the order of millions of rows for each daily file, by loading the external table, and then processing that table into our permanent logging table.  The data sets involved here were pretty big, and so usual manipulation like inserts for millions of rows would take hours and hours, so changing from Sql Loader to external tables saved a lot of time, but I still had a lot of inserts to make, so I added some tweaks, like dropping indices and recreating them after, and then updated stats with the new indices for Oracle’s Query Optimizer. 

Once I had the files shared on a network location accessible to this Oracle unix server, I loaded them with this proc:

procedure LoadExtTable(filedate varchar2) is
execute immediate 'create table mdf_meta_activity_dump ( IP_ADDRESS VARCHAR2(255), PID NUMBER,' ||
                      'SYMBOL VARCHAR2(255), USER_ID VARCHAR2(50), APPLICATION VARCHAR2(60),' ||
                      'HOSTNAME VARCHAR2(60), SYMBOL_MESSAGE VARCHAR2(255), SYMBOL_FORMAT VARCHAR2(255),' ||
                      'SCRIPT_NAME VARCHAR2(255), PROCMON_PROCESS VARCHAR2(255), TIME_STAMP DATE )' ||
                      'organization external (type oracle_loader default directory exttabdir access parameters ' ||
                      '(RECORDS DELIMITED BY NEWLINE FIELDS TERMINATED by ''|'' ' ||
                      ' ) LOCATION (''\someplace\somedb\udpserver\udp.txt''));';

I would process the dump with this proc, which also updated two other tables and was written to be re-runnable, so that, in case of failure or just manual mistake, running the same file of millions of rows would not result in a mess of a million duplicates. 

You will also see here Oracle bulk statements, and logging, which allowed someone to monitor the process realtime, as it usually took some minutes or hours.

procedure ProcessActivityDump is
cursor c_log(p_file_date date) is
select s.id, d.user_id, d.symbol_message, d.time_stamp, p_file_date, trunc(d.time_stamp), to_char(d.time_stamp,'M')
 from mdf_meta_symbol s
 join mdf_meta_activity_dump d
 on s.name = d.symbol
  type t_activity is table of c_log%rowtype;
  r_activity t_activity;
  v_count number;
  v_file_date date;
    merge into mdf_meta_proc p
using (select distinct procmon_process, script_name from mdf_meta_activity_dump) d
on (p.procmonjob = d.procmon_process and p.script = d.script_name) 
when not matched then
insert (id, procmonjob, script, active_fg, insert_date, audit_date, audit_user)
values(seq_mdf_id.nextval, procmon_process, script_name, 1, sysdate, sysdate, 'PKG_META');
    Log_This('PKG_META.ProcessActivityDump','MDF_META_PROC new rows inserted: ' || sql%rowcount ,'INFO');
-- SYMBOL, rerunnable
    merge into mdf_meta_symbol s
using (select distinct symbol, p.id from mdf_meta_activity_dump join mdf_meta_proc p on procmon_process = procmonjob and script_name = script) d
on (s.name = d.symbol)
when not matched then
insert(id, name, proc_id) values (seq_mdf_id.nextval, symbol, d.id);
    Log_This('PKG_META.ProcessActivityDump','MDF_META_SYMBOL new rows inserted: ' || sql%rowcount ,'INFO');
select file_date into v_file_date from (
 select trunc(time_stamp) file_date, count(*) 
 from mdf_meta_activity_dump
 group by trunc(time_stamp) 
 order by count(*) desc) where rownum = 1;
-- delete existing activity for this day, to make rerunnable
delete from mdf_meta_activity where file_date = v_file_date;
    Log_This('PKG_META.ProcessActivityDump','Dump_Date: ' || v_file_date || ' rows deleted in preparation for new load: ' || sql%rowcount ,'INFO');
-- now add the activity, logging only every 200k or so
-- maybe need to drop idx and recreate after
open c_log(v_file_date);
    v_count := 0;
fetch c_log bulk collect into r_activity limit 1000;
exit when r_activity.count = 0;
      forall idx in 1..r_activity.count
insert into mdf_meta_activity
values   r_activity(idx);
      v_count := v_count + r_activity.count;
if Mod(v_count, 200000) = 0then
        Log_This('PKG_META.ProcessActivityDump','Cumulative insert now at ' || v_count || ' rows','INFO');
end if;
end loop;
end ProcessActivityDump;

And that’s it.

Oracle aggregation query

by MikeHogg20. January 2012 14:56

One of the processes I set up required a merging of a few different data sources.  Besides our own data store of user input data, we also had our own data store of operational account data, and went to another system to get account billing data, and yet another system for customer contract data.  This query was an aggregation of several queries that, all together, aggregated the exact data we needed for our data warehouse import.  Some of the neat features of this query are the company name partition with replace to account for manual data entry discrepencies…

cursor c_contracts(p_start_date date, p_end_date date, p_snapshot_date date) is
Select seq_some.nextval, c.propid, to_char(p_start_date,'YYYYMM'), 
         PRODUCT_NAME ,
         ACCOUNTS_SOLD ,
case when bui_account.bdm_id is not null then trim(csh_account.bdm)
 when bui_contract.bdm_id is not null then trim(csh_contract.bdm)
 else 'Unknown' end BDM, 
trim(c.account_bdm) account_bdm,
trim(c.contract_bdm) contract_bdm,
          c.maxid  COMPANY_ID,
          substr(1000000000 + nvl(trim(regexp_replace(regexp_replace(c.duns_no,'-|/|[a-z|A-Z]|\.|[[:space:]]',''),'([[:digit:]]{9})[[:digit:]]*','\1')),'999999999'),2,9) DUNS_NO,
          TRUNC(FM_NET_MARGIN,2) FLOW_Margin,
          TRUNC(FM_VOLUME,2)  FLOW_Volume,
          s_billing.bill_type, ----------  BILL TYPES AND OPTIONS
trim(Green_Volume) ,
          1, p_snapshot_date
          , c.OFFER_TYPE
from (  select e.*, max(e.cust_id) over (partition by replace(replace(e.company_name,','),'.')) maxid
                       , NULL OFFER_TYPE
 select null customers_sold, null accounts_sold, 
                       r.MASTERCONTRACT_OUR_NUM propid, r.CONTRACT_STATUS status,
                       r.PRODUCTNAME product_name, r.BDM account_bdm, 'RESI' company_name, 
                       r.BDM contract_bdm, to_char(r.COUNTER_SIGN_DATE,'dd-MON-yy') COUNTER_SIGN_DATE, 
                       '1' cust_id, to_char(r.EST_POWER_FLOW_DATE,'dd-MON-yy') EST_POWER_FLOW_DATE, to_char(r.EST_POWER_END_DATE,'dd-MON-yy') EST_POWER_END_DATE, 
                       r.LEAD_SEGMENT, null mw_Sold, null product_group, r.BUSUNIT_NAME region,
 null sales_agent_fee, null salesfee_coname, null sic_code, r.STATE, r.ZONE, null LEG_MONTH, null LEG_YEAR, null fm_net_margin, 
 null fm_volume, null fm_sales_agent_fee, r.TOTAL_MARGIN origination_netmargin, r.TOTAL_VOLUME origination_volume, 
 null origination_grossmargin, r.TOTAL_CREDITCOST total_credit_cost, null duns_no, null busunit_id, r.ISO, 
 null price_date, null ccm_id, null rec_green_volume, null green_volume, r.UDC_NAME edc_name, r.contract_term contract_term_months, 
                       r.EDC_ACCTNO, r.ZIP servzip, r.ANNUAL_MWH, null contract_type, null non_std_terms_flg, null telesales_flg, null acquisition_type, '1'
                       , r.OFFER_TYPE
                            BUSUNIT_NAME, STATE, ZONE, 
 sum(TOTAL_MARGIN) over (partition by mastercontract_our_num) TOTAL_MARGIN, 
 sum(TOTAL_VOLUME) over (partition by mastercontract_our_num) TOTAL_VOLUME, 
 sum(TOTAL_CREDITCOST) over (partition by mastercontract_our_num) TOTAL_CREDITCOST, 
                            ISO, UDC_NAME, contract_term, 
                            EDC_ACCTNO, ZIP, ANNUAL_MWH
                            , OFFER_TYPE
 from cer_res.residential) r
-------------- somebillingsystem 
left join ( 
 select ag.agree_num, (select brk.name from s_org_ext@sblreadonly brk where brk.row_id = ag.x_broker_2_account_id) broker2, 
                         (select count(q.row_id) from s_doc_quote@sblreadonly q where q.quote_num = qu.quote_num) num_quotes,
                         round(months_between(sysdate,co.cust_since_dt),2) cust_accumulated_term, co.x_sub_type, co.ou_type_cd,
 case when exists (select ai.row_id from somebillingsystem.s_agree_item@sblreadonly ai
 join somebillingsystem.s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
 where ag.row_id = ai.doc_agree_id
 and ai.itm_stat_cd = 'Pending' and sa.cust_stat_cd = 'Enrolled'
 and ag.stat_cd = 'Pending' and ag.x_offer_type = 'Renewal' ) then 1 else 0 end Early_Renewal
 from S_DOC_AGREE@sblreadonly AG
        )somebillingsystem on c.propid = somebillingsystem.agree_num
-------------   BILL OPTIONS
left join ( select ag.agree_num, 
 min(bp.x_group_bill)  billing_option, -- Multi over Single
 min(bp.bill_pref_cd) bill_type -- SOME (Charges) over UDC (Rate) 
 from s_doc_agree@sblreadonly ag
 join s_agree_item@sblreadonly ai on ag.row_id = ai.doc_agree_id
 join s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
 join s_party_rel@sblreadonly pr on sa.row_id = pr.rel_party_id
 and pr.rel_type_cd = 'Bill Group'
 and pr.x_primary_bill_group = 'Y'
 join s_org_ext@sblreadonly bg on pr.party_id = bg.row_id
 join S_INV_PROF@sblreadonly bp on bg.row_id = bp.accnt_id
 group by ag.agree_num
           ) s_billing on s_billing.agree_num = c.propid
---------    AR_BALANCE
left join (select contract_id, /*ar.edc_acctno, */sum(ar.ar_bal_current) ar_bal_current, sum(ar.ar_bal_1_30) ar_bal_1_30 , 
 sum(ar.ar_bal_31_60) ar_bal_31_60, sum(ar.ar_bal_61_90) ar_bal_61_90, 
 sum(ar.ar_bal_91plus) ar_bal_91plus, ar.snapshot_date, round(sum(bad.balance),2) bad_debt_balance
 from (select x.contract_id, x.edc_acctno, sum(b.ar_bal_current) ar_bal_current, sum(b.ar_bal_1_30) ar_bal_1_30 , 
 sum(b.ar_bal_31_60) ar_bal_31_60, sum(b.ar_bal_61_90) ar_bal_61_90, 
 sum(b.ar_bal_91plus) ar_bal_91plus, b.snapshot_date
 from some_uni_cust_arb b
 join some_uni_edc_x_propid x on b.edc_acctno = x.edc_acctno and x.snapshot_date = b.snapshot_date
 where b.snapshot_date = (select max(snapshot_date) from some_uni_cust_arb where snapshot_date <= p_end_date)
 --and x.contractid = '1-220606257'
 group by x.contract_id, x.edc_acctno, b.snapshot_date
                           ) ar
 -- BAD DEBT LOGIC : ave of EDCs, summed up at contract level, so three EDCs in a contract have 24 month avgs of 12, 1250, 650 bad debt show at contract 1912
 join (select b.edc_acctno, avg(b.ar_bal_91plus) balance
 from (select a.edc_acctno,trunc(a.snapshot_date,'MM'),  max(a.snapshot_date) maxdate
 from some_uni_cust_arb a group by a.edc_acctno, trunc(snapshot_date, 'MM')
                                   ) maxdate
 join some_uni_cust_arb b on maxdate.edc_acctno = b.edc_acctno and maxdate.maxdate = b.snapshot_date
 where b.snapshot_date between add_months(p_start_date,-23) and p_end_date
 group by b.edc_acctno
                           ) bad on ar.edc_acctno = bad.edc_acctno
 group by ar.contract_id, ar.snapshot_date
        ) ar on ar.contract_id = c.propid
left join ( 
 select contractid,
 -- for breakout analysis- remove grouping if you run this query alone
 /*min(billgroupoption) over(partition by contractid) billgroupoption, -- M trumps S
                            max(bgbmf) over(partition by contractid) bgbmf, -- in case of multiple just get the highest
                            max(ssomef) over(partition by contractid) ssomef, 
                            q.uidwqitem, q.openedtime ,q.wqtypecode, q.wqqueuecode, q.weight
                            ,sum(nvl(weight,0)) over(partition by contractid) sumweightpercontract*/
                            decode(min(billgroupoption),'M',max(bgbmf), max(ssomef)) + sum(weight) Billing_Weight
 from (
 select con.contractid,
                                    bg.billingmodeflag bgbmf, 
                                    sa.billingmodeflag ssomef, 
 --wq.uidwqitem, wq.openedtime ,wq.wqtypecode, wq.wqqueuecode, 
 from (select contractid, uidaccount
 from (select c.contractid, ci.uidaccount, max(c.contractid) over (partition by ci.uidaccount) maxid
 from (select uidaccount, uidcontract from CUSTOMERSYS.lscmcontractitem@ls_link) ci
 join (select contractid, starttime, stoptime, uidcontract, executetime
 from CUSTOMERSYS.lscmcontract@ls_link) c on ci.uidcontract = c.uidcontract and c.executetime is not null
 where c.starttime < p_end_date and c.stoptime > add_months(p_start_date,-2)
                                            ) where contractid = maxid
                                     ) con
 -- SUBSELECT to pick one(can't tell which is correct) when LS has bad dupes
 left join (select uidaccount, accountid, billingmodeflag
 from (select a.uidaccount, a.accountid, a.udcacctid, a.billingmodeflag,
                                                    a.lstime, max(a.lstime) over (partition by a.udcacctid) maxlstime
 where a.accounttypecode = 'SERVICE_ACCOUNT' ) 
 where maxlstime = lstime
                                     ) sa on sa.uidaccount = con.uidaccount
 LEFT JOIN (select lci.accountid, lci.wqtypecode, lci.wqqueuecode, q.weight--, lci.uidwqitem, lci.openedtime
 join some_work_queue q on lci.WQTYPECODE = q.type and lci.WQQUEUECODE = q.code
 where CLOSEDTIME Between add_months(p_start_date,-2) and p_end_date
                                        ) wqON wq.accountid = sa.accountid
 -- SUBSELECT to pick one(can't tell which is correct) when LS has bad overlap
 left JOIN (select uidaccountfrom, uidaccountto
 from (select l.uidaccountfrom, l.uidaccountto, 
 max(l.uidaccountfrom) over (partition by l.uidaccountto) maxbgacctid
 where l.starttime < p_end_date and l.stoptime > add_months(p_start_date,-2))
 where uidaccountfrom = maxbgacctid
                                    )x ON sa.UIDACCOUNT = x.UIDACCOUNTTO
 and bg.accounttypecode = 'BILLING_GROUP'
 left JOIN (select billgroupoption, uidaccount
 from (select ah.billgroupoption, ah.uidaccount, 
                                                   ah.starttime, min(ah.starttime) over (partition by ah.uidaccount) maxstart
 where starttime = maxstart
                                   )h ON bg.UIDACCOUNT = h.UIDACCOUNT
                            ) q group by contractid
         ) ls on ls.contractid = c.propid
-----------   BDM JOIN
left join main_rpt.SOME_sales_hierarchy_new csh_account on UPPER(c.account_bdm) = UPPER(csh_account.bdm_crm)
left join some_uni_bdm bui_account on csh_account.employee_id = bui_account.bdm_id and bui_account.month = to_char(p_start_date,'YYYYMM')
left join main_rpt.SOME_sales_hierarchy_new csh_contract on UPPER(c.contract_bdm) = UPPER(csh_contract.bdm_crm)
left join some_uni_bdm bui_contract on csh_contract.employee_id = bui_contract.bdm_id and bui_contract.month = to_char(p_start_date,'YYYYMM')
where trim(c.propid) is not null
and c.est_power_end_date >= p_start_date
and c.counter_sign_date  <= p_end_date;
the subselects for AR balances, and the Bad Debt aggregations over different partitions… the Decode for deciding which weights to use in calculations… and picking distinct values from an external system that had inherent 1-3% data anomalies.



Using a Data Sandbox to allow users to push their own data changes to Prod

by MikeHogg14. November 2011 19:03

Most Pl/Sql code is fairly boring, selecting lots of columns from lots of joins and maybe doing calculations and decodes on the fly, or groupings and other aggregations, and sub queries.  But one project I had was pretty neat.  Users had two lists of Products, and although the names and attributes were not usually the same, they were the same, just called different things by different companies. 

When we started using a new Product, we had to find the two matching ones and map them together.  This involved a process that ultimately led to what we called a data fix.  No code was changed, but data that the business process relied on had to be created or changed in the production database.  To do this required an IT resource to team up with a business resource and first make the changes in a test environment, and then test, and make adjustments, and test again, and then do the same exact cumulative change they did in the test environment, in the production environment. 

This was time intensive, and highly manual.  I wanted to make this quick, and automatic, without any room for manual errors.  So one of the features of the new tool I built was for the users to make these changes themselves, in a test environment, with full auditing and editing capability, and be able to see for themselves the effects their changes had, and when they were ready, push a button to have the exact same results applied to their production environment. This worked wonderfully, and worked with full auditing and history tracking in case any recovery was necessary. Besides some finely tuned controls in the new UI, all it required was an Oracle database link to the test environment.


procedure PushToProd(p_audit_user varchar2,
                       p_id         my_product.id%type,
                       p_notevarchar2) is
insert into my_product_history
      (audit_user,  audit_date,  type,
       id,  name, pf_code,  pf_type, 
       cc_code, exch_id,  gs_name,
       pf_commodity, lot_size,  uom_cd, 
       some_lot_sz, term, opt_style,
       product_type,  calendar,  location,
       description_id, trader_group,  settle_date_rule,
       eot_date_rule,  uses_dst, exchange)
      (select p_audit_user,   sysdate, 'PTP',
              nvl(p.id, pp.id),  nvl(p.name, pp.name),  nvl(p.pf_code, pp.pf_code),  nvl(p.pf_type, pp.pf_type),
              nvl(p.cc_code, pp.cc_code),  nvl(p.exch_id, pp.exch_id),  nvl(p.gs_name, pp.gs_name),
              nvl(p.pf_commodity, pp.pf_commodity),  nvl(p.lot_size, pp.lot_size),  nvl(p.uom_cd, p.uom_cd),
              nvl(p.some_lot_sz, pp.some_lot_sz),  nvl(p.term, pp.term),   nvl(p.opt_style, pp.opt_style),
              nvl(p.product_type, pp.product_type),  nvl(p.calendar, pp.calendar),  nvl(p.location, pp.location),
              nvl(p.description_id, pp.description_id),  nvl(p.trader_group, pp.trader_group),  nvl(p.settle_date_rule, pp.settle_date_rule),
              nvl(p.eot_date_rule, pp.eot_date_rule),  nvl(p.uses_dst, pp.uses_dst), nvl(p.exchange, pp.exchange)
from my_product p
full outer join my_product@SOMEW_LINK pp on p.id =
                                                                  pp.id -- handles new 
where p.id = p_id
or pp.id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product_history');
delete from my_product where id = p_id;
delete from some_fee where product_id = p_id;
insert into my_product
      (select * from my_product@SOMEW_LINK where id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product');
insert into some_fee
      (select *
from SOME.some_fee@SOMEW_LINK
where product_id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in some_fee');
when others then
procedure PushMappingToProd(p_audit_user varchar2,
                              p_v_id       my_values.id%type,
                              p_notevarchar2) is
insert into my_values_history
      (audit_user,  audit_date,  type,
       id, key,  value,
       val_type, schema,  rule,  exch_id)
select p_audit_user,  sysdate,  'PTP',
             nvl(v.id, vv.id),  nvl(v.key, vv.key),  nvl(v.value, vv.value),
             nvl(v.val_type, vv.val_type),  nvl(v.schema, vv.schema),
             nvl(v.rule, vv.rule),  nvl(v.exch_id, vv.exch_id)
from my_values v
full outer join my_values@SOMEW_LINK vv on v.id =
where v.id = p_v_id
or vv.id = p_v_id;
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' record(s) inserted by User ' ||
                              p_audit_user || ' in my_values_history');
delete from my_values where id = p_v_id;
insert into my_values
      (select * from my_values@SOMEW_LINK where id = p_v_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_value');
when others then


I also put some auditing features in the new UI.  The system relied on the premise that the two databases to start with exact primary key ids and to keep them in sync between the two systems.  I built several auditing screens, showing even the detail for each user’s changes in each environment, so that more thatn one user could see what other users had done.  Here’s one of the auditing tools that showed the user the differences between some of the aggregations of the two systems.  This is the boring sql I was talking about before.  This also works with that oracle database link.

procedure ViewLotDiff(p_valuation_date in me_trade.valuation_date%type,
                        cur_out          out cur) is
-- Added condition v.expiration_date >=  p_valuation_date
open cur_out for
select gg.anothercompany_name,
 when sum(some_w_lots) <>sum(gs_lots) then
 case when sum(some_p_lots) <>sum(gs_lots) then 'BOTH'
 else 'some_W' end
 end as issue, 
 sum(some_w_lots) total_some_w_lots,
 sum(some_p_lots) total_some_p_lots,
 sum(gs_lots) total_gs_lots
from (select some_w_lots,
 from (select round(sum(v.lots),1) some_w_lots,
                             0 some_p_lots,
                             0 gs_lots,
                             'some_W' src,
 from some_.MAP_A_VW v
 where 1 = 1
 and v.VALUATION_DATE = p_valuation_date
 and (v.term <> 'DAY' or
                             v.contract_month >= p_valuation_date)
 and ( ( v.pf_commodity in ('WEA','WTI') and nvl(v.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date )
                             ( v.pf_commodity not in ('WEA','WTI') and nvl(v.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
 group by v.anothercompany_NAME, v.PF_CODE )
 union (select 0 some_w_lots,
                           round(sum(d.lots),1) some_p_lots,
                           0 gs_lots,
                           'some_P' src,
                           p.GS_NAME anothercompany_NAME
 from some_.me_trade_detail@some_P_LINK d
 join some_.me_trade_header@some_P_LINK h on d.eti = h.eti
 join some_.me_product@some_P_LINK p on d.pf_code = p.pf_code and d.pf_type = p.pf_type and h.exchange = p.exchange
 where 1 = 1
 and d.VALUATION_DATE = p_valuation_date
 and nvl(d.some__status,'nvl') <> 'ORI'
 and (h.term <> 'DAY' or
                           d.contract_month >= p_valuation_date)
 and ( ( p.pf_commodity in ('WEA','WTI') and nvl(d.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date  )
                             ( p.pf_commodity not in ('WEA','WTI') and nvl(d.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
 group by p.GS_NAME, d.pf_code )
 union (select 0 some_w_lots,
                           0 some_p_lots,
 sum(g.net_qty) gs_lots,
                           'GS' src,
                           '' pf_code,
                           g.prod_description anothercompany_name
 from gsd_open_positions g
 where 1 = 1
 and g.stmt_date = p_valuation_date
 and g.expiry_date >= p_valuation_date
 group by g.prod_description ) 
              ) gg
group by anothercompany_name
having ( (sum(some_w_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_w_lots) <> 0) )
order by anothercompany_NAME;

Creating large tables in Oracle with dynamic partitions to allow for instant reads

by MikeHogg29. September 2011 14:55

A lot of my work in Oracle was with tables between a million rows and a billion, and so indexing and query performance was always part of writing the system.  In this case I knew my table was going to grow a quarter to a half million rows a day, and so to retain immediate selects, I needed partitions.  While it is easy to build a table with the initial partitions, you have to maintain the scheme with a nightly or a weekly job.  Here is the initial table creation:

createtable SOME_CONTRACT
  ID                             NUMBER not null,
  CONTRACT_IDVARCHAR2(150) not null,
MONTH                          NUMBER not null,
  <snip some hundred columns>
  ACTIVE_FG                      NUMBER,
tablespace SOME_TBS01
  pctfree 10
  initrans 1
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
partition by range (snapshot_date)( 
partition P20110101 values less than (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110201 values less than (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110301 values less than (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110401 values less than (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110501 values less than (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110601 values less than (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110701 values less than (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110801 values less than (TO_DATE(' 2011-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110901 values less than (TO_DATE(' 2011-09-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110902 values less than (TO_DATE(' 2011-09-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110905 values less than (TO_DATE(' 2011-09-06 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110906 values less than (TO_DATE(' 2011-09-07 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110907 values less than (TO_DATE(' 2011-09-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110908 values less than (TO_DATE(' 2011-09-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110909 values less than (TO_DATE(' 2011-09-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110912 values less than (TO_DATE(' 2011-09-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110913 values less than (TO_DATE(' 2011-09-14 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110914 values less than (TO_DATE(' 2011-09-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110915 values less than (TO_DATE(' 2011-09-16 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition PMAX values less than (TO_DATE(' 2099-12-31 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
-- Create/Recreate primary, unique and foreign key constraints 
alter table SOME_CONTRACT
add constraint PK_SOMECONTRACTS_ID primary key (ID)
using index
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
-- Create/Recreate indexes 
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited

So new rows get inserted into PMAX each day in a daily process.  Partitions in Oracle are like separate tables, and in our case, most of our select queries will always be on snapshot_date, so it is like we are querying a single table.  We can also add indices if we want.  I think in my case here I was using an active flag and so I added a composite index with the PK and the column Active_fg.

To maintain this scheme, you have to periodically add new partitions, and split up PMAX.  I did that with a scheduled job, that ran a Stored Proc each night.

procedure Maintain_Partitions is
cursor c_nightly_deletes is
 select sdate
 from (select distinct snapshot_date sdate, count(distinct snapshot_date) over (partition by month) total
 from SOME_contract_nightly
 where month = to_char(  sysdate  ,'YYYYMM') 
 order by snapshot_date ) 
 where rownum < total - 3;-- keep three latest in any given month
cursor c_compress_ddl (p_table_name varchar2 ) is
select 'alter table ' || p_table_name || ' move partition ' || partition_name || ' compress' ddl, partition_name
from user_tab_partitions
where table_name = p_table_name
 and partition_name not like '%MAX%'
 and (compression <> 'ENABLED'
 or -- recompress NIGHTLYs since those older than 3 days are being deleted
                 table_name = 'SOME_CONTRACT_NIGHTLY' and partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%'
cursor c_truncate_ddl ( p_table varchar2  ) is
select 'alter table ' || table_name || ' truncate partition ' || partition_name || ' update global indexes' as ddl, partition_name
from user_tab_partitions
where table_name = p_table
 and partition_name not like '%MAX%'
 and to_date( trim( substr(partition_name, 2) ), 'YYYYMMDD' )   <= (add_months(sysdate,-24));
for r_nightly_deletes in c_nightly_deletes loop
delete from SOME_contract_nightly
where snapshot_date = r_nightly_deletes.sdate;
      Log_This('PKG_SOME_UI.Maintain_Partitions','SOME_CONTRACT_NIGHTLY snapshot_date' ||
        to_char(r_nightly_deletes.sdate,'DD-MON-YY') || ' deleted: ' || sql%rowcount);
end loop;
for each_table in (select distinct table_name from user_tab_partitions where table_name like 'SOME%') loop
-- create new partitions
      SplitWeekdayPartitions( each_table.TABLE_NAME);
-- compress (today's) data
for r_compress_ddl in c_compress_ddl ( each_table.table_name ) loop
execute immediate r_compress_ddl.ddl;
        Log_This('PKG_SOME_UI.Maintain_Partitions',each_table.table_name || ' partition compressed: ' || r_compress_ddl.partition_name);
end loop;
-- remove any partitions older than two years
for r_truncate_ddl in c_truncate_ddl ( each_table.table_name  ) loop
 execute immediate r_truncate_ddl.ddl;
          Log_This('PKG_SOME_UI.Maintain_Partitions', each_table.table_name || ' Partition Truncated: ' || r_truncate_ddl.partition_name);
end loop;
end loop;
for x in (  select t.owner, t.table_name, p.partition_name
 from all_tables t
 join all_tab_partitions p
 on t.table_name = p.table_name
where p.table_name like 'SOME%'
 and (p.last_analyzed is null
                  p.table_name = 'SOME_CONTRACT_NIGHTLY'
 and p.partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%' ) )
                      degree=> dbms_stats.AUTO_DEGREE,
                      ownname => x.owner,
                      tabname => x.table_name,
                      partname => x.partition_name,
 cascade => dbms_stats.AUTO_CASCADE );
              Log_This('PKG_SOME_UI.Maintain_Partitions','Statistics gathered for table.partition: ' || 
                                    x.table_name || '.' || x.partition_name);
end loop;
exception when others then
      Rebuild_Indices(); -- do not leave in unuseable state
      raise; -- notify user to investigate and rerun
end Maintain_Partitions;


In this particular case, I was deleting old partitions, and using compression to keep the total database size down.   The SplitWeekdayPartitions created the new ones:

PROCEDURE SplitWeekdayPartitions( p_table_name in varchar2 ) AS
  v_latest_date date;
  v_new_date date;
  v_part_name varchar2(8);
  v_upper_bound varchar2(8);
SELECT to_date(substr(MAX(partition_name) KEEP (DENSE_RANK FIRST ORDER BY partition_position desc ), 2),'YYYYMMDD')
into v_latest_date
FROM (SELECT partition_name, partition_position
 FROM user_tab_partitions
 WHERE table_name = upper(p_table_name) 
 AND partition_name <> 'PMAX' );
if v_latest_date < sysdate
            v_new_date := v_latest_date + 1;
 while to_char(v_new_date, 'd') in (1,7) loop -- skip weekend dates
                v_new_date := v_new_date + 1;
 end loop;
            v_part_name := to_char(v_new_date,'YYYYMMDD');
            v_upper_bound := to_char(v_new_date + 1,'YYYYMMDD');
 execute immediate
            'ALTER TABLE ' || p_table_name || ' SPLIT PARTITION PMAX AT (to_date(' || v_upper_bound ||
                ',''YYYYMMDD'')) INTO ' || '( PARTITION P' || v_part_name || 
            Log_This('PKG_SOME_UI.SplitWeekdayPartitions', 'New partition P' || v_part_name || ' created for ' || p_table_name);
            SplitWeekdayPartitions( p_table_name );-- do we need to catch up to today?
end if;
END SplitWeekdayPartitions;

It’s interesting that I picked a recursive call here instead of a while loop.

… and in the MaintainPartitions proc, RebuildIndices was necessary afterwards because the alter table commands and truncating didn’t trigger index updates.

procedure Rebuild_Indices is
-- LOCAL INDICES   (not using any now but maybe in the future...)
 execute immediate
                          'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD PARTITION ' || x.PARTITION_NAME ;
            Log_This('PKG_SOME_UI.Maintain_Partitions','Partition Index Rebuilt: ' || x.index_name);
end loop;
 WHERE table_name like 'SOME%' and STATUS not in ( 'VALID', 'N/A' ) )
 execute immediate
                        'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD ' ;
             Log_This('PKG_SOME_UI.Maintain_Partitions','Global Index Rebuilt: ' || x.index_name);
end loop;
and that’s it.  Neat feature to plug into your next large table.

Mapping patterns in Oracle

by MikeHogg28. June 2011 19:38

In my research, Mapping patterns came in three forms and they varied by levels of complexity.  The simplest form was a mapping table that most people recognize as a many to many join.  This is easy to use, and easy to understand, but doesn’t allow dynamic mapping.  You have to have both entities primary keys and then you have to do an insert to map the relationship.  This won’t work on a data store that is constantly receiving new data.

The second form allows dynamic mapping based on attributes.  You can set up a process that automatically inserts the rows or assigned the mappings according to each entity’s attribute value.  Then you could set up a process where every new entity whose X = Y could be automatically mapped to entity A.  This requires columns for each attribute though, and didn’t allow multiple entities.

The third form of mapping table, involves dynamic attributes, dynamic tables, and allows the most flexibility, but is the most difficult to implement and use, which you realize just as soon as you start writing your procs against it.

I happened to inherit a system where the original consultant set up the third form, and the following generations of developers built a variety of complicated tools and complex business processes to maintain it.  While it took me a while to understand it, once I did, and realized its strengths, I also realized a simple UI could abstract away the complexity and make it easy to use.

Behind the UI though had to be the data layer, and while the UI was simple looking, the data layer was where all the complexity had to be simplified.  Here is one of the queries I wrote to run a mapping function on the fly, for one particular entity, having at least three but no more than four different mapping attributes.  This is one of the few times I had to use dynamic SQL in Oracle, whose PL/SQL allowed for great flexibility but would not help me in this case:


procedure MapUnmappedparts(p_certain_date my_part.certain_date%type) is
cursor c_mappings is
select  v.id, v.value, v.schema, v.exch_id,
            firstkey.attr_name firstname, regexp_substr(v.key, '[^()]+',1,2) firstvalue,
            secondkey.attr_name secondname, regexp_substr(v.key, '[^=()]+',1,4) secondvalue,
            thirdkey.attr_name thirdname, regexp_substr(v.key, '[^=()]+',1,6) thirdvalue,
            fourthkey.attr_name fourthname, regexp_substr(v.key, '[^=()]+',1,8) fourthvalue,
            fifthkey.attr_name fifthname, regexp_substr(v.key, '[^=()]+',1,10) fifthvalue,
            sixthkey.attr_name sixthname, regexp_substr(v.key, '[^=()]+',1,12) sixthvalue,
            seventhkey.attr_name seventhname, regexp_substr(v.key, '[^=()]+',1,14) seventhvalue
from my_values v
join my_attributes firstkey on firstkey.short_name = regexp_substr(v.key, '[^=]+',1,1)
join my_attributes secondkey on secondkey.short_name = regexp_substr(v.key, '[^=()]+',1,3)
join my_attributes thirdkey on thirdkey.short_name = regexp_substr(v.key, '[^=()]+',1,5)
left outer join my_attributes fourthkey on fourthkey.short_name = regexp_substr(v.key, '[^=()]+',1,7)
left outer join my_attributes fifthkey on fifthkey.short_name = regexp_substr(v.key, '[^=()]+',1,9) 
left outer join my_attributes sixthkey on sixthkey.short_name = regexp_substr(v.key, '[^=()]+',1,11) 
left outer join my_attributes seventhkey on seventhkey.short_name = regexp_substr(v.key, '[^=()]+',1,13) 
join my_schemas s on s.sch_id = v.schema and s.key_order = 1
order by s.eval_order, s.sch_id;
    v_query varchar2(2000);
    Pkg_somy__Log.writeMessage(p_module   => 'Pkg_ProductMapper.MapUnmappedparts',
                              p_severity => 'DEBUG',
                              p_message  =>  'Started');
for r_mapping in c_mappings loop
      v_query := 'merge into my_part_Detail d
 using (select dd.id, h.co, h.d_type, dd.curve, dd.delivery_point, 
                                dd.curve_2, dd.commodity, h.part_start_month, h.part_end_month, 
                                dd.term_of_trading_cd, h.price_src_1, h.price_src_2, h.term,
                                dd.vintage, dd.adj_dm_year, h.product_code, dd.region, dd.technology
 from my_part_detail dd join my_part_header h on dd.eti = h.eti
 where dd.certain_date = :1
 and dd.pf_code is null
 and nvl(dd.somy__status,''nvl'') <> ''ORI''
                           ) unmapped on (unmapped.id = d.id)
 when matched then update set d.pf_code = :2
 where 1=1 ';
      v_query := v_query || ' and unmapped.' || r_mapping.firstname || ' = :3 ';
      v_query := v_query || ' and unmapped.' || r_mapping.secondname || ' = :4 ';
      v_query := v_query || ' and unmapped.' || r_mapping.thirdname || ' = :5 ';
if r_mapping.fourthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.fourthname || ' = :6 ';end if;
if r_mapping.fifthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.fifthname || ' = :7 ';end if;
if r_mapping.sixthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.sixthname || ' = :8 ';end if;
if r_mapping.seventhname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.seventhname || ' = :9 ';end if;
when r_mapping.seventhname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue, r_mapping.seventhvalue ;
when r_mapping.sixthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue ;
when r_mapping.fifthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue ;
when r_mapping.fourthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue ;
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue ;
end case;
end loop;



About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more.amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here