Loading log files into Oracle

by MikeHogg 8. March 2012 17:51

One of my last Oracle projects was pretty neat, because I started working with the new 11g feature, external tables.  This allowed Oracle to mount a file as a table, and was incredibly fast compared to using sqlloader, which was what we had been doing for years. 

In this case I was loading unix log files in the order of millions of rows for each daily file, by loading the external table, and then processing that table into our permanent logging table.  The data sets involved here were pretty big, and so usual manipulation like inserts for millions of rows would take hours and hours, so changing from Sql Loader to external tables saved a lot of time, but I still had a lot of inserts to make, so I added some tweaks, like dropping indices and recreating them after, and then updated stats with the new indices for Oracle’s Query Optimizer. 

Once I had the files shared on a network location accessible to this Oracle unix server, I loaded them with this proc:

  procedure LoadExtTable(filedate varchar2) is 
    execute immediate 'create table mdf_meta_activity_dump ( IP_ADDRESS VARCHAR2(255), PID NUMBER,' ||
                      'SYMBOL VARCHAR2(255), USER_ID VARCHAR2(50), APPLICATION VARCHAR2(60),' ||
                      'HOSTNAME VARCHAR2(60), SYMBOL_MESSAGE VARCHAR2(255), SYMBOL_FORMAT VARCHAR2(255),' ||
                      'SCRIPT_NAME VARCHAR2(255), PROCMON_PROCESS VARCHAR2(255), TIME_STAMP DATE )' ||
                      'organization external (type oracle_loader default directory exttabdir access parameters ' ||
                      '(RECORDS DELIMITED BY NEWLINE FIELDS TERMINATED by ''|'' ' ||
                      ' ) LOCATION (''\someplace\somedb\udpserver\udp.txt''));';

I would process the dump with this proc, which also updated two other tables and was written to be re-runnable, so that, in case of failure or just manual mistake, running the same file of millions of rows would not result in a mess of a million duplicates. 

You will also see here Oracle bulk statements, and logging, which allowed someone to monitor the process realtime, as it usually took some minutes or hours.

  procedure ProcessActivityDump is
    cursor c_log(p_file_date date) is 
           select s.id, d.user_id, d.symbol_message, d.time_stamp, p_file_date, trunc(d.time_stamp), to_char(d.time_stamp,'M')
            from mdf_meta_symbol s
            join mdf_meta_activity_dump d
              on s.name = d.symbol
  type t_activity is table of c_log%rowtype;
  r_activity t_activity;
  v_count number; 
  v_file_date date;
    -- PROCS
    merge into mdf_meta_proc p
    using (select distinct procmon_process, script_name from mdf_meta_activity_dump) d
    on (p.procmonjob = d.procmon_process and p.script = d.script_name)    
    when not matched then 
      insert (id, procmonjob, script, active_fg, insert_date, audit_date, audit_user)
      values(seq_mdf_id.nextval, procmon_process, script_name, 1, sysdate, sysdate, 'PKG_META');
    Log_This('PKG_META.ProcessActivityDump','MDF_META_PROC new rows inserted: ' || sql%rowcount ,'INFO');
    -- SYMBOL, rerunnable
    merge into mdf_meta_symbol s
    using (select distinct symbol, p.id from mdf_meta_activity_dump join mdf_meta_proc p on procmon_process = procmonjob and script_name = script) d
    on (s.name = d.symbol)
    when not matched then 
      insert(id, name, proc_id) values (seq_mdf_id.nextval, symbol, d.id);
    Log_This('PKG_META.ProcessActivityDump','MDF_META_SYMBOL new rows inserted: ' || sql%rowcount ,'INFO');    
    select file_date into v_file_date from (
                     select trunc(time_stamp) file_date, count(*) 
                       from mdf_meta_activity_dump 
                      group by trunc(time_stamp) 
                      order by count(*) desc) where rownum = 1;
    -- delete existing activity for this day, to make rerunnable   
    delete from mdf_meta_activity where file_date = v_file_date; 
    Log_This('PKG_META.ProcessActivityDump','Dump_Date: ' || v_file_date || ' rows deleted in preparation for new load: ' || sql%rowcount ,'INFO');
    -- now add the activity, logging only every 200k or so
    -- maybe need to drop idx and recreate after
    open c_log(v_file_date);    
    v_count := 0;
    fetch c_log bulk collect into r_activity limit 1000;
    exit when r_activity.count = 0;
      forall idx in 1..r_activity.count
        insert into mdf_meta_activity
        values   r_activity(idx);
      v_count := v_count + r_activity.count;
      if Mod(v_count, 200000) = 0  then
        Log_This('PKG_META.ProcessActivityDump','Cumulative insert now at ' || v_count || ' rows','INFO');
      end if;
    end loop; 
  end ProcessActivityDump;

And that’s it.


Oracle | Automation

Oracle aggregation query

by MikeHogg 20. January 2012 14:56

One of the processes I set up required a merging of a few different data sources.  Besides our own data store of user input data, we also had our own data store of operational account data, and went to another system to get account billing data, and yet another system for customer contract data.  This query was an aggregation of several queries that, all together, aggregated the exact data we needed for our data warehouse import.  Some of the neat features of this query are the company name partition with replace to account for manual data entry discrepencies…

  cursor c_contracts(p_start_date date, p_end_date date, p_snapshot_date date) is
  Select seq_some.nextval, c.propid, to_char(p_start_date,'YYYYMM'), 
         PRODUCT_NAME ,
         ACCOUNTS_SOLD ,
          case when bui_account.bdm_id is not null then trim(csh_account.bdm)
               when bui_contract.bdm_id is not null then trim(csh_contract.bdm)
               else 'Unknown' end BDM, 
          trim(c.account_bdm) account_bdm,
          trim(c.contract_bdm) contract_bdm,
          c.maxid  COMPANY_ID,
          substr(1000000000 + nvl(trim(regexp_replace(regexp_replace(c.duns_no,'-|/|[a-z|A-Z]|\.|[[:space:]]',''),'([[:digit:]]{9})[[:digit:]]*','\1')),'999999999'),2,9) DUNS_NO,
          TRUNC(FM_NET_MARGIN,2) FLOW_Margin,
          TRUNC(FM_VOLUME,2)  FLOW_Volume,
          s_billing.bill_type,                    ----------  BILL TYPES AND OPTIONS
          trim(Green_Volume) ,
          1, p_snapshot_date 
          , c.OFFER_TYPE
        from (  select e.*, max(e.cust_id) over (partition by replace(replace(e.company_name,','),'.')) maxid
                       , NULL OFFER_TYPE
                  from some_UNI_CONTRACT_SOME_EXTRACT e 
                select null customers_sold, null accounts_sold, 
                       r.MASTERCONTRACT_OUR_NUM propid, r.CONTRACT_STATUS status,
                       r.PRODUCTNAME product_name, r.BDM account_bdm, 'RESI' company_name, 
                       r.BDM contract_bdm, to_char(r.COUNTER_SIGN_DATE,'dd-MON-yy') COUNTER_SIGN_DATE, 
                       '1' cust_id, to_char(r.EST_POWER_FLOW_DATE,'dd-MON-yy') EST_POWER_FLOW_DATE, to_char(r.EST_POWER_END_DATE,'dd-MON-yy') EST_POWER_END_DATE, 
                       r.LEAD_SEGMENT, null mw_Sold, null product_group, r.BUSUNIT_NAME region,
                       null sales_agent_fee, null salesfee_coname, null sic_code, r.STATE, r.ZONE, null LEG_MONTH, null LEG_YEAR, null fm_net_margin, 
                       null fm_volume, null fm_sales_agent_fee, r.TOTAL_MARGIN origination_netmargin, r.TOTAL_VOLUME origination_volume, 
                       null origination_grossmargin, r.TOTAL_CREDITCOST total_credit_cost, null duns_no, null busunit_id, r.ISO, 
                       null price_date, null ccm_id, null rec_green_volume, null green_volume, r.UDC_NAME edc_name, r.contract_term contract_term_months, 
                       r.EDC_ACCTNO, r.ZIP servzip, r.ANNUAL_MWH, null contract_type, null non_std_terms_flg, null telesales_flg, null acquisition_type, '1'
                       , r.OFFER_TYPE
                from (select distinct MASTERCONTRACT_OUR_NUM, CONTRACT_STATUS, PRODUCTNAME, BDM,  
                            BUSUNIT_NAME, STATE, ZONE, 
                            sum(TOTAL_MARGIN) over (partition by mastercontract_our_num) TOTAL_MARGIN, 
                            sum(TOTAL_VOLUME) over (partition by mastercontract_our_num) TOTAL_VOLUME, 
                            sum(TOTAL_CREDITCOST) over (partition by mastercontract_our_num) TOTAL_CREDITCOST, 
                            ISO, UDC_NAME, contract_term, 
                            EDC_ACCTNO, ZIP, ANNUAL_MWH 
                            , OFFER_TYPE
                       from cer_res.residential) r
        -------------- somebillingsystem 
        left join ( 
                   select ag.agree_num, (select brk.name from s_org_ext@sblreadonly brk where brk.row_id = ag.x_broker_2_account_id) broker2, 
                         (select count(q.row_id) from s_doc_quote@sblreadonly q where q.quote_num = qu.quote_num) num_quotes,
                         round(months_between(sysdate,co.cust_since_dt),2) cust_accumulated_term, co.x_sub_type, co.ou_type_cd,
                         case when exists (select ai.row_id from somebillingsystem.s_agree_item@sblreadonly ai 
                                             join somebillingsystem.s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
                                            where ag.row_id = ai.doc_agree_id 
                                              and ai.itm_stat_cd = 'Pending' and sa.cust_stat_cd = 'Enrolled'
                                              and ag.stat_cd = 'Pending' and ag.x_offer_type = 'Renewal' ) then 1 else 0 end Early_Renewal 
                    from S_DOC_AGREE@sblreadonly AG 
                    JOIN S_DOC_QUOTE@sblreadonly QU  ON AG.QUOTE_ID = QU.ROW_ID(+)
                    JOIN S_ORG_EXT@sblreadonly CO ON AG.TARGET_OU_ID = CO.ROW_ID               
        )somebillingsystem on c.propid = somebillingsystem.agree_num 
        -------------   BILL OPTIONS
        left join ( select ag.agree_num, 
                           min(bp.x_group_bill)  billing_option, -- Multi over Single
                           min(bp.bill_pref_cd) bill_type -- SOME (Charges) over UDC (Rate) 
                      from s_doc_agree@sblreadonly ag 
                      join s_agree_item@sblreadonly ai on ag.row_id = ai.doc_agree_id
                      join s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
                      join s_party_rel@sblreadonly pr on sa.row_id = pr.rel_party_id 
                                                     and pr.rel_type_cd = 'Bill Group' 
                                                     and pr.x_primary_bill_group = 'Y'
                      join s_org_ext@sblreadonly bg on pr.party_id = bg.row_id 
                      join S_INV_PROF@sblreadonly bp on bg.row_id = bp.accnt_id 
                     group by ag.agree_num
           ) s_billing on s_billing.agree_num = c.propid
        ---------    AR_BALANCE  
        left join (select contract_id, /*ar.edc_acctno, */sum(ar.ar_bal_current) ar_bal_current, sum(ar.ar_bal_1_30) ar_bal_1_30 , 
                          sum(ar.ar_bal_31_60) ar_bal_31_60, sum(ar.ar_bal_61_90) ar_bal_61_90, 
                          sum(ar.ar_bal_91plus) ar_bal_91plus, ar.snapshot_date, round(sum(bad.balance),2) bad_debt_balance
                     from (select x.contract_id, x.edc_acctno, sum(b.ar_bal_current) ar_bal_current, sum(b.ar_bal_1_30) ar_bal_1_30 , 
                                  sum(b.ar_bal_31_60) ar_bal_31_60, sum(b.ar_bal_61_90) ar_bal_61_90, 
                                  sum(b.ar_bal_91plus) ar_bal_91plus, b.snapshot_date
                             from some_uni_cust_arb b
                             join some_uni_edc_x_propid x on b.edc_acctno = x.edc_acctno and x.snapshot_date = b.snapshot_date
                            where b.snapshot_date = (select max(snapshot_date) from some_uni_cust_arb where snapshot_date <= p_end_date)
                              --and x.contractid = '1-220606257'
                            group by x.contract_id, x.edc_acctno, b.snapshot_date
                           ) ar
                      -- BAD DEBT LOGIC : ave of EDCs, summed up at contract level, so three EDCs in a contract have 24 month avgs of 12, 1250, 650 bad debt show at contract 1912
                      join (select b.edc_acctno, avg(b.ar_bal_91plus) balance
                              from (select a.edc_acctno,trunc(a.snapshot_date,'MM'),  max(a.snapshot_date) maxdate 
                                      from some_uni_cust_arb a group by a.edc_acctno, trunc(snapshot_date, 'MM')
                                   ) maxdate
                              join some_uni_cust_arb b on maxdate.edc_acctno = b.edc_acctno and maxdate.maxdate = b.snapshot_date 
                              where b.snapshot_date between add_months(p_start_date,-23) and p_end_date 
                              group by b.edc_acctno  
                           ) bad on ar.edc_acctno = bad.edc_acctno
                      group by ar.contract_id, ar.snapshot_date
        ) ar on ar.contract_id = c.propid        
        ---------    WORK QUEUES AND BILLG MODES 
        left join ( 
                     select contractid,
                            -- for breakout analysis- remove grouping if you run this query alone
                            /*min(billgroupoption) over(partition by contractid) billgroupoption, -- M trumps S 
                            max(bgbmf) over(partition by contractid) bgbmf, -- in case of multiple just get the highest
                            max(ssomef) over(partition by contractid) ssomef, 
                            q.uidwqitem, q.openedtime ,q.wqtypecode, q.wqqueuecode, q.weight 
                            ,sum(nvl(weight,0)) over(partition by contractid) sumweightpercontract*/  
                            decode(min(billgroupoption),'M',max(bgbmf), max(ssomef)) + sum(weight) Billing_Weight
                      from (
                             select con.contractid,
                                    bg.billingmodeflag bgbmf, 
                                    sa.billingmodeflag ssomef,   
                                    --wq.uidwqitem, wq.openedtime ,wq.wqtypecode, wq.wqqueuecode, 
                              from (select contractid, uidaccount 
                                      from (select c.contractid, ci.uidaccount, max(c.contractid) over (partition by ci.uidaccount) maxid
                                              from (select uidaccount, uidcontract from CUSTOMERSYS.lscmcontractitem@ls_link) ci 
                                              join (select contractid, starttime, stoptime, uidcontract, executetime 
                                                      from CUSTOMERSYS.lscmcontract@ls_link) c on ci.uidcontract = c.uidcontract and c.executetime is not null
                                             where c.starttime < p_end_date and c.stoptime > add_months(p_start_date,-2)
                                            ) where contractid = maxid 
                                     ) con   
                              -- SUBSELECT to pick one(can't tell which is correct) when LS has bad dupes
                              left join (select uidaccount, accountid, billingmodeflag   
                                       from (select a.uidaccount, a.accountid, a.udcacctid, a.billingmodeflag,
                                                    a.lstime, max(a.lstime) over (partition by a.udcacctid) maxlstime
                                               from CUSTOMERSYS.ACCOUNT@ls_link a
                                              where a.accounttypecode = 'SERVICE_ACCOUNT' ) 
                                       where maxlstime = lstime
                                     ) sa on sa.uidaccount = con.uidaccount      
                             LEFT JOIN (select lci.accountid, lci.wqtypecode, lci.wqqueuecode, q.weight--, lci.uidwqitem, lci.openedtime
                                         from CUSTOMERSYS.LSWQCLOSEDITEM@ls_link  lci                                                       
                                         join some_work_queue q on lci.WQTYPECODE = q.type and lci.WQQUEUECODE = q.code 
                                        where CLOSEDTIME Between add_months(p_start_date,-2) and p_end_date     
                                        ) wq  ON wq.accountid = sa.accountid           
                              -- SUBSELECT to pick one(can't tell which is correct) when LS has bad overlap
                             left JOIN (select uidaccountfrom, uidaccountto   
                                      from (select l.uidaccountfrom, l.uidaccountto, 
                                                   max(l.uidaccountfrom) over (partition by l.uidaccountto) maxbgacctid
                                             from CUSTOMERSYS.LSACCOUNTREL@ls_link l
                                            where l.starttime < p_end_date and l.stoptime > add_months(p_start_date,-2))
                                      where uidaccountfrom = maxbgacctid
                                    )x ON sa.UIDACCOUNT = x.UIDACCOUNTTO                                                 
                             left join CUSTOMERSYS.ACCOUNT@ls_link bg on bg.UIDACCOUNT = x.UIDACCOUNTFROM 
                                                                and bg.accounttypecode = 'BILLING_GROUP'  
                             left JOIN (select billgroupoption, uidaccount
                                      from (select ah.billgroupoption, ah.uidaccount, 
                                                   ah.starttime, min(ah.starttime) over (partition by ah.uidaccount) maxstart
                                              from CUSTOMERSYS.ACCOUNTHISTORY@ls_link ah  )
                                     where starttime = maxstart
                                   )h ON bg.UIDACCOUNT = h.UIDACCOUNT  
                            ) q group by contractid  
         ) ls on ls.contractid = c.propid 
        -----------   BDM JOIN     
        left join main_rpt.SOME_sales_hierarchy_new csh_account on UPPER(c.account_bdm) = UPPER(csh_account.bdm_crm)
        left join some_uni_bdm bui_account on csh_account.employee_id = bui_account.bdm_id and bui_account.month = to_char(p_start_date,'YYYYMM')
        left join main_rpt.SOME_sales_hierarchy_new csh_contract on UPPER(c.contract_bdm) = UPPER(csh_contract.bdm_crm)
        left join some_uni_bdm bui_contract on csh_contract.employee_id = bui_contract.bdm_id and bui_contract.month = to_char(p_start_date,'YYYYMM')
        where trim(c.propid) is not null
        and c.est_power_end_date >= p_start_date 
        and c.counter_sign_date  <= p_end_date;   
the subselects for AR balances, and the Bad Debt aggregations over different partitions… the Decode for deciding which weights to use in calculations… and picking distinct values from an external system that had inherent 1-3% data anomalies.



Using a Data Sandbox to allow users to push their own data changes to Prod

by MikeHogg 14. November 2011 19:03

Most Pl/Sql code is fairly boring, selecting lots of columns from lots of joins and maybe doing calculations and decodes on the fly, or groupings and other aggregations, and sub queries.  But one project I had was pretty neat.  Users had two lists of Products, and although the names and attributes were not usually the same, they were the same, just called different things by different companies. 

When we started using a new Product, we had to find the two matching ones and map them together.  This involved a process that ultimately led to what we called a data fix.  No code was changed, but data that the business process relied on had to be created or changed in the production database.  To do this required an IT resource to team up with a business resource and first make the changes in a test environment, and then test, and make adjustments, and test again, and then do the same exact cumulative change they did in the test environment, in the production environment. 

This was time intensive, and highly manual.  I wanted to make this quick, and automatic, without any room for manual errors.  So one of the features of the new tool I built was for the users to make these changes themselves, in a test environment, with full auditing and editing capability, and be able to see for themselves the effects their changes had, and when they were ready, push a button to have the exact same results applied to their production environment. This worked wonderfully, and worked with full auditing and history tracking in case any recovery was necessary. Besides some finely tuned controls in the new UI, all it required was an Oracle database link to the test environment.


  procedure PushToProd(p_audit_user varchar2,
                       p_id         my_product.id%type,
                       p_note       varchar2) is
    insert into my_product_history
      (audit_user,  audit_date,  type,
       id,  name, pf_code,  pf_type,  
       cc_code, exch_id,  gs_name,
       pf_commodity, lot_size,  uom_cd, 
       some_lot_sz, term, opt_style,
       product_type,  calendar,  location,
       description_id, trader_group,  settle_date_rule,
       eot_date_rule,  uses_dst, exchange)
      (select p_audit_user,   sysdate, 'PTP',
              nvl(p.id, pp.id),  nvl(p.name, pp.name),  nvl(p.pf_code, pp.pf_code),  nvl(p.pf_type, pp.pf_type),
              nvl(p.cc_code, pp.cc_code),  nvl(p.exch_id, pp.exch_id),  nvl(p.gs_name, pp.gs_name),
              nvl(p.pf_commodity, pp.pf_commodity),  nvl(p.lot_size, pp.lot_size),  nvl(p.uom_cd, p.uom_cd),
              nvl(p.some_lot_sz, pp.some_lot_sz),  nvl(p.term, pp.term),   nvl(p.opt_style, pp.opt_style),
              nvl(p.product_type, pp.product_type),  nvl(p.calendar, pp.calendar),  nvl(p.location, pp.location),
              nvl(p.description_id, pp.description_id),  nvl(p.trader_group, pp.trader_group),  nvl(p.settle_date_rule, pp.settle_date_rule),
              nvl(p.eot_date_rule, pp.eot_date_rule),  nvl(p.uses_dst, pp.uses_dst), nvl(p.exchange, pp.exchange)
         from my_product p
         full outer join my_product@SOMEW_LINK pp on p.id =
                                                                  pp.id -- handles new 
        where p.id = p_id
           or pp.id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product_history');
    delete from my_product where id = p_id;
    delete from some_fee where product_id = p_id;
    insert into my_product
      (select * from my_product@SOMEW_LINK where id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product');
    insert into some_fee
      (select *
         from SOME.some_fee@SOMEW_LINK
        where product_id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in some_fee');
    when others then
  procedure PushMappingToProd(p_audit_user varchar2,
                              p_v_id       my_values.id%type,
                              p_note       varchar2) is
    insert into my_values_history
      (audit_user,  audit_date,  type,
       id,   key,  value,
       val_type,   schema,  rule,  exch_id)
      select p_audit_user,  sysdate,  'PTP',
             nvl(v.id, vv.id),  nvl(v.key, vv.key),  nvl(v.value, vv.value),
             nvl(v.val_type, vv.val_type),  nvl(v.schema, vv.schema),
             nvl(v.rule, vv.rule),  nvl(v.exch_id, vv.exch_id)
        from my_values v
        full outer join my_values@SOMEW_LINK vv on v.id =
       where v.id = p_v_id
          or vv.id = p_v_id;
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' record(s) inserted by User ' ||
                              p_audit_user || ' in my_values_history');
    delete from my_values where id = p_v_id;
    insert into my_values
      (select * from my_values@SOMEW_LINK where id = p_v_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_value');
    when others then


I also put some auditing features in the new UI.  The system relied on the premise that the two databases to start with exact primary key ids and to keep them in sync between the two systems.  I built several auditing screens, showing even the detail for each user’s changes in each environment, so that more thatn one user could see what other users had done.  Here’s one of the auditing tools that showed the user the differences between some of the aggregations of the two systems.  This is the boring sql I was talking about before.  This also works with that oracle database link.

  procedure ViewLotDiff(p_valuation_date in me_trade.valuation_date%type,
                        cur_out          out cur) is
    -- Added condition v.expiration_date >=  p_valuation_date    
    open cur_out for
      select gg.anothercompany_name,
               when sum(some_w_lots) <>  sum(gs_lots) then
                 case when sum(some_p_lots) <>  sum(gs_lots) then 'BOTH'
                 else 'some_W' end 
             end as issue,            
             sum(some_w_lots) total_some_w_lots,
             sum(some_p_lots) total_some_p_lots,
             sum(gs_lots) total_gs_lots            
        from (select some_w_lots,
                from (select round(sum(v.lots),1) some_w_lots,
                             0 some_p_lots,
                             0 gs_lots,
                             'some_W' src,
                        from some_.MAP_A_VW v
                       where 1 = 1
                         and v.VALUATION_DATE = p_valuation_date
                         and (v.term <> 'DAY' or
                             v.contract_month >= p_valuation_date)
                         and ( ( v.pf_commodity in ('WEA','WTI') and nvl(v.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date )
                             ( v.pf_commodity not in ('WEA','WTI') and nvl(v.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
                       group by v.anothercompany_NAME, v.PF_CODE )
              union (select 0 some_w_lots,
                           round(sum(d.lots),1) some_p_lots,
                           0 gs_lots,
                           'some_P' src,
                           p.GS_NAME anothercompany_NAME
                      from some_.me_trade_detail@some_P_LINK d
                      join some_.me_trade_header@some_P_LINK h on d.eti = h.eti
                      join some_.me_product@some_P_LINK p on d.pf_code = p.pf_code and d.pf_type = p.pf_type and h.exchange = p.exchange
                     where 1 = 1
                       and d.VALUATION_DATE = p_valuation_date
                       and nvl(d.some__status,'nvl') <> 'ORI'
                       and (h.term <> 'DAY' or
                           d.contract_month >= p_valuation_date)
                       and ( ( p.pf_commodity in ('WEA','WTI') and nvl(d.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date  )
                             ( p.pf_commodity not in ('WEA','WTI') and nvl(d.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
                     group by p.GS_NAME, d.pf_code )
              union (select 0 some_w_lots,
                           0 some_p_lots,
                           sum(g.net_qty) gs_lots,
                           'GS' src,
                           '' pf_code,
                           g.prod_description anothercompany_name 
                      from gsd_open_positions g
                     where 1 = 1
                       and g.stmt_date = p_valuation_date                       
                       and g.expiry_date >= p_valuation_date
                     group by g.prod_description )              
              ) gg 
       group by anothercompany_name
      having ( (sum(some_w_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_w_lots) <> 0) )
       order by anothercompany_NAME;

Creating large tables in Oracle with dynamic partitions to allow for instant reads

by MikeHogg 29. September 2011 14:55

A lot of my work in Oracle was with tables between a million rows and a billion, and so indexing and query performance was always part of writing the system.  In this case I knew my table was going to grow a quarter to a half million rows a day, and so to retain immediate selects, I needed partitions.  While it is easy to build a table with the initial partitions, you have to maintain the scheme with a nightly or a weekly job.  Here is the initial table creation:

create   table SOME_CONTRACT 
  ID                             NUMBER not null,
  CONTRACT_ID                    VARCHAR2(150) not null,
  MONTH                          NUMBER not null,
  CUSTOMER_ID                    VARCHAR2(150),
  BUSUNIT_ID                     VARCHAR2(25),
  SEGMENT_ID                     VARCHAR2(25),
  PRODUCT_ID                     VARCHAR2(100),
  <snip some hundred columns>
  ZONE_MARKER                    VARCHAR2(50),
  ACTIVE_FG                      NUMBER,
  SNAPSHOT_DATE                  DATE,
  AUDIT_DATE                     DATE
tablespace SOME_TBS01
  pctfree 10
  initrans 1
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
partition by range (snapshot_date)(  
partition P20110101 values less than (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110201 values less than (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110301 values less than (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110401 values less than (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110501 values less than (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110601 values less than (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110701 values less than (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110801 values less than (TO_DATE(' 2011-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110901 values less than (TO_DATE(' 2011-09-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110902 values less than (TO_DATE(' 2011-09-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110905 values less than (TO_DATE(' 2011-09-06 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110906 values less than (TO_DATE(' 2011-09-07 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110907 values less than (TO_DATE(' 2011-09-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110908 values less than (TO_DATE(' 2011-09-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110909 values less than (TO_DATE(' 2011-09-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110912 values less than (TO_DATE(' 2011-09-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110913 values less than (TO_DATE(' 2011-09-14 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110914 values less than (TO_DATE(' 2011-09-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110915 values less than (TO_DATE(' 2011-09-16 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition PMAX values less than (TO_DATE(' 2099-12-31 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
-- Create/Recreate primary, unique and foreign key constraints 
alter table SOME_CONTRACT
  add constraint PK_SOMECONTRACTS_ID primary key (ID)
  using index 
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
-- Create/Recreate indexes 
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited

So new rows get inserted into PMAX each day in a daily process.  Partitions in Oracle are like separate tables, and in our case, most of our select queries will always be on snapshot_date, so it is like we are querying a single table.  We can also add indices if we want.  I think in my case here I was using an active flag and so I added a composite index with the PK and the column Active_fg.

To maintain this scheme, you have to periodically add new partitions, and split up PMAX.  I did that with a scheduled job, that ran a Stored Proc each night.

procedure Maintain_Partitions is
  cursor c_nightly_deletes is
            select sdate 
             from (select distinct snapshot_date sdate, count(distinct snapshot_date) over (partition by month) total
                     from SOME_contract_nightly 
                    where month = to_char(  sysdate  ,'YYYYMM')   
                    order by snapshot_date ) 
            where rownum < total - 3;  -- keep three latest in any given month
  cursor c_compress_ddl (p_table_name varchar2 ) is
         select 'alter table ' || p_table_name || ' move partition ' || partition_name || ' compress' ddl, partition_name
           from user_tab_partitions
          where table_name = p_table_name
            and partition_name not like '%MAX%'
            and (compression <> 'ENABLED'
                 or -- recompress NIGHTLYs since those older than 3 days are being deleted
                 table_name = 'SOME_CONTRACT_NIGHTLY' and partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%'
  cursor c_truncate_ddl ( p_table varchar2  ) is
         select 'alter table ' || table_name || ' truncate partition ' || partition_name || ' update global indexes' as ddl, partition_name
           from user_tab_partitions
          where table_name = p_table
            and partition_name not like '%MAX%'
            and to_date( trim( substr(partition_name, 2) ), 'YYYYMMDD' )   <= (add_months(sysdate,-24));
    for r_nightly_deletes in c_nightly_deletes loop
      delete from SOME_contract_nightly 
        where snapshot_date = r_nightly_deletes.sdate;
      Log_This('PKG_SOME_UI.Maintain_Partitions','SOME_CONTRACT_NIGHTLY snapshot_date' ||
        to_char(r_nightly_deletes.sdate,'DD-MON-YY') || ' deleted: ' || sql%rowcount);
    end loop;
    for each_table in (select distinct table_name from user_tab_partitions where table_name like 'SOME%') loop 
      -- create new partitions
      SplitWeekdayPartitions( each_table.TABLE_NAME);
      -- compress (today's) data
      for r_compress_ddl in c_compress_ddl ( each_table.table_name ) loop
        execute immediate r_compress_ddl.ddl;
        Log_This('PKG_SOME_UI.Maintain_Partitions',each_table.table_name || ' partition compressed: ' || r_compress_ddl.partition_name);
      end loop;
      -- remove any partitions older than two years
      for r_truncate_ddl in c_truncate_ddl ( each_table.table_name  ) loop 
              execute immediate r_truncate_ddl.ddl;
          Log_This('PKG_SOME_UI.Maintain_Partitions', each_table.table_name || ' Partition Truncated: ' || r_truncate_ddl.partition_name);
      end loop; 
    end loop;
    for x in (  select t.owner, t.table_name, p.partition_name 
            from all_tables t
            join all_tab_partitions p
              on t.table_name = p.table_name 
           where p.table_name like 'SOME%' 
             and (p.last_analyzed is null 
                  p.table_name = 'SOME_CONTRACT_NIGHTLY' 
                  and p.partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%' ) )
                      degree=> dbms_stats.AUTO_DEGREE,
                      ownname => x.owner,
                      tabname => x.table_name,
                      partname => x.partition_name,
                      cascade => dbms_stats.AUTO_CASCADE );
              Log_This('PKG_SOME_UI.Maintain_Partitions','Statistics gathered for table.partition: ' || 
                                    x.table_name || '.' || x.partition_name);
    end loop;  
    exception when others then
      Rebuild_Indices(); -- do not leave in unuseable state
      raise; -- notify user to investigate and rerun
end Maintain_Partitions;


In this particular case, I was deleting old partitions, and using compression to keep the total database size down.   The SplitWeekdayPartitions created the new ones:

PROCEDURE SplitWeekdayPartitions( p_table_name in varchar2 ) AS
  v_latest_date date;
  v_new_date date; 
  v_part_name varchar2(8);
  v_upper_bound varchar2(8);
     SELECT to_date(substr(MAX(partition_name) KEEP (DENSE_RANK FIRST ORDER BY partition_position desc ), 2),'YYYYMMDD')
       into v_latest_date 
       FROM (SELECT partition_name, partition_position 
               FROM user_tab_partitions
              WHERE table_name = upper(p_table_name) 
                AND partition_name <> 'PMAX' );
        if v_latest_date < sysdate
            v_new_date := v_latest_date + 1;
            while to_char(v_new_date, 'd') in (1,7) loop -- skip weekend dates
                v_new_date := v_new_date + 1;
            end loop;
            v_part_name := to_char(v_new_date,'YYYYMMDD');
            v_upper_bound := to_char(v_new_date + 1,'YYYYMMDD');
            execute immediate 
            'ALTER TABLE ' || p_table_name || ' SPLIT PARTITION PMAX AT (to_date(' || v_upper_bound ||
                ',''YYYYMMDD'')) INTO ' || '( PARTITION P' || v_part_name ||  
            Log_This('PKG_SOME_UI.SplitWeekdayPartitions', 'New partition P' || v_part_name || ' created for ' || p_table_name);
            SplitWeekdayPartitions( p_table_name );   -- do we need to catch up to today?
        end if;
END SplitWeekdayPartitions;

It’s interesting that I picked a recursive call here instead of a while loop.

… and in the MaintainPartitions proc, RebuildIndices was necessary afterwards because the alter table commands and truncating didn’t trigger index updates.

procedure Rebuild_Indices is 
    -- LOCAL INDICES   (not using any now but maybe in the future...)
            WHERE UIP.STATUS <> 'USABLE' )
            execute immediate
                          'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD PARTITION ' || x.PARTITION_NAME ;
            Log_This('PKG_SOME_UI.Maintain_Partitions','Partition Index Rebuilt: ' || x.index_name);
    end loop;
    -- GLOBAL INDICES     
    for x in ( SELECT INDEX_NAME
                FROM USER_INDEXES
                WHERE table_name like 'SOME%' and STATUS not in ( 'VALID', 'N/A' ) )
             execute immediate
                        'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD ' ;
             Log_This('PKG_SOME_UI.Maintain_Partitions','Global Index Rebuilt: ' || x.index_name);
    end loop;   
and that’s it.  Neat feature to plug into your next large table.

Mapping patterns in Oracle

by MikeHogg 28. June 2011 19:38

In my research, Mapping patterns came in three forms and they varied by levels of complexity.  The simplest form was a mapping table that most people recognize as a many to many join.  This is easy to use, and easy to understand, but doesn’t allow dynamic mapping.  You have to have both entities primary keys and then you have to do an insert to map the relationship.  This won’t work on a data store that is constantly receiving new data.

The second form allows dynamic mapping based on attributes.  You can set up a process that automatically inserts the rows or assigned the mappings according to each entity’s attribute value.  Then you could set up a process where every new entity whose X = Y could be automatically mapped to entity A.  This requires columns for each attribute though, and didn’t allow multiple entities.

The third form of mapping table, involves dynamic attributes, dynamic tables, and allows the most flexibility, but is the most difficult to implement and use, which you realize just as soon as you start writing your procs against it.

I happened to inherit a system where the original consultant set up the third form, and the following generations of developers built a variety of complicated tools and complex business processes to maintain it.  While it took me a while to understand it, once I did, and realized its strengths, I also realized a simple UI could abstract away the complexity and make it easy to use.

Behind the UI though had to be the data layer, and while the UI was simple looking, the data layer was where all the complexity had to be simplified.  Here is one of the queries I wrote to run a mapping function on the fly, for one particular entity, having at least three but no more than four different mapping attributes.  This is one of the few times I had to use dynamic SQL in Oracle, whose PL/SQL allowed for great flexibility but would not help me in this case:


  procedure MapUnmappedparts(p_certain_date my_part.certain_date%type) is 
    cursor c_mappings is      
    select  v.id, v.value, v.schema, v.exch_id,
            firstkey.attr_name firstname, regexp_substr(v.key, '[^()]+',1,2) firstvalue,
            secondkey.attr_name secondname, regexp_substr(v.key, '[^=()]+',1,4) secondvalue,
            thirdkey.attr_name thirdname, regexp_substr(v.key, '[^=()]+',1,6) thirdvalue,
            fourthkey.attr_name fourthname, regexp_substr(v.key, '[^=()]+',1,8) fourthvalue,
            fifthkey.attr_name fifthname, regexp_substr(v.key, '[^=()]+',1,10) fifthvalue,
            sixthkey.attr_name sixthname, regexp_substr(v.key, '[^=()]+',1,12) sixthvalue,
            seventhkey.attr_name seventhname, regexp_substr(v.key, '[^=()]+',1,14) seventhvalue
    from my_values v 
    join my_attributes firstkey on firstkey.short_name = regexp_substr(v.key, '[^=]+',1,1)
    join my_attributes secondkey on secondkey.short_name = regexp_substr(v.key, '[^=()]+',1,3)
    join my_attributes thirdkey on thirdkey.short_name = regexp_substr(v.key, '[^=()]+',1,5)
    left outer join my_attributes fourthkey on fourthkey.short_name = regexp_substr(v.key, '[^=()]+',1,7)
    left outer join my_attributes fifthkey on fifthkey.short_name = regexp_substr(v.key, '[^=()]+',1,9) 
    left outer join my_attributes sixthkey on sixthkey.short_name = regexp_substr(v.key, '[^=()]+',1,11) 
    left outer join my_attributes seventhkey on seventhkey.short_name = regexp_substr(v.key, '[^=()]+',1,13) 
    join my_schemas s on s.sch_id = v.schema and s.key_order = 1 
    order by s.eval_order, s.sch_id;
    v_query varchar2(2000);   
    Pkg_somy__Log.writeMessage(p_module   => 'Pkg_ProductMapper.MapUnmappedparts',
                              p_severity => 'DEBUG',
                              p_message  =>  'Started'); 
    for r_mapping in c_mappings loop
      v_query := 'merge into my_part_Detail d
                  using (select dd.id, h.co, h.d_type, dd.curve, dd.delivery_point, 
                                dd.curve_2, dd.commodity, h.part_start_month, h.part_end_month, 
                                dd.term_of_trading_cd, h.price_src_1, h.price_src_2, h.term,
                                dd.vintage, dd.adj_dm_year, h.product_code, dd.region, dd.technology
                          from my_part_detail dd join my_part_header h on dd.eti = h.eti 
                         where dd.certain_date = :1 
                           and dd.pf_code is null 
                           and nvl(dd.somy__status,''nvl'') <> ''ORI''
                           ) unmapped on (unmapped.id = d.id)
                  when matched then update set d.pf_code = :2
                    where 1=1 ';
      v_query := v_query || ' and unmapped.' || r_mapping.firstname || ' = :3 ';
      v_query := v_query || ' and unmapped.' || r_mapping.secondname || ' = :4 ';
      v_query := v_query || ' and unmapped.' || r_mapping.thirdname || ' = :5 ';
      if r_mapping.fourthname is not null 
        then      v_query := v_query || ' and unmapped.' || r_mapping.fourthname || ' = :6 ';  end if;
      if r_mapping.fifthname is not null 
        then      v_query := v_query || ' and unmapped.' || r_mapping.fifthname || ' = :7 ';  end if;
      if r_mapping.sixthname is not null 
        then      v_query := v_query || ' and unmapped.' || r_mapping.sixthname || ' = :8 ';  end if;                   
      if r_mapping.seventhname is not null 
        then      v_query := v_query || ' and unmapped.' || r_mapping.seventhname || ' = :9 ';  end if;      
        when r_mapping.seventhname is not null then 
          execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue, r_mapping.seventhvalue ;
        when r_mapping.sixthname is not null then 
          execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue ;
        when r_mapping.fifthname is not null then 
          execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue ;                                  
        when r_mapping.fourthname is not null then 
          execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue ;
          execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue ;
      end case;
    end loop;



About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more. amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here