Loading log files into Oracle

by MikeHogg8. March 2012 17:51

One of my last Oracle projects was pretty neat, because I started working with the new 11g feature, external tables.  This allowed Oracle to mount a file as a table, and was incredibly fast compared to using sqlloader, which was what we had been doing for years. 

In this case I was loading unix log files in the order of millions of rows for each daily file, by loading the external table, and then processing that table into our permanent logging table.  The data sets involved here were pretty big, and so usual manipulation like inserts for millions of rows would take hours and hours, so changing from Sql Loader to external tables saved a lot of time, but I still had a lot of inserts to make, so I added some tweaks, like dropping indices and recreating them after, and then updated stats with the new indices for Oracle’s Query Optimizer. 

Once I had the files shared on a network location accessible to this Oracle unix server, I loaded them with this proc:

procedure LoadExtTable(filedate varchar2) is
execute immediate 'create table mdf_meta_activity_dump ( IP_ADDRESS VARCHAR2(255), PID NUMBER,' ||
                      'SYMBOL VARCHAR2(255), USER_ID VARCHAR2(50), APPLICATION VARCHAR2(60),' ||
                      'HOSTNAME VARCHAR2(60), SYMBOL_MESSAGE VARCHAR2(255), SYMBOL_FORMAT VARCHAR2(255),' ||
                      'SCRIPT_NAME VARCHAR2(255), PROCMON_PROCESS VARCHAR2(255), TIME_STAMP DATE )' ||
                      'organization external (type oracle_loader default directory exttabdir access parameters ' ||
                      '(RECORDS DELIMITED BY NEWLINE FIELDS TERMINATED by ''|'' ' ||
                      ' ) LOCATION (''\someplace\somedb\udpserver\udp.txt''));';

I would process the dump with this proc, which also updated two other tables and was written to be re-runnable, so that, in case of failure or just manual mistake, running the same file of millions of rows would not result in a mess of a million duplicates. 

You will also see here Oracle bulk statements, and logging, which allowed someone to monitor the process realtime, as it usually took some minutes or hours.

procedure ProcessActivityDump is
cursor c_log(p_file_date date) is
select s.id, d.user_id, d.symbol_message, d.time_stamp, p_file_date, trunc(d.time_stamp), to_char(d.time_stamp,'M')
 from mdf_meta_symbol s
 join mdf_meta_activity_dump d
 on s.name = d.symbol
  type t_activity is table of c_log%rowtype;
  r_activity t_activity;
  v_count number;
  v_file_date date;
    merge into mdf_meta_proc p
using (select distinct procmon_process, script_name from mdf_meta_activity_dump) d
on (p.procmonjob = d.procmon_process and p.script = d.script_name) 
when not matched then
insert (id, procmonjob, script, active_fg, insert_date, audit_date, audit_user)
values(seq_mdf_id.nextval, procmon_process, script_name, 1, sysdate, sysdate, 'PKG_META');
    Log_This('PKG_META.ProcessActivityDump','MDF_META_PROC new rows inserted: ' || sql%rowcount ,'INFO');
-- SYMBOL, rerunnable
    merge into mdf_meta_symbol s
using (select distinct symbol, p.id from mdf_meta_activity_dump join mdf_meta_proc p on procmon_process = procmonjob and script_name = script) d
on (s.name = d.symbol)
when not matched then
insert(id, name, proc_id) values (seq_mdf_id.nextval, symbol, d.id);
    Log_This('PKG_META.ProcessActivityDump','MDF_META_SYMBOL new rows inserted: ' || sql%rowcount ,'INFO');
select file_date into v_file_date from (
 select trunc(time_stamp) file_date, count(*) 
 from mdf_meta_activity_dump
 group by trunc(time_stamp) 
 order by count(*) desc) where rownum = 1;
-- delete existing activity for this day, to make rerunnable
delete from mdf_meta_activity where file_date = v_file_date;
    Log_This('PKG_META.ProcessActivityDump','Dump_Date: ' || v_file_date || ' rows deleted in preparation for new load: ' || sql%rowcount ,'INFO');
-- now add the activity, logging only every 200k or so
-- maybe need to drop idx and recreate after
open c_log(v_file_date);
    v_count := 0;
fetch c_log bulk collect into r_activity limit 1000;
exit when r_activity.count = 0;
      forall idx in 1..r_activity.count
insert into mdf_meta_activity
values   r_activity(idx);
      v_count := v_count + r_activity.count;
if Mod(v_count, 200000) = 0then
        Log_This('PKG_META.ProcessActivityDump','Cumulative insert now at ' || v_count || ' rows','INFO');
end if;
end loop;
end ProcessActivityDump;

And that’s it.

Streaming 100 megabyte csv files with WCF

by MikeHogg21. February 2012 20:35

WCF is very configurable, but that is also the thing that I like least about it.  Often I have a project, and a need, and I’m in WCF documentation, and researching how to do exactly what I want, and I get my config just right, tweaked and tested and it works perfectly, but I never learn a new pattern or algorithm.  They are just config options.  Here is a case where I had to play with the buffer limits to be able to stream very large files, but there are more than a few options related to buffer size and so you have to know exactly which ones work for your case.

Here is my client endpoint.  You will see I am using NetTcp.  I’m in an intranet environment.  I added settings for TransferMode(Stream), ReceiveTimeout(ten minutes), also SendTimeout, as the client also uploaded large files too, and MaxReceivedMessageSize(250MB)

 <binding name="NetTcpBindingEndpoint" closeTimeout="00:01:00"
 openTimeout="00:01:00" receiveTimeout="00:10:00" sendTimeout="00:10:00"
 transactionFlow="false" transferMode="StreamedResponse" transactionProtocol="OleTransactions"
 hostNameComparisonMode="StrongWildcard" listenBacklog="10"
 maxBufferPoolSize="524288" maxBufferSize="65536" maxConnections="10"
 <readerQuotas maxDepth="32" maxStringContentLength="8192" maxArrayLength="16384"
 maxBytesPerRead="4096" maxNameTableCharCount="16384" />
 <security mode="Transport">
 <transport clientCredentialType="Windows" protectionLevel="EncryptAndSign" />
 <message clientCredentialType="Windows" />
 <binding name="WSHttpBinding_AnotherClient
           <endpoint address="anotherendpoint
<endpoint address="net.tcp://somedevserver:8445/SOME_DATA_SERVICE"
 binding="netTcpBinding" bindingConfiguration="NetTcpBindingEndpoint"
 contract="DataServiceReference.IDataService" name="NetTcpBindingEndpoint">
 <dns value="localhost" />

You will see similar options in my service config:

 <behavior name="WindowsService.DataServiceBehavior">
 <serviceMetadata httpGetEnabled="false" />
 <serviceDebug includeExceptionDetailInFaults="false" />
<binding name="NetTcpBinding" maxReceivedMessageSize="250000000" transferMode="StreamedResponse"
 receiveTimeout="00:10:00" sendTimeout="00:10:00">
 <security mode="Transport">
 <transport clientCredentialType="Windows" protectionLevel="EncryptAndSign"/>
 <message clientCredentialType="Windows"/>
 <service behaviorConfiguration="WindowsService.DataServiceBehavior"
 <endpoint address="" binding="netTcpBinding" bindingConfiguration="NetTcpBinding"
 name="NetTcpBindingEndpoint" contract="SOME_DATA_SERVICE.IDataService"> 
 <dns value="localhost" />
 <endpoint address="mex" binding="mexTcpBinding" bindingConfiguration=""
 name="MexTcpBindingEndpoint" contract="IMetadataExchange" />
 <add baseAddress="net.tcp://localhost:8445/SOME_DATA_SERVICE" />
<!-- added this to see trace file -->
      <source name="System.ServiceModel"
          <add name="ServiceModelTraceListener"
               type="System.Diagnostics.XmlWriterTraceListener, System, Version=, Culture=neutral, PublicKeyToken=b77a5c561934e089"


And the c# code to stream:

private System.IO.MemoryStream GetCSVStream(System.Data.Common.DbDataReader dr)
                System.IO.MemoryStream ms = new System.IO.MemoryStream();
                System.IO.StreamWriter sw = new System.IO.StreamWriter(ms);
                System.Data.DataTable header = dr.GetSchemaTable();
 foreach (System.Data.DataRow r in header.Rows)
 string column = r["ColumnName"].ToString();
 if (column == "ID") column = "id";
 if (header.Rows.IndexOf(r) < header.Rows.Count - 1) sw.Write("|");
                sw.Write(sw.NewLine); sw.Flush();
 while (dr.Read())
 for( int x = 1; x < dr.FieldCount; x++ )
                    sw.Write(sw.NewLine); sw.Flush();
                ms.Position = 0;
 return ms;
 catch (Exception ex)
                LogError(ex, APPNAME);
 return null;
and that’s it!



Miscellaneous code to tweak IE access rights on random servers

by MikeHogg23. January 2012 21:33

I set up a framework where developers could write data parsers for a variety of sources that could be scheduled, normalized, logged, and archived for various purposes.  There were hundreds of data sources and jobs so a lot of them had enough similarity that a framework with standard libraries made sense.  A large portion of these were web scrapes.  (I used WatIn for those- a great mocking solution for our purposes).  I would run into problems, because our scheduler would run on a server farm of hundreds, with new machines/instances being created at intervals, and so we basically had to write everything into our jobs.  Here are some of the hacks I put into a Web library to get usual https sites working with the web scrapes, where I had a variety of popup problems (I needed to turn off IE’s built in popup blocker), file download problems (add domain to trusted sites)…  They are all probably obsolete with Server 2012 now, but they might come in handy for the next few years still…

private static bool IsPopupEnabled(string domain)
 string keylocation = @"Software\Microsoft\Internet Explorer\New Windows\Allow";
            Microsoft.Win32.RegistryKey cu = Microsoft.Win32.Registry.CurrentUser;
            Microsoft.Win32.RegistryKey parent = cu.OpenSubKey(keylocation);
 if (parent != null) return parent.GetValue(domain) != null;
 else return false;
private static bool EnablePopup(string domain)
 string keylocation = @"Software\Microsoft\Internet Explorer\New Windows\Allow";
            Microsoft.Win32.RegistryKey cu = Microsoft.Win32.Registry.CurrentUser;
            Microsoft.Win32.RegistryKey parentkey = cu.CreateSubKey(keylocation);
            parentkey.SetValue(domain, Microsoft.Win32.RegistryValueKind.Binary);
 return IsPopupEnabled(domain);
private static bool TrustedSiteAddition(string domain)
 const string domainsKeyLocation = @"Software\Microsoft\Windows\CurrentVersion\Internet Settings\ZoneMap\Domains";
            Microsoft.Win32.RegistryKey currentUserKey = Microsoft.Win32.Registry.CurrentUser;
            Microsoft.Win32.RegistryKey parentKey = currentUserKey.OpenSubKey(domainsKeyLocation, true);
            Microsoft.Win32.RegistryKey key = parentKey.CreateSubKey(domain);
 object objSubDomainValue = key.GetValue("http");
 if (objSubDomainValue == null || Convert.ToInt32(objSubDomainValue) != 0x02)
                key.SetValue("http", 0x02, Microsoft.Win32.RegistryValueKind.DWord);
            objSubDomainValue = key.GetValue("https");
 if (objSubDomainValue == null || Convert.ToInt32(objSubDomainValue) != 0x02)
                key.SetValue("https", 0x02, Microsoft.Win32.RegistryValueKind.DWord);
 return IsTrusted(domain);
private static bool IsTrusted(string domain)
 const string domainsKeyLocation = @"Software\Microsoft\Windows\CurrentVersion\Internet Settings\ZoneMap\Domains";
 string keyLocation = string.Format(@"{0}\{1}", domainsKeyLocation, domain);
            Microsoft.Win32.RegistryKey currentUserKey = Microsoft.Win32.Registry.CurrentUser;
            Microsoft.Win32.RegistryKey foundRegistryKey = currentUserKey.OpenSubKey(keyLocation, false);
 if (foundRegistryKey != null)
 return true;
 return false;
private static bool CheckFileDownloadEnabled()
 string keylocation = @"Software\Microsoft\Windows\CurrentVersion\Internet Settings\Zones\2";
            Microsoft.Win32.RegistryKey cu = Microsoft.Win32.Registry.CurrentUser;
            Microsoft.Win32.RegistryKey parent = cu.OpenSubKey(keylocation,true);
 if (parent != null && (int)parent.GetValue(@"2200") != 0x00)
                parent.SetValue(@"2200", 0x00);
 return true;
 else if (parent != null && (int)parent.GetValue(@"2200") == 0x00)
 return true;
 else return false;
public static bool IEFullyEnabled(string domain)
 if (!IsPopupEnabled(domain) && !EnablePopup(domain)) return false;
 if (!IsTrusted(domain) && !TrustedSiteAddition(domain)) return false;
 if (!CheckFileDownloadEnabled()) return false;
 return true;


Automation | C#

Oracle aggregation query

by MikeHogg20. January 2012 14:56

One of the processes I set up required a merging of a few different data sources.  Besides our own data store of user input data, we also had our own data store of operational account data, and went to another system to get account billing data, and yet another system for customer contract data.  This query was an aggregation of several queries that, all together, aggregated the exact data we needed for our data warehouse import.  Some of the neat features of this query are the company name partition with replace to account for manual data entry discrepencies…

cursor c_contracts(p_start_date date, p_end_date date, p_snapshot_date date) is
Select seq_some.nextval, c.propid, to_char(p_start_date,'YYYYMM'), 
         PRODUCT_NAME ,
         ACCOUNTS_SOLD ,
case when bui_account.bdm_id is not null then trim(csh_account.bdm)
 when bui_contract.bdm_id is not null then trim(csh_contract.bdm)
 else 'Unknown' end BDM, 
trim(c.account_bdm) account_bdm,
trim(c.contract_bdm) contract_bdm,
          c.maxid  COMPANY_ID,
          substr(1000000000 + nvl(trim(regexp_replace(regexp_replace(c.duns_no,'-|/|[a-z|A-Z]|\.|[[:space:]]',''),'([[:digit:]]{9})[[:digit:]]*','\1')),'999999999'),2,9) DUNS_NO,
          TRUNC(FM_NET_MARGIN,2) FLOW_Margin,
          TRUNC(FM_VOLUME,2)  FLOW_Volume,
          s_billing.bill_type, ----------  BILL TYPES AND OPTIONS
trim(Green_Volume) ,
          1, p_snapshot_date
          , c.OFFER_TYPE
from (  select e.*, max(e.cust_id) over (partition by replace(replace(e.company_name,','),'.')) maxid
                       , NULL OFFER_TYPE
 select null customers_sold, null accounts_sold, 
                       r.MASTERCONTRACT_OUR_NUM propid, r.CONTRACT_STATUS status,
                       r.PRODUCTNAME product_name, r.BDM account_bdm, 'RESI' company_name, 
                       r.BDM contract_bdm, to_char(r.COUNTER_SIGN_DATE,'dd-MON-yy') COUNTER_SIGN_DATE, 
                       '1' cust_id, to_char(r.EST_POWER_FLOW_DATE,'dd-MON-yy') EST_POWER_FLOW_DATE, to_char(r.EST_POWER_END_DATE,'dd-MON-yy') EST_POWER_END_DATE, 
                       r.LEAD_SEGMENT, null mw_Sold, null product_group, r.BUSUNIT_NAME region,
 null sales_agent_fee, null salesfee_coname, null sic_code, r.STATE, r.ZONE, null LEG_MONTH, null LEG_YEAR, null fm_net_margin, 
 null fm_volume, null fm_sales_agent_fee, r.TOTAL_MARGIN origination_netmargin, r.TOTAL_VOLUME origination_volume, 
 null origination_grossmargin, r.TOTAL_CREDITCOST total_credit_cost, null duns_no, null busunit_id, r.ISO, 
 null price_date, null ccm_id, null rec_green_volume, null green_volume, r.UDC_NAME edc_name, r.contract_term contract_term_months, 
                       r.EDC_ACCTNO, r.ZIP servzip, r.ANNUAL_MWH, null contract_type, null non_std_terms_flg, null telesales_flg, null acquisition_type, '1'
                       , r.OFFER_TYPE
                            BUSUNIT_NAME, STATE, ZONE, 
 sum(TOTAL_MARGIN) over (partition by mastercontract_our_num) TOTAL_MARGIN, 
 sum(TOTAL_VOLUME) over (partition by mastercontract_our_num) TOTAL_VOLUME, 
 sum(TOTAL_CREDITCOST) over (partition by mastercontract_our_num) TOTAL_CREDITCOST, 
                            ISO, UDC_NAME, contract_term, 
                            EDC_ACCTNO, ZIP, ANNUAL_MWH
                            , OFFER_TYPE
 from cer_res.residential) r
-------------- somebillingsystem 
left join ( 
 select ag.agree_num, (select brk.name from s_org_ext@sblreadonly brk where brk.row_id = ag.x_broker_2_account_id) broker2, 
                         (select count(q.row_id) from s_doc_quote@sblreadonly q where q.quote_num = qu.quote_num) num_quotes,
                         round(months_between(sysdate,co.cust_since_dt),2) cust_accumulated_term, co.x_sub_type, co.ou_type_cd,
 case when exists (select ai.row_id from somebillingsystem.s_agree_item@sblreadonly ai
 join somebillingsystem.s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
 where ag.row_id = ai.doc_agree_id
 and ai.itm_stat_cd = 'Pending' and sa.cust_stat_cd = 'Enrolled'
 and ag.stat_cd = 'Pending' and ag.x_offer_type = 'Renewal' ) then 1 else 0 end Early_Renewal
 from S_DOC_AGREE@sblreadonly AG
        )somebillingsystem on c.propid = somebillingsystem.agree_num
-------------   BILL OPTIONS
left join ( select ag.agree_num, 
 min(bp.x_group_bill)  billing_option, -- Multi over Single
 min(bp.bill_pref_cd) bill_type -- SOME (Charges) over UDC (Rate) 
 from s_doc_agree@sblreadonly ag
 join s_agree_item@sblreadonly ai on ag.row_id = ai.doc_agree_id
 join s_org_ext@sblreadonly sa on ai.x_sa_ou_id = sa.row_id
 join s_party_rel@sblreadonly pr on sa.row_id = pr.rel_party_id
 and pr.rel_type_cd = 'Bill Group'
 and pr.x_primary_bill_group = 'Y'
 join s_org_ext@sblreadonly bg on pr.party_id = bg.row_id
 join S_INV_PROF@sblreadonly bp on bg.row_id = bp.accnt_id
 group by ag.agree_num
           ) s_billing on s_billing.agree_num = c.propid
---------    AR_BALANCE
left join (select contract_id, /*ar.edc_acctno, */sum(ar.ar_bal_current) ar_bal_current, sum(ar.ar_bal_1_30) ar_bal_1_30 , 
 sum(ar.ar_bal_31_60) ar_bal_31_60, sum(ar.ar_bal_61_90) ar_bal_61_90, 
 sum(ar.ar_bal_91plus) ar_bal_91plus, ar.snapshot_date, round(sum(bad.balance),2) bad_debt_balance
 from (select x.contract_id, x.edc_acctno, sum(b.ar_bal_current) ar_bal_current, sum(b.ar_bal_1_30) ar_bal_1_30 , 
 sum(b.ar_bal_31_60) ar_bal_31_60, sum(b.ar_bal_61_90) ar_bal_61_90, 
 sum(b.ar_bal_91plus) ar_bal_91plus, b.snapshot_date
 from some_uni_cust_arb b
 join some_uni_edc_x_propid x on b.edc_acctno = x.edc_acctno and x.snapshot_date = b.snapshot_date
 where b.snapshot_date = (select max(snapshot_date) from some_uni_cust_arb where snapshot_date <= p_end_date)
 --and x.contractid = '1-220606257'
 group by x.contract_id, x.edc_acctno, b.snapshot_date
                           ) ar
 -- BAD DEBT LOGIC : ave of EDCs, summed up at contract level, so three EDCs in a contract have 24 month avgs of 12, 1250, 650 bad debt show at contract 1912
 join (select b.edc_acctno, avg(b.ar_bal_91plus) balance
 from (select a.edc_acctno,trunc(a.snapshot_date,'MM'),  max(a.snapshot_date) maxdate
 from some_uni_cust_arb a group by a.edc_acctno, trunc(snapshot_date, 'MM')
                                   ) maxdate
 join some_uni_cust_arb b on maxdate.edc_acctno = b.edc_acctno and maxdate.maxdate = b.snapshot_date
 where b.snapshot_date between add_months(p_start_date,-23) and p_end_date
 group by b.edc_acctno
                           ) bad on ar.edc_acctno = bad.edc_acctno
 group by ar.contract_id, ar.snapshot_date
        ) ar on ar.contract_id = c.propid
left join ( 
 select contractid,
 -- for breakout analysis- remove grouping if you run this query alone
 /*min(billgroupoption) over(partition by contractid) billgroupoption, -- M trumps S
                            max(bgbmf) over(partition by contractid) bgbmf, -- in case of multiple just get the highest
                            max(ssomef) over(partition by contractid) ssomef, 
                            q.uidwqitem, q.openedtime ,q.wqtypecode, q.wqqueuecode, q.weight
                            ,sum(nvl(weight,0)) over(partition by contractid) sumweightpercontract*/
                            decode(min(billgroupoption),'M',max(bgbmf), max(ssomef)) + sum(weight) Billing_Weight
 from (
 select con.contractid,
                                    bg.billingmodeflag bgbmf, 
                                    sa.billingmodeflag ssomef, 
 --wq.uidwqitem, wq.openedtime ,wq.wqtypecode, wq.wqqueuecode, 
 from (select contractid, uidaccount
 from (select c.contractid, ci.uidaccount, max(c.contractid) over (partition by ci.uidaccount) maxid
 from (select uidaccount, uidcontract from CUSTOMERSYS.lscmcontractitem@ls_link) ci
 join (select contractid, starttime, stoptime, uidcontract, executetime
 from CUSTOMERSYS.lscmcontract@ls_link) c on ci.uidcontract = c.uidcontract and c.executetime is not null
 where c.starttime < p_end_date and c.stoptime > add_months(p_start_date,-2)
                                            ) where contractid = maxid
                                     ) con
 -- SUBSELECT to pick one(can't tell which is correct) when LS has bad dupes
 left join (select uidaccount, accountid, billingmodeflag
 from (select a.uidaccount, a.accountid, a.udcacctid, a.billingmodeflag,
                                                    a.lstime, max(a.lstime) over (partition by a.udcacctid) maxlstime
 where a.accounttypecode = 'SERVICE_ACCOUNT' ) 
 where maxlstime = lstime
                                     ) sa on sa.uidaccount = con.uidaccount
 LEFT JOIN (select lci.accountid, lci.wqtypecode, lci.wqqueuecode, q.weight--, lci.uidwqitem, lci.openedtime
 join some_work_queue q on lci.WQTYPECODE = q.type and lci.WQQUEUECODE = q.code
 where CLOSEDTIME Between add_months(p_start_date,-2) and p_end_date
                                        ) wqON wq.accountid = sa.accountid
 -- SUBSELECT to pick one(can't tell which is correct) when LS has bad overlap
 left JOIN (select uidaccountfrom, uidaccountto
 from (select l.uidaccountfrom, l.uidaccountto, 
 max(l.uidaccountfrom) over (partition by l.uidaccountto) maxbgacctid
 where l.starttime < p_end_date and l.stoptime > add_months(p_start_date,-2))
 where uidaccountfrom = maxbgacctid
                                    )x ON sa.UIDACCOUNT = x.UIDACCOUNTTO
 and bg.accounttypecode = 'BILLING_GROUP'
 left JOIN (select billgroupoption, uidaccount
 from (select ah.billgroupoption, ah.uidaccount, 
                                                   ah.starttime, min(ah.starttime) over (partition by ah.uidaccount) maxstart
 where starttime = maxstart
                                   )h ON bg.UIDACCOUNT = h.UIDACCOUNT
                            ) q group by contractid
         ) ls on ls.contractid = c.propid
-----------   BDM JOIN
left join main_rpt.SOME_sales_hierarchy_new csh_account on UPPER(c.account_bdm) = UPPER(csh_account.bdm_crm)
left join some_uni_bdm bui_account on csh_account.employee_id = bui_account.bdm_id and bui_account.month = to_char(p_start_date,'YYYYMM')
left join main_rpt.SOME_sales_hierarchy_new csh_contract on UPPER(c.contract_bdm) = UPPER(csh_contract.bdm_crm)
left join some_uni_bdm bui_contract on csh_contract.employee_id = bui_contract.bdm_id and bui_contract.month = to_char(p_start_date,'YYYYMM')
where trim(c.propid) is not null
and c.est_power_end_date >= p_start_date
and c.counter_sign_date  <= p_end_date;
the subselects for AR balances, and the Bad Debt aggregations over different partitions… the Decode for deciding which weights to use in calculations… and picking distinct values from an external system that had inherent 1-3% data anomalies.



Using a Data Sandbox to allow users to push their own data changes to Prod

by MikeHogg14. November 2011 19:03

Most Pl/Sql code is fairly boring, selecting lots of columns from lots of joins and maybe doing calculations and decodes on the fly, or groupings and other aggregations, and sub queries.  But one project I had was pretty neat.  Users had two lists of Products, and although the names and attributes were not usually the same, they were the same, just called different things by different companies. 

When we started using a new Product, we had to find the two matching ones and map them together.  This involved a process that ultimately led to what we called a data fix.  No code was changed, but data that the business process relied on had to be created or changed in the production database.  To do this required an IT resource to team up with a business resource and first make the changes in a test environment, and then test, and make adjustments, and test again, and then do the same exact cumulative change they did in the test environment, in the production environment. 

This was time intensive, and highly manual.  I wanted to make this quick, and automatic, without any room for manual errors.  So one of the features of the new tool I built was for the users to make these changes themselves, in a test environment, with full auditing and editing capability, and be able to see for themselves the effects their changes had, and when they were ready, push a button to have the exact same results applied to their production environment. This worked wonderfully, and worked with full auditing and history tracking in case any recovery was necessary. Besides some finely tuned controls in the new UI, all it required was an Oracle database link to the test environment.


procedure PushToProd(p_audit_user varchar2,
                       p_id         my_product.id%type,
                       p_notevarchar2) is
insert into my_product_history
      (audit_user,  audit_date,  type,
       id,  name, pf_code,  pf_type, 
       cc_code, exch_id,  gs_name,
       pf_commodity, lot_size,  uom_cd, 
       some_lot_sz, term, opt_style,
       product_type,  calendar,  location,
       description_id, trader_group,  settle_date_rule,
       eot_date_rule,  uses_dst, exchange)
      (select p_audit_user,   sysdate, 'PTP',
              nvl(p.id, pp.id),  nvl(p.name, pp.name),  nvl(p.pf_code, pp.pf_code),  nvl(p.pf_type, pp.pf_type),
              nvl(p.cc_code, pp.cc_code),  nvl(p.exch_id, pp.exch_id),  nvl(p.gs_name, pp.gs_name),
              nvl(p.pf_commodity, pp.pf_commodity),  nvl(p.lot_size, pp.lot_size),  nvl(p.uom_cd, p.uom_cd),
              nvl(p.some_lot_sz, pp.some_lot_sz),  nvl(p.term, pp.term),   nvl(p.opt_style, pp.opt_style),
              nvl(p.product_type, pp.product_type),  nvl(p.calendar, pp.calendar),  nvl(p.location, pp.location),
              nvl(p.description_id, pp.description_id),  nvl(p.trader_group, pp.trader_group),  nvl(p.settle_date_rule, pp.settle_date_rule),
              nvl(p.eot_date_rule, pp.eot_date_rule),  nvl(p.uses_dst, pp.uses_dst), nvl(p.exchange, pp.exchange)
from my_product p
full outer join my_product@SOMEW_LINK pp on p.id =
                                                                  pp.id -- handles new 
where p.id = p_id
or pp.id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product_history');
delete from my_product where id = p_id;
delete from some_fee where product_id = p_id;
insert into my_product
      (select * from my_product@SOMEW_LINK where id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_product');
insert into some_fee
      (select *
from SOME.some_fee@SOMEW_LINK
where product_id = p_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in some_fee');
when others then
procedure PushMappingToProd(p_audit_user varchar2,
                              p_v_id       my_values.id%type,
                              p_notevarchar2) is
insert into my_values_history
      (audit_user,  audit_date,  type,
       id, key,  value,
       val_type, schema,  rule,  exch_id)
select p_audit_user,  sysdate,  'PTP',
             nvl(v.id, vv.id),  nvl(v.key, vv.key),  nvl(v.value, vv.value),
             nvl(v.val_type, vv.val_type),  nvl(v.schema, vv.schema),
             nvl(v.rule, vv.rule),  nvl(v.exch_id, vv.exch_id)
from my_values v
full outer join my_values@SOMEW_LINK vv on v.id =
where v.id = p_v_id
or vv.id = p_v_id;
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' record(s) inserted by User ' ||
                              p_audit_user || ' in my_values_history');
delete from my_values where id = p_v_id;
insert into my_values
      (select * from my_values@SOMEW_LINK where id = p_v_id);
                              'Note:' || p_note || '; ' || SQL%ROWCOUNT ||
                              ' new record(s) inserted by User ' ||
                              p_audit_user || ' in my_value');
when others then


I also put some auditing features in the new UI.  The system relied on the premise that the two databases to start with exact primary key ids and to keep them in sync between the two systems.  I built several auditing screens, showing even the detail for each user’s changes in each environment, so that more thatn one user could see what other users had done.  Here’s one of the auditing tools that showed the user the differences between some of the aggregations of the two systems.  This is the boring sql I was talking about before.  This also works with that oracle database link.

procedure ViewLotDiff(p_valuation_date in me_trade.valuation_date%type,
                        cur_out          out cur) is
-- Added condition v.expiration_date >=  p_valuation_date
open cur_out for
select gg.anothercompany_name,
 when sum(some_w_lots) <>sum(gs_lots) then
 case when sum(some_p_lots) <>sum(gs_lots) then 'BOTH'
 else 'some_W' end
 end as issue, 
 sum(some_w_lots) total_some_w_lots,
 sum(some_p_lots) total_some_p_lots,
 sum(gs_lots) total_gs_lots
from (select some_w_lots,
 from (select round(sum(v.lots),1) some_w_lots,
                             0 some_p_lots,
                             0 gs_lots,
                             'some_W' src,
 from some_.MAP_A_VW v
 where 1 = 1
 and v.VALUATION_DATE = p_valuation_date
 and (v.term <> 'DAY' or
                             v.contract_month >= p_valuation_date)
 and ( ( v.pf_commodity in ('WEA','WTI') and nvl(v.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date )
                             ( v.pf_commodity not in ('WEA','WTI') and nvl(v.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
 group by v.anothercompany_NAME, v.PF_CODE )
 union (select 0 some_w_lots,
                           round(sum(d.lots),1) some_p_lots,
                           0 gs_lots,
                           'some_P' src,
                           p.GS_NAME anothercompany_NAME
 from some_.me_trade_detail@some_P_LINK d
 join some_.me_trade_header@some_P_LINK h on d.eti = h.eti
 join some_.me_product@some_P_LINK p on d.pf_code = p.pf_code and d.pf_type = p.pf_type and h.exchange = p.exchange
 where 1 = 1
 and d.VALUATION_DATE = p_valuation_date
 and nvl(d.some__status,'nvl') <> 'ORI'
 and (h.term <> 'DAY' or
                           d.contract_month >= p_valuation_date)
 and ( ( p.pf_commodity in ('WEA','WTI') and nvl(d.expiration_date,to_date('12/31/9999','mm/dd/yyyy')) >= p_valuation_date  )
                             ( p.pf_commodity not in ('WEA','WTI') and nvl(d.settle_date,to_date('12/31/9999','mm/dd/yyyy')) >= trunc(to_date(p_valuation_date),'MM')))
 group by p.GS_NAME, d.pf_code )
 union (select 0 some_w_lots,
                           0 some_p_lots,
 sum(g.net_qty) gs_lots,
                           'GS' src,
                           '' pf_code,
                           g.prod_description anothercompany_name
 from gsd_open_positions g
 where 1 = 1
 and g.stmt_date = p_valuation_date
 and g.expiry_date >= p_valuation_date
 group by g.prod_description ) 
              ) gg
group by anothercompany_name
having ( (sum(some_w_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_p_lots) <> 0) or (sum(gs_lots) - sum(some_w_lots) <> 0) )
order by anothercompany_NAME;

Using Moq to write Unit tests in VS2008

by MikeHogg5. October 2011 18:26

I’ve read about Test Driven Development, and after trying it out myself I have noticed a difference in the code that I write, even if I am not using it for a particular project.  My classes have become simpler, my functions more atomic.  I write more for the Open Closed principle, that my classes are closed for modification, but open for extension, and I find my projects are more organized and development seems to be quicker even, as the base classes get written quicker, and then additional features drop right into place as if they were expected, rather than the spaghetti code that I used to eventually end up with.

There is a difference between Unit tests and other kinds of tests.  I have heard other developers say that they can’t test because most of their code is dependent upon some specific data layer, or web UI, and I think they might not know that Integration Testing is different than Unit testing.  This was what I meant about changing my writing style.  TDD prompted me to write smaller more abstract classes, with unit testable atomic functions, and then to write application specific interfaces and more specific inherited classes. 

In some cases, though, your tests cross the line, and you need integration testing, but there are several Mocking libraries out there now that make it easy to do this.  The main ones all appear to be similar.  Here is a simple project where I used Moq to fake my Data Service.  The Project had a few components.  The end user UI was WPF, and was built with MVVM. 

Because these UI related elements are decoupled from the actual UI in MVVM, I can unit test them.  Here are some tests for the MainViewModel, which takes a user in Constructor Dependency Injection.

public void CanRestartServiceTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            target = new MainViewModel_Accessor(ReadonlyUser);
public void CanLoadAllTablesTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            target._backgroundmanager.IsRunning = true;
            Assert.IsFalse (target.CanLoadAllTables());
            target = new MainViewModel_Accessor(ReadonlyUser);
public void StartProgressBarTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            Assert.IsTrue(target.ProgressVisibility == Visibility.Visible);
            Assert.IsTrue(target.ProgressValue == 0);
public void BM_TimerFinishedTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            TimerFinishedEventArgs e = new TimerFinishedEventArgs();
            e.Message = "TickTock";
            target.BM_TimerFinished(new object(), e);
            Assert.IsTrue(target.ProgressVisibility == Visibility.Hidden);
            Assert.IsTrue(target.Dialog.Visibility == Visibility.Visible);
public void HandleErrorTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            target.HandleError(new Exception("Test Exception"));
            Assert.IsTrue(target.Error.Visibility == Visibility.Visible);
            Assert.IsTrue(target.Error.Message.Contains("Test Exception"));

I created all of my authorization features as properties on the VM, and then exposed Enabled in the UI to them, so I can test if the VM is Enabled to Admin users or Readonly users, as I do in CanRestartServiceTest.  I also disable certain controls through this mechanism when running certain background jobs, and CanGetTablesTest tests that.  I have a progressbar control, also with its own VM, and hook into it and expose a property called ProgressVisibility in the Main VM, so the StartProgressBarTest can test that it is working.  This UI runs several Background jobs and I wrote a custom BackgroundJobManager class to manage all of them.  BM_TimerFinishedTest tests against one of the behaviors of the manager as it is implemented in the VM.  And HandleErrorTest tests against the DialogErrorVM I am using in the Main VM.  So with MVVM it is possible to write unit tests of a sort for your UI components.

So, it’s great that I can in some way test the UI plumbing, but most of what this app does is interact with a windows service, a DTS package through a COM wrapper, and a Web service, and MVVM doesn’t help me test these integration layer features.  In the past, since I have written each of these as interfaces, I would have to write test classes for each of these.  These would be just stubs, returning dummy data.  And that is fine, but with a Mocking library, you no longer have to write all those test classes.  You can usually write up your mocked class and dummy results in a line or two.  And there is a load of generic flexibility built into it. 

Here I am setting up a mocked object based on my IDataService interface, with one instantiated method, StageAcorn(Datetime) which I set to take Any DateTime as an argument.  My MVVM method takes an IDataService injected as an argument, and so I can now test my method,  without writing any IDataService test stub.

public void StageTablesTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            var mock = new Moq.Mock<AUM.DataServiceReference.IDataService>();
            mock.Setup<string>(f => f.StageAcorn(Moq.It.IsAny<DateTime>())).Returns("DTSER is Success");
            Assert.IsTrue(target.Dialog.Visibility == Visibility.Visible);

Here are a couple other similar, super simple tests also using mocked interfaces…

public void GetSavePathTest()
            var mock = new Moq.Mock<IFileDialogService>();
            mock.Setup(f => f.GetSaveFolder()).Returns("testfolderpath");
            Assert.AreEqual("testfolderpath", MainViewModel_Accessor.GetSavePath(mock.Object));
public void RestartServiceTest()
            MainViewModel_Accessor target = new MainViewModel_Accessor(AdminUser);
            var mock = new Moq.Mock<AUM.RestarterServiceReference.IRestarterService>();
            mock.Setup(f => f.StopAcornAppService()).Returns(true);
            mock.Setup(f => f.StartAcornAppService()).Returns(true);
            Assert.IsTrue(target.ProgressVisibility == Visibility.Visible);
 bool running = true; int testtimeout = 0;
 while (running)
 if (target.ProgressVisibility == Visibility.Hidden) running = false;
 if (testtimeout++ > 200)
                    Assert.Inconclusive("Test Timeout");
                    running = false;


These are very simple examples, and not every method on every class is meant to be unit tested, but they show how easy it is to get started with MS testing projects.

Creating large tables in Oracle with dynamic partitions to allow for instant reads

by MikeHogg29. September 2011 14:55

A lot of my work in Oracle was with tables between a million rows and a billion, and so indexing and query performance was always part of writing the system.  In this case I knew my table was going to grow a quarter to a half million rows a day, and so to retain immediate selects, I needed partitions.  While it is easy to build a table with the initial partitions, you have to maintain the scheme with a nightly or a weekly job.  Here is the initial table creation:

createtable SOME_CONTRACT
  ID                             NUMBER not null,
  CONTRACT_IDVARCHAR2(150) not null,
MONTH                          NUMBER not null,
  <snip some hundred columns>
  ACTIVE_FG                      NUMBER,
tablespace SOME_TBS01
  pctfree 10
  initrans 1
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
partition by range (snapshot_date)( 
partition P20110101 values less than (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110201 values less than (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110301 values less than (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110401 values less than (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110501 values less than (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110601 values less than (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110701 values less than (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110801 values less than (TO_DATE(' 2011-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110901 values less than (TO_DATE(' 2011-09-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110902 values less than (TO_DATE(' 2011-09-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110905 values less than (TO_DATE(' 2011-09-06 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110906 values less than (TO_DATE(' 2011-09-07 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110907 values less than (TO_DATE(' 2011-09-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110908 values less than (TO_DATE(' 2011-09-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110909 values less than (TO_DATE(' 2011-09-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110912 values less than (TO_DATE(' 2011-09-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110913 values less than (TO_DATE(' 2011-09-14 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110914 values less than (TO_DATE(' 2011-09-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110915 values less than (TO_DATE(' 2011-09-16 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition PMAX values less than (TO_DATE(' 2099-12-31 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
-- Create/Recreate primary, unique and foreign key constraints 
alter table SOME_CONTRACT
add constraint PK_SOMECONTRACTS_ID primary key (ID)
using index
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited
-- Create/Recreate indexes 
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
    initial 64K
    minextents 1
    maxextents unlimited

So new rows get inserted into PMAX each day in a daily process.  Partitions in Oracle are like separate tables, and in our case, most of our select queries will always be on snapshot_date, so it is like we are querying a single table.  We can also add indices if we want.  I think in my case here I was using an active flag and so I added a composite index with the PK and the column Active_fg.

To maintain this scheme, you have to periodically add new partitions, and split up PMAX.  I did that with a scheduled job, that ran a Stored Proc each night.

procedure Maintain_Partitions is
cursor c_nightly_deletes is
 select sdate
 from (select distinct snapshot_date sdate, count(distinct snapshot_date) over (partition by month) total
 from SOME_contract_nightly
 where month = to_char(  sysdate  ,'YYYYMM') 
 order by snapshot_date ) 
 where rownum < total - 3;-- keep three latest in any given month
cursor c_compress_ddl (p_table_name varchar2 ) is
select 'alter table ' || p_table_name || ' move partition ' || partition_name || ' compress' ddl, partition_name
from user_tab_partitions
where table_name = p_table_name
 and partition_name not like '%MAX%'
 and (compression <> 'ENABLED'
 or -- recompress NIGHTLYs since those older than 3 days are being deleted
                 table_name = 'SOME_CONTRACT_NIGHTLY' and partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%'
cursor c_truncate_ddl ( p_table varchar2  ) is
select 'alter table ' || table_name || ' truncate partition ' || partition_name || ' update global indexes' as ddl, partition_name
from user_tab_partitions
where table_name = p_table
 and partition_name not like '%MAX%'
 and to_date( trim( substr(partition_name, 2) ), 'YYYYMMDD' )   <= (add_months(sysdate,-24));
for r_nightly_deletes in c_nightly_deletes loop
delete from SOME_contract_nightly
where snapshot_date = r_nightly_deletes.sdate;
      Log_This('PKG_SOME_UI.Maintain_Partitions','SOME_CONTRACT_NIGHTLY snapshot_date' ||
        to_char(r_nightly_deletes.sdate,'DD-MON-YY') || ' deleted: ' || sql%rowcount);
end loop;
for each_table in (select distinct table_name from user_tab_partitions where table_name like 'SOME%') loop
-- create new partitions
      SplitWeekdayPartitions( each_table.TABLE_NAME);
-- compress (today's) data
for r_compress_ddl in c_compress_ddl ( each_table.table_name ) loop
execute immediate r_compress_ddl.ddl;
        Log_This('PKG_SOME_UI.Maintain_Partitions',each_table.table_name || ' partition compressed: ' || r_compress_ddl.partition_name);
end loop;
-- remove any partitions older than two years
for r_truncate_ddl in c_truncate_ddl ( each_table.table_name  ) loop
 execute immediate r_truncate_ddl.ddl;
          Log_This('PKG_SOME_UI.Maintain_Partitions', each_table.table_name || ' Partition Truncated: ' || r_truncate_ddl.partition_name);
end loop;
end loop;
for x in (  select t.owner, t.table_name, p.partition_name
 from all_tables t
 join all_tab_partitions p
 on t.table_name = p.table_name
where p.table_name like 'SOME%'
 and (p.last_analyzed is null
                  p.table_name = 'SOME_CONTRACT_NIGHTLY'
 and p.partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%' ) )
                      degree=> dbms_stats.AUTO_DEGREE,
                      ownname => x.owner,
                      tabname => x.table_name,
                      partname => x.partition_name,
 cascade => dbms_stats.AUTO_CASCADE );
              Log_This('PKG_SOME_UI.Maintain_Partitions','Statistics gathered for table.partition: ' || 
                                    x.table_name || '.' || x.partition_name);
end loop;
exception when others then
      Rebuild_Indices(); -- do not leave in unuseable state
      raise; -- notify user to investigate and rerun
end Maintain_Partitions;


In this particular case, I was deleting old partitions, and using compression to keep the total database size down.   The SplitWeekdayPartitions created the new ones:

PROCEDURE SplitWeekdayPartitions( p_table_name in varchar2 ) AS
  v_latest_date date;
  v_new_date date;
  v_part_name varchar2(8);
  v_upper_bound varchar2(8);
SELECT to_date(substr(MAX(partition_name) KEEP (DENSE_RANK FIRST ORDER BY partition_position desc ), 2),'YYYYMMDD')
into v_latest_date
FROM (SELECT partition_name, partition_position
 FROM user_tab_partitions
 WHERE table_name = upper(p_table_name) 
 AND partition_name <> 'PMAX' );
if v_latest_date < sysdate
            v_new_date := v_latest_date + 1;
 while to_char(v_new_date, 'd') in (1,7) loop -- skip weekend dates
                v_new_date := v_new_date + 1;
 end loop;
            v_part_name := to_char(v_new_date,'YYYYMMDD');
            v_upper_bound := to_char(v_new_date + 1,'YYYYMMDD');
 execute immediate
            'ALTER TABLE ' || p_table_name || ' SPLIT PARTITION PMAX AT (to_date(' || v_upper_bound ||
                ',''YYYYMMDD'')) INTO ' || '( PARTITION P' || v_part_name || 
            Log_This('PKG_SOME_UI.SplitWeekdayPartitions', 'New partition P' || v_part_name || ' created for ' || p_table_name);
            SplitWeekdayPartitions( p_table_name );-- do we need to catch up to today?
end if;
END SplitWeekdayPartitions;

It’s interesting that I picked a recursive call here instead of a while loop.

… and in the MaintainPartitions proc, RebuildIndices was necessary afterwards because the alter table commands and truncating didn’t trigger index updates.

procedure Rebuild_Indices is
-- LOCAL INDICES   (not using any now but maybe in the future...)
 execute immediate
                          'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD PARTITION ' || x.PARTITION_NAME ;
            Log_This('PKG_SOME_UI.Maintain_Partitions','Partition Index Rebuilt: ' || x.index_name);
end loop;
 WHERE table_name like 'SOME%' and STATUS not in ( 'VALID', 'N/A' ) )
 execute immediate
                        'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD ' ;
             Log_This('PKG_SOME_UI.Maintain_Partitions','Global Index Rebuilt: ' || x.index_name);
end loop;
and that’s it.  Neat feature to plug into your next large table.

An example of one of my favorite projects

by MikeHogg27. September 2011 15:00

Here’s a small one I like.

One time I inherited a system of sorts that supported a single user, with her third party data warehouse application. We didn’t support the warehouse, but we were supposed to get the data extracts that she imported into the warehouse at monthly intervals. The existing IT process was very manual, and very time intensive. As well as involving data from 4 different sources and the queries or processes to get them, it involved a dozen files per run, sometimes up to three people from different departments, with several runs per month, taking four to eight hours each run, and no history or state tracking except to keep the files in folders forever.

The initial attempt to automate this also left behind a number of files and processes to maintain, and it had been running for over a year with 60 monthly man hours of IT dedicated to it and now several hundred files, folders, and processes in assorted locations.

This is one of my favorite jobs. People put a mess in front of me and I turn it into something easy to use that saves time. One of the things that bugged me about the existing process was that there was no history and it took too long. I expanded our small database to include tables for each of our entities, and started automating the extracts in a nightly process. This had the effect of making the user’s request time drop from several hours for the complicated queries to almost instant since we were now caching the data ourselves, as well as provided an easy way for IT to hook into historic runs.

Another thing that I wanted to change was to streamline the steps. The existing process exported from data sources, inserted into databases, extracted into files, joined with other datasources, imported into databases again. So I built an SSIS package that did the data transformations on our Oracle database and inserted the data directly into the warehouse MSSQL server. This removed the need for the files and a whole staging process, and made the whole process easier to maintain from an IT perspective.

Another thing that I wanted to change was to remove the IT resource component. I don’t believe IT needs to be involved for day to day business operation requests, unless something breaks. So I built a simple WPF click-once intranet application with a handful of features, enabling the user to run the whole extract/import instantly for any date they choose, and even view the data by Excel export if they want. I like that it not only provided so much convenience for the user, but that it dropped the IT cost to maintain from an average of 60 monthly man hours to almost zero.

Mapping patterns in Oracle

by MikeHogg28. June 2011 19:38

In my research, Mapping patterns came in three forms and they varied by levels of complexity.  The simplest form was a mapping table that most people recognize as a many to many join.  This is easy to use, and easy to understand, but doesn’t allow dynamic mapping.  You have to have both entities primary keys and then you have to do an insert to map the relationship.  This won’t work on a data store that is constantly receiving new data.

The second form allows dynamic mapping based on attributes.  You can set up a process that automatically inserts the rows or assigned the mappings according to each entity’s attribute value.  Then you could set up a process where every new entity whose X = Y could be automatically mapped to entity A.  This requires columns for each attribute though, and didn’t allow multiple entities.

The third form of mapping table, involves dynamic attributes, dynamic tables, and allows the most flexibility, but is the most difficult to implement and use, which you realize just as soon as you start writing your procs against it.

I happened to inherit a system where the original consultant set up the third form, and the following generations of developers built a variety of complicated tools and complex business processes to maintain it.  While it took me a while to understand it, once I did, and realized its strengths, I also realized a simple UI could abstract away the complexity and make it easy to use.

Behind the UI though had to be the data layer, and while the UI was simple looking, the data layer was where all the complexity had to be simplified.  Here is one of the queries I wrote to run a mapping function on the fly, for one particular entity, having at least three but no more than four different mapping attributes.  This is one of the few times I had to use dynamic SQL in Oracle, whose PL/SQL allowed for great flexibility but would not help me in this case:


procedure MapUnmappedparts(p_certain_date my_part.certain_date%type) is
cursor c_mappings is
select  v.id, v.value, v.schema, v.exch_id,
            firstkey.attr_name firstname, regexp_substr(v.key, '[^()]+',1,2) firstvalue,
            secondkey.attr_name secondname, regexp_substr(v.key, '[^=()]+',1,4) secondvalue,
            thirdkey.attr_name thirdname, regexp_substr(v.key, '[^=()]+',1,6) thirdvalue,
            fourthkey.attr_name fourthname, regexp_substr(v.key, '[^=()]+',1,8) fourthvalue,
            fifthkey.attr_name fifthname, regexp_substr(v.key, '[^=()]+',1,10) fifthvalue,
            sixthkey.attr_name sixthname, regexp_substr(v.key, '[^=()]+',1,12) sixthvalue,
            seventhkey.attr_name seventhname, regexp_substr(v.key, '[^=()]+',1,14) seventhvalue
from my_values v
join my_attributes firstkey on firstkey.short_name = regexp_substr(v.key, '[^=]+',1,1)
join my_attributes secondkey on secondkey.short_name = regexp_substr(v.key, '[^=()]+',1,3)
join my_attributes thirdkey on thirdkey.short_name = regexp_substr(v.key, '[^=()]+',1,5)
left outer join my_attributes fourthkey on fourthkey.short_name = regexp_substr(v.key, '[^=()]+',1,7)
left outer join my_attributes fifthkey on fifthkey.short_name = regexp_substr(v.key, '[^=()]+',1,9) 
left outer join my_attributes sixthkey on sixthkey.short_name = regexp_substr(v.key, '[^=()]+',1,11) 
left outer join my_attributes seventhkey on seventhkey.short_name = regexp_substr(v.key, '[^=()]+',1,13) 
join my_schemas s on s.sch_id = v.schema and s.key_order = 1
order by s.eval_order, s.sch_id;
    v_query varchar2(2000);
    Pkg_somy__Log.writeMessage(p_module   => 'Pkg_ProductMapper.MapUnmappedparts',
                              p_severity => 'DEBUG',
                              p_message  =>  'Started');
for r_mapping in c_mappings loop
      v_query := 'merge into my_part_Detail d
 using (select dd.id, h.co, h.d_type, dd.curve, dd.delivery_point, 
                                dd.curve_2, dd.commodity, h.part_start_month, h.part_end_month, 
                                dd.term_of_trading_cd, h.price_src_1, h.price_src_2, h.term,
                                dd.vintage, dd.adj_dm_year, h.product_code, dd.region, dd.technology
 from my_part_detail dd join my_part_header h on dd.eti = h.eti
 where dd.certain_date = :1
 and dd.pf_code is null
 and nvl(dd.somy__status,''nvl'') <> ''ORI''
                           ) unmapped on (unmapped.id = d.id)
 when matched then update set d.pf_code = :2
 where 1=1 ';
      v_query := v_query || ' and unmapped.' || r_mapping.firstname || ' = :3 ';
      v_query := v_query || ' and unmapped.' || r_mapping.secondname || ' = :4 ';
      v_query := v_query || ' and unmapped.' || r_mapping.thirdname || ' = :5 ';
if r_mapping.fourthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.fourthname || ' = :6 ';end if;
if r_mapping.fifthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.fifthname || ' = :7 ';end if;
if r_mapping.sixthname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.sixthname || ' = :8 ';end if;
if r_mapping.seventhname is not null
then      v_query := v_query || ' and unmapped.' || r_mapping.seventhname || ' = :9 ';end if;
when r_mapping.seventhname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue, r_mapping.seventhvalue ;
when r_mapping.sixthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue, r_mapping.sixthvalue ;
when r_mapping.fifthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue,
                                          r_mapping.fifthvalue ;
when r_mapping.fourthname is not null then
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue, r_mapping.fourthvalue ;
execute immediate v_query using p_certain_date, r_mapping.value, r_mapping.firstvalue,
                                          r_mapping.secondvalue, r_mapping.thirdvalue ;
end case;
end loop;



Repository Patterns without Entity Framework

by MikeHogg2. March 2011 18:03

At the time EF came out, I was usually interfacing with million row Oracle databases, and EF didn’t connect to Oracle.  It wasn’t until I started working in a different industry later on that I got to enjoy EF.  So for some time my Repositories wrapped a Stored Procedure Data Layer. 

Here’s an example of my Repository usage in WPF using MVVM, LINQ, DI.  This was a financials reporting application and one of the interesting features that I was tasked with in this case was running calculations on the fly based on data from its different sources. 

First a couple simple models: The Group Model and a Details Model, both exposing some basic datasource properties as well as some derived properties and the methods to calculate them, and the Group with a List<> of Details:


public class MatchGroup : IEquatable<MatchGroup>// IEquatable for LINQ.Distinct()
public decimal Match_Group_Id { get; set; }
public string Product { get; set; }
public DateTime Contract_Month { get; set; }
public decimal? Total_CER_Lots { get; set; }
public decimal? Total_GS_Lots { get; set; }
public decimal? Lot_Size { get; set; }
public decimal? CER_Price { get; set; }
public decimal? GS_Price { get; set; }
public decimal? CER_Float { get; set; }
public decimal? CER_Final_Float { get; set; }
public decimal? GS_Float { get; set; } 
        ... etc etc
public List<MatchDetail> Details { get; set; }
public string Single_ETI { get; set; }
public string Single_Goldman_Id { get; set; }
        #region Constructor
public MatchGroup()
            Details = new List<MatchDetail>();
public void Calculate()
private void SetSingleETI()
 if (Details.Count() > 1)
                Single_ETI = "+";
                Single_ETI = Details.Select(f => f.ETI).FirstOrDefault();
                Single_Goldman_Id = Details.Select(f => f.Goldman_Id).FirstOrDefault();
private void CheckDetails()
 foreach (MatchDetail d in Details) d.CheckDetails();
private void CheckGroup()
            Float_Diff = GS_Price - CER_Float;
            Price_Diff = GS_Price - CER_Price;
            Qty_Diff = Total_GS_Lots - Total_CER_Lots;
private void SumGroupChecks()
 // most of these can be summed from linq grouping in initial creation? 
            CER_Settled_Total = Details.Sum(d => d.CER_Settled_Total);
            CER_Check = Details.Sum(d => d.CER_Check);
            CER_Diff = Details.Sum(d => d.CER_Diff);
        #region IEquatable
public bool Equals(MatchGroup other)
 return true ? Match_Group_Id == other.Match_Group_Id : false;
public override int GetHashCode()
 return Match_Group_Id == 0 ? 0 : Match_Group_Id.GetHashCode();


The Details Model was similar.

The ViewModel has some UI specific properties, but mostly just exposes the Model properties directly instantiated through constructor injection, as this is a read-only reporting View.

public class MatchGroupVM : ViewModelBase
        Models.MatchGroup _matchgroup;
public decimal Match_Group_Id { get { return _matchgroup.Match_Group_Id; } }
public string Product { get { return _matchgroup.Product; } }
public DateTime Contract_Month { get { return _matchgroup.Contract_Month; } }
public decimal? Total_CER_Lots { get{ return _matchgroup.Total_CER_Lots; } }
public decimal? Total_GS_Lots { get{ return _matchgroup.Total_GS_Lots; } }
public MatchGroupVM(Models.MatchGroup mg){
            _matchgroup = mg;

I use a list of these GroupVMs on the main page or tab’s VM, with Repository injection.

public class FloatPricesVM : ViewModelBase
        #region Fields
        #region Properties
public string SelectedProductName { get; set; }
public int SelectedDeliveryMonth { get; set; }
public int SelectedDeliveryYear { get; set; }
public List<string> ProductNames { get; set; }
public List<int> DeliveryMonths { get; set; }
public List<int> DeliveryYears { get; set; }
public List<MatchGroupVM> MatchGroups
 return _matchgroups;
                SetProperty(ref _matchgroups, value, "MatchGroups");
public FloatPricesVM(Models.MatchGroupRepository repo)
            _repository = repo;
            _error = new ErrorVM();
            _dialog = new DialogVMBase();
            ProductNames = _repository.GetProductNames();
            DeliveryMonths = _repository.GetDeliveryMonths();
            DeliveryYears = _repository.GetDeliveryYears();
            SelectedDeliveryMonth = DateTime.Now.Month - 1;
            SelectedDeliveryYear = DateTime.Now.Year;
        #region Commands
private List<MatchGroupVM> GetMatchGroups(string productname, DateTime deliverymonth)
            List<MatchGroupVM> result = new List<MatchGroupVM>();
 foreach (Models.MatchGroup mg in _repository.GetMatchGroups(productname, deliverymonth))
                    result.Add(new MatchGroupVM(mg));
 catch (Exception e)
                Error = new ErrorVM(e);
 return result;

The Repository wraps our data access layer, which is Oracle Stored Procs in this case.  It gets injected with a Pricing engine provider.

public class MatchGroupRepository
        IFloatPriceProvider _fpp;
public MatchGroupRepository(IFloatPriceProvider fpp)
            _fpp = fpp;
public List<MatchGroup> GetMatchGroups(string productname, DateTime deliverymonth)
            List<MatchGroup> result = new List<MatchGroup>();
            DataTable dt = new DataTable();
                dt = LIB.MyOraDBDAC.GetMatchDetails(productname, deliverymonth);
                var groups = from row in dt.AsEnumerable()
                             group row by new
                                 Id = (decimal)row.Field<OracleDecimal>("MATCH_GROUP_ID"),
                                 P = (string)row.Field<OracleString>("PRODUCT"),
                                 CM = (DateTime)row.Field<OracleDate>("CONTRACT_MONTH")
                             } into mg
                             orderby mg.Key.CM
                             select new Models.MatchGroup
                                 Match_Group_Id = mg.Key.Id,
                                 Product = mg.Key.P,
                                 Contract_Month = mg.Key.CM,
                                 Total_CER_Lots = RoundNullableDecimal(mg.Sum(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("CER_LOTS"))), 0),
                                 Total_GS_Lots = RoundNullableDecimal(mg.Sum(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("GS_LOTS"))), 0),
                                 Lot_Size = mg.Max(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("LOT_SIZE"))),
                                 CER_Price = mg.Max(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("CER_PRICE"))),
                                 GS_Price = mg.Max(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("GS_PRICE"))),
 //CER_Float = _fpp.GetFloat(productname, mg.Key.CM),
                                 CER_Float = mg.Max(f => OracleDecimalToDotNet(f.Field<OracleDecimal>("CER_FLOAT"))),
 //CER_Final_Float = _fpp.GetFloat(productname, mg.Key.CM),
 foreach (Models.MatchGroup mg in groups)
                    mg.Details = GetDetails(mg, dt);
 catch (InvalidCastException ex)
 foreach (DataColumn c in dt.Columns)
                    ex.Data.Add(c.ColumnName, c.DataType.Name);
 return result;
private List<Match.Models.MatchDetail> GetDetails(Models.MatchGroup mg, DataTable dt)
            var dets = from row in dt.AsEnumerable()
                       where (decimal)row.Field<OracleDecimal>("MATCH_GROUP_ID") == mg.Match_Group_Id
                       select new Models.MatchDetail
                           MG = mg,
                           GS_Lots = RoundNullableDecimal(OracleDecimalToDotNet(row.Field<OracleDecimal>("GS_LOTS")), 2),
                           Goldman_Id = OracleStringToDotNet(row.Field<OracleString>("GOLDMAN_ID")),
                           ETI = OracleStringToDotNet(row.Field<OracleString>("ETI")),
                           CER_Lots = RoundNullableDecimal(OracleDecimalToDotNet(row.Field<OracleDecimal>("CER_LOTS")), 2),
                           CER_Settled_Value_TC = OracleDecimalToDotNet(row.Field<OracleDecimal>("CER_SETTLED_VALUE_TC")),
 return dets.ToList();
private decimal? RoundNullableDecimal(decimal? d, int decimalplaces)
 return d.HasValue ? (decimal?)Decimal.Round(d.Value, decimalplaces) : null;
private decimal? OracleDecimalToDotNet(OracleDecimal od)
            OracleDecimal d = OracleDecimal.SetPrecision(od, 28);
 return (d.IsNull) ? null : (decimal?)d;
private string OracleStringToDotNet(OracleString os)
 return (os.IsNull) ? null : (string)os;


There’s some neat stuff in there, catching InvalidCastException and adding info before re-throwing the error, Linq grouping, and here you see the call to the model’s Calculate function we showed in the beginning.

and that’s it.


WPF | Linq

About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more.amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here