Creating large tables in Oracle with dynamic partitions to allow for instant reads

by MikeHogg 29. September 2011 14:55

A lot of my work in Oracle was with tables between a million rows and a billion, and so indexing and query performance was always part of writing the system.  In this case I knew my table was going to grow a quarter to a half million rows a day, and so to retain immediate selects, I needed partitions.  While it is easy to build a table with the initial partitions, you have to maintain the scheme with a nightly or a weekly job.  Here is the initial table creation:

create   table SOME_CONTRACT 
(
  ID                             NUMBER not null,
  CONTRACT_ID                    VARCHAR2(150) not null,
  MONTH                          NUMBER not null,
  CUSTOMER_ID                    VARCHAR2(150),
  BUSUNIT_ID                     VARCHAR2(25),
  SEGMENT_ID                     VARCHAR2(25),
  PRODUCT_ID                     VARCHAR2(100),
  <snip some hundred columns>
  ZONE_MARKER                    VARCHAR2(50),
  ACTIVE_FG                      NUMBER,
  SNAPSHOT_DATE                  DATE,
  AUDIT_DATE                     DATE
)
tablespace SOME_TBS01
  pctfree 10
  initrans 1
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  )
partition by range (snapshot_date)(  
partition P20110101 values less than (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110201 values less than (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110301 values less than (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110401 values less than (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110501 values less than (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110601 values less than (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110701 values less than (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110801 values less than (TO_DATE(' 2011-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110901 values less than (TO_DATE(' 2011-09-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110902 values less than (TO_DATE(' 2011-09-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110905 values less than (TO_DATE(' 2011-09-06 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110906 values less than (TO_DATE(' 2011-09-07 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110907 values less than (TO_DATE(' 2011-09-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110908 values less than (TO_DATE(' 2011-09-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110909 values less than (TO_DATE(' 2011-09-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110912 values less than (TO_DATE(' 2011-09-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110913 values less than (TO_DATE(' 2011-09-14 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110914 values less than (TO_DATE(' 2011-09-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110915 values less than (TO_DATE(' 2011-09-16 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition PMAX values less than (TO_DATE(' 2099-12-31 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
)
-- Create/Recreate primary, unique and foreign key constraints 
alter table SOME_CONTRACT
  add constraint PK_SOMECONTRACTS_ID primary key (ID)
  using index 
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  );
-- Create/Recreate indexes 
 
create unique index UK_SOME_PK_CONTR_ACTIVE on SOME_CONTRACT (DECODE(ACTIVE_FG,1,CONTRACT_ID||TO_CHAR(MONTH)))
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  );

So new rows get inserted into PMAX each day in a daily process.  Partitions in Oracle are like separate tables, and in our case, most of our select queries will always be on snapshot_date, so it is like we are querying a single table.  We can also add indices if we want.  I think in my case here I was using an active flag and so I added a composite index with the PK and the column Active_fg.

To maintain this scheme, you have to periodically add new partitions, and split up PMAX.  I did that with a scheduled job, that ran a Stored Proc each night.

procedure Maintain_Partitions is
  cursor c_nightly_deletes is
            select sdate 
             from (select distinct snapshot_date sdate, count(distinct snapshot_date) over (partition by month) total
                     from SOME_contract_nightly 
                    where month = to_char(  sysdate  ,'YYYYMM')   
                    order by snapshot_date ) 
            where rownum < total - 3;  -- keep three latest in any given month
            
  cursor c_compress_ddl (p_table_name varchar2 ) is
         select 'alter table ' || p_table_name || ' move partition ' || partition_name || ' compress' ddl, partition_name
           from user_tab_partitions
          where table_name = p_table_name
            and partition_name not like '%MAX%'
            and (compression <> 'ENABLED'
                 or -- recompress NIGHTLYs since those older than 3 days are being deleted
                 table_name = 'SOME_CONTRACT_NIGHTLY' and partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%'
                 );
            
  cursor c_truncate_ddl ( p_table varchar2  ) is
         select 'alter table ' || table_name || ' truncate partition ' || partition_name || ' update global indexes' as ddl, partition_name
           from user_tab_partitions
          where table_name = p_table
            and partition_name not like '%MAX%'
            and to_date( trim( substr(partition_name, 2) ), 'YYYYMMDD' )   <= (add_months(sysdate,-24));
          
begin
    
    for r_nightly_deletes in c_nightly_deletes loop
      delete from SOME_contract_nightly 
        where snapshot_date = r_nightly_deletes.sdate;
        
      Log_This('PKG_SOME_UI.Maintain_Partitions','SOME_CONTRACT_NIGHTLY snapshot_date' ||
        to_char(r_nightly_deletes.sdate,'DD-MON-YY') || ' deleted: ' || sql%rowcount);
    
    end loop;
      
    for each_table in (select distinct table_name from user_tab_partitions where table_name like 'SOME%') loop 
      
      -- create new partitions
      SplitWeekdayPartitions( each_table.TABLE_NAME);
      
      -- compress (today's) data
      for r_compress_ddl in c_compress_ddl ( each_table.table_name ) loop
        execute immediate r_compress_ddl.ddl;
        Log_This('PKG_SOME_UI.Maintain_Partitions',each_table.table_name || ' partition compressed: ' || r_compress_ddl.partition_name);
      end loop;
      
      -- remove any partitions older than two years
      for r_truncate_ddl in c_truncate_ddl ( each_table.table_name  ) loop 
              execute immediate r_truncate_ddl.ddl;
          Log_This('PKG_SOME_UI.Maintain_Partitions', each_table.table_name || ' Partition Truncated: ' || r_truncate_ddl.partition_name);
      end loop; 
       
    end loop;
    
    
    Rebuild_Indices();
    
    -- GATHER STATISTICS FOR NEW PARTITIONS
    for x in (  select t.owner, t.table_name, p.partition_name 
            from all_tables t
            join all_tab_partitions p
              on t.table_name = p.table_name 
           where p.table_name like 'SOME%' 
             and (p.last_analyzed is null 
                  or
                  p.table_name = 'SOME_CONTRACT_NIGHTLY' 
                  and p.partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%' ) )
    loop
              dbms_stats.gather_table_stats(
                      degree=> dbms_stats.AUTO_DEGREE,
                      ownname => x.owner,
                      tabname => x.table_name,
                      partname => x.partition_name,
                      cascade => dbms_stats.AUTO_CASCADE );
              Log_This('PKG_SOME_UI.Maintain_Partitions','Statistics gathered for table.partition: ' || 
                                    x.table_name || '.' || x.partition_name);
    end loop;  
      
    exception when others then
      Rebuild_Indices(); -- do not leave in unuseable state
      raise; -- notify user to investigate and rerun
end Maintain_Partitions;

 

In this particular case, I was deleting old partitions, and using compression to keep the total database size down.   The SplitWeekdayPartitions created the new ones:

--------- PARTITIONING
PROCEDURE SplitWeekdayPartitions( p_table_name in varchar2 ) AS
  v_latest_date date;
  v_new_date date; 
  v_part_name varchar2(8);
  v_upper_bound varchar2(8);
BEGIN
 
     SELECT to_date(substr(MAX(partition_name) KEEP (DENSE_RANK FIRST ORDER BY partition_position desc ), 2),'YYYYMMDD')
       into v_latest_date 
       FROM (SELECT partition_name, partition_position 
               FROM user_tab_partitions
              WHERE table_name = upper(p_table_name) 
                AND partition_name <> 'PMAX' );
                 
        if v_latest_date < sysdate
        then
            v_new_date := v_latest_date + 1;
            while to_char(v_new_date, 'd') in (1,7) loop -- skip weekend dates
                v_new_date := v_new_date + 1;
            end loop;
 
            v_part_name := to_char(v_new_date,'YYYYMMDD');
            v_upper_bound := to_char(v_new_date + 1,'YYYYMMDD');
            execute immediate 
            'ALTER TABLE ' || p_table_name || ' SPLIT PARTITION PMAX AT (to_date(' || v_upper_bound ||
                ',''YYYYMMDD'')) INTO ' || '( PARTITION P' || v_part_name ||  
                ', PARTITION PMAX ) UPDATE GLOBAL INDEXES';
                
                
            Log_This('PKG_SOME_UI.SplitWeekdayPartitions', 'New partition P' || v_part_name || ' created for ' || p_table_name);
            
            SplitWeekdayPartitions( p_table_name );   -- do we need to catch up to today?
        end if;
END SplitWeekdayPartitions;

It’s interesting that I picked a recursive call here instead of a while loop.

… and in the MaintainPartitions proc, RebuildIndices was necessary afterwards because the alter table commands and truncating didn’t trigger index updates.

procedure Rebuild_Indices is 
begin 
    -- LOCAL INDICES   (not using any now but maybe in the future...)
    for x in ( SELECT UIP.INDEX_NAME, UIP.PARTITION_NAME, UIP.COMPRESSION
            FROM USER_IND_PARTITIONS UIP
            WHERE UIP.STATUS <> 'USABLE' )
    loop
            execute immediate
                          'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD PARTITION ' || x.PARTITION_NAME ;
            Log_This('PKG_SOME_UI.Maintain_Partitions','Partition Index Rebuilt: ' || x.index_name);
    end loop;
        
    -- GLOBAL INDICES     
    for x in ( SELECT INDEX_NAME
                FROM USER_INDEXES
                WHERE table_name like 'SOME%' and STATUS not in ( 'VALID', 'N/A' ) )
    loop
             execute immediate
                        'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD ' ;
             Log_This('PKG_SOME_UI.Maintain_Partitions','Global Index Rebuilt: ' || x.index_name);
    end loop;   
end;    
and that’s it.  Neat feature to plug into your next large table.

An example of one of my favorite projects

by MikeHogg 27. September 2011 15:00

Here’s a small one I like.

One time I inherited a system of sorts that supported a single user, with her third party data warehouse application. We didn’t support the warehouse, but we were supposed to get the data extracts that she imported into the warehouse at monthly intervals. The existing IT process was very manual, and very time intensive. As well as involving data from 4 different sources and the queries or processes to get them, it involved a dozen files per run, sometimes up to three people from different departments, with several runs per month, taking four to eight hours each run, and no history or state tracking except to keep the files in folders forever.

The initial attempt to automate this also left behind a number of files and processes to maintain, and it had been running for over a year with 60 monthly man hours of IT dedicated to it and now several hundred files, folders, and processes in assorted locations.

This is one of my favorite jobs. People put a mess in front of me and I turn it into something easy to use that saves time. One of the things that bugged me about the existing process was that there was no history and it took too long. I expanded our small database to include tables for each of our entities, and started automating the extracts in a nightly process. This had the effect of making the user’s request time drop from several hours for the complicated queries to almost instant since we were now caching the data ourselves, as well as provided an easy way for IT to hook into historic runs.

Another thing that I wanted to change was to streamline the steps. The existing process exported from data sources, inserted into databases, extracted into files, joined with other datasources, imported into databases again. So I built an SSIS package that did the data transformations on our Oracle database and inserted the data directly into the warehouse MSSQL server. This removed the need for the files and a whole staging process, and made the whole process easier to maintain from an IT perspective.

Another thing that I wanted to change was to remove the IT resource component. I don’t believe IT needs to be involved for day to day business operation requests, unless something breaks. So I built a simple WPF click-once intranet application with a handful of features, enabling the user to run the whole extract/import instantly for any date they choose, and even view the data by Excel export if they want. I like that it not only provided so much convenience for the user, but that it dropped the IT cost to maintain from an average of 60 monthly man hours to almost zero.

About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more. amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here