Creating large tables in Oracle with dynamic partitions to allow for instant reads

by MikeHogg29. September 2011 14:55

A lot of my work in Oracle was with tables between a million rows and a billion, and so indexing and query performance was always part of writing the system.  In this case I knew my table was going to grow a quarter to a half million rows a day, and so to retain immediate selects, I needed partitions.  While it is easy to build a table with the initial partitions, you have to maintain the scheme with a nightly or a weekly job.  Here is the initial table creation:

createtable SOME_CONTRACT
(
  ID                             NUMBER not null,
  CONTRACT_IDVARCHAR2(150) not null,
MONTH                          NUMBER not null,
  CUSTOMER_IDVARCHAR2(150),
  BUSUNIT_IDVARCHAR2(25),
  SEGMENT_IDVARCHAR2(25),
  PRODUCT_IDVARCHAR2(100),
  <snip some hundred columns>
  ZONE_MARKERVARCHAR2(50),
  ACTIVE_FG                      NUMBER,
  SNAPSHOT_DATEDATE,
  AUDIT_DATEDATE
)
tablespace SOME_TBS01
  pctfree 10
  initrans 1
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  )
partition by range (snapshot_date)( 
partition P20110101 values less than (TO_DATE(' 2011-02-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110201 values less than (TO_DATE(' 2011-03-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110301 values less than (TO_DATE(' 2011-04-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110401 values less than (TO_DATE(' 2011-05-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110501 values less than (TO_DATE(' 2011-06-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110601 values less than (TO_DATE(' 2011-07-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110701 values less than (TO_DATE(' 2011-08-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110801 values less than (TO_DATE(' 2011-09-01 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110901 values less than (TO_DATE(' 2011-09-02 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN'))    tablespace CER_ABM_TBS01    pctfree 0    initrans 1    maxtrans 255    storage    (      initial 64K      minextents 1      maxextents unlimited    ),
partition P20110902 values less than (TO_DATE(' 2011-09-03 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110905 values less than (TO_DATE(' 2011-09-06 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110906 values less than (TO_DATE(' 2011-09-07 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110907 values less than (TO_DATE(' 2011-09-08 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110908 values less than (TO_DATE(' 2011-09-09 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110909 values less than (TO_DATE(' 2011-09-10 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110912 values less than (TO_DATE(' 2011-09-13 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110913 values less than (TO_DATE(' 2011-09-14 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110914 values less than (TO_DATE(' 2011-09-15 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition P20110915 values less than (TO_DATE(' 2011-09-16 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) ,
  partition PMAX values less than (TO_DATE(' 2099-12-31 00:00:00', 'SYYYY-MM-DD HH24:MI:SS', 'NLS_CALENDAR=GREGORIAN')) 
)
-- Create/Recreate primary, unique and foreign key constraints 
alter table SOME_CONTRACT
add constraint PK_SOMECONTRACTS_ID primary key (ID)
using index
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  );
-- Create/Recreate indexes 
create unique index UK_SOME_PK_CONTR_ACTIVE on SOME_CONTRACT (DECODE(ACTIVE_FG,1,CONTRACT_ID||TO_CHAR(MONTH)))
  tablespace SOME_TBS01
  pctfree 10
  initrans 2
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  );

So new rows get inserted into PMAX each day in a daily process.  Partitions in Oracle are like separate tables, and in our case, most of our select queries will always be on snapshot_date, so it is like we are querying a single table.  We can also add indices if we want.  I think in my case here I was using an active flag and so I added a composite index with the PK and the column Active_fg.

To maintain this scheme, you have to periodically add new partitions, and split up PMAX.  I did that with a scheduled job, that ran a Stored Proc each night.

procedure Maintain_Partitions is
cursor c_nightly_deletes is
 select sdate
 from (select distinct snapshot_date sdate, count(distinct snapshot_date) over (partition by month) total
 from SOME_contract_nightly
 where month = to_char(  sysdate  ,'YYYYMM') 
 order by snapshot_date ) 
 where rownum < total - 3;-- keep three latest in any given month
 
cursor c_compress_ddl (p_table_name varchar2 ) is
select 'alter table ' || p_table_name || ' move partition ' || partition_name || ' compress' ddl, partition_name
from user_tab_partitions
where table_name = p_table_name
 and partition_name not like '%MAX%'
 and (compression <> 'ENABLED'
 or -- recompress NIGHTLYs since those older than 3 days are being deleted
                 table_name = 'SOME_CONTRACT_NIGHTLY' and partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%'
                 );
 
cursor c_truncate_ddl ( p_table varchar2  ) is
select 'alter table ' || table_name || ' truncate partition ' || partition_name || ' update global indexes' as ddl, partition_name
from user_tab_partitions
where table_name = p_table
 and partition_name not like '%MAX%'
 and to_date( trim( substr(partition_name, 2) ), 'YYYYMMDD' )   <= (add_months(sysdate,-24));
 
begin
for r_nightly_deletes in c_nightly_deletes loop
delete from SOME_contract_nightly
where snapshot_date = r_nightly_deletes.sdate;
      Log_This('PKG_SOME_UI.Maintain_Partitions','SOME_CONTRACT_NIGHTLY snapshot_date' ||
        to_char(r_nightly_deletes.sdate,'DD-MON-YY') || ' deleted: ' || sql%rowcount);
end loop;
for each_table in (select distinct table_name from user_tab_partitions where table_name like 'SOME%') loop
-- create new partitions
      SplitWeekdayPartitions( each_table.TABLE_NAME);
-- compress (today's) data
for r_compress_ddl in c_compress_ddl ( each_table.table_name ) loop
execute immediate r_compress_ddl.ddl;
        Log_This('PKG_SOME_UI.Maintain_Partitions',each_table.table_name || ' partition compressed: ' || r_compress_ddl.partition_name);
end loop;
-- remove any partitions older than two years
for r_truncate_ddl in c_truncate_ddl ( each_table.table_name  ) loop
 execute immediate r_truncate_ddl.ddl;
          Log_This('PKG_SOME_UI.Maintain_Partitions', each_table.table_name || ' Partition Truncated: ' || r_truncate_ddl.partition_name);
end loop;
end loop;
    Rebuild_Indices();
-- GATHER STATISTICS FOR NEW PARTITIONS
for x in (  select t.owner, t.table_name, p.partition_name
 from all_tables t
 join all_tab_partitions p
 on t.table_name = p.table_name
where p.table_name like 'SOME%'
 and (p.last_analyzed is null
 or
                  p.table_name = 'SOME_CONTRACT_NIGHTLY'
 and p.partition_name like 'P' || to_char(sysdate,'YYYYMM') ||'%' ) )
    loop
              dbms_stats.gather_table_stats(
                      degree=> dbms_stats.AUTO_DEGREE,
                      ownname => x.owner,
                      tabname => x.table_name,
                      partname => x.partition_name,
 cascade => dbms_stats.AUTO_CASCADE );
              Log_This('PKG_SOME_UI.Maintain_Partitions','Statistics gathered for table.partition: ' || 
                                    x.table_name || '.' || x.partition_name);
end loop;
exception when others then
      Rebuild_Indices(); -- do not leave in unuseable state
      raise; -- notify user to investigate and rerun
end Maintain_Partitions;

 

In this particular case, I was deleting old partitions, and using compression to keep the total database size down.   The SplitWeekdayPartitions created the new ones:

--------- PARTITIONING
PROCEDURE SplitWeekdayPartitions( p_table_name in varchar2 ) AS
  v_latest_date date;
  v_new_date date;
  v_part_name varchar2(8);
  v_upper_bound varchar2(8);
BEGIN
SELECT to_date(substr(MAX(partition_name) KEEP (DENSE_RANK FIRST ORDER BY partition_position desc ), 2),'YYYYMMDD')
into v_latest_date
FROM (SELECT partition_name, partition_position
 FROM user_tab_partitions
 WHERE table_name = upper(p_table_name) 
 AND partition_name <> 'PMAX' );
 
if v_latest_date < sysdate
then
            v_new_date := v_latest_date + 1;
 while to_char(v_new_date, 'd') in (1,7) loop -- skip weekend dates
                v_new_date := v_new_date + 1;
 end loop;
            v_part_name := to_char(v_new_date,'YYYYMMDD');
            v_upper_bound := to_char(v_new_date + 1,'YYYYMMDD');
 execute immediate
            'ALTER TABLE ' || p_table_name || ' SPLIT PARTITION PMAX AT (to_date(' || v_upper_bound ||
                ',''YYYYMMDD'')) INTO ' || '( PARTITION P' || v_part_name || 
                ', PARTITION PMAX ) UPDATE GLOBAL INDEXES';
 
 
            Log_This('PKG_SOME_UI.SplitWeekdayPartitions', 'New partition P' || v_part_name || ' created for ' || p_table_name);
 
            SplitWeekdayPartitions( p_table_name );-- do we need to catch up to today?
end if;
END SplitWeekdayPartitions;

It’s interesting that I picked a recursive call here instead of a while loop.

… and in the MaintainPartitions proc, RebuildIndices was necessary afterwards because the alter table commands and truncating didn’t trigger index updates.

procedure Rebuild_Indices is
begin
-- LOCAL INDICES   (not using any now but maybe in the future...)
for x in ( SELECT UIP.INDEX_NAME, UIP.PARTITION_NAME, UIP.COMPRESSION
 FROM USER_IND_PARTITIONS UIP
 WHERE UIP.STATUS <> 'USABLE' )
    loop
 execute immediate
                          'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD PARTITION ' || x.PARTITION_NAME ;
            Log_This('PKG_SOME_UI.Maintain_Partitions','Partition Index Rebuilt: ' || x.index_name);
end loop;
-- GLOBAL INDICES
for x in ( SELECT INDEX_NAME
 FROM USER_INDEXES
 WHERE table_name like 'SOME%' and STATUS not in ( 'VALID', 'N/A' ) )
    loop
 execute immediate
                        'ALTER INDEX ' || x.INDEX_NAME || ' REBUILD ' ;
             Log_This('PKG_SOME_UI.Maintain_Partitions','Global Index Rebuilt: ' || x.index_name);
end loop;
end;
and that’s it.  Neat feature to plug into your next large table.

An example of one of my favorite projects

by MikeHogg27. September 2011 15:00

Here’s a small one I like.

One time I inherited a system of sorts that supported a single user, with her third party data warehouse application. We didn’t support the warehouse, but we were supposed to get the data extracts that she imported into the warehouse at monthly intervals. The existing IT process was very manual, and very time intensive. As well as involving data from 4 different sources and the queries or processes to get them, it involved a dozen files per run, sometimes up to three people from different departments, with several runs per month, taking four to eight hours each run, and no history or state tracking except to keep the files in folders forever.

The initial attempt to automate this also left behind a number of files and processes to maintain, and it had been running for over a year with 60 monthly man hours of IT dedicated to it and now several hundred files, folders, and processes in assorted locations.

This is one of my favorite jobs. People put a mess in front of me and I turn it into something easy to use that saves time. One of the things that bugged me about the existing process was that there was no history and it took too long. I expanded our small database to include tables for each of our entities, and started automating the extracts in a nightly process. This had the effect of making the user’s request time drop from several hours for the complicated queries to almost instant since we were now caching the data ourselves, as well as provided an easy way for IT to hook into historic runs.

Another thing that I wanted to change was to streamline the steps. The existing process exported from data sources, inserted into databases, extracted into files, joined with other datasources, imported into databases again. So I built an SSIS package that did the data transformations on our Oracle database and inserted the data directly into the warehouse MSSQL server. This removed the need for the files and a whole staging process, and made the whole process easier to maintain from an IT perspective.

Another thing that I wanted to change was to remove the IT resource component. I don’t believe IT needs to be involved for day to day business operation requests, unless something breaks. So I built a simple WPF click-once intranet application with a handful of features, enabling the user to run the whole extract/import instantly for any date they choose, and even view the data by Excel export if they want. I like that it not only provided so much convenience for the user, but that it dropped the IT cost to maintain from an average of 60 monthly man hours to almost zero.

About Mike Hogg

Mike Hogg is a c# developer in Brooklyn.

More Here

Favorite Books

This book had the most influence on my coding style. It drastically changed the way I write code and turned me on to test driven development even if I don't always use it. It made me write clearer, functional-style code using more principles such as DRY, encapsulation, single responsibility, and more.amazon.com

This book opened my eyes to a methodical and systematic approach to upgrading legacy codebases step by step. Incrementally transforming code blocks into testable code before making improvements. amazon.com

More Here