Enlightened Experiences: Data Pump Architecture & Its Salient Features

Oracle 10g Data Pump is one of the Oracle's greatest innovations in its database 10g release. I was so amazed after seeing its simplistic approach for the complex data transfer. In this blog lets see what is the limitation of traditional exp/imp and how this Data Pump is so powerful than the traditional method (i request you not to think that Data Pump is the upgraded version of traditional exp/imp, it is really a new and powerful feature).

Limitation of exp/imp
*********************
Traditional exp/imp is a client side utility. During export, it extracts the data from the database server through the server process and the data gets formatted and written into the dump file by the user process (in the client side). During import, the user process creates the SQL statements that to be executed in the database server and the server process executes those statements. If you clearly notice the architecture for exp/imp the work load is heavy in the user process. The server does not have any informations of the export/import activities.When the session gets lost or the instance gets crashed then we have to restart the whole export/import process from the beginning.

Data Pump Architecture
**********************
Data Pump is a server side utility, which means the whole process is done in the server unlike the traditional exp/imp. When the user process launches the export/import job the following components in the server side is created to manage the job.

1)Data Pump Master process DMnn
Starts when the Data Pump job is launched. When multiple Data Pump jobs are launched, each job will have its own DM process. Controls the DWnn.

2)Data Pump Worker processes DWnn
Master process starts n number of worker processes, where n is the parallelism mentioned while launching the job.

3)Control or Command queue
All of the work requests created by the master process and the associated responses from the worker processes are passed through it. This queue operates on a deliver-exactly-once model, means messages are enqueued by DMnn and dequeued by DWnn.

4)Status queue
Master process updates this queue with the job status and the error details. This queue operates on a publish-and-subscribe model, means any user process can query the queue to monitor the job’s progress.

5)Master Table
Table contains the detailed information of the current Data Pump operation being performed. Provides the information on each object being exported or imported and their locations in the dump file set, Records the parameters supplied by the calling program, Worker status information, Output file information. The table provides all the required information to restart a failed/stopped Data Pump job. Each Data Pump jobs will have its own master table. This table is created in the schema which launches this Data Pump job and the name of this table is same like job name.

Data Pump job working process
*****************************

The user process establishes the session through server process.
The user process launches the Data Pump job by creating the Master table and starting the Master process. The User process updates the Master table with the job details.
The Master process now starts the worker processes, control queue and status queue.
The Master process checks the job details in the Master table and updates the control queue with command to be executed by the worker processes.
The worker process picks the command from the control queue and executes it (extracts the data and writes into the dump file or reads the dump file and executes the SQL statements on the database). The response is sent back to the control queue. The worker process also updates the Master table with its process status.
The Master process now updates the Master table with the completed and remaining job details and updates the status queue with the completed job status and error details. The Master process updates the log file which describes the history of the job.

Steps to launch the Data Pump jobs
********************************

Create the Oracle directory in the database. Data Pump writes (reads for import) the dump file and the log file in this oracle directory location. Oracle directory is owned by the SYS schema even if it is created by the user who has "CREATE DIRECTORY" privilege.

CREATE DIRECTORY exp_dump AS '/d21/oradata/dump_file';

Grant access on that directory to the user running the Data Pump job.

GRANT READ,WRITE ON DIRECTORY exp_dump to system;

Launch expdp/impdp from the client side.

EXPDP SYSTEM/MANAGER DIRECTORY=exp_dump PARALLEL=5 DUMPFILE=apps_exp_dump%U.dmp LOGFILE=apps_exp_dump.log FULL=y

In this command we mentioned PARALLEL=5 which means Master process will start 5 worker processes.

(For more options on this expdp/impdp please use -help on this command)

Data Pump Features
*******************

Faster in performance

Data Pump is 15-45 times faster than the traditional exp/imp. This high performance is due to the degree of parallelism it supports. Data Pump accesses the data from the database through 2 different techniques Direct path and External path.

Generate SQL files

We can generate the DDL statements of the objects using impdp. This SQL file can be used for the documentation purpose or to re-create the database structure.

IMPDP SYSTEM/MANAGER DIRECTORY=exp_dump DUMPFILE=apps_exp_dump%U.dmp LOGFILE=apps_imp_dump.log SQLFILE=create_database_sql.sql

Transfer the data without creating the dump file

We can transfer the date from one database to another database in the network through database link. Export and Import will run concurrently, the one feeding the other. Amount of time we save using this method is amazing.

IMPDP SYSTEM/MANAGER DIRECTORY=exp_dump NETWORK_LINK=db_link LOGFILE=apps_imp_dump.log

Fine-Grained Object Selections

There are 2 powerful options for selecting the required objects for export or import.

IMPDP SYSTEM/MANAGER DIRECTORY=exp_dump NETWORK_LINK=db_link LOGFILE=apps_imp_dump.log INCLUDE=TABLE:"IN ('EMP','DEPT')"

Only the tables 'EMP' and 'DEPT' will be imported.

IMPDP SYSTEM/MANAGER DIRECTORY=exp_dump NETWORK_LINK=db_link LOGFILE=apps_imp_dump.log EXCLUDE=TABLE:"= 'BONUS'"

Except the table 'BONUS' all other tables will be imported.

Easy to monitor the Data Pump job

Any user process can attach to any Data Pump job running in that database just by knowing the job name. We can get the job name from the log file or from DBA_DATAPUMP_JOBS table. Once attached to the job we can monitor its status using the command status.

IMPDP SYSTEM/MANAGER ATTACH='JOB_NAME'
IMPDP>STATUS

Other ways to monitor the job status are by querying the Master table, querying the dynamic performance view V$SESSION_LONGOPS and looking in to the log file.

Enlightened Experiences

Thursday, August 23, 2007

Data Pump Architecture & Its Salient Features

No comments:

BASHA

Blog Archive

Labels

About Me

BLOGROLL