Tutorial 2: Create ETL Files
Now the scott schema has been created, ETL files can be built to create the tables and insert the data into PostgreSQL.
Use the
filesonly.pypython script in thetutorialdirectory.Create a path to store the created files and modify the
filesonly.pyconstantETL_PATH.The
filesonly.pyscript will create a new directory for each run. This directory name will be based on the date and time of the run. These will all be in a further sub-directory of theETL_PATHdirectory, calledexports.Set Oracle Instant Client location.
Locate your local Oracle Instant Client location and modify the
ORACLE_INSTANT_CLIENTconstant in thefilesonly.pyscript.Modify the connection constants.
... ORAUSER = 'scott' ORAPASS = 'tiger' ORAPORT = '1521' ORAHOST = '<specific host>' ORADBASE = '<specific database>' ...
Run the ETL files on PostgreSQL.
You can now create the scott tables on PostgreSQL. Using
psql, run the newly created_run_.sqlfile. The command to run can be found as a comment at the top of the_run_.sqlfile.Important
psql needs to be set to the same character set as
easyo2p, which by default is UTF-8. To set psql to UTF-8, set thePGCLIENTENCODINGenvironment variable before running psql.SET PGCLIENTENCODING=utf-8
psql -d easyo2p_db -U easyo2p_user easyo2p_db=> \i 'C:\Temp\EasyO2P\filesonly\<datetime>\_run_.sql'
Note
The run script drops the scott schema before recreating it, so it can be run after any of the following steps where the files are created.
Run the script.
Run the
filesonly.pyscript. When complete, four files should be created in the<ETL_PATH>/exports/<datetime>/etldirectory. One for each of the table create statements, and another one for the constraints and indexes.Turn on the data parameter.
... parameters = { easyo2p.ETL_DATA: True, ...
Run the script again.
Run the
filesonly.pyscript. When complete, a new dated directory should contain six migration files, four as before, and two new files containing data for each table, along with the run and log files.Migrate only selected tables.
By default, all tables in the schema will be migrated. To select specific tables, include the
TABLESparameter. This expects a list of table names.To migrate only the EMP table modify the parameters declaration:
.. parameters = { easyo2p.TABLES: ['EMP'], ...
When the script is run now only the EMP table is migrated.
ETL Files
The migration process creates up to three files for each table.
Files suffixed .3.sql contain the table create statements.
These should all be run first.
Files suffixed .4.sql contain the data insert statements.
These should always be all run next, before the constraints.
Files suffixed .5.sql contain the constraint, index and trigger statements.
These should always be all run afterwards,
to ensure all referencing tables are created and not to affect the data migration.
Note
Files suffixed
.1.sqlare custom pre-build files, such as PL/pgSQL objects, to be run first. The file suffixed.2.sqlissequences.2.sql, this contains the create sequence statements. Files suffixed.6.sqlare custom post-build files, such as for granting permissions, to be run last of all. These will be covered in more detail in a later step.
We should now understand how to use easyo2p to create ETL scripts
that can build tables and insert data.