Picasso Database Query Optimizer Visualizer

©Indian Institute of Science, Bangalore, India

 

 

 

TPC-H data generation and loading

 

Note: Picasso can be used with any generic relational database schema and SQL queries. The examples in the Picasso documentation are with respect to the TPC-H benchmark, and the procedure for generating and loading the TPC-H database is given here.

 

1.    Download the TPC-H benchmark programs DBGEN and QGEN from the TPC-H site

2.    Create the TPC-H schema in your database engine using the file dss.ddl as per these instructions: DB2  Oracle  SQL Server  Sybase ASE  PostgreSQL MySql 

3.    Generate the data.
 

a.    Compile DBGEN and QGEN after creating a suitable makefile.

b.      Run it to produce the data files (.tbl files) and queries.

4.    Load the TPC-H data as per these instructions: DB2  Oracle  SQL Server  Sybase ASE  PostgreSQL MySql

Note:  While loading data into tables, it is preferable to initially not specify the integrity constraints (primary key, foreign key and index) since it might slow down the insertion process. These constraints can be added later as mentioned below after the data has been loaded into the tables.

5.    Apply the integrity constraints on the data using file dss.ri as per these instructions:  DB2  Oracle  SQL Server  Sybase ASE  PostgreSQL MySql

Note 1:  The dss.ddl and dss.ri files may have to be modified to suit the syntax requirements of particular database engines.  For example:

a.    The connect to statement should not be present for Oracle.  Also this statement is not required (for other engines) if a connection to the database has already been established.  The equivalent command for Sybase ASE (through the isql prompt) is use <dbname>.  And for PostgreSQL (through the psql prompt), it is \c <dbname>

b.    Sybase ASE (through the Interactive SQL prompt) requires that the script should not have a semi-colon (;) at the end of any statement.

c.    The schema name (qualifier to the table names) should be changed to reflect the actual schema name or the table names should be made un-qualified.

d.    While specifying foreign key constraints, Oracle requires the keyword constraint and the constraint name to appear before the keywords foreign key.

e.    While specifying any constraints, SQL Server, Sybase ASE and PostgreSQL require that these constraints should not be named.

f.     The isql prompt of Sybase ASE requires that there should be no semi-colon after each statement – instead there should be a go command.

g.    The commit work statements are not needed since any ddl command is generally automatically followed by a commit.  In the case of some engines, this could also lead to a syntax error.

h.    SQLServer uses the datetime data type in place of date.


Note 2:  The .tbl file may have to be modified to suit the syntax requirements of particular database engines.  For example,

a.    PostgreSQL requires that the delimiter should not appear at the end of every line and that string values should not be quoted.

b.    DB2 requires that the values should be comma-separated and the string values should be enclosed in double quotes (“”).

 

Documentation Home