Picasso Database Query Optimizer
Visualizer
©Indian Institute of
Science,
TPC-H data
generation and loading
Note:
Picasso can be used with any generic relational database schema and SQL queries.
The examples in the Picasso documentation are with respect to the TPC-H
benchmark, and the procedure for generating and loading the TPC-H database is
given here.
1.
Download the TPC-H benchmark programs DBGEN and QGEN
from the TPC-H site.
2. Create
the TPC-H schema in your database engine using the file dss.ddl as per these instructions: DB2 Oracle SQL Server
Sybase ASE PostgreSQL MySql
3.
Generate the data.
a.
Compile DBGEN and QGEN after creating a suitable makefile.
b. Run
it to produce the data files (.tbl files) and queries.
4.
Load the TPC-H data as per these instructions: DB2 Oracle SQL Server
Sybase ASE
PostgreSQL MySql
Note: While loading
data into tables, it is preferable to initially not specify the integrity constraints (primary key, foreign
key and index) since
it might slow down the insertion process. These constraints can be added later
as mentioned below after the data has been loaded into the tables.
5.
Apply the integrity constraints on the data using file
dss.ri as
per these instructions: DB2 Oracle SQL Server Sybase ASE PostgreSQL MySql
Note
1: The dss.ddl and dss.ri files may have
to be modified to suit the syntax requirements of particular database
engines. For example:
a. The connect to statement should not be present for Oracle. Also this statement is not required (for
other engines) if a connection to the database has already been
established. The equivalent command for
Sybase ASE (through the isql
prompt) is use <dbname>. And for PostgreSQL
(through the psql
prompt), it is \c
<dbname>
b. Sybase
ASE (through the Interactive SQL
prompt) requires that the script should not have a semi-colon (;) at the end of any statement.
c. The
schema name (qualifier to the table names) should be changed to reflect the
actual schema name or the table names should be made un-qualified.
d. While
specifying foreign key constraints, Oracle requires the keyword constraint and the constraint name to
appear before the keywords foreign key.
e. While
specifying any constraints, SQL Server, Sybase ASE and PostgreSQL
require that these constraints should not
be named.
f. The
isql prompt of Sybase
ASE requires that there should be no semi-colon
after each statement – instead there should be a go
command.
g. The
commit work statements are not needed
since any ddl
command is generally automatically followed by a commit. In the case of some
engines, this could also lead to a syntax error.
h. SQLServer uses the datetime
data type in place of date.
Note 2: The .tbl file may have to be modified to suit the
syntax requirements of particular database engines. For example,
a.
PostgreSQL
requires that the delimiter should not appear
at the end of every line and that string values should not be quoted.
b.
DB2 requires that the values should be comma-separated
and the string values should be enclosed in double quotes (“”).