TPC-H

TPC-H is an ad-hoc, decision support benchmark. TPC benchmarks are widely used today in evaluating the performance of relational database systems; the results are published on the TPC web site. TPC-H provides the detail specification on the benchmark.

Please be aware that this page is the record of the expriment done on Sept. 2009. I have not checked any TPC-H changes since then.

Setup experiment environments

At the TPC-H web page on the right side, source codes are available that will generate the test data. It also contains the table definitions.

For MySQL, similar experiments have done by Ming 1 which refers 2. I think that Ming’s post has several errors that we will describe in detail here. For other databases, please visit 3.

Compile source codes to get dbgen.exe and qgen.exe

I have tested TPC-H under MS-Windows XP. So related environments will be confined to Windows XP. Download source codes from 4. You should read README as its name asks you to do so. Then modify or consult makefile.suite file to compile the source code to get dbgen.exe and qgen.exe. Since MySQL is not a TPC supported database, you need some trick to compile these. Refer Ming’s post 5 for tips. Or if you are a lazy boy, then just download these that I self-compiled Media:tcp-h_windows.zip. Remember that you run this program at your own risk as most software you download from the Web means to. In this distribution, dbgen (which is the data generator) is located at the root. Qgen (which is the query generator) is placed under queries directory.

Generate data files

Usage: dbgen -s 1

1 is a scale factor representing the size of database (1 = 1 GB data). Change it to your preferred size. You know in the real TPC-H experiment, this number easily goes up to millions but not for personal laptop that I use for this experiment to check the MySQL TPC-H compatibility. So please bear with the small scale. Anyway, information here is scalable to the big scale without much modification. dbgen will generate 8 tbl (table data) files. Keep it at where you can remember.

Setup MySQL

Open MySQL client and create the database for TCP-H simulation. Now create the tables that are defined in dss.ddl (for table creation) and dss.ri (for key creation). Both files are included in the source distribution. Or just run below commands. If you prefer MyISAM as your table engine, then attach ENGINE=MyISAM at the end of each statement.

CREATE TABLE NATION  ( N_NATIONKEY  INTEGER NOT NULL,
                            N_NAME       CHAR(25) NOT NULL,
                            N_REGIONKEY  INTEGER NOT NULL,
                            N_COMMENT    VARCHAR(152));

CREATE TABLE REGION  ( R_REGIONKEY  INTEGER NOT NULL,
                            R_NAME       CHAR(25) NOT NULL,
                            R_COMMENT    VARCHAR(152));

CREATE TABLE PART  ( P_PARTKEY     INTEGER NOT NULL,
                          P_NAME        VARCHAR(55) NOT NULL,
                          P_MFGR        CHAR(25) NOT NULL,
                          P_BRAND       CHAR(10) NOT NULL,
                          P_TYPE        VARCHAR(25) NOT NULL,
                          P_SIZE        INTEGER NOT NULL,
                          P_CONTAINER   CHAR(10) NOT NULL,
                          P_RETAILPRICE DECIMAL(15,2) NOT NULL,
                          P_COMMENT     VARCHAR(23) NOT NULL );

CREATE TABLE SUPPLIER ( S_SUPPKEY     INTEGER NOT NULL,
                             S_NAME        CHAR(25) NOT NULL,
                             S_ADDRESS     VARCHAR(40) NOT NULL,
                             S_NATIONKEY   INTEGER NOT NULL,
                             S_PHONE       CHAR(15) NOT NULL,
                             S_ACCTBAL     DECIMAL(15,2) NOT NULL,
                             S_COMMENT     VARCHAR(101) NOT NULL);

CREATE TABLE PARTSUPP ( PS_PARTKEY     INTEGER NOT NULL,
                             PS_SUPPKEY     INTEGER NOT NULL,
                             PS_AVAILQTY    INTEGER NOT NULL,
                             PS_SUPPLYCOST  DECIMAL(15,2)  NOT NULL,
                             PS_COMMENT     VARCHAR(199) NOT NULL );

CREATE TABLE CUSTOMER ( C_CUSTKEY     INTEGER NOT NULL,
                             C_NAME        VARCHAR(25) NOT NULL,
                             C_ADDRESS     VARCHAR(40) NOT NULL,
                             C_NATIONKEY   INTEGER NOT NULL,
                             C_PHONE       CHAR(15) NOT NULL,
                             C_ACCTBAL     DECIMAL(15,2)   NOT NULL,
                             C_MKTSEGMENT  CHAR(10) NOT NULL,
                             C_COMMENT     VARCHAR(117) NOT NULL);

CREATE TABLE ORDERS  ( O_ORDERKEY       INTEGER NOT NULL,
                           O_CUSTKEY        INTEGER NOT NULL,
                           O_ORDERSTATUS    CHAR(1) NOT NULL,
                           O_TOTALPRICE     DECIMAL(15,2) NOT NULL,
                           O_ORDERDATE      DATE NOT NULL,
                           O_ORDERPRIORITY  CHAR(15) NOT NULL,
                           O_CLERK          CHAR(15) NOT NULL,
                           O_SHIPPRIORITY   INTEGER NOT NULL,
                           O_COMMENT        VARCHAR(79) NOT NULL);

CREATE TABLE LINEITEM ( L_ORDERKEY    INTEGER NOT NULL,
                             L_PARTKEY     INTEGER NOT NULL,
                             L_SUPPKEY     INTEGER NOT NULL,
                             L_LINENUMBER  INTEGER NOT NULL,
                             L_QUANTITY    DECIMAL(15,2) NOT NULL,
                             L_EXTENDEDPRICE  DECIMAL(15,2) NOT NULL,
                             L_DISCOUNT    DECIMAL(15,2) NOT NULL,
                             L_TAX         DECIMAL(15,2) NOT NULL,
                             L_RETURNFLAG  CHAR(1) NOT NULL,
                             L_LINESTATUS  CHAR(1) NOT NULL,
                             L_SHIPDATE    DATE NOT NULL,
                             L_COMMITDATE  DATE NOT NULL,
                             L_RECEIPTDATE DATE NOT NULL,
                             L_SHIPINSTRUCT CHAR(25) NOT NULL,
                             L_SHIPMODE     CHAR(10) NOT NULL,
                             L_COMMENT      VARCHAR(44) NOT NULL);

Now its time to import data. Open the MySQL console (If you don’t know what MySQL console is, then plz Google it and do not email me!) and run below commands. Don’t forget to modify Yourdirectory to comply with your directory where you stored tbl files. Still don’t get it? Your tbl files are located at where you unzipped dbgen program or at some place I told you to remember :)

load data infile "Yourdirectory/part.tbl" into table part fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/partsupp.tbl" into table partsupp fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/customer.tbl" into table customer fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/nation.tbl" into table nation fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/orders.tbl" into table orders fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/region.tbl" into table region fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/supplier.tbl" into table supplier fields terminated by "|" lines terminated by "\r\n";
load data infile "Yourdirectory/lineitem.tbl" into table lineitem fields terminated by "|" lines terminated by "\r\n";

Let’s add keys to tables. Belows are default keys specified in dss.ri.

ALTER TABLE REGION
ADD PRIMARY KEY (R_REGIONKEY);

ALTER TABLE NATION
ADD PRIMARY KEY (N_NATIONKEY),
ADD PRIMARY KEY (N_REGIONKEY);

ALTER TABLE CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY),
ADD KEY (C_NATIONKEY);

ALTER TABLE SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY),
ADD KEY (S_NATIONKEY);

ALTER TABLE PART
ADD PRIMARY KEY (P_PARTKEY);

ALTER TABLE PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY),
ADD KEY (PS_PARTKEY),
ADD KEY (PS_SUPPKEY);

ALTER TABLE ORDERS
ADD PRIMARY KEY (O_ORDERKEY),
ADD KEY (O_CUSTKEY);

ALTER TABLE LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER),
ADD KEY (L_ORDERKEY),
ADD KEY (L_PARTKEY,L_SUPPKEY);

If you are using the InnoDB engine and want to test TPC-H under transaction environment, then specify the foreign key like below. Be aware to the order of statements that should comply with the foreign key order.


ALTER TABLE NATION
ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION (R_REGIONKEY);

ALTER TABLE SUPPLIER
ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION (N_NATIONKEY);

ALTER TABLE CUSTOMER
ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION (N_NATIONKEY);

ALTER TABLE PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER (S_SUPPKEY),
ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART (P_PARTKEY);

ALTER TABLE ORDERS
ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER (C_CUSTKEY);

ALTER TABLE LINEITEM
ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY)  references ORDERS (O_ORDERKEY),
ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references PARTSUPP (PS_PARTKEY,PS_SUPPKEY);

Setup MonetDB

Up to above, I created the enviornment for MySQL database. On this section I am talking about MonetDB. The reason that I decided to put this db is because it is column-based database specifically good for data-ware housing. You would see the speed comparison between MySQL and MonetDB at the end. If not interested in MonetDB, jump to the below section.

Create tables modified for MonetDB table creation specification.

COPY RECORDS INTO region FROM "Yourdirectory/region.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO nation FROM "Yourdirectory/nation.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO part FROM "Yourdirectory/part.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO supplier FROM "Yourdirectory/supplier.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO customer FROM "Yourdirectory/customer.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO partsupp FROM "Yourdirectory/partsupp.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO orders FROM "Yourdirectory/orders.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
COPY RECORDS INTO lineitem FROM "Yourdirectory/lineitem.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';

MonetDB uses slightly different grammar to setup foreign key

ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER (S_SUPPKEY);
ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_PARTKEY) references PART (P_PARTKEY);
ALTER TABLE ORDERS ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER (C_CUSTKEY);
ALTER TABLE LINEITEM ADD FOREIGN KEY (L_ORDERKEY)  references ORDERS (O_ORDERKEY);
ALTER TABLE LINEITEM ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP (PS_PARTKEY, PS_SUPPKEY);

Get query statements

Since you already got qgen, you can set its option to generate the query as you want. Or just use below commands to get the default queries. The output of default queries can be verified with that of 6 to see whether your TPC-H database works as expected.

qgen -d 1 >  default_1.sql
qgen -d 2 >  default_2.sql
qgen -d 3 >  default_3.sql
qgen -d 4 >  default_4.sql
qgen -d 5 >  default_5.sql
qgen -d 6 >  default_6.sql
qgen -d 7 >  default_7.sql
qgen -d 8 >  default_8.sql
qgen -d 9 >  default_9.sql
qgen -d 10 >  default_10.sql
qgen -d 11 >  default_11.sql
qgen -d 12 >  default_12.sql
qgen -d 13 >  default_13.sql
qgen -d 14 >  default_14.sql
qgen -d 15 >  default_15.sql
qgen -d 16 >  default_16.sql
qgen -d 17 >  default_17.sql
qgen -d 18 >  default_18.sql
qgen -d 19 >  default_19.sql
qgen -d 20 >  default_20.sql
qgen -d 21 >  default_21.sql
qgen -d 22 >  default_22.sql

If you look at the default SQL files, you will see some problems that every statement includes limit after semicolon. This invalidates the statement. So you have to modify it slightly to move limit statement back into the body. In doing so, if the limit has -1 then just delete that line. Then in my experiment on MySQL v. 5.1.23, only Q13 needs some modification to run at MySQL, which is contradiction to Ming’s comments.

Here is the modified Q13.

select
    c_count,
    count(*) as custdist
from
    (
        select
            c_custkey,
            count(o_orderkey) as c_count
        from
            customer left outer join orders on
                c_custkey = o_custkey
                and o_comment not like '%special%requests%'
        group by
            c_custkey
    ) as c_orders
group by
    c_count
order by
    custdist desc,
    c_count desc;

Or I will just give away all default queries. See Project/EFIM/TPC-H default queries.

Experiment results

My laptop is Lenovo T61, Intel Core(TM)2 Duo CPU, T 8300 @ 2.4GHz with 3GB of RAM. MySQL version is 5.1.23-rc-community server on top of Microsoft Windows XP Service pack 3. I performed this experiment with more than 10 applications at the background and in addition I was reading some on-line newspaper (See not looking serious :) Thus there could be some CPU cut-in moments by some background process but I do not think that affects the result that much. Real server configurations for serious benchmarking will not be like this. They mostly have a highly-tunned OS components optimized only to run the database server best. For this experiment, the purpose is for relative comparison using different scales and between MySQL and MonetDB.

TPC-H Scale 1

Query   Test in seconds
Q1  31.08
Q2  330.5
Q3  228.4
Q4  3.03
Q5  330.94
Q6  5.31
Q7  67.83
Q8  7.95
Q9  165.22
Q10 27.45
Q11 1.11
Q12 6.14
Q13 24.66
Q14 104.47
Q15 10.38
Q16 21.83
Q17 4.86
Q18 > 1 hour, Give up!
Q19 3.84
Q20 6.34
Q21 334.69
Q22 0.98

For TPC-H scale 0.01, we perform the test for two databases for comparison. MonetDB is well-known for its column-based engine. In the experiment, MonetDB outperforms the MySQL except for Q16 that needs to create the view and manage data on the view.

TPC-H Scale 0.01

    MySQL   MonetDB
Q1  0.437   0.015
Q2  0.719   0.016
Q3  0.5 0.078
Q4  0.093   0.062
Q5  1.969   0.032
Q6  0.063   0.0001
Q7  0.587   0.016
Q8  0.078   0.016
Q9  0.671   0.047
Q10 0.25    0.125
Q11 0.047   0.0001
Q12 0.078   0.016
Q13 0.266   0.188
Q14 0.906   0.062
Q15 0.172   0.313
Q16 0.25    0.0001
Q17 null    null
Q18 inf 0.062
Q19 0.047   0.016
Q20 0.031   0.016
Q21 1.984   0.031
Q22 0.062   0.016

Review

My result is very slow compared to MonetDB TPC-H report that compares their performance with MySQL and PostgreSQL. See the environment difference! Anyway I described this experiment here to help someone out there who wants to setup TPC-H benchmark environment on top of MS-Windows and MySQL combination for FUN since I found no easy or good set up instruction on the Web.

OK. Mine is slow. Then how to get a fast database system?

Money will give you a fast system. You know that.

Using MySQL, there are very interesting systems out there. One I am surprised is the product of KickFire. Their outputs reported at TPC 7 achieved the lowest Price/Performance 8. However, Kickfire uses proprietary custom hardware.

Then any fastest open-source database can we get? MonetDB compares their performance with other open-source databases using TPC-H experiment sets at 9. One more database of my interest which is less popular compared to other database systems is C-Store database at 10.

References

  1. TPC-H
  2. TPC-H specification
  3. TPC-H Run on MySQL 5.1 and 6.0 11 12.
  4. MonetDB TPC-H report