TPC-H is an ad-hoc, decision support benchmark. TPC benchmarks are widely used today in evaluating the performance of relational database systems; the results are published on the TPC web site. TPC-H provides the detail specification on the benchmark.
Please be aware that this page is the record of the expriment done on Sept. 2009. I have not checked any TPC-H changes since then.
I have tested TPC-H under MS-Windows XP. So related environments will be confined to Windows XP. Download source codes from 4. You should read README as its name asks you to do so. Then modify or consult makefile.suite file to compile the source code to get dbgen.exe and qgen.exe. Since MySQL is not a TPC supported database, you need some trick to compile these. Refer Ming’s post 5 for tips. Or if you are a lazy boy, then just download these that I self-compiled Media:tcp-h_windows.zip. Remember that you run this program at your own risk as most software you download from the Web means to. In this distribution, dbgen (which is the data generator) is located at the root. Qgen (which is the query generator) is placed under queries directory.
Usage: dbgen -s 1
1 is a scale factor representing the size of database (1 = 1 GB data). Change it to your preferred size. You know in the real TPC-H experiment, this number easily goes up to millions but not for personal laptop that I use for this experiment to check the MySQL TPC-H compatibility. So please bear with the small scale. Anyway, information here is scalable to the big scale without much modification. dbgen will generate 8 tbl (table data) files. Keep it at where you can remember.
Open MySQL client and create the database for TCP-H simulation. Now create the tables that are defined in dss.ddl (for table creation) and dss.ri (for key creation). Both files are included in the source distribution. Or just run below commands. If you prefer MyISAM as your table engine, then attach ENGINE=MyISAM at the end of each statement.
CREATE TABLE NATION ( N_NATIONKEY INTEGER NOT NULL, N_NAME CHAR(25) NOT NULL, N_REGIONKEY INTEGER NOT NULL, N_COMMENT VARCHAR(152)); CREATE TABLE REGION ( R_REGIONKEY INTEGER NOT NULL, R_NAME CHAR(25) NOT NULL, R_COMMENT VARCHAR(152)); CREATE TABLE PART ( P_PARTKEY INTEGER NOT NULL, P_NAME VARCHAR(55) NOT NULL, P_MFGR CHAR(25) NOT NULL, P_BRAND CHAR(10) NOT NULL, P_TYPE VARCHAR(25) NOT NULL, P_SIZE INTEGER NOT NULL, P_CONTAINER CHAR(10) NOT NULL, P_RETAILPRICE DECIMAL(15,2) NOT NULL, P_COMMENT VARCHAR(23) NOT NULL ); CREATE TABLE SUPPLIER ( S_SUPPKEY INTEGER NOT NULL, S_NAME CHAR(25) NOT NULL, S_ADDRESS VARCHAR(40) NOT NULL, S_NATIONKEY INTEGER NOT NULL, S_PHONE CHAR(15) NOT NULL, S_ACCTBAL DECIMAL(15,2) NOT NULL, S_COMMENT VARCHAR(101) NOT NULL); CREATE TABLE PARTSUPP ( PS_PARTKEY INTEGER NOT NULL, PS_SUPPKEY INTEGER NOT NULL, PS_AVAILQTY INTEGER NOT NULL, PS_SUPPLYCOST DECIMAL(15,2) NOT NULL, PS_COMMENT VARCHAR(199) NOT NULL ); CREATE TABLE CUSTOMER ( C_CUSTKEY INTEGER NOT NULL, C_NAME VARCHAR(25) NOT NULL, C_ADDRESS VARCHAR(40) NOT NULL, C_NATIONKEY INTEGER NOT NULL, C_PHONE CHAR(15) NOT NULL, C_ACCTBAL DECIMAL(15,2) NOT NULL, C_MKTSEGMENT CHAR(10) NOT NULL, C_COMMENT VARCHAR(117) NOT NULL); CREATE TABLE ORDERS ( O_ORDERKEY INTEGER NOT NULL, O_CUSTKEY INTEGER NOT NULL, O_ORDERSTATUS CHAR(1) NOT NULL, O_TOTALPRICE DECIMAL(15,2) NOT NULL, O_ORDERDATE DATE NOT NULL, O_ORDERPRIORITY CHAR(15) NOT NULL, O_CLERK CHAR(15) NOT NULL, O_SHIPPRIORITY INTEGER NOT NULL, O_COMMENT VARCHAR(79) NOT NULL); CREATE TABLE LINEITEM ( L_ORDERKEY INTEGER NOT NULL, L_PARTKEY INTEGER NOT NULL, L_SUPPKEY INTEGER NOT NULL, L_LINENUMBER INTEGER NOT NULL, L_QUANTITY DECIMAL(15,2) NOT NULL, L_EXTENDEDPRICE DECIMAL(15,2) NOT NULL, L_DISCOUNT DECIMAL(15,2) NOT NULL, L_TAX DECIMAL(15,2) NOT NULL, L_RETURNFLAG CHAR(1) NOT NULL, L_LINESTATUS CHAR(1) NOT NULL, L_SHIPDATE DATE NOT NULL, L_COMMITDATE DATE NOT NULL, L_RECEIPTDATE DATE NOT NULL, L_SHIPINSTRUCT CHAR(25) NOT NULL, L_SHIPMODE CHAR(10) NOT NULL, L_COMMENT VARCHAR(44) NOT NULL);
Now its time to import data. Open the MySQL console (If you don’t know what MySQL console is, then plz Google it and do not email me!) and run below commands. Don’t forget to modify Yourdirectory to comply with your directory where you stored tbl files. Still don’t get it? Your tbl files are located at where you unzipped dbgen program or at some place I told you to remember :)
load data infile "Yourdirectory/part.tbl" into table part fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/partsupp.tbl" into table partsupp fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/customer.tbl" into table customer fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/nation.tbl" into table nation fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/orders.tbl" into table orders fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/region.tbl" into table region fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/supplier.tbl" into table supplier fields terminated by "|" lines terminated by "\r\n"; load data infile "Yourdirectory/lineitem.tbl" into table lineitem fields terminated by "|" lines terminated by "\r\n";
Let’s add keys to tables. Belows are default keys specified in dss.ri.
ALTER TABLE REGION ADD PRIMARY KEY (R_REGIONKEY); ALTER TABLE NATION ADD PRIMARY KEY (N_NATIONKEY), ADD PRIMARY KEY (N_REGIONKEY); ALTER TABLE CUSTOMER ADD PRIMARY KEY (C_CUSTKEY), ADD KEY (C_NATIONKEY); ALTER TABLE SUPPLIER ADD PRIMARY KEY (S_SUPPKEY), ADD KEY (S_NATIONKEY); ALTER TABLE PART ADD PRIMARY KEY (P_PARTKEY); ALTER TABLE PARTSUPP ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY), ADD KEY (PS_PARTKEY), ADD KEY (PS_SUPPKEY); ALTER TABLE ORDERS ADD PRIMARY KEY (O_ORDERKEY), ADD KEY (O_CUSTKEY); ALTER TABLE LINEITEM ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER), ADD KEY (L_ORDERKEY), ADD KEY (L_PARTKEY,L_SUPPKEY);
If you are using the InnoDB engine and want to test TPC-H under transaction environment, then specify the foreign key like below. Be aware to the order of statements that should comply with the foreign key order.
ALTER TABLE NATION ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references REGION (R_REGIONKEY); ALTER TABLE SUPPLIER ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references NATION (N_NATIONKEY); ALTER TABLE CUSTOMER ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references NATION (N_NATIONKEY); ALTER TABLE PARTSUPP ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references SUPPLIER (S_SUPPKEY), ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references PART (P_PARTKEY); ALTER TABLE ORDERS ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references CUSTOMER (C_CUSTKEY); ALTER TABLE LINEITEM ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references ORDERS (O_ORDERKEY), ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references PARTSUPP (PS_PARTKEY,PS_SUPPKEY);
Up to above, I created the enviornment for MySQL database. On this section I am talking about MonetDB. The reason that I decided to put this db is because it is column-based database specifically good for data-ware housing. You would see the speed comparison between MySQL and MonetDB at the end. If not interested in MonetDB, jump to the below section.
Create tables modified for MonetDB table creation specification.
COPY RECORDS INTO region FROM "Yourdirectory/region.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO nation FROM "Yourdirectory/nation.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO part FROM "Yourdirectory/part.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO supplier FROM "Yourdirectory/supplier.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO customer FROM "Yourdirectory/customer.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO partsupp FROM "Yourdirectory/partsupp.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO orders FROM "Yourdirectory/orders.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n'; COPY RECORDS INTO lineitem FROM "Yourdirectory/lineitem.tbl" DELIMITERS tuple_seperator '|' record_seperator '\r\n';
MonetDB uses slightly different grammar to setup foreign key
ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_SUPPKEY) references SUPPLIER (S_SUPPKEY); ALTER TABLE PARTSUPP ADD FOREIGN KEY (PS_PARTKEY) references PART (P_PARTKEY); ALTER TABLE ORDERS ADD FOREIGN KEY (O_CUSTKEY) references CUSTOMER (C_CUSTKEY); ALTER TABLE LINEITEM ADD FOREIGN KEY (L_ORDERKEY) references ORDERS (O_ORDERKEY); ALTER TABLE LINEITEM ADD FOREIGN KEY (L_PARTKEY,L_SUPPKEY) references PARTSUPP (PS_PARTKEY, PS_SUPPKEY);
Since you already got qgen, you can set its option to generate the query as you want. Or just use below commands to get the default queries. The output of default queries can be verified with that of 6 to see whether your TPC-H database works as expected.
qgen -d 1 > default_1.sql qgen -d 2 > default_2.sql qgen -d 3 > default_3.sql qgen -d 4 > default_4.sql qgen -d 5 > default_5.sql qgen -d 6 > default_6.sql qgen -d 7 > default_7.sql qgen -d 8 > default_8.sql qgen -d 9 > default_9.sql qgen -d 10 > default_10.sql qgen -d 11 > default_11.sql qgen -d 12 > default_12.sql qgen -d 13 > default_13.sql qgen -d 14 > default_14.sql qgen -d 15 > default_15.sql qgen -d 16 > default_16.sql qgen -d 17 > default_17.sql qgen -d 18 > default_18.sql qgen -d 19 > default_19.sql qgen -d 20 > default_20.sql qgen -d 21 > default_21.sql qgen -d 22 > default_22.sql
If you look at the default SQL files, you will see some problems that every statement includes limit after semicolon. This invalidates the statement. So you have to modify it slightly to move limit statement back into the body. In doing so, if the limit has -1 then just delete that line. Then in my experiment on MySQL v. 5.1.23, only Q13 needs some modification to run at MySQL, which is contradiction to Ming’s comments.
Here is the modified Q13.
select c_count, count(*) as custdist from ( select c_custkey, count(o_orderkey) as c_count from customer left outer join orders on c_custkey = o_custkey and o_comment not like '%special%requests%' group by c_custkey ) as c_orders group by c_count order by custdist desc, c_count desc;
Or I will just give away all default queries. See Project/EFIM/TPC-H default queries.
My laptop is Lenovo T61, Intel Core(TM)2 Duo CPU, T 8300 @ 2.4GHz with 3GB of RAM. MySQL version is 5.1.23-rc-community server on top of Microsoft Windows XP Service pack 3. I performed this experiment with more than 10 applications at the background and in addition I was reading some on-line newspaper (See not looking serious :) Thus there could be some CPU cut-in moments by some background process but I do not think that affects the result that much. Real server configurations for serious benchmarking will not be like this. They mostly have a highly-tunned OS components optimized only to run the database server best. For this experiment, the purpose is for relative comparison using different scales and between MySQL and MonetDB.
Query Test in seconds Q1 31.08 Q2 330.5 Q3 228.4 Q4 3.03 Q5 330.94 Q6 5.31 Q7 67.83 Q8 7.95 Q9 165.22 Q10 27.45 Q11 1.11 Q12 6.14 Q13 24.66 Q14 104.47 Q15 10.38 Q16 21.83 Q17 4.86 Q18 > 1 hour, Give up! Q19 3.84 Q20 6.34 Q21 334.69 Q22 0.98
For TPC-H scale 0.01, we perform the test for two databases for comparison. MonetDB is well-known for its column-based engine. In the experiment, MonetDB outperforms the MySQL except for Q16 that needs to create the view and manage data on the view.
MySQL MonetDB Q1 0.437 0.015 Q2 0.719 0.016 Q3 0.5 0.078 Q4 0.093 0.062 Q5 1.969 0.032 Q6 0.063 0.0001 Q7 0.587 0.016 Q8 0.078 0.016 Q9 0.671 0.047 Q10 0.25 0.125 Q11 0.047 0.0001 Q12 0.078 0.016 Q13 0.266 0.188 Q14 0.906 0.062 Q15 0.172 0.313 Q16 0.25 0.0001 Q17 null null Q18 inf 0.062 Q19 0.047 0.016 Q20 0.031 0.016 Q21 1.984 0.031 Q22 0.062 0.016
My result is very slow compared to MonetDB TPC-H report that compares their performance with MySQL and PostgreSQL. See the environment difference! Anyway I described this experiment here to help someone out there who wants to setup TPC-H benchmark environment on top of MS-Windows and MySQL combination for FUN since I found no easy or good set up instruction on the Web.
Money will give you a fast system. You know that.
Using MySQL, there are very interesting systems out there. One I am surprised is the product of KickFire. Their outputs reported at TPC 7 achieved the lowest Price/Performance 8. However, Kickfire uses proprietary custom hardware.
Then any fastest open-source database can we get? MonetDB compares their performance with other open-source databases using TPC-H experiment sets at 9. One more database of my interest which is less popular compared to other database systems is C-Store database at 10.