InfiniSQL™ Guide

InfiniSQL™ Guide

2013-12-14 18:10:31


Preface

This is a set of guides for using InfiniSQL. Starting with preparing a suitable environment, then on to acquiring, building, running and testing. Performance optimization and benchmark testing using bundled tools is also explained.

Part I. Getting Started Guide

This document can walk you through all the steps necessary to have a functioning InfiniSQL system up and running.

Chapter 1. License

InfiniSQL is free software licenced under the the GPL version 3..

Chapter 2. Requirements

Table of Contents

Installing Dependencies

InfiniSQL runs on Linux x86_64.

Building InfiniSQL has the following dependencies:

  • GCC g++ >= 4.7

  • Autotools (automake, autoconf, libtool)

  • bison & flex

The InfiniSQL server daemon has the following dependencies:

  • Crypto++ Library

  • Perl Compatible Regular Expressions

  • 0mq

  • Boost Headers >= 1.53

  • Message Pack

  • Lockless Allocator

  • LZ4

Python modules:

  • argparse

  • msgpack-python

  • pyzmq

Connecting as a client requires PostgreSQL client libraries >=7.4 for any given platform, available from PostgreSQL.

Using bundled test scripts, such as scripts/regression.pl, requires perl modules:

  • DBD::Pg

  • Data::MessagePack

To benchmark, the patch misc/pgbench_persecond-v1.patch must be applied to PostgreSQL 9.2.4.

Installing Dependencies

InfiniSQL should run on any x86_64 Linux distribution assuming the dependencies are met. There is a script in the source distribution that will install the necessary dependencies for a Debian-based system: https://raw.github.com/infinisql/infinisql/master/scripts/install/debian-prequisites.py

The remainder of this section provides some installation details for Centos 6, but it may be worth reading even if your preferred distribution is something else.

Centos 6 doesn't include an appropriate version of GCC by default. The best way to get a proper version is to install the Devtoolset. As root:

yum install wget
cd /etc/yum.repos.d
wget http://people.centos.org/tru/devtools-1.1/devtools-1.1.repo 
yum install devtoolset-1.1-gcc devtoolset-1.1-gcc-c++ devtoolset-1.1 binutils devtoolset-1.1-gdb  

Then, create a user for running InfiniSQL, such as infinisql. Append the following to their login script, such as ~/.bash_profile:

exec /usr/bin/scl enable devtoolset-1.1 bash

Install the EPEL yum repository, still as root:

cd
wget http://dl.fedoraproject.org/pub/epel/6/x86_64/epel-release-6-8.noarch.rpm
wget http://rpms.famillecollet.com/enterprise/remi-release-6.rpm
rpm -Uvh remi-release-6*.rpm epel-release-6*.rpm

Install packages for most of the above dependencies:

yum install cryptopp-devel pcre-devel zeromq3-devel python-devel flex bison python-pip cpan libtool readline-devel zlib-devel make ruby patch git gcc-c++ cmake msgpack-devel git perl-DBD-Pg perl-Module-Build perl-Test-Requires postgresql

Note

Do not install Boost, Lockless Allocator, or LZ4 from yum. Instead, try the following.

InfiniSQL does not compile successfully when using Boost headers installed via yum. Later versions of Boost, such as >= 1.53, allow InfiniSQL to compile successfully.

wget http://sourceforge.net/projects/boost/files/boost/1.54.0/boost_1_54_0.tar.gz/download
tar xzf boost_1_54_0.tar.gz
sudo cp -r boost_1_54_0/boost /usr/local/include  

The allocator from Lockless, Inc. isn't available as an RPM package, but is very easy to acquire and install.

wget http://locklessinc.com/downloads/lockless_allocator_linux.tgz
tar xzf lockless_allocator_linux.tgz
sudo cp lockless_allocator_linux/64bit/* /usr/local/lib  

There's no Centos package for an LZ4 library. One needs to be built by hand. Download source from here.

cd <lz4srcdir>/cmake
cmake -DBUILD_SHARED_LIBS:BOOL=ON
make
sudo make install

Note

LZ4 uses gcc's -march=native to determine CPU architecture. If compiled in a virtual machine such as KVM on Centos, make sure to set the CPU architecture explicitly to match host CPU configuration, or some other valid 64bit architecture. Otherwise, gcc may become mistaken about CPU type and compiling LZ4 will fail.

Install python modules, as root:

python-pip install argparse
python-pip install msgpack-python
python-pip install pyzmq  

Perl modules, using CPAN:

cpan
install Data::MessagePack
exit  

Make sure ld.so can find the new libraries, as root:

echo /usr/local/lib > /etc/ld.so.conf.d/userlocal.conf
ldconfig

Apply some network tuning parameters, as root:

cat >> /etc/sysctl.conf <<EOF
net.ipv4.tcp_timestamps=1
net.ipv4.tcp_sack=1
net.core.netdev_max_backlog=250000
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.core.optmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216  
EOF
sysctl -f

Increase file descriptor limits and allow user programs to escalate scheduling privileges, as root:

cat >> /etc/security/limits.conf <<EOF
* soft nofile 1048576
* hard nofile 1048576
* soft rtprio 30
* hard rtprio 30
EOF

Log out of all sessions, and log back in for the new limits to take effect.

Chapter 3. Getting & Building

InfiniSQL source is acquired and built as follows. As a non-root user, such as infinisql, clone the repository from GitHub:

git clone git://github.com/infinisql/infinisql.git

Enter into sourcedir, and checkout a particular version (or skip this step if you want to work from HEAD):

git checkout v0.1.2-alpha

Build and install in some <installationdir>:

libtoolize
autoreconf --force --install
./configure --prefix=<installationdir>
make
make install
cd procs
./makem.sh <installationdir>

Binaries and test stored procedures have now been installed under <installationdir>.

Chapter 4. Running

InfiniSQL uses passwordless ssh to log in to each host that runs infinisqld. Create a keyfile using ssh-keygen with no passphrase. Make sure the public key is contained in ~/.ssh/authorized_keys on each host. The default filesystem location for the key is set in the config parameter sshkey.

If infinisqld needs to dump core, it will do so in one of two places. If config parameter mgmthost is set to some value, then core dumps will show up in the home directory of the username configured for infinisqld. If mgmthost is blank, then core files will occur in <sourcedir/infinisqlmgr>, which is where InfiniSQL is launched. Make sure to allow core files of any size in your shell startup file with "ulimit -c unlimited". Make sure that the ulimit command appears in a line before /usr/bin/scl enable ..., if exists.

Once proper authentication is set up, simply do the following to run a single node instance of InfiniSQL, on the same host as your <sourcedir> (the supplied infinisqlmgr.conf-sample is configured to run a single infinisqld node on localhost):

cp <sourcedir>/etc/infinisqlmgr.conf-sample <sourcedir>etc/infinisqlmgr.conf
cd <sourcedir>/infinisqlmgr
./infinisqlmgr.py

There should be a single infinisqld process running. Logs are written to <installationdir>/var. To stop the server, "pkill infinisqld".

InfiniSQL becomes interesting once multiple nodes are set up. Deploy the installation files using <sourcedir>/scripts/benchmark/deploy.sh, and configure any number of nodes and hosts by referring to the Configuration Reference.

Chapter 5. Testing

Regression Test

Edit <sourcedir>/scripts/infinisql.plib and set $SCRIPTDIR and $PROCDIR to <sourcedir>/scripts and <installationdir>/procs, respectively. Functional behavior should be identical regardless of the quantity of nodes. With a running cluster, do the following:

cd <sourcedir>/scripts
./prime_regression.pl -h localhost

The output should look a lot like the following, though will vary from version to version:

logging in
17	17	0

createdomain texas:
19	19	0
0
creating user
19	19	0
0
logout
connecting
logging in domain texas:
17	17	0

creating schema
17	17	0

createtable:
19	19	0
1
add column1
19	19	0
0
add column2
19	19	0
1
add column3
19	19	0
2
creating table manycolumns
19	19	0
2
19	19	0
0
19	19	0
1
19	19	0
2
19	19	0
3
19	19	0
4
19	19	0
5
19	19	0
6
19	19	0
7
19	19	0
8
19	19	0
9
20	20	0
10
20	20	0
11
20	20	0
12
20	20	0
13
20	20	0
14
20	20	0
15
20	20	0
16
20	20	0
17
20	20	0
18
20	20	0
19
20	20	0
20
20	20	0
21
20	20	0
22
20	20	0
23
creating table liketable
19	19	0
3
19	19	0
0
19	19	0
1
creating table nulltable
19	19	0
4
19	19	0
0
19	19	0
1
19	19	0
2
19	19	0
3
19	19	0
4
19	19	0
5
19	19	0
6
19	19	0
7
19	19	0
8
19	19	0
9
20	20	0
10
20	20	0
11
20	20	0
12
20	20	0
13
creating table nonuniquetable
19	19	0
5
19	19	0
0
19	19	0
1
19	19	0
2
19	19	0
3
19	19	0
4
19	19	0
5
19	19	0
6
19	19	0
7
19	19	0
8
19	19	0
9
20	20	0
10
20	20	0
11
20	20	0
12
creating tables for update/delete testing
19	19	0
6
19	19	0
0
19	19	0
1
19	19	0
7
19	19	0
0
19	19	0
1
19	19	0
8
19	19	0
0
19	19	0
1
19	19	0
9
19	19	0
0
19	19	0
1
20	20	0
10
19	19	0
0
19	19	0
1
20	20	0
11
19	19	0
0
19	19	0
1
20	20	0
12
19	19	0
0
19	19	0
1
20	20	0
13
19	19	0
0
19	19	0
1
20	20	0
14
19	19	0
0
19	19	0
1
20	20	0
15
19	19	0
0
19	19	0
1
20	20	0
16
19	19	0
0
19	19	0
1
20	20	0
17
19	19	0
0
19	19	0
1
20	20	0
18
19	19	0
0
19	19	0
1
20	20	0
19
19	19	0
0
19	19	0
1
20	20	0
20
19	19	0
0
19	19	0
1
20	20	0
21
19	19	0
0
19	19	0
1
20	20	0
22
19	19	0
0
19	19	0
1
20	20	0
23
19	19	0
0
19	19	0
1
20	20	0
24
19	19	0
0
19	19	0
1
20	20	0
25
19	19	0
0
19	19	0
1
20	20	0
26
19	19	0
0
19	19	0
1
20	20	0
27
19	19	0
0
19	19	0
1
creating table for transfer procedures
20	20	0
28
19	19	0
0
19	19	0
1
compiling statement debitbuyer:
	UPDATE accountstable SET balance = balance - :0 where accountid = :1
28	28	0
debitbuyer
compiling statement creditseller:
	UPDATE accountstable SET balance = balance + :0 where accountid = :1
30	30	0
creditseller
loading SqlProc procedure
17	17	0  

Next, run the SQL regression test and put the output in <somefile>:

./regression.pl -h localhost > <somefile>

The output can be compared with <sourcedir>/scripts/regression.out-sample. Check for daemon errors under <installationdir>/var.

More information about these scripts is in the Command Reference.

Command Line

PostgreSQL's psql command-line utility can be used to connect and query your InfiniSQL database. Look through prime_regression.pl to see the test user, domain and tables that have been created. Or, create some of your own using commands in the Command Reference. The way to connect using psql is as follows:

PGSSLMODE=disable psql -p <port> -h <hostname> -d <domainname> -U <username>

For the test database created by prime_regression.pl, port is 15432, hostname is localhost, domainname (equivalent to PostgreSQL dbname) is texas, username is mayor, and the password, when prompted, is austin.

After logging in, something like this should appear:


Password for user mayor: 
psql (9.2.4)
Type "help" for help.

texas=>   

Try a query: select * from mastertable;

  
 accountid | isactive | balance 
-----------+----------+---------
        15 | t        |    1500
        25 | t        |    2500
         5 | t        |     500
        35 | t        |    3500
        40 | t        |    4000
        30 | t        |    3000
        50 | t        |    5000
        10 | t        |    1000
        45 | t        |    4500
        20 | t        |    2000
(10 rows)

texas=>     
  

Part II. Benchmarking Guide

This guide is for benchmarking InfiniSQL using the tools provided in the source distribution. It assumes that you've already built InfiniSQL and have done functional testing to ensure that it's operating properly.

Chapter 6. About the Tools and Environment

Build PostgreSQL 9.2.4, as non-root user:

wget http://ftp.postgresql.org/pub/source/v9.2.4/postgresql-9.2.4.tar.gz
tar xzf postgresql-9.2.4.tar.gz
cd postgresql-9.2.4
./configure
make

Patch pgbench:

patch -p1 < <infinisqlsourcedir>/misc/pgbench_persecond-v1.patch
cd contrib/pgbench
make PG_LIBS='../../src/interfaces/libpq/libpq.a $(PTHREAD_LIBS)'
sudo cp pgbench /usr/local/bin  

The scripts for benchmarking InfiniSQL are in <sourcedir>scripts/benchmark. The Command and File Reference describes each of these tools. Many of them are quick and dirty scripts, so are meant to be read and altered as needs arise.

pgbench is the main performer for benchmarking. The reference manual describes the modifications made for InfiniSQL testing. Those modifications would also work for regular PostgreSQL testing, but at extreme volumes. bm.pl is configured to launch each pgbench process as a single thread with 200 database connections. That value is optimal on the Xeon-based bare metal servers which have been used mainly for InfiniSQL benchmarking. The total number of clients for a particular test is the product of the value passed as the -c parameter to pgbench in bm.pl, times the value passed to --nclients on the command line.

The test types bundled with the source code are defined in the *.pgb files in the benchmark directory. procedure.pgb performs the stored procedure in <sourcedir>/procs/PgbenchProc.cc. This performs 3 updates, a select, and an insert for every transaction. The likelihood of the 3 updates to occur on distinct hosts is very high, because the recordid being updated is randomized for each update. InfiniSQL distributes data across nodes by hash partitioning, so that causes this type of workload to make heavy use of intra-cluster network traffic. procedurenoinsert.pgb is identical to procedure.pgb, except that no insert is performed. It corresponds to PgbenchNoinsertProc.cc. Inserts cause memory consumption to go up, obviously, which makes long-running benchmarks difficult, especially in somewhat limited memory environments. setkey and getkey.pgb perform workloads equivalent to what a distributed key-store does. The random range in the pgbench scripts that correspond to recordid's should be equal to the number of accountid's inserted by bm.pl's fill test types. That range in procedure.pgb and procedurenoinsert.pgb is 1000000 (at least for v0.1.2-alpha). pgbench scripting can either use regular SQL statements, or stored procedures.

Setting up the environment is documented in the reference manual and Part I, “Getting Started Guide”.

The most important configuration parameters for performance are transactionagents, engines, ibgateways, and obgateways. They define the number of thread instances for each actor type. Too many instances, and performance suffers from excessive context switching. Too few, and CPUs go idle. Optimal performance for any particular workload will vary, but here are a few guidelines that should work for a large number of workloads. They are based on previous benchmarking efforts of InfiniSQL:

  • 2:1 ratio of transactionagents to engines.

  • transactionagents = the number of CPU threads. Ex. a 4-core Xeon with hyperthreading enabled has 8 visible threads. A 6-core Opteron has 6 threads.

  • If available, processor hyper-threading should be enabled.

  • Make sure to allow user-level programs to increase schedule priority (rtprio), as described above. This allows ibgateway and obgateway to increase their priorities, which increases throughput significantly.

  • configure lots of file descriptors

  • NICs with multiple RX queues are ideal. 10GBE is better than 1GB for latency, throughput, and host CPU utilization. One goal for InfiniSQL is to implement Infiniband Verbs for cluster communication, but that is a future activity.

Individual actor thread utilization can be observed using the -H flag with top, and then doing a stack trace (or cause core dump) and correlate thread PIDs. Inbound and Outbound Gateway threads will appear in top output with increase priority (if configured). Benchmarking the included PgbenchNoinsert stored procedure has shown that TransactionAgents generally use 2-3x more CPU than Engines, and gateway utilization varies based on a number of factors. For transactionagents and engines, it is better to have too many threads than too few. Gateways performance will tend to degrade if too many exist. Testing your workload on your environment is the best way to learn how to configure InfiniSQL for best performance.

Chapter 7. Sample Benchmark Run

The process of benchmarking any application requires environment configuration, test planning, and lots of iterative testing, data collection, and evaluation. InfiniSQL is no different. What follows is an example workflow for a single iteration using the tools provided in the source distribution.

  1. On the host from which you start InfiniSQL, edit <sourcedir>/etc/infinisqlmgr.conf to reflect the node topology.

  2. Edit <sourcedir>/scripts/benchmark/client, daemons, and daemonhosts to reflect client and daemon topology.

  3. Copy InfiniSQL binaries and sources to all hosts:

    cd <sourcedir>/scripts/benchmark
    ./deploy.sh <installationdir> <sourcedir>
    

  4. Start InfiniSQL:

    cd <sourcedir>/infinisqlmgr
    ./infinisqlmgr.py
    

  5. Start daemon host resource collection scripts (these scripts not included in source distribution--but are trivial to customize for yourself):

    cd <sourcedir>/scripts/benchmark
    ./collectcpudaemons.sh
    

  6. Prepare database:

    cd <sourcedir>/scripts/benchmark
    ./primebm.pl -h <someinfinisqldnodehostname>
    

  7. Fill 3 benchmark tables with 1,000,000 rows each:

    ./bm.pl --test=preparepgbench --accountids=1000000
    

    Wait for all perl fill.pl scripts to finish on each client host.

  8. Execute benchmark of PgbenchNoinsert stored procedure for 180 seconds and a total of 100,000 clients.

    ./bm.pl --test=procedurenoinsert --duration=180 --bmresultsdir=/home/infinisql/bmresults --nclients=500
    

    Wait for all pgbench processes to finish on each client host.

  9. Analyze the results.

    ./report.pl -t procedurenoinsert -s <unixsecondfrombm.pl>
    

  10. Stop InfiniSQL

    ./killdaemons.sh
    

  11. Grab daemon logs.

    ./getlogs <unixsecondfrom bm.pl>