SCICLUST Manual
Version 0.5
2004-07-09
Sven Hartrumpf © 2002-2004

Introduction

SCICLUST is a high-level and lean load balancer for computer clusters. It is not designed to reimplement secure and remote command execution, but relies on the availability of an established technique like the secure shell (ssh). So SCICLUST can concentrate on the actual load balancing and offer lots of parameters for fine tuning.

These are some of SCICLUST's advantages:

Of course, there are some disadvantages that might disqualify SCICLUST for your cluster:

Prerequisites

You must ensure that you can log in to all nodes of your cluster using ssh (or similar) without being prompted for a password or passphrase. The following documents explain good ways to achieve this:

One favorite combination is

Installation

The whole SCICLUST system is distributed as a tar file at this location:

http://pi7.fernuni-hagen.de/hartrumpf/sciclust/

After downloading the tar file, unpack it like this:

linuxbox0> tar xvfz sciclust-0.5.tar.gz

The binaries sciclust and sciclust_server (that match your architecture) must be on your PATH. Furthermore you must add configuration files for the server (local file ~/.sciclust or machine global file /etc/sciclust per machine) and for the client (local file ~/.sciclustc or machine global file /etc/sciclustc per machine). Take the configuration files distributed as a start. The only adaptions that are required is to add client nodes that are allowed to submit jobs after the keyword client-nodes in the server configuration file and to add all server nodes after the nodes keyword in the client configuration file. To add a node write down the numerical IP (or the host name) as a string enclosed between double quotes.

Usage

Start for each server node (here: linuxbox1, linuxbox2, ...) a SCICLUST server (sciclust_server) process, for example by using the script sciclust_add:

linuxbox0> sciclust_add linuxbox1
linuxbox0> sciclust_add linuxbox2
...

Instead of typing several commands you can adapt the script sciclust_add_nodes. A SCICLUST server process might write to standard output; therefore the output should be redirected to a file like it is done in the script sciclust_add.

Then, you can start a SCICLUST client, which is called just sciclust, for example

linuxbox0> sciclust hostname

(If the command after sciclust is missing, the hostname command is the default command.) If you want to follow a little bit the reasoning of SCICLUST, you can add debug output with the option -d:

linuxbox0> sciclust -d hostname

Add one or even two -d options, to get more information.

It is convenient to have one or more aliases for sciclust, e.g. add the following line ~/.bashrc (if you are you using bash as your login shell):

alias s="sciclust"

Then, to use your cluster for a command, just prepend s to it.

Advanced Usage

If you have a working cluster and feel familiar with it, you can try to change some settings (options) for the client and/or the server to influence the behavior of the cluster. The default value for an option is shown in parentheses after the name of the option.

For understanding the options, one should know how SCICLUST selects a node for the execution of a job submitted using sciclust. A SCICLUST client sends a load query to nodes and picks the node with minimal load value. The load value is modified by some other characteristics to improve load balancing in the cluster.

Server Configuration

A SCICLUST server first reads /etc/sciclust (if present). If local-server-configuration is not set to no, then ~/.sciclust is processed too.

client-nodes (no default!)
the IP addresses of nodes that may execute remotely on this sciclust_server's node (This option must be the last one.)
delay-server-process-end (0)
delays the end of a communication with a client by the given number of seconds
local-server-configuration (yes)
allow local configuration of options, i.e. processing of the configuration file ~/.sciclust.
max-open-query-penalty (1.0)
the penalty from the open-query-penalty option is limited to this maximal value
open-query-penalty (0.5)
load value is increased by this value for each query that has not yet led to a start or retreat message
port (9414)
port that a SCICLUST server opens for clients (should match the port option in Section *)
recency-penalty (0.5)
load value is increased by this value if a certain number of jobs were started during a certain number of seconds
recency-penalty-interval (30)
number of seconds that count as recent for option recency-penalty
recency-penalty-jobs (1)
threshold number of jobs that must be started during the recency interval in order to cause a recency-penalty
run-job-penalty (0.00000001)
load is increased by this value for each completed job so that jobs are better distributed if all servers have equal load

Client Configuration

A SCICLUST client first reads /etc/sciclustc (if present). If local-client-configuration is not set to no, then ~/.sciclustc is processed too.

delay-client-end (0)
seconds to wait before end of SCICLUST client run
first-retry-interval (30)
seconds to wait for first retry if no node is found
local-client-configuration (yes)
allow local configuration of options, i.e. processing of the configuration file ~/.sciclustc.
max-retry-interval (600)
retry-interval will never become greater than this maximum
max-runs (10000)
maximum of jobs that are allowed to be run by SCICLUST on one node at the same time
min-queries (1)
query at least this many nodes
min-retry-interval (1)
retry-interval will never become less than this minimum
nice ("nice")
the command (optionally with arguments) that should be used to adjust the nice level of a job (This option can be set to the empty string.)
nodes (no default!)
the IP addresses of all nodes that should be considered for a submitted job (This option must be the last one.)
port (9414)
port that a SCICLUST server opens for clients
query-percentage (100)
query only this percentage of nodes to limit message traffic for large clusters
random-order (no)
use random order for querying servers
retry-interval-factor (2)
modify retry interval by this factor
shell ("ssh")
the command for remote execution of jobs
shell-options ("-n -e none")
the arguments that are added behind the shell command (The ssh option -n might be problematic for some systems (e.g. Solaris); if you encounter problems, remove it from the configuration file.)
use-current-directory (yes)
add cd command to shell command for remote host
wait-reply-percentage (100)
stop waiting for query replies after this percentage of expected replies has been received

Cluster Management

Besides the two scripts for adding cluster nodes, analogous scripts exist for removing cluster nodes:

linuxbox2> sciclust_remove linuxbox34
linuxbox4> sciclust_remove_nodes

For a quick check about the presence and status of SCICLUST servers on possible nodes, one can use the scripts sciclust_check and sciclust_check_nodes:

linuxbox17> sciclust_check linuxbox4
linuxbox67> sciclust_check_nodes

Limitations

SCICLUST has been intensively used over several years on a cluster with around 10 nodes under various Linux and Solaris versions. Nevertheless, SCICLUST probably needs testing on other systems to become even more mature.

SCICLUST can be run on a cluster with up to 6 nodes. If you want to run it on a larger cluster, please contact me.