Zach Garner's blog

My Blog Entry

I was reading my email... now I'm blogging.

Ganglia Installed and Configured on Medusa

|

I've configured Ganglia on Medusa. See it's native HTML output at http://medusa.lab.ac.uab.edu/ganglia/. It's also integrated with MDS, so grid-info-searches will display information about the entire cluster.

Ganglia seems to be about beta quality. No Major problems with it... except that the dynamically generated icons aren't showing up. This doesn't matter too much to us, since we aren't going to be using their web front end.

To integrate Ganglia with MDS, I have to use ganglia-python client, which seems to be alpha quality software. I had to edit the python code to change hardcoded paths (they refered to the developer's home directory).

GridPort and OGCE installed but unconfigured

I've installed GridPort, but so far do not have it configured. I've put up their example/demo site at http://bechamel.lab.ac.uab.edu/gridport/. To compare, OGCE Open Grid Computing Environment is also installed at http://bechamel.lab.ac.uab.edu:10081/uabgrid

NWS Installed and Configured on bechamel

NWS has been installed and configured on bechamel. It's not tied in to the rest of our systems just yet.

I've created a init.d startup for it, as follows.

#!/bin/bash
#
# Startup script for NWS.
# Author: Zach Garner
#
# chkconfig: - 85 15
# description: NWS

export NWS_HOME=/opt/nws-2.8.1/
export HOSTNAME=`hostname`
export NAME_SERVER=$HOSTNAME
export SCRATCH_DIR=/tmp/nws

start() {
$NWS_HOME/bin/nws_nameserver -e /var/log/nws/nameserver.err -l /var/log/nws/nameserver.log -f \
$SCRATCH_DIR/registrations &
$NWS_HOME/bin/nws_memory -d $SCRATCH_DIR -e /var/log/nws/memory.err -l /var/log/nws/memory.log -N $HOSTNAME &

Globus3 mpicc32 flavor

the mpicc compiler is not compiling Globus 3 correctly. It looks like the problem exists in only one file, in one package. Trying to find a work around

Globus Build Process Sucks!

It looks like the build script supplied by Globus hard codes the Flavor. As far as I can tell there is no way to create anything other than a gcc32dbg[pth] build without editing the build script. This is needed for Non-Debug builds, builds based on a a vendor compiler (i.e. our portland compilers) or to use MPI.

Note that this is not GPT's fault. Globus is providing a script that calls GPT with the proper FLAVOR arguments. It's this script (install-gt3) that is the problem.

NMI MPICH-G2

NMI based MPICH-G2 is working to some extent.

One problem is that it either needs to share filesystems for all systems in the MPICH-G2 cluster, or the user needs to manually copy executables to all systems. The problem with this is that the NMI Documentation does not mention this issue.

We now have to remaining issues to take care of:
# The current set up is only working with my GridN test systems
# MPICH-G2 applications are running on the Head Node of the Beowulf cluster, but not returning STDOUT/STDERR messages
# The applications are only running on the Head Node, not being distributed accross the cluster.

OpenCA Running

OpenCA is now Running. It will still take some time to integrate it with our Grid infrastructure.

Condor Pool Created

|

I've created a small Condor Pool.

The master (nori) is running on a VM.My workstation (ceviche) is the only other machine part of the pool. The following "screenshot" shows three jobs that were submitted. Two are running on one of my workstations idle processors. The other is runningon the VM (I've got to disable this, no one should be computing on the VM)

Name OpSys Arch State Activity LoadAv Mem ActvtyTime
vm1@ceviche.l LINUX INTEL Unclaimed Idle 0.000 851
vm2@ceviche.l LINUX INTEL Claimed Busy 0.000 851
vm3@ceviche.l LINUX INTEL Claimed Busy 0.000 851

Adding a node to an existing Condor Pool

|

If you install the Condor RPM, it comes pre-configured as a master. This is what you need to do, if you want to use it as a execute node:

run: "./condor_configure --central-manager=condor_master.example.domain --owner condor"

Make sure user 'condor' exists. Now rerun condor_master (controller of the daemons on the execute node, Not the server)

Syndicate content