[phpwiki]
To support the UABgrid2 pilot we purchased a few Dell systems that arrived in April: two [Dell 2950s| http://www.dell.com/content/products/productdetails.aspx/pedge_2950?c=us&cs=RC956904&l=en&s=hied] with dual 3.0Ghz Xeon, 8Gb RAM, and 300Gb SAS disks (Perc5/i-based mirrors), a [Dell 1950|http://www.dell.com/content/products/productdetails.aspx/pedge_1950?c=us&cs=RC956904&l=en&s=hied] with dual CPUs and some local storage, and an [Dell/EMC 6TB SAN|http://www.dell.com/content/products/productdetails.aspx/pvaul_ax150?c=us&cs=RC956904&l=en&s=hied] to connect to the 1950. The 2950's will host various aspects of the UABgrid2 infrastructure including the identity management (VO, CA, and MyProxy) and application support (GridWay, Gridsphere, and other collaborative apps) systems. The 2950s will be hosting VMware-based virtual machines to carry out most of these tasks, with the goal of easing application deployment when conflicting system requirements arise. The 1950 will act as a quasi-NAS device, supporting traditional network shares locally and high-bandwidth file transfers via GridFTP (and potentially other protocols) for UABgrid job management. Together these systems will form the UABgrid infrastructure cluster.
I've finally gotten to the point of running juice through these systems and bringing them life. (Work has already been underway constructing the VMs on existing systems.) Being an occasional, though half-hearted, "RedHat for the servers" advocate, I dutifully worked on getting [CentOS4|http://centos.org/] installed on these systems. Here's how it went:
!! Branding confusion: x86_64 = EM64T != Itanium
Several hours were spent dealing with the system not wanting to boot my install disks. I've created many bootable ISOs in the past with [K3b|http://en.wikipedia.org/wiki/K3b] and was confused why it wasn't working. At one point I thought the 2950's drive might be DVD only and I was using CDRs. That wasn't the case. The Dell "Start Here" disk would boot but wouldn't install an OS off any disk not containing key files with "RedHat 4" in them. Dell's installation support CD just didn't want to deal with CentOS.
It wasn't until after trying several different disks (all using the _IA64 architecture build of CentOS), asking around in CIS and ENG (hadn't experienced any such behavior, but gave a good recommendation to use the #centos IRC channel at Freenode.irc.com), and looking up [all manor|http://developer.intel.com/technology/efi/index.htm] of [scary posts|http://www.centos.org/docs/4/html/rhel-ig-x8664-multi-en-4/s1-ia64-intro-efi-shell.html] regarding Itanium installation that I finally realized my error. Having an Intel 64-bit chip does not mean it is [Itanium|http://en.wikipedia.org/wiki/Itanium]. Remember the [marketplace confusion|http://en.wikipedia.org/wiki/X86] a few years ago when [Intel flubbed 64-bit and AMD got it right|http://www.geek.com/news/geeknews/2004Mar/bch20040406024621.htm]? I'd only been with AMD for 64-bit chips so didn't know Intels naming conventions. I just remembered Intel touting they had 64-bits in Itanium! Well, [EM64T is Intel's version of AMD's x86_64 architecture|http://www.hardwaresecrets.com/article/262]. These Dell's have [Xeon 5160 (EM64T chips)|http://www.intel.com/performance/server/xeon/intthru.htm]. I switched over to using x86_64 boot ISOs and was moving forward with installing CentOS4.
!! CentOS4 and modern hardware: installing drivers manually is not fun
The main reason I like CentOS is that it seems to do a better job of integrating with non-RedHat package sources. It's freely available making the [community support infrastructure stronger|http://www.gtlib.gatech.edu/pub/centos/]. Earlier versions of CentOS (4.2) have also served the @lab well running our existing VMware Server infrastructure. I felt comfortable using it for the task.
My first boot of the CentOS4 installer revealed a problem. It immediately reported there was no disk available to install on.
My main system requirement was running [VMWare Server 1.0.3|http://www.vmware.com/products/server/]. The [support matrix|http://pubs.vmware.com/server1/admin/wwhelp/wwhimpl/common/html/wwhelp.htm?context=admin&file=intro_admin.2.24.html] showed that only up to RHEL 4.3 was officially supported. The forums seemed to indicate some [issues on tests with RHEL 5.0|http://www.vmware.com/community/message.jspa?messageID=642099]. I decided to get [CentOS 4.3|http://lists.centos.org/pipermail/centos/2006-March/061519.html], eventhough using a vault release is frowned upon, because I didn't want to deal with unknown VMware Server compatibility. (The older hardware in the lab has been running CentOS 4.2 just fine.) The CentOS 4.3 network install ISO seemed to survive the disk detection test (though it may just come later) but it couldn't deal with the Broadcomm 1Gig cards on the system.
!! Using Suse 10.1: path of least resistance
In my DVDRom scare earlier, I'd popped in a pressed Suse 10.1 CD and was pleased to see it booted just fine. (In fact, the architecture support list on that disk had helped resolve my Intel-induced hardware confusion.) In looking up the VMware Server hardware requirements matrix, I had also noticed that VMWare Server 1.x was supported on Suse 10.1. Given that 10.1 is a little more current than the CentOS 4 line, I decided to see if it detected the hardware properly. It did. Given the options, it was an easy decision to switch to Suse 10.1 for this platform.
I've used Suse 9.x on the server side for a while and Suse 9.0 through 10.2 on the desktop for years, so it's not an unfamilar selection. I trust it as much as RedHat. My "RedHat on the server" assumption is sometimes driven by the perception that others are more comfortable with it that way. This little adventure was a good reminder to me: CentOS/Redhat, Suse, and Debian are all outstanding systems. They've got a long, positive track-record. But, it's important to use the appropriate tool for the job. That's been my motivation for system selection all along. In this case, the right tool is Suse 10.1.
Suse 10.1 does have the bug with it's out-of-the box package updater, but this is easily [worked around|http://spinink.net/2006/06/09/suse-linux-101-package-management-issues-fixed/] now by installing updates to it first thing. [10.1|http://www.linux.com/article.pl?sid=06/05/22/1817239] is a solid system and it's administrative configuration and tools make it easy to manage. This is ideal for a system that won't have much custom software (VMware, Globus and cfengine) and serves mainly to serve VMs.
!! UABgrid2 VM platform is on-line
I'm using VMware Server 1.0.3 on the 2950, running Suse 10.1. The system firewall only allows connections via SSH and VMware at present. The system itself sits on a private address range behind the @lab firewall and only the VMs will be exposed. Direct access is not possible. This should be an adequately secure physical configuration and reflects its expected deployment configuration.
There are more steps of course. The other Dell systems need to be harnessed up and put into service. Existing VMs need to be migrated and others built. Further software installs and configuration are ahead. Having the UABgrid infrastructure cluster coming on line and begin taking on the hosting of the VMs feels good, though. It's the second half of Milestone 1.
Look for a update next week on the progress with the first half of milestone 1: the software core of UABgrid2. After all, that's the more interesting part.

