The Cult of Gary

24 Apr

EC2, Sun Grid Engine and host_aliases

Part of the application I’m working on right now can take advantage of grid (as in chunking up jobs and distributing them to work nodes). I’ve spent my day getting EC2 and Sun Grid Engine to work together. It was a long and hard battle, but I prevailed.

The crux of the problem is DNS. Out of the gate, EC2 instances have two DNS names — the public, internet resolvable name and the internal DNS name. On top of that, I typically CNAME a hostname to the internet name of the host and manually change the hostname to match this. When I’m in a system, it becomes a lot easier to see which one I’m running if it’s my own naming schema versus the one Amazon picked for me. The other handy thing is that EC2’s get the internal IP when they do the CNAME lookup, which is very helpful for routing and firewalls.

SGE is very picky with DNS. It expects that every host have matching forward and reverse DNS. That DNS name also needs to be the primary name of the host. I think these requirements are a bit on the excessive side, but that’s what I have to live with if I don’t want to write my own grid (and I thought about that several times today). 

If I would have stuck with the internal names of the hosts, everything probably would have worked a lot easier. 

During my fiddling, I cam across a config file for SGE called host_aliases. You put the unique host name, followed by the alternate host names for a particular. My first read through the documentation lead me to believe that I could put this in any order. If I put my hostnames first, then they would show up in grid reports and such.

It turns out that this is not the case. I learned from much trial and error that you must put the official and real hostname (in this case, the internal one) as the unique host name.  

4 Responses to “EC2, Sun Grid Engine and host_aliases”

  1. 1
    nate Says:

    Can you describe your setup a bit more? I’m trying to get sge running with ec2 and can’t figure out the necessary host_aliases tricks. Right now I’m just trying to get a qmaster and execd running on the same machine.

    Thanks!

  2. 2
    Alexey Bokov’s weblog » Blog Archive » Installing Sun Grid Engine on Amazon EC2 Says:

    [...] cause some problems in running SGE Amazon EC2 instances with SGE, this stuff can be fixed using host_aliases file in SGE, or other way it’s to use /etc/hosts file for it – some kind of this technique used in [...]

  3. 3
    Alexey Bokov Says:

    Yepp, SGE isn’t very friendly to Amazon EC2 – I also got same problems with DNS, but I use another way to fix them – I list all instances in /etc/hosts in format ( internal_ip external_name external_short_name internal_name internal_short_name ) and also use hostname external_short_name after it

  4. 4
    Justin Riley Says:

    Hi, I’ve created a project, called StarCluster that creates Sun Grid Engine clusters on Amazon EC2 and auto configures things like nfs mounting /home, passwordless ssh, OpenMPI, configuring scratch space. It also provides the ability to mount an EBS volume to /home for persistent storage. If you’re interested, give StarCluster a shot, it’s Open Source and freely available: http://web.mit.edu/starcluster

Leave a Reply

© 2010 The Cult of Gary | Entries (RSS) and Comments (RSS)

GPSwordpress logo