Installing mpi4py and disco in the LittleFe

Computer Science Department - University of Puerto Rico

Prepared by: José R. Ortiz-Ubarri

Installing mpi4py

This document will guide you through the installation of mpi4py and disco in the LittleFe educational cluster.

mpi4py is an implementation of the Message Passing Library (MPI) for python. mpi4py

disco is a distributed computing framework based on the MapReduce paradigm. disco

Requirements

  1. Priviledge access to a LittleFe cluster (Duh!)
  2. Python (Duh x2!)
  3. Python devel libraries

Installing the python development libraries

#apt-get update
#apt-get install python-dev

Getting mpi4py

Download mpi4py from: mpi4py

#wget https://mpi4py.googlecode.com/files/mpi4py-1.3.tar.gz

Note: The latest mpi4py version was 1.3 for the time this slide was written. You might want to download the actual latest version.

Installing mpi4py

Uncompress the tar.gz file
#tar xzvf mpi4py.tar.gz
Move to the mpi4py source code directory
cd mpi4py-1.3
Build and install mpi4py
python setup.py build
python setup.py install

Testing the installation

Generate the cluster machine file. The file is generated and stored in your home directory with the name machines-openmpi.
#bccd-snarfhosts
This is an example of the content of the file:
node015.bccd.net slots=2
node014.bccd.net slots=2
node013.bccd.net slots=2
node012.bccd.net slots=2
node011.bccd.net slots=2
node000 slots=2

The helloworld test.

The hello world test is inside the demo directory of the mpi4py source code.

#mpirun -n 12 -machinefile ~/machines-openmpi python demo/helloworld.py

Run the unit tests.

Run all the unit tests included in the source. This takes a while...

#mpirun -n 12 -machinefile ~/machines-openmpi python test/runtests.py

Getting disco

Since the BCCD installed in the LittleFe is a debian distribution. You can install disco with apt.

Add the following line to your LittleFe's /etc/apt/source.list

deb http://discoproject.org/debian /

And update the packages database

#apt-get update

Install disco

You need to istall the packages:

#apt-get install python-disco disco-master disco-node

Important: Turn off automatic start of the master daemon

LittleFe nodes' File System are an NFS share of the master. For this reason, if we do not turn off automatic start of the daemon in the master it will also automatically start in the nodes. As of version BCCD 3.0.0 this will cause the nodes to be unreachable.

#update-rc.d disco-master disable
To start disco-master manually in the master node.
#service disco-master start

Disco authentication

Change to the disco user account
su disco
Generate ssh keys:
ssh-keygen -N '' -f ~/.ssh/id_dsa

Copy the key to the authorizd keys file of disco to allow password less connectivity among the cluster nodes.

cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

Note: Since the disco home directory is shared among all the cluster nodes, you do not need to do this step in all the nodes. In clusters that do not share the home directory you would have to copy the keys to each node and then add the keys to each nodes disco account authorized_keys file.

Configure the master. Add nodes to disco.

In your browser go to your master IP address or hostname in port 8989. For example:
http://littlefe.ccom.uprrp.edu:8989/

Add nodes to the sytem.

  1. Click configure
  2. Click add row, under the Available nodes table.
In the nodes column add
node011:015
and in the max workers columns
2

If the nodes are named from node011 to node015, and each node has 2 cores.

Add nodes to disco screen shot.

Your System configuration, Available nodes table should look like this:

Status screen shot

Your status screen should look like this:

Test your disco installation

Write the following disco python script into your LittleFe with name count_words.py .
from disco.core import Job, result_iterator

def map(line, params):
    for word in line.split():
        yield word, 1

def reduce(iter, params):
    from disco.util import kvgroup
    for word, counts in kvgroup(sorted(iter)):
        yield word, sum(counts)

if __name__ == '__main__':
    job = Job().run(input=["http://discoproject.org/media/text/chekhov.txt"],
                    map=map,
                    reduce=reduce)
    for word, count in result_iterator(job.wait(show=True)):
        print(word, count)

Example taken from: http://discoproject.org/

Run the test script.

#python count_words.py

If successfull, the output will be a list of words with the frequency they appeared in the text document checkhov.txt.

References

Python, www.python.org

MPI4py, http://mpi4py.scipy.org/

Disco Project, http://discoproject.org/