Computer Science Department - University of Puerto Rico
Prepared by: José R. Ortiz-Ubarri
This document will guide you through the installation of mpi4py and disco in the LittleFe educational cluster.
mpi4py is an implementation of the Message Passing Library (MPI) for python. mpi4py
disco is a distributed computing framework based on the MapReduce paradigm. disco
#apt-get update #apt-get install python-dev
Download mpi4py from: mpi4py
#wget https://mpi4py.googlecode.com/files/mpi4py-1.3.tar.gz
Note: The latest mpi4py version was 1.3 for the time this slide was written. You might want to download the actual latest version.
#tar xzvf mpi4py.tar.gzMove to the mpi4py source code directory
cd mpi4py-1.3Build and install mpi4py
python setup.py build python setup.py install
#bccd-snarfhostsThis is an example of the content of the file:
node015.bccd.net slots=2 node014.bccd.net slots=2 node013.bccd.net slots=2 node012.bccd.net slots=2 node011.bccd.net slots=2 node000 slots=2
The hello world test is inside the demo directory of the mpi4py source code.
#mpirun -n 12 -machinefile ~/machines-openmpi python demo/helloworld.py
Run all the unit tests included in the source. This takes a while...
#mpirun -n 12 -machinefile ~/machines-openmpi python test/runtests.py
Since the BCCD installed in the LittleFe is a debian distribution. You can install disco with apt.
Add the following line to your LittleFe's /etc/apt/source.list
deb http://discoproject.org/debian /
And update the packages database
#apt-get update
You need to istall the packages:
#apt-get install python-disco disco-master disco-node
LittleFe nodes' File System are an NFS share of the master. For this reason, if we do not turn off automatic start of the daemon in the master it will also automatically start in the nodes. As of version BCCD 3.0.0 this will cause the nodes to be unreachable.
#update-rc.d disco-master disableTo start disco-master manually in the master node.
#service disco-master start
su discoGenerate ssh keys:
ssh-keygen -N '' -f ~/.ssh/id_dsa
Copy the key to the authorizd keys file of disco to allow password less connectivity among the cluster nodes.
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
Note: Since the disco home directory is shared among all the cluster nodes, you do not need to do this step in all the nodes. In clusters that do not share the home directory you would have to copy the keys to each node and then add the keys to each nodes disco account authorized_keys file.
http://littlefe.ccom.uprrp.edu:8989/
Add nodes to the sytem.
node011:015and in the max workers columns
2
If the nodes are named from node011 to node015, and each node has 2 cores.
Your System configuration, Available nodes table should look like this:
from disco.core import Job, result_iterator def map(line, params): for word in line.split(): yield word, 1 def reduce(iter, params): from disco.util import kvgroup for word, counts in kvgroup(sorted(iter)): yield word, sum(counts) if __name__ == '__main__': job = Job().run(input=["http://discoproject.org/media/text/chekhov.txt"], map=map, reduce=reduce) for word, count in result_iterator(job.wait(show=True)): print(word, count)
Example taken from: http://discoproject.org/
#python count_words.py
If successfull, the output will be a list of words with the frequency they appeared in the text document checkhov.txt.
Python, www.python.org
MPI4py, http://mpi4py.scipy.org/
Disco Project, http://discoproject.org/