First we start the cluster system with
ipcluster start
, which on this six-core Mac Pro gives me 12 threads. Then I'll start iPython notebook with ipython notebook
.from __future__ import print_function
from IPython import parallel
clients = parallel.Client()
clients.block = True
print(clients.ids)
Now I'll create a
view
that can balance the load automatically.view = clients.load_balanced_view()
Next let me get a list of all the Bach chorales' filenames inside
music21
:from music21 import *
chorales = list(corpus.chorales.Iterator(returnType = 'filename'))
chorales[0:5]
Now, I can use the
view.map
function to automatically run a function, in this case corpus.parse
on each element of the chorales
list.view.map(corpus.parse, chorales[0:4])
Note though that the overhead of returning a complete music21 Score from each processor is high enough that we don't get much of a savings, if any, from parsing on each core and returning the Score object:
import time
t = time.time()
x = view.map(corpus.parse, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [corpus.parse(y) for y in chorales[0:30]]
print("Single processed", time.time() - t)
But let's instead just return the length of each chorale, so we don't need to pass much information back to the main server. First we need to import music21 on each client:
clients[:].execute('from music21 import *')
Now, we'll define a function that parses the chorale and returns how many pitches are in the Chorale:
def parseLength(fn):
c = corpus.parse(fn)
return len(c.flat.pitches)
Now we're going to see a big difference:
t = time.time()
x = view.map(parseLength, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [parseLength(y) for y in chorales[0:30]]
print("Multiprocessed", time.time() - t)
In fact, we can do the entire chorale dataset in about the same amount of time as it takes to do just the first 30 on single core:
t = time.time()
x = view.map(parseLength, chorales)
print(len(chorales), 'chorales in', time.time() - t, 'seconds')
I hope that this example gives some sense of what might be done w/ a cluster situation in
music21
. If you can't afford your own Mac Pro or you need even more power, it's possible to rent an hour of cluster computing time at Amazon Web Services for just a few bucks.