Tuesday, June 16, 2015

Parallel Computing with music21

First we start the cluster system with ipcluster start, which on this six-core Mac Pro gives me 12 threads. Then I'll start iPython notebook with ipython notebook.
from __future__ import print_function
from IPython import parallel
clients = parallel.Client()
clients.block = True
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Now I'll create a view that can balance the load automatically.
view = clients.load_balanced_view()
Next let me get a list of all the Bach chorales' filenames inside music21:
from music21 import *
chorales = list(corpus.chorales.Iterator(returnType = 'filename'))
['bach/bwv269', 'bach/bwv347', 'bach/bwv153.1', 'bach/bwv86.6', 'bach/bwv267']

Now, I can use the view.map function to automatically run a function, in this case corpus.parse on each element of the chorales list.
view.map(corpus.parse, chorales[0:4])
[<music21.stream.Score 4467044944>,
 <music21.stream.Score 4467216976>,
 <music21.stream.Score 4465996368>,
 <music21.stream.Score 4465734224>]
Note though that the overhead of returning a complete music21 Score from each processor is high enough that we don't get much of a savings, if any, from parsing on each core and returning the Score object:
import time
t = time.time()
x = view.map(corpus.parse, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [corpus.parse(y) for y in chorales[0:30]]
print("Single processed", time.time() - t)
Multiprocessed 1.7093911171
Single processed 2.04412794113

But let's instead just return the length of each chorale, so we don't need to pass much information back to the main server. First we need to import music21 on each client:
clients[:].execute('from music21 import *')
<AsyncResult: finished>
Now, we'll define a function that parses the chorale and returns how many pitches are in the Chorale:
def parseLength(fn):
    c = corpus.parse(fn)
    return len(c.flat.pitches)
Now we're going to see a big difference:
t = time.time()
x = view.map(parseLength, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [parseLength(y) for y in chorales[0:30]]
print("Multiprocessed", time.time() - t)
Multiprocessed 0.59440112114
Multiprocessed 2.97019314766

In fact, we can do the entire chorale dataset in about the same amount of time as it takes to do just the first 30 on single core:
t = time.time()
x = view.map(parseLength, chorales)
print(len(chorales), 'chorales in', time.time() - t, 'seconds')
347 chorales in 5.31799721718 seconds

I hope that this example gives some sense of what might be done w/ a cluster situation in music21. If you can't afford your own Mac Pro or you need even more power, it's possible to rent an hour of cluster computing time at Amazon Web Services for just a few bucks.

No comments:

Post a Comment