Tuesday, June 16, 2015

Parallel Computing with music21




First we start the cluster system with ipcluster start, which on this six-core Mac Pro gives me 12 threads. Then I'll start iPython notebook with ipython notebook.
from __future__ import print_function
from IPython import parallel
clients = parallel.Client()
clients.block = True
print(clients.ids)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

Now I'll create a view that can balance the load automatically.
view = clients.load_balanced_view()
Next let me get a list of all the Bach chorales' filenames inside music21:
from music21 import *
chorales = list(corpus.chorales.Iterator(returnType = 'filename'))
chorales[0:5]
['bach/bwv269', 'bach/bwv347', 'bach/bwv153.1', 'bach/bwv86.6', 'bach/bwv267']

Now, I can use the view.map function to automatically run a function, in this case corpus.parse on each element of the chorales list.
view.map(corpus.parse, chorales[0:4])
[<music21.stream.Score 4467044944>,
 <music21.stream.Score 4467216976>,
 <music21.stream.Score 4465996368>,
 <music21.stream.Score 4465734224>]
Note though that the overhead of returning a complete music21 Score from each processor is high enough that we don't get much of a savings, if any, from parsing on each core and returning the Score object:
import time
t = time.time()
x = view.map(corpus.parse, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [corpus.parse(y) for y in chorales[0:30]]
print("Single processed", time.time() - t)
Multiprocessed 1.7093911171
Single processed 2.04412794113

But let's instead just return the length of each chorale, so we don't need to pass much information back to the main server. First we need to import music21 on each client:
clients[:].execute('from music21 import *')
<AsyncResult: finished>
Now, we'll define a function that parses the chorale and returns how many pitches are in the Chorale:
def parseLength(fn):
    c = corpus.parse(fn)
    return len(c.flat.pitches)
Now we're going to see a big difference:
t = time.time()
x = view.map(parseLength, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [parseLength(y) for y in chorales[0:30]]
print("Multiprocessed", time.time() - t)
Multiprocessed 0.59440112114
Multiprocessed 2.97019314766

In fact, we can do the entire chorale dataset in about the same amount of time as it takes to do just the first 30 on single core:
t = time.time()
x = view.map(parseLength, chorales)
print(len(chorales), 'chorales in', time.time() - t, 'seconds')
347 chorales in 5.31799721718 seconds

I hope that this example gives some sense of what might be done w/ a cluster situation in music21. If you can't afford your own Mac Pro or you need even more power, it's possible to rent an hour of cluster computing time at Amazon Web Services for just a few bucks.

Music21 v.2.0.5 (beta) released

The newest version of the beta 2.0 track of music21 has been released. A reminder that the 2.0 track involves potentially incompatible changes w/ 1.X so upgrade slowly and carefully if you need existing programs to work. Changes are being made to simplify and speed up usage and make the system more expandable for the future.

Download at https://github.com/cuthbertLab/music21/releases or with PyPI.


Major Changes

  • Complete rewrite of TinyNotation. Tinynotation was one of the oldest modules in music21 and it showed — I was still learning Python when I wrote it. It documents a simple way of getting notation into music21 via a lily-like text interface. It was designed to be subclassable to make it work on whatever notation you wanted to use. And technically it was, but it was so difficult to do as to be nearly impossible. Now you’ll find it much simpler to subclass. Demos of subclassing are included in the code (esp. HarmonyNotation, and trecento.notation); a tutorial to come soon.
  • backwards incompatible changes: (1) you used to be able to specify an initial time signature to Tinynotation as corpus.parse(“tinynotation: c4 d e f”, “4/4”); now you must put the time signature string into the text itself, as corpus.parse(“tinynotation: 4/4 c4 d e f”). “cut” and “c” time signatures are no longer supported; use 2/2 and 4/4 instead. (2) calling tinyNotation.TinyNotationStream() directly doesn’t work any more. Use the corpus.parse interface either with the “tinynotation:” header or format=“tinynotation” instead. If you must use the guts, try tinyNotation.Converter(“4/4 c4 d e f”).parse().stream. (3) TinyNotation used to return its own “TinyNotationStream” class, which was basically incompatible with everything. Now it returns a standard stream.Part() (4) TinyNotation did not put notes into measures, etc. you needed to call .makeMeasures() afterwards. If you need the older method, use corpus.parse(‘tinynotation: 4/4 c2 d’, makeNotation=False)
  • Musescore works as a PNG/PDF format. First run: us = environment.UserSettings(); us[‘musescoreDirectPNGPath’] = '/Applications/MuseScore 2.app/Contents/MacOS/mscore’ or wherever you have it). Then try calling “.show(‘musicxml.png’)” and watch the image arrive about 100x faster than it would in Lilypond. Thanks MuseScore folks! This is now the default format for .show() in iPython notebook. Examples using lily.png and lily.pdf will migrate to this format, so that lilypond can be moved to deprecated-but-not-to-be-removed status. (I just don’t have time to keep up)
  • demos/gatherAccidentals : a good first test programming assignment for students. I use it a lot in teaching.
  • musicxml parses clefs mid-measure (thanks fzalkow)
  • installer.command updated for OS X (thanks Andrew Hankinson) — let me know if this makes a problem.
  • postTonalTools demo in usersGuide.
  • DataSet feature extractor gets a .failFast = False option for debugging.


Under the hood / contributors

  • music21 now uses coverage checking via coveralls.io. We are at 91.5% code coverage; meaning when the test suite is run, 91% of all the lines of code are tested. Aiming for 95% (100% is impossible). Adding coverage checking let me find a lot of places that weren’t being tested that, lo and behold!, had bugs. What it means for contributors: any commit that is longer than 20 lines of code needs to improve the coverage percentage and help us get to 95%. So make sure that at least 92% (better 99%) of your code is covered by tests.
  • the romanText.objects module has been renamed romanText.rtObjects to not conflict with external libraries. It’s an implementation detail.
  • added qm_converter.py demo of how to subclass SubConverter.


Minor Changes

  • measure number suffixes in musicxml output, not just input.
  • language detector can detect Latin and Dutch language texts now.
  • fix pitch class errors in microtones.
  • midi files with negative durations no long crash the system.
  • bugs fixed in tonalCertainty. You can be more certain that it works.
  • cPickle is used in Python3 now. Faster.
  • midi parsing can specify quantization levels.
  • music21.__version__ gives the version (maxalbert did a lot this commit; forgot to shout out before!)
  • better detection of lilypond binaries.
  • certain Sibelius MusicXML files with UTF-16BOMs can now be read.
  • rests imported from MusicXML would not have expressions attached to them — fermatas, etc. fixed
  • serial.ToneRow() now has the notes each as quarter notes rather than as zero-length notes; it makes .show() possible; backwards incompatible for the small number of people using it.
  • colored notation now works better and in more places.
  • better docs.
  • about a trillion tiny bugs and untested pieces of code identified and fixed by glasperfan (Hugh Z.)

 

Looking forward to the 2.1 release!