Thursday, January 15, 2015

Ten(+) Years of music21 (pt. 1)


An important milestone for music21 passed unnoticed by me a few weeks ago: ten years since the first time that code (well mostly documentation and tests) that still remains somewhere in the music21 codebase was run and did something useful (a scrambled loop for a composition project I was working on called Fine dead sound, after Faulkner).  It seemed worth saying a bit about the history of the project and things learned along the way.

Music21 started as a Perl project -- the oldest code I can find that even vaguely resembles the current music21  comes from November 2003, a 30k Perl script to label a Bach chorale with Roman numerals and correct keys (I still need to publish the algorithm somewhere since I haven't seen a better one). It had no graphical output except using David Rakowski's Lassus font. I was able to get it back up and running and through the modern miracle of web fonts, everyone (except IE users) can play with the oldest  music21  system at: http://www.trecento.com/test/lassus.cgi?num=260. If I remember correctly, chords in red had only a passing functionality, the small letters are the roots of chords and the second line of text is the locally active key. Pivot chords are labeled in brackets. This was done for an assignment in Avi Pfeffer's AI and Music class; others in my group had already produced the code (in C I believe) for removing passing tones from the score.

I had begun that project using Humdrum, a system I knew well and still have great admiration for, but I began to need to encapsulate more and more items into individual classes.  I googled around, because I figured someone would have already gotten far in an object-oriented replacement for Humdrum. But I found nothing, so when I got back to the project about a year later I started creating generalized objects which I incorporated into a package called "PMusic" for "Perl Music." (I think I first called it just "Music" but realized that this was going to be a problem.)  Here is part of the PMusic code for Chords, written on 12/31/2004.  Good documentation was already a music21 principle, but names of notes were still based on Kern/Humdrum:


=head1 PMusic::Chord (class)

class for dealing with chords

  $ch1 = PMusic::Chord->new(['CC', 'E', 'g', 'b-']);

Note double set of brackets, indicating a perl array reference.
Doesnt deal with rests yet.  Chords of more or less than four notes might
be odd...

Can also give an arrayRef of Note objects if you're that perverse...

=cut

package PMusic::Chord;
use strict;
use Carp;

sub new {
  my $proto = shift;
  my $class = ref($proto) || $proto;
  my %args;

  ## if only one arg, assume it's an arrayref of Note names
  if (scalar @_ == 1) { $args{Notes} = $_[0] }
  else { %args = @_ };
  my $self  = \%args;
  bless $self, $class;
}

### for now, first note is always bass of chord (does tenor ever go below bass? sometimes!)

=item NoteObjs()

Returns an array ref of PMusic::Note objects

=cut

sub NoteObjs {
  my $self = shift;
  return $self->{NoteObjs} if $self->{NoteObjs};
  my @objs = ();
  foreach (@{$self->{Notes}}) {
    if (ref($_)) { push @objs, $_; }
    else         { push @objs, Note->new($_); }
  }
  $self->{NoteObjs} = \@objs;
}

=item Bass([note PMusic::Note])

returns the bass note or sets it to note.  Usually defined as the first noteObj
but we want to be able to override this.  You might want an implied
bass for instance...  v o9.

=cut

sub Bass {
  my $self = shift;
  $self->{Bass} = $_[0] if defined $_[0];
  $self->{Bass} ||= $self->NoteObjs->[0];
  return $self->{Bass};

}

At this point I was living in Rome at the American Academy as a Rome Prize fellow in Medieval Studies, and halfway through my fellowship I realized that none of this code was going to write a six-hundred page dissertation on fourteenth-century music fragments, and besides, I had job interviews to prepare for, and later, students to teach at Smith College, Mount Holyoke College, and then MIT, none of which (yes, including MIT) cared that I was working on generalized software for musical analysis as a side project. I also saw how long this project was going to take.  And besides, I was still convinced that someone had already written a generalized OO toolkit for musical analysis, I just wasn't searching well enough.  So the project was put aside for almost two years before it reemerged in Python, as the (to be written) Part 2 explains.

Sunday, January 11, 2015

Music21 v.2.0.0 (alpha) Released

We're happy to announce that the first public alpha of music21 v.2 has been released!

Version 2 is the first version of music21 since v.1 to make substantial changes in the code base that introduce backwards incompatibilities in order to make going forward faster and smoother.  It doesn't change anything super fundamental à la Python 3's print function, so most code should still run fine, but definitely test in a separate environment before upgrading on any code you have that needs to run without problems.  The system is still changing and more backward-incompatible changes could be included until v.2.1.

We have had 420 commits since the last release, so there is a lot that is new!

Substantial changes include:

  • Offsets and quarterLengths are now stored internally as Fractions if they cannot be exactly represented as floating point numbers. A lot of work went into making this conversion extremely fast; you probably won't ever notice the speed difference, but you can now be sure that seven septuplets will add up to exactly a half note.  For instance:
  • >>> n = note.Note()
    >>> n.duration.appendTuplet(duration.Tuplet(3,2))
    >>> n.fullName
    'C in octave 4 Quarter Triplet (2/3 QL) Note'
    >>> n.quarterLength
    Fraction(2, 3)
    >>> n.quarterLengthFloat # if you need it...
    0.6666666666666666
  • Converter structure has been overhauled for more easily adding new converters in the future. If you've wanted to write a converter or already have one for a format not supported but have been daunted by how to include it in music21 now is a great time to do it.  Speaking of which...
  • MEI format is supported for import (thanks to Chris Antila and the ELVIS team at McGill university for this great enhancement)
  • Python 2.6 is no longer supported. All tests and demos pass and run on Python 2.7, 3.3, and 3.4. (3.2 and older are not supported)
  • FreezeThaw on Streams works much better and caching loaded scores works great (some of this was included in 1.9, so definitely upgrade at least to that.
  • Much improved Vexflow output using music21j, Javascript/Vexflow rendering engine. Was in 1.9, but improved here.
  • Lots of places that used to return anonymous tuples, now return namedtuples for more easily understanding what the return values mean.
  • Integrated Travis-CI testing and Coverage tests will keep many more bugs out of music21 in the future.
  • Many small problems with Sorting and stream handling fixed.
  • Corpus changed: for various licensing reasons, v.2.0 does not include the scores from the MuseData corpus anymore. This change mostly affects Haydn string quartets and Handel's Messiah.  However, new replacement scores are being included and 2.1 will have as many great works as before.  The MuseData scores are still available online.  MuseData is now a deprecated format and no further testing on it will be conducted; only bug fixes that are easily implemented will be accepted.
  • music21 is now available under the BSD license in addition to LGPL! 

We will try to stick to something close to the semantic versioning model in the future once v.2.1 is released. In other words, after 2.1, we'll try very hard not to break anything that worked in v.2.1 until v. 3.0 is released.  This will probably mean that the version numbers are going to creep up faster in the future.

Still todo before v.2.1 is a major change in how elements are stored in Streams.  Stay tuned if you care about performance tweaks etc., otherwise ignore it -- we'll keep the interface the same so you might not notice anything except speed improvements.

Smaller backward-incompatible changes include:
  • Stream __repr__ now includes a pointer rather than a number if .id is not set.  This change will make filtering out doctests easier in the future.
  • TinyNotation no longer allows for a two-element tuple where the second element is the time signature.  Replace: ("C4 D E", "3/4") with ("tinynotation: 3/4 C4 D E")
  • Obscure calls in SpannerBundle have been removed: spannerBundle.getByClassComplete etc.
  • Convenience classes: WholeNote, HalfNote, etc. have been removed.  Replace with Note(type='whole') etc.
  • Old convenience classes for moving from Perl to Python (DefaultHash, defList) have been removed or renamed (defaultlist) 
  • Articulations are marked as equal if they are of the same class, regardless of other attributes.
  • common.almostLessThan, etc. are gone; were only needed for float rounding, and that problem is fixed.
  • duration.aggregateTupletRatio is now aggregateTupletMultiplier, which is more correct.
  • scala.ScalaStorage renamed scala.ScalaData
  • common.nearestMultiplier now returns the signed difference.
  • layout -- names changed for consistency (some layout objects had "startMeasure" and some "measureStart" - now they're all the same); now all use namedtuples.
  • rarely used functions in Sites, base, Duration, SpannerStorage, VariantStorage, have been removed or renamed.  I'd be surprised if these affect anyone outside the development team.
Improvements and bug fixes:
  • common.mixedNumeral for working with large rational numbers
  • common.opFrac() optionally converts a float, int, or Fraction to a float, int, or Fraction depending on what is necessary to get exact representations. This is a highly optimized function responsible for music21 working with Fractions at about 10x the speed of normal Fraction work.
  • Rest objects get a .lineShift attribute for layout.
  • staffDetails, printObject MXL had a bug, writing out "True" instead of "yes"
  • staffLines is now an int not float. (duh!)
  • better checks for reused pointers.
  • lots of private methods are now ready for public hacking!
  • Lyric.rawText() will return "hel-" instead of "hel" for "hel-lo".

Wednesday, July 23, 2014

Beethoven's Piano Sonatas: an Application of music21

[This is a guest post by Derek Klinge who uses music21 in his research on music and disability. I thank him for his contribution. - MSC]

I am a researcher within the Performing Arts Medicine Association. I was interested in looking at Beethoven's use of range over time in his piano sonatas. Although several previous studies have looked at the question of how Beethoven's compositions were affected by his hearing loss, the results were far less than conclusive. A study in the British Medical Journal counted the notes in the first movements of the first violin parts of Beethoven's string quartet's by hand. For a number of reasons, I thought it might be better to look at the piano sonatas, including that Beethoven wrote more piano sonatas than he did string quartets and symphonies, so the statistical power would be greater. Counting all of the notes in Beethoven's piano sonatas by hand would be a Herculean task for sure, but fortunately with scores available from the Center for Computer Assisted Humanities and music21 sufficient coding skills would do the job.

Why music21?

In addition to the number of high notes, I was also interested in Beethoven's overall use of range, the average note, average frequency, number of measures with high notes, and in calculating values based on the number of notes, as well as weighting those measures by the duration of notes. The methods available in music21 allow the collection of this data very quickly. To collect the majority of the data I needed from all 103 movements of Beethoven's piano sonatas, count over a quarter million individual notes, and organize the data into sonatas, and separating the data by movement number, takes about 11 minutes.

Some Interesting Findings

Beethoven's use of high notes was lowest around 1800 (for all the graphs below, the colors within the dots represent the Sonata Numbers, going from red to purple from 1-32):


The average frequency of each sonata follows a similar trend:

In general, as there are more notes per measure, there are more high notes per measure. This trend does not hold many of the sonatas written before 1802.

Also, the relationship between the use of high notes, and the average frequency was different between the earlier and later sonatas:


Conclusions

Technology like music21 is an invaluable tool for the empirical study of musicology. Relatively quickly, data gathered can be used to analyze the possible relationships between Beethoven's use of high notes and his overall range, and compare that with what we understand about his hearing loss. These data suggest that Beethoven was significantly affected by his hearing loss, though it seems that sometime around 1802 he developed strategies to cope with his progressing disability.

Wednesday, June 25, 2014

Music21 v1.9 released

We are proud to release music21 v1.9.3, the latest and last release in the 1.x series.
There have been 147 commits in the two months since v1.8; here are some of the highlights:
  • MUCH faster .getContextByClass (KUDOS to Josiah Oberholtzer for this). Even if you don't use .getContextByClass in your own code, you're definitely calling something that calls it. This method figures out where the most recent key signature, time signature, clef, etc. is for any given object, finds relationships between notes in different voices, etc. For analysis of medium-sized scores (say, 3 voices, 100 measures) expect a 10-fold speedup. For larger pieces, the speedup can be over 100-fold.
  • A new stream/timespans module that makes the previous speedup possible by representing m21 Streams as AVL trees -- it's used in a few places (needs more docs), forthcoming releases will use it in a lot more places
  • Python3 support (3.3 and later). The entire test/multiprocessTest.py suite passes on Python 3. N.B. to contributors -- from now on all contributions need to pass tests on both Python 2.7 and 3.3 and later. Negative -- in the past you could have made music21 run on unsupported older systems (2.6 and sometimes 2.5); now from music21 import * will fail on pre-2.7. 2.7 has been a requirement since Music21 1.7. Fewer than 30% of Macs still in use are running Lion or earlier and thus will need to update to 2.7. This version of music21 runs about 25% faster on Python 3 than Python 2, but otherwise no new features of Python3 are used. Python 2.7 will be supported throughout the Music21 2.x cycle so no panicking -- it'll be years (if ever) before Python 3.3+ is a requirement.
  • Improvements to reductions of scores. And to analyzing voiceleading motion (some of this is backwards incompatible)
  • Better, faster, and more consistent sorting of elements in a Stream
  • Changes to the derivations module that I doubt anyone else was using anyhow...
  • Removed obsolete files.
  • Stafflines import and export from musicxml (thanks Metalmike!)
  • Complete refactoring of converter.py to make it easier for users to write their own Subconverter formats (that can eventually be put into the system)
  • Complete serialization of Streams via a new version of jsonpickle. This has big implications down the line; for now it affects...
  • Vexflow output is much improved (unless you were counting on Voices; in which case do not upgrade) using the alpha version of music21j -- Javascript reimplementation of music21's core features.
  • IPython improvements, allowing for robust and persistent communication between Javascript and Python. This will eventually (once I document it...) let you use the web browser as a UI for music21 python apps including live updating of music notation. It's too complex for most users right now, but I can attest that this will be one of the biggest perks of the 2.x development.
The usual bug fixes, documentation improvements and fixes, etc. are implemented. Thanks to MIT, the NEH, and the Seaver Institute for funding the project. (and to MIT for tenuring me in part on the basis of music21). This is the last release that Josiah Oberholtzer was lead programmer for; his considerable talents will still be on display in Abjad and many other projects he works on, and the implications of the new storage system he has developed will continue to pay off for years.

What's next?

Starting work on music21 2.0 today. That release will have some backwards incompatible changes that developers will need to deal with -- just as the path to 1.0 meant that some things that were originally thought of as good ideas were thrown out, the path to 2.0 will rely on 8 years of using music21 to fix some things that really should've been done differently from the beginning. Having just spent 2 weeks making m21 compatible with Python 3, I will give my assurance that as few incompatibilities as possible will be introduced. Most of the major changes will be on the core -- so if you've never messed with Sites, SpannerStorage, etc., you'll be fine.
  • Problems with 5 quintuplets = .99999999 of a beat will disappear. Music21 2.X will store offsets and quarterLengths internally as rational numbers (actually a custom MixedNumeral class, so that the __repr__ is nicer...). All music21 objects will gain four properties: ".offsetRational, .duration.quarterLengthRational, .offsetFloat, and .duration.quarterLengthFloat" -- in music21 2.0, .offset and .duration.quarterLength will be aliases for offsetFloat and .duration.quarterLengthFloat -- so no changes will be needed to existing code. This will give a period of time (6 months?) to switch .offset either to .offsetFloat or .offsetRational. We'll have a tool to make the switch automatically. Then at a certain point, .offset will become an alias for .offsetRational. By music21 3.0 .offset will only support Rational numbers.
  • Streams will store the position of notes, etc. in them. Right now this is all stored in the Note object itself. There are some great reasons for doing it that way, but significant speedups will take place by shifting this.
  • inPlace will be False by default for all operations on Notes, Streams, etc. -- you can plan for the migration by explicitly setting inPlace for every call now.
  • Some changes to boundary cases in .getElementsByOffset will take place -- it will not change much, but for a few users this will be crucial.
  • NamedTuples and OrderedDicts will appear in a lot of places
that's all for now, but more examples to come soon. - Myke

Sunday, May 25, 2014

Python reimports

We've been working a lot recently on two kinds of optimization in music21: improving speed and then using some of the speed increases to add functionality and stability, so that new features can be added without slowing down the process. One of the places we found where we could make changes is in our over-cautious use of imports. 

While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:

Here I compare ten million calls to reference an object vs. doing the same while also importing a module that is already imported:

>>> from timeit import timeit as t # number = ten million; output in secs to 3 decimals
>>> t('x', setup='import weakref; x=5', number=10000000)
0.278
>>> t('import weakref; x', setup='import weakref; x=5', number=10000000)
7.810

So it's approximately two orders of magnitude slower than direct access alone.  Even with using the module and creating the weakref itself, the check-for-reimport timedominates five-fold over the creation of the weakref:

>>> t('weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
2.098
>>> t('import weakref; weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
9.823

for historical reasons (porting to systems without weakref, etc.) the “common.wrapWeakref” function of music21 (which does a try: except to see if a weakref could be made) did the import within the function.  Moving it outside the function sped it up considerably and made it only half the speed of calling weakref.ref(x) directly -- worth it for the extra safety--and only an order of magnitude slower than direct access to x itself:

before, with common.wrapWeakref doing a safety "import weakRef" call 
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
17.112

after, without it:
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
4.171

So this is the speedup in music21 that you'd find if you managed to grab the GitHub repository right now.  But we're planning on using the speedup to make things more functional.

As a practical consideration, one of the things that I’ve never been able to fix in music21 is the ability of elements embedded in a Stream to change their duration without telling their sites that things have changed for an element. There are expensive operations such as calculating that the length of a Measure, the last object, etc. which we cache as long as no .append(), .insert(), .remove() etc. are called.  But a Note inside the measure may have changed length so that the information in the cache is no longer accurate. I've been wanting to fix this for a while.

The problem is that the Note object itself has no idea that its duration has changed, because while the Note has a reference to the Duration, the Duration does not have a reference to Note -- it can't have a normal reference because this would create a circular reference (Note.duration = Duration; Duration.client = Note). With a circular reference, neither the Note nor the Duration will ever disappear even after they're not needed anymore, causing memory leaks. The obvious solution is to use a weak reference which behaves mostly like a normal reference but does not cause circular references. If the Note should disappear then the Duration.client weakref is not strong enough to keep the two objects alive.

With the speed increases, it should be possible to store a weakref on Duration and also Pitch to the object they’re attached to so that they can inform their “client” that they’ve changed.  The client can then inform its Sites (measures, etc.) that it has changed and clear the appropriate cache.  The extra overhead of creating the weakref ends up being only about 20% of object creation time; a small price to pay for the security of knowing that nothing can change and screw up the overall system:

>>> t('d=duration.Duration();', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
19.382
>>> t('d=duration.Duration(); pitchRef = common.wrapWeakref(x)', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
23.787

Expect to see more functionality like this in a forthcoming release of music21.

Friday, May 23, 2014

Speed, Speed, Speed, ... and news.

The newest GitHub repository contains a huge change to the under-the-hood processing of .getContextByClass() which is used in about a million places in music21.  It is the function that lets any note know what its current TimeSignature (and thus beatStrength, etc.) is, lets us figure out whether the sharp on a given note should be displayed or not given the current KeySignature, etc.  While we had tried to optimize the hell out of it, it’s been a major bottleneck in music21 for working with very large scores. We sped up parsing (at least the second time through) a lot the last commit. This was the time to speed up Context searching.  We now use a form of AVL tree implemented in a new Stream.timespans module — it’s not well-documented yet, so we’re only exposing it directly in one place, stream.asTimespans(recurse=True|False).  You don’t need to know about this unless you’re a developer; but I wanted to let you know that the results are extraordinary.

Here’s a code snippet that loads a score with three parts and 126 measures and many TimeSignatures and calculates the TimeSignature active for every note, clef, etc. and then prints the time it takes to run:

>>> c = corpus.parse('luca/gloria')
>>> def allContext(c):
...     for n in c.recurse():
...         k = n.getContextByClass('TimeSignature')
... 
>>> from time import time as t
>>> x = t(); allContext(c); print t() - x

with the 1.8 release of Music21:
42.9 seconds

with the newest version in GitHub:
0.70 seconds

There’s a lot of caching that happens along the way, so the second call is much faster:

second call with 1.8 release:
44.6 seconds ( = same within a margin of error)
with the newest version in GitHub if the score hasn’t changed:
0.18 seconds

You’ll see the speedup immensely in places where every combination of notes, etc. needs to be found.  For instance, finding all parallel fifths in a large score of 8 parts could have taken hours before. Now you’ll likely get results in under a few seconds.

I have not heard of any issues arising from the change in sorting from the last posting on April 26, so people who were afraid of updating can breath a bit more easily and update to the version of music21 at least as of yesterday. The newer version, like all GitHub commits, should be used with caution until we make a release.

Thanks to the NEH and the Digging into Data Challenge for supporting the creation of tools for working with much bigger scores than before.

In other news: 

Music21j — a Javascript implementation of music21’s core features — is running rapidly towards a public release.  See http://prolatio.blogspot.com/2014/05/web-pages-with-musical-examples.html for an example of usage.  We’ll be integrating it with the Python version over the summer.

Ian Quinn’s review of Music21 appeared in the Journal of the American Musicological society yesterday.  Prior to this issue, no non-book had ever been reviewed. It’s a great feeling to have people not on this list know about the software as well.

Oh, and MIT was foolhardy enough to give me tenure! Largely on the basis of music21.  If you’re an academic working on a large digital project, I still advice proceeding with caution, but know that it can be done.  Thanks everyone for support.

Sunday, February 16, 2014

Plotting pitches and durations continuously in music21

With music21, it's not hard to plot discrete data (pitches, durations, etc.) as continuous data. There isn't a built in tool for doing this, but since music21 is written in Python, it is easy to take advantage of the tools from matplotlib, numpy, and scipy to create "cubic bezier-curve splines" that show these points in an easily visualized format.

In music21 you can easily plot the position of notes as a piano roll:

from music21 import corpus

bach = corpus.parse('bwv66.6')

bach.plot('pianoroll')



which preserves pitch names, measure numbers, etc.  But the case we're asking for requires a plot more like this:




















The numbers at the left are midi numbers while the bottom is number of quarter notes from the beginning. Here's some code to help you achieve this:

import numpy as np
import matplotlib.pyplot as plt
from scipy import interpolate
from music21 import corpus

bach = corpus.parse('bwv66.6')

fig = plt.figure()

for i in range(len(bach.parts)):
    top = bach.parts[i].flat.notes
    y = [n.ps for n in top]
    x = [n.offset + n.quarterLength/2.0 for n in top]

    tck = interpolate.splrep(x,y,s=0)
    xnew = np.arange(0,max(x),0.01)
    ynew = interpolate.splev(xnew,tck,der=0)
    
    subplot = fig.add_subplot(111) # semi-arbitrary three-digit number
    subplot.plot(xnew,ynew)
plt.title('Bach motion')

plt.show()

With this sort of graph it's easy to isolate each voice (not much overlap of voices in this chorale) and to see the preponderance of similar motion among the Soprano, Alto, and Tenor, but lack of coordination with the Bass (which would create forbidden parallels if it coordinated).  More sophisticated examples with better labels are easily created by those with knowledge of matplotlib, but this simple demonstration will suffice to get things started.