Showing posts with label speed. Show all posts
Showing posts with label speed. Show all posts
Friday, September 11, 2015
Durations and DurationTuples
If you don’t do a lot with Tuplets (Triplets, etc.) and have never heard of a DurationUnit, this is a post to skim. :-)
Music21 has the ability to consider two different incompatible ideas of a note as a note.Note (or note.Rest, chord.Chord, etc.): a note on a page and a note as heard.
(1) When we write a note on a page, it has certain properties: stems, flags, beams, dots, ties, tuplets, etc. Consider a half note tied to an eighth note. On the page these are definitely two notes — we read them as two notes and then sound them together because of the tie.
(2) Now consider a single sound from a trumpet, at quarter note = 60, which lasts 2.5 seconds. This is a note. One note. But try to notate it. You’ll find that it takes two notes on the page to notate it: a half note and an eighth note, tied together. But when we hear this sound, it sounds like one note.
How can we represent the second type of note in music21 even though it can’t be notated as a single note (well, not normally; more on that below)? Simple, we create a single Note object, which has a single Duration object. That Duration object, however, has two elements in its “.components” list — a component corresponding to a half note, and a component corresponding to an eighth note. The note’s overall duration has a type of “complex”. When sending this Note out to MusicXML, Lilypond, etc., we split it into two notes (you guessed it, a half note and an eighth note). We then look at something called “linkage” to see that each of these notes should be connected by a Tie. (Rests have no linkage, for instance). When we send it out to MIDI, on the other hand, we can leave it alone as one Note, since MIDI doesn’t support ties, but does support arbitrary lengths of notes.
So, this has been the music21 model since alpha 1, and in general it remains the model.
What has changed in the newest GitHub repository and will change in the next 2.X release is what these “.components” are. Up until now they’ve been an object called DurationUnit — an amazingly flexible object created by Chris Ariza that can represent everything that a “simple” duration does; DurationUnits can have tuplets, dot-groups (an obscure medieval term), and just about everything else you can think of. They’re extremely cool, and I’m going to miss them.
In music21 2.X, the components of a complex duration are called DurationTuples. They are much simpler objects that only store three pieces of information: the type (‘whole’, ‘half’, ‘16th’, etc., plus ‘zero’), the number of dots, and the quarter length of that component. They don’t have tuplets, dot groups, etc. And they’re called Tuples because they derive from namedtuple which derives from the Python tuple object — in other words they are immutable. Once a DurationTuple is set, it can’t be changed. To change a note’s duration from whole to half, the DurationTuple needs to be deleted from .components and a new one created and inserted. So they do everything a DurationUnit does and much less.
So, why the change? Amazing flexibility and power, such as DurationUnits offer, comes at a price: speed. And complexity. The new DurationTuple makes creating the most common type of Duration with a single component much faster. The amount of time to do: “d = duration.Duration(1.0); d.type” has been cut by over half. This makes the creation of Notes about 20% faster than before (well, after the first check of durations), which is a pretty substantial improvement. And as Dmitri and others have noticed, there are a lot of ways to change the duration of a Note that can affect other things, such as Streams. This change reduces the complexity by making it so that only the Duration object itself can change its duration. Changes to underlying .components are impossible to make since DurationTuples are immutable.
The only practical effect that most users are likely to see is in the use of Tuplets. In the past tuplets lived on DurationUnits. This meant for instance that a Duration could represent a single duration of “half-note-tied-to-eight-note- triplet” (QL: = 2.333333333). Now, all the components of a Duration need to have the same tuplet or nested tuplets. So this duration can be represented as a (dotted-half-note + eighth note) triplet. Or it can be represented as (do the math, but only when you have time) a single whole-note in an 12:7 tuplet (because the latter is easier to determine for the computer to do, that’s what is done right now, but that could change).
The other practical change in Tuplets is that there are generally three aspects to a Tuplet (well, four, but we’ll keep it simpler): the number of “actual” notes (3 for a triplet), the number of “normal” notes (2), and the durationNormal (‘eighth’, no dots, for instance). In theory, once a tuplet was attached to a note, it became immutable (frozen), but because normal note was a DurationUnit, it was possible to create the tuplet and then to change the normal note type, or dots, or whatever. Now that durationNormal references a DurationTuple, it is immutable; so instead of this:
t = duration.Tuplet()
t.durationNormal.type = ‘quarter’
or do this:
t = duration.Tuplet(‘ durationNormal’ = ‘eighth')
or if you must:
t = duration.Tuplet()
t.durationNormal = duration.durationTupleFromType Dots("eighth", 0))
(this may go away, tuplets might become fully immutable in the future)
I hope these changes make using the system faster without much trouble.
Sunday, September 6, 2015
Speed improvements in music21
Music21 continues to get faster and faster. The average music21 internal operation takes about 1/8th as long on the same computer as it was when the system was first released. And the processing speed for normal operations is about twice as fast as it was back then.
Huh? Why only half the time? We'll, every time we get a speedup, we spend half of it on making the system more robust. So for instance, here's how long it took to make 10,000 notes in 2008 and 2013:
2008 Sep ~1.1
2013 Nov 0.777
Well, that was a pretty good improvement. But there were all sorts of problems with tuplets in music21 (especially from MIDI), where, for instance, five quintuplet 16ths could add up to 0.9999 quarter notes. So we switched to a Fraction module for safety, and we lost the speedups:
2014 Jul 1.126
2015 Jan1 1.154
That seemed too slow, so in January, we undertook a large number of tweaks, described below, and got it down to:
2015 Jan19 0.516
This gives a lot of room to play with to start making the system safer and more secure.
Deepcopy performance still leaves a lot to be desired. This will be the next focus.
This article will get updated as the timing improves (or is sacrificed for security).
>>> from timeit import timeit as t
========== Note
#1 Baseline
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
1.154
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
5.535
#2 Instantiation Tweaks
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.690
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
4.221
#3 Deepcopy of Durations
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.698
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.751
#4 Tweaks to Pitch
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.662
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.594
#5 Move imports out of frequently called objects
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.516
#6 2.0.10 -- 2015 Sep improvements to seeing up durations and sites:
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.400
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.323
========= GeneralNote
# 1 Baseline
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.754
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
3.489
# 2 Instantiation Tweaks
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.301
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
1.888
# 3 Deepcopy of Durations
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.311
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
1.389
For comparison:
>>> t('n=note.NotRest()', 'from music21 import base, note; import copy;', number=10000)
0.361
>>> t('n=note.Rest()', 'from music21 import base, note; import copy;', number=10000)
0.299
>> t('n=note.Unpitched()', 'from music21 import base, note; import copy;', number=10000)
0.368
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Unpitched()', number=10000)
2.059
Chords are fast...
>>> t('c=chord.Chord()', 'from music21 import chord; import copy;', number=10000)
0.301
But each additional note is 0.5s per 10000
>>> t('c=chord.Chord(["C"])', 'from music21 import chord; import copy;', number=10000)
0.882
>>> t('c=chord.Chord(["C","E","G"])', 'from music21 import chord; import copy;', number=10000)
1.985
Pitches:
>>> t('copy.deepcopy(p)', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
1.291
>>> t('p=pitch.Pitch("C")', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
0.217
after tweaks:
>>> t('p=pitch.Pitch("C")', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
0.189
Accidentals are .08s
>>> t('n=note.Note("C")', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.642
>>> t('n=note.Note("C#")', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.721
========= Music21Object
# 1 Baseline
>>> t('n=base.Music21Object()', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.113
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.912
One subclass away (__init__ does nothing but call super(__init__):
>>> t('n=base.ElementWrapper()', 'from music21 import base, note; import copy;', number=10000)
0.308
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.ElementWrapper()', number=10000)
1.423
# 2 Sites and Duration improvements (Sep 2015)
>>> t('n=base.Music21Object()', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.034
Deepcopy is MUCH slower than just creating a new one...
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.656
Huh? Why only half the time? We'll, every time we get a speedup, we spend half of it on making the system more robust. So for instance, here's how long it took to make 10,000 notes in 2008 and 2013:
2008 Sep ~1.1
2013 Nov 0.777
Well, that was a pretty good improvement. But there were all sorts of problems with tuplets in music21 (especially from MIDI), where, for instance, five quintuplet 16ths could add up to 0.9999 quarter notes. So we switched to a Fraction module for safety, and we lost the speedups:
2014 Jul 1.126
2015 Jan1 1.154
That seemed too slow, so in January, we undertook a large number of tweaks, described below, and got it down to:
2015 Jan19 0.516
We're still working, so in 2.0.10 when it is released you'll find that 10,000 notes now takes:
2015 Sep6 0.400
This gives a lot of room to play with to start making the system safer and more secure.
Deepcopy performance still leaves a lot to be desired. This will be the next focus.
This article will get updated as the timing improves (or is sacrificed for security).
>>> from timeit import timeit as t
========== Note
#1 Baseline
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
1.154
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
5.535
#2 Instantiation Tweaks
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.690
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
4.221
#3 Deepcopy of Durations
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.698
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.751
#4 Tweaks to Pitch
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.662
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.594
#5 Move imports out of frequently called objects
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.516
#6 2.0.10 -- 2015 Sep improvements to seeing up durations and sites:
>>> t('n=note.Note()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.400
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
3.323
========= GeneralNote
# 1 Baseline
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.754
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
3.489
# 2 Instantiation Tweaks
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.301
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
1.888
# 3 Deepcopy of Durations
>>> t('n=note.GeneralNote()', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.311
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.GeneralNote()', number=10000)
1.389
For comparison:
>>> t('n=note.NotRest()', 'from music21 import base, note; import copy;', number=10000)
0.361
>>> t('n=note.Rest()', 'from music21 import base, note; import copy;', number=10000)
0.299
>> t('n=note.Unpitched()', 'from music21 import base, note; import copy;', number=10000)
0.368
>>> t('copy.deepcopy(n)', 'from music21 import stream, note; import copy; n=note.Unpitched()', number=10000)
2.059
Chords are fast...
>>> t('c=chord.Chord()', 'from music21 import chord; import copy;', number=10000)
0.301
But each additional note is 0.5s per 10000
>>> t('c=chord.Chord(["C"])', 'from music21 import chord; import copy;', number=10000)
0.882
>>> t('c=chord.Chord(["C","E","G"])', 'from music21 import chord; import copy;', number=10000)
1.985
Pitches:
>>> t('copy.deepcopy(p)', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
1.291
>>> t('p=pitch.Pitch("C")', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
0.217
after tweaks:
>>> t('p=pitch.Pitch("C")', 'from music21 import pitch; import copy; p=pitch.Pitch("C")', number=10000)
0.189
Accidentals are .08s
>>> t('n=note.Note("C")', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.642
>>> t('n=note.Note("C#")', 'from music21 import stream, note; import copy; n=note.Note()', number=10000)
0.721
========= Music21Object
# 1 Baseline
>>> t('n=base.Music21Object()', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.113
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.912
One subclass away (__init__ does nothing but call super(__init__):
>>> t('n=base.ElementWrapper()', 'from music21 import base, note; import copy;', number=10000)
0.308
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.ElementWrapper()', number=10000)
1.423
# 2 Sites and Duration improvements (Sep 2015)
>>> t('n=base.Music21Object()', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.034
Deepcopy is MUCH slower than just creating a new one...
>>> t('copy.deepcopy(n)', 'from music21 import base, note; import copy; n=base.Music21Object()', number=10000)
0.656
Tuesday, June 16, 2015
Parallel Computing with music21
First we start the cluster system with
ipcluster start
, which on this six-core Mac Pro gives me 12 threads. Then I'll start iPython notebook with ipython notebook
.from __future__ import print_function
from IPython import parallel
clients = parallel.Client()
clients.block = True
print(clients.ids)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
Now I'll create a
view
that can balance the load automatically.view = clients.load_balanced_view()
Next let me get a list of all the Bach chorales' filenames inside
music21
:from music21 import *
chorales = list(corpus.chorales.Iterator(returnType = 'filename'))
chorales[0:5]
['bach/bwv269', 'bach/bwv347', 'bach/bwv153.1', 'bach/bwv86.6', 'bach/bwv267']
Now, I can use the
view.map
function to automatically run a function, in this case corpus.parse
on each element of the chorales
list.view.map(corpus.parse, chorales[0:4])
[<music21.stream.Score 4467044944>, <music21.stream.Score 4467216976>, <music21.stream.Score 4465996368>, <music21.stream.Score 4465734224>]
Note though that the overhead of returning a complete music21 Score from each processor is high enough that we don't get much of a savings, if any, from parsing on each core and returning the Score object:
import time
t = time.time()
x = view.map(corpus.parse, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [corpus.parse(y) for y in chorales[0:30]]
print("Single processed", time.time() - t)
Multiprocessed 1.7093911171 Single processed 2.04412794113
But let's instead just return the length of each chorale, so we don't need to pass much information back to the main server. First we need to import music21 on each client:
clients[:].execute('from music21 import *')
<AsyncResult: finished>
Now, we'll define a function that parses the chorale and returns how many pitches are in the Chorale:
def parseLength(fn):
c = corpus.parse(fn)
return len(c.flat.pitches)
Now we're going to see a big difference:
t = time.time()
x = view.map(parseLength, chorales[0:30])
print("Multiprocessed", time.time() - t)
t = time.time()
x = [parseLength(y) for y in chorales[0:30]]
print("Multiprocessed", time.time() - t)
Multiprocessed 0.59440112114 Multiprocessed 2.97019314766
In fact, we can do the entire chorale dataset in about the same amount of time as it takes to do just the first 30 on single core:
t = time.time()
x = view.map(parseLength, chorales)
print(len(chorales), 'chorales in', time.time() - t, 'seconds')
347 chorales in 5.31799721718 seconds
I hope that this example gives some sense of what might be done w/ a cluster situation in
music21
. If you can't afford your own Mac Pro or you need even more power, it's possible to rent an hour of cluster computing time at Amazon Web Services for just a few bucks.Sunday, January 11, 2015
Music21 v.2.0.0 (alpha) Released
We're happy to announce that the first public alpha of music21 v.2 has been released!
Version 2 is the first version of music21 since v.1 to make substantial changes in the code base that introduce backwards incompatibilities in order to make going forward faster and smoother. It doesn't change anything super fundamental à la Python 3's print function, so most code should still run fine, but definitely test in a separate environment before upgrading on any code you have that needs to run without problems. The system is still changing and more backward-incompatible changes could be included until v.2.1.
We have had 420 commits since the last release, so there is a lot that is new!
Substantial changes include:
Version 2 is the first version of music21 since v.1 to make substantial changes in the code base that introduce backwards incompatibilities in order to make going forward faster and smoother. It doesn't change anything super fundamental à la Python 3's print function, so most code should still run fine, but definitely test in a separate environment before upgrading on any code you have that needs to run without problems. The system is still changing and more backward-incompatible changes could be included until v.2.1.
We have had 420 commits since the last release, so there is a lot that is new!
Substantial changes include:
- Offsets and quarterLengths are now stored internally as Fractions if they cannot be exactly represented as floating point numbers. A lot of work went into making this conversion extremely fast; you probably won't ever notice the speed difference, but you can now be sure that seven septuplets will add up to exactly a half note. For instance:
- >>> n = note.Note()>>> n.duration.appendTuplet(duration.Tuplet(3,2))>>> n.fullName'C in octave 4 Quarter Triplet (2/3 QL) Note'>>> n.quarterLengthFraction(2, 3)>>> n.quarterLengthFloat # if you need it...0.6666666666666666
- Converter structure has been overhauled for more easily adding new converters in the future. If you've wanted to write a converter or already have one for a format not supported but have been daunted by how to include it in music21 now is a great time to do it. Speaking of which...
- MEI format is supported for import (thanks to Chris Antila and the ELVIS team at McGill university for this great enhancement)
- Python 2.6 is no longer supported. All tests and demos pass and run on Python 2.7, 3.3, and 3.4. (3.2 and older are not supported)
- FreezeThaw on Streams works much better and caching loaded scores works great (some of this was included in 1.9, so definitely upgrade at least to that.
- Much improved Vexflow output using music21j, Javascript/Vexflow rendering engine. Was in 1.9, but improved here.
- Lots of places that used to return anonymous tuples, now return namedtuples for more easily understanding what the return values mean.
- Integrated Travis-CI testing and Coverage tests will keep many more bugs out of music21 in the future.
- Many small problems with Sorting and stream handling fixed.
- Corpus changed: for various licensing reasons, v.2.0 does not include the scores from the MuseData corpus anymore. This change mostly affects Haydn string quartets and Handel's Messiah. However, new replacement scores are being included and 2.1 will have as many great works as before. The MuseData scores are still available online. MuseData is now a deprecated format and no further testing on it will be conducted; only bug fixes that are easily implemented will be accepted.
- music21 is now available under the BSD license in addition to LGPL!
We will try to stick to something close to the semantic versioning model in the future once v.2.1 is released. In other words, after 2.1, we'll try very hard not to break anything that worked in v.2.1 until v. 3.0 is released. This will probably mean that the version numbers are going to creep up faster in the future.
Still todo before v.2.1 is a major change in how elements are stored in Streams. Stay tuned if you care about performance tweaks etc., otherwise ignore it -- we'll keep the interface the same so you might not notice anything except speed improvements.
Smaller backward-incompatible changes include:
- Stream __repr__ now includes a pointer rather than a number if .id is not set. This change will make filtering out doctests easier in the future.
- TinyNotation no longer allows for a two-element tuple where the second element is the time signature. Replace: ("C4 D E", "3/4") with ("tinynotation: 3/4 C4 D E")
- Obscure calls in SpannerBundle have been removed: spannerBundle.getByClassComplete etc.
- Convenience classes: WholeNote, HalfNote, etc. have been removed. Replace with Note(type='whole') etc.
- Old convenience classes for moving from Perl to Python (DefaultHash, defList) have been removed or renamed (defaultlist)
- Articulations are marked as equal if they are of the same class, regardless of other attributes.
- common.almostLessThan, etc. are gone; were only needed for float rounding, and that problem is fixed.
- duration.aggregateTupletRatio is now aggregateTupletMultiplier, which is more correct.
- scala.ScalaStorage renamed scala.ScalaData
- common.nearestMultiplier now returns the signed difference.
- layout -- names changed for consistency (some layout objects had "startMeasure" and some "measureStart" - now they're all the same); now all use namedtuples.
- rarely used functions in Sites, base, Duration, SpannerStorage, VariantStorage, have been removed or renamed. I'd be surprised if these affect anyone outside the development team.
Improvements and bug fixes:
- common.mixedNumeral for working with large rational numbers
- common.opFrac() optionally converts a float, int, or Fraction to a float, int, or Fraction depending on what is necessary to get exact representations. This is a highly optimized function responsible for music21 working with Fractions at about 10x the speed of normal Fraction work.
- Rest objects get a .lineShift attribute for layout.
- staffDetails, printObject MXL had a bug, writing out "True" instead of "yes"
- staffLines is now an int not float. (duh!)
- better checks for reused pointers.
- lots of private methods are now ready for public hacking!
- Lyric.rawText() will return "hel-" instead of "hel" for "hel-lo".
Wednesday, June 25, 2014
Music21 v1.9 released
We are proud to release
music21
v1.9.3, the latest and last release in the 1.x series.
There have been 147 commits in the two months since v1.8; here are some of the highlights:
- MUCH faster .getContextByClass (KUDOS to Josiah Oberholtzer for this). Even if you don't use .getContextByClass in your own code, you're definitely calling something that calls it. This method figures out where the most recent key signature, time signature, clef, etc. is for any given object, finds relationships between notes in different voices, etc. For analysis of medium-sized scores (say, 3 voices, 100 measures) expect a 10-fold speedup. For larger pieces, the speedup can be over 100-fold.
- A new stream/timespans module that makes the previous speedup possible by representing m21 Streams as AVL trees -- it's used in a few places (needs more docs), forthcoming releases will use it in a lot more places
- Python3 support (3.3 and later). The entire test/multiprocessTest.py suite passes on Python 3. N.B. to contributors -- from now on all contributions need to pass tests on both Python 2.7 and 3.3 and later. Negative -- in the past you could have made music21 run on unsupported older systems (2.6 and sometimes 2.5); now
from music21 import *
will fail on pre-2.7. 2.7 has been a requirement since Music21 1.7. Fewer than 30% of Macs still in use are running Lion or earlier and thus will need to update to 2.7. This version of music21 runs about 25% faster on Python 3 than Python 2, but otherwise no new features of Python3 are used. Python 2.7 will be supported throughout the Music21 2.x cycle so no panicking -- it'll be years (if ever) before Python 3.3+ is a requirement. - Improvements to reductions of scores. And to analyzing voiceleading motion (some of this is backwards incompatible)
- Better, faster, and more consistent sorting of elements in a Stream
- Changes to the derivations module that I doubt anyone else was using anyhow...
- Removed obsolete files.
- Stafflines import and export from musicxml (thanks Metalmike!)
- Complete refactoring of converter.py to make it easier for users to write their own Subconverter formats (that can eventually be put into the system)
- Complete serialization of Streams via a new version of jsonpickle. This has big implications down the line; for now it affects...
- Vexflow output is much improved (unless you were counting on Voices; in which case do not upgrade) using the alpha version of music21j -- Javascript reimplementation of music21's core features.
- IPython improvements, allowing for robust and persistent communication between Javascript and Python. This will eventually (once I document it...) let you use the web browser as a UI for music21 python apps including live updating of music notation. It's too complex for most users right now, but I can attest that this will be one of the biggest perks of the 2.x development.
The usual bug fixes, documentation improvements and fixes, etc. are implemented. Thanks to MIT, the NEH, and the Seaver Institute for funding the project. (and to MIT for tenuring me in part on the basis of music21). This is the last release that Josiah Oberholtzer was lead programmer for; his considerable talents will still be on display in Abjad and many other projects he works on, and the implications of the new storage system he has developed will continue to pay off for years.
What's next?
Starting work on music21 2.0 today. That release will have some backwards incompatible changes that developers will need to deal with -- just as the path to 1.0 meant that some things that were originally thought of as good ideas were thrown out, the path to 2.0 will rely on 8 years of using music21 to fix some things that really should've been done differently from the beginning. Having just spent 2 weeks making m21 compatible with Python 3, I will give my assurance that as few incompatibilities as possible will be introduced. Most of the major changes will be on the core -- so if you've never messed with Sites, SpannerStorage, etc., you'll be fine.
- Problems with 5 quintuplets = .99999999 of a beat will disappear. Music21 2.X will store offsets and quarterLengths internally as rational numbers (actually a custom MixedNumeral class, so that the __repr__ is nicer...). All music21 objects will gain four properties: ".offsetRational, .duration.quarterLengthRational, .offsetFloat, and .duration.quarterLengthFloat" -- in music21 2.0, .offset and .duration.quarterLength will be aliases for offsetFloat and .duration.quarterLengthFloat -- so no changes will be needed to existing code. This will give a period of time (6 months?) to switch .offset either to .offsetFloat or .offsetRational. We'll have a tool to make the switch automatically. Then at a certain point, .offset will become an alias for .offsetRational. By music21 3.0 .offset will only support Rational numbers.
- Streams will store the position of notes, etc. in them. Right now this is all stored in the Note object itself. There are some great reasons for doing it that way, but significant speedups will take place by shifting this.
- inPlace will be False by default for all operations on Notes, Streams, etc. -- you can plan for the migration by explicitly setting inPlace for every call now.
- Some changes to boundary cases in .getElementsByOffset will take place -- it will not change much, but for a few users this will be crucial.
- NamedTuples and OrderedDicts will appear in a lot of places
that's all for now, but more examples to come soon. - Myke
Sunday, May 25, 2014
Python reimports
We've been working a lot recently on two kinds of optimization in music21: improving speed and then using some of the speed increases to add functionality and stability, so that new features can be added without slowing down the process. One of the places we found where we could make changes is in our over-cautious use of imports.
While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:
While everyone says that in Python you can import a module inside a function without it going through the overhead of actually reimporting, there is some real overhead still, especially if the function is called a lot of times:
Here I compare ten million calls to reference an object vs. doing the same while also importing a module that is already imported:
>>> from timeit import timeit as t # number = ten million; output in secs to 3 decimals
>>> t('x', setup='import weakref; x=5', number=10000000)
0.278
>>> t('import weakref; x', setup='import weakref; x=5', number=10000000)
7.810
So it's approximately two orders of magnitude slower than direct access alone. Even with using the module and creating the weakref itself, the check-for-reimport timedominates five-fold over the creation of the weakref:
>>> t('weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
2.098
>>> t('import weakref; weakref.ref(x)', setup='import weakref; from music21 import pitch; x=pitch.Pitch()', number=10000000)
9.823
for historical reasons (porting to systems without weakref, etc.) the “common.wrapWeakref” function of music21 (which does a try: except to see if a weakref could be made) did the import within the function. Moving it outside the function sped it up considerably and made it only half the speed of calling weakref.ref(x) directly -- worth it for the extra safety--and only an order of magnitude slower than direct access to x itself:
before, with common.wrapWeakref doing a safety "import weakRef" call
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
17.112
after, without it:
>>> t('common.wrapWeakref(x)', setup='from music21 import common,pitch; x=pitch.Pitch()', number=10000000)
4.171
So this is the speedup in music21 that you'd find if you managed to grab the GitHub repository right now. But we're planning on using the speedup to make things more functional.
As a practical consideration, one of the things that I’ve never been able to fix in music21 is the ability of elements embedded in a Stream to change their duration without telling their sites that things have changed for an element. There are expensive operations such as calculating that the length of a Measure, the last object, etc. which we cache as long as no .append(), .insert(), .remove() etc. are called. But a Note inside the measure may have changed length so that the information in the cache is no longer accurate. I've been wanting to fix this for a while.
The problem is that the Note object itself has no idea that its duration has changed, because while the Note has a reference to the Duration, the Duration does not have a reference to Note -- it can't have a normal reference because this would create a circular reference (Note.duration = Duration; Duration.client = Note). With a circular reference, neither the Note nor the Duration will ever disappear even after they're not needed anymore, causing memory leaks. The obvious solution is to use a weak reference which behaves mostly like a normal reference but does not cause circular references. If the Note should disappear then the Duration.client weakref is not strong enough to keep the two objects alive.
With the speed increases, it should be possible to store a weakref on Duration and also Pitch to the object they’re attached to so that they can inform their “client” that they’ve changed. The client can then inform its Sites (measures, etc.) that it has changed and clear the appropriate cache. The extra overhead of creating the weakref ends up being only about 20% of object creation time; a small price to pay for the security of knowing that nothing can change and screw up the overall system:
>>> t('d=duration.Duration();', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
19.382
>>> t('d=duration.Duration(); pitchRef = common.wrapWeakref(x)', setup='from music21 import common,duration,pitch; x=pitch.Pitch()', number=10000000)
23.787
Expect to see more functionality like this in a forthcoming release of music21.
Friday, May 23, 2014
Speed, Speed, Speed, ... and news.
The newest GitHub repository contains a huge change to the under-the-hood processing of .getContextByClass() which is used in about a million places in music21. It is the function that lets any note know what its current TimeSignature (and thus beatStrength, etc.) is, lets us figure out whether the sharp on a given note should be displayed or not given the current KeySignature, etc. While we had tried to optimize the hell out of it, it’s been a major bottleneck in music21 for working with very large scores. We sped up parsing (at least the second time through) a lot the last commit. This was the time to speed up Context searching. We now use a form of AVL tree implemented in a new Stream.timespans module — it’s not well-documented yet, so we’re only exposing it directly in one place, stream.asTimespans(recurse=True|False). You don’t need to know about this unless you’re a developer; but I wanted to let you know that the results are extraordinary.
Here’s a code snippet that loads a score with three parts and 126 measures and many TimeSignatures and calculates the TimeSignature active for every note, clef, etc. and then prints the time it takes to run:
>>> c = corpus.parse('luca/gloria')
>>> def allContext(c):
... for n in c.recurse():
... k = n.getContextByClass('TimeSignature')
...
>>> from time import time as t
>>> x = t(); allContext(c); print t() - x
with the 1.8 release of Music21:
42.9 seconds
with the newest version in GitHub:
0.70 seconds
There’s a lot of caching that happens along the way, so the second call is much faster:
second call with 1.8 release:
44.6 seconds ( = same within a margin of error)
with the newest version in GitHub if the score hasn’t changed:
0.18 seconds
You’ll see the speedup immensely in places where every combination of notes, etc. needs to be found. For instance, finding all parallel fifths in a large score of 8 parts could have taken hours before. Now you’ll likely get results in under a few seconds.
I have not heard of any issues arising from the change in sorting from the last posting on April 26, so people who were afraid of updating can breath a bit more easily and update to the version of music21 at least as of yesterday. The newer version, like all GitHub commits, should be used with caution until we make a release.
Thanks to the NEH and the Digging into Data Challenge for supporting the creation of tools for working with much bigger scores than before.
In other news:
Music21j — a Javascript implementation of music21’s core features — is running rapidly towards a public release. See http://prolatio.blogspot.com/2014/05/web-pages-with-musical-examples.html for an example of usage. We’ll be integrating it with the Python version over the summer.
Ian Quinn’s review of Music21 appeared in the Journal of the American Musicological society yesterday. Prior to this issue, no non-book had ever been reviewed. It’s a great feeling to have people not on this list know about the software as well.
Oh, and MIT was foolhardy enough to give me tenure! Largely on the basis of music21. If you’re an academic working on a large digital project, I still advice proceeding with caution, but know that it can be done. Thanks everyone for support.
Wednesday, May 9, 2012
Music21 speedups in Chordify and with PyPy
The biggest recurring complaint about using music21 is the speed at working with large scores. I wanted to point out two resources that are available in the latest SVN releases. Both will appear in the next public release, but for some people you might want to try it already:
- Some parts of chordify move from O(m^2) time to O(m) where m is the number of measures in a part – for very large scores, this will mean a huge speedup. (usually noticeable after about 100 measures)
- Music21 works with the rewrite of python called PyPy – which is a sped-up version of python 2.7. The only parts that don’t work are plotting algorithms, since matplotlib and numpy aren’t yet ported to pypy. Most operations will see about a halving of the speed – the exception is in parsing files a second and subsequent time (however, the first time is quite a bit faster).
Work on running music21 on multiple systems is proceeding, so we should be able to demonstrate that soon.
Thanks for the patience. My motto is “make it work first, make it faster later.” which I sometimes translate as, “we’ve waited 200 years to have a tool that can analyze thousands of works at once; we can wait another 20 minutes.” but that doesn’t mean we’re not working all the time to make music21 run as fast as we can.
Subscribe to:
Posts (Atom)