Second, we tried condensing the musical patterns by only focusing on the starts of notes, and tried further compressing that using a byte pair encoding scheme. First, a chordwise approach that considered every combination of notes sounding at one time as an individual “chord”, and assigned a token to each chord.
We experimented with several different ways to encode the MIDI files into tokens suitable for this task. The transformer is trained on sequential data: given a set of notes, we ask it to predict the upcoming note. Additionally, we used the MAESTRO dataset. ClassicalArchives and BitMidi donated their large collections of MIDI files for this project, and we also found several collections online, including jazz, pop, African, Indian, and Arabic styles. We collected training data for MuseNet from many different sources.