The MIDI File Format

The MIDI File Format

The Standard MIDI File (SMF) is a file format specifically designed to store the data that a sequencer records and plays (whether that sequencer be software or hardware based).

This format stores the standard MIDI messages (ie, status bytes with appropriate data bytes) plus a time-stamp for each message (ie, a series of bytes that represent how many clock pulses to wait before “playing” the event). The format allows saving information about tempo, pulses per quarter note resolution (or resolution expressed in divisions per second, ie SMPTE setting), time and key signatures, and names of tracks and patterns. It can store multiple patterns and tracks so that any application can preserve these structures when loading the file.

NOTE: A track usually is analogous to one musical part, such as a Trumpet part. A pattern would be analogous to all of the musical parts (ie, Trumpet, Drums, Piano, etc) for a song, or excerpt of a song.

The format was designed to be generic so that any sequencer could read or write such a file without losing the most important data, and flexible enough for a particular application to store its own proprietary, “extra” data in such a way that another application won’t be confused when loading the file and can safely ignore this extra stuff that it doesn’t need. Think of the MIDI file format as a musical version of an ASCII text file (except that the MIDI file contains binary data too), and the various sequencer programs as text editors all capable of reading that file. But, unlike ASCII, MIDI file format saves data in chunks (ie, groups of bytes preceded by an ID and size) which can be parsed, loaded, skipped, etc. Therefore, it can be easily extended to include a program’s proprietary info. For example, maybe a program wants to save a “flag byte” that indicates whether the user has turned on an audible metronome click. The program can put this flag byte into a MIDI file in such a way that another application can skip this byte without having to understand what that byte is for. In the future, the MIDI file format can also be extended to include new “official” chunks that all sequencer programs may elect to load and use. This can be done without making old data files obsolete (ie, the format is designed to be extensible in a backwardly compatible way).

What’s a Chunk?

Data is always saved within a chunk. There can be many chunks inside of a MIDI file. Each chunk can be a different size (ie, where size refers to how many bytes are contained in the chunk). The data bytes in a chunk are related in some way. A chunk is simply a group of related bytes.

Each chunk begins with a 4 character (ie, 4 ascii bytes) ID which tells what “type” of chunk this is. The next 4 bytes (all bytes are comprised of 8 bits) form a 32-bit length (ie, size) of the chunk. All chunks must begin with these two fields (ie, 8 bytes), which are referred to as the chunk header.

NOTE: The Length does not include the 8 byte chunk header. It simply tells you how many bytes of data are in the chunk following this header.

Here’s an example header (with bytes expressed in hex):

4D 54 68 64 00 00 00 06

Note that the first 4 bytes make up the ascii ID of MThd (ie, the first four bytes are the ascii values for ‘M’, ‘T’, ‘h’, and ‘d’). The next 4 bytes tell us that there should be 6 more data bytes in the chunk (and after that we should find the next chunk header or the end of the file).

In fact, all MIDI files begin with this MThd header (and that’s how you know that it’s a MIDI file).

NOTE: The 4 bytes that make up the Length are stored in Motorola 68000 byte order, not Intel reverse byte order (ie, the 06 is the fourth byte instead of the first of the four). All multiple byte fields in a MIDI file follow this standard, often called “Big Endian” form.

MThd Chunk

The MThd header has an ID of MThd, and a Length of 6.

Let’s examine the 6 data bytes (which follow the above, 8 byte header) in an MThd chunk.

The first two data bytes tell the Format (which I prefer to call Type). There are actually 3 different types (ie, formats) of MIDI files. A type of 0 means that the file contains one single track containing midi data on possibly all 16 midi channels. If your sequencer sorts/stores all of its midi data in one single block of memory with the data in the order that it’s “played”, then it should read/write this type. A type of 1 means that the file contains one or more simultaneous (ie, all start from an assumed time of 0) tracks, perhaps each on a single midi channel. Together, all of these tracks are considered one sequence or pattern. If your sequencer separates its midi data (i.e. tracks) into different blocks of memory but plays them back simultaneously (ie, as one “pattern”), it will read/write this type. A type of 2 means that the file contains one or more sequentially independant single-track patterns. f your sequencer separates its midi data into different blocks of memory, but plays only one block at a time (ie, each block is considered a different “excerpt” or “song”), then it will read/write this type.

The next 2 bytes tell how many tracks are stored in the file, NumTracks. Of course, for format type 0, this is always 1. For the other 2 types, there can be numerous tracks.

The last two bytes indicate how many Pulses (i.e. clocks) Per Quarter Note (abbreviated as PPQN) resolution the time-stamps are based upon, Division. For example, if your sequencer has 96 ppqn, this field would be (in hex):

00 60

Alternately, if the first byte of Division is negative, then this represents the division of a second that the time-stamps are based upon. The first byte will be -24, -25, -29, or -30, corresponding to the 4 SMPTE standards representing frames per second. The second byte (a positive number) is the resolution within a frame (ie, subframe). Typical values may be 4 (MIDI Time Code), 8, 10, 80 (SMPTE bit resolution), or 100.

You can specify millisecond-based timing by the data bytes of -25 and 40 subframes.

Here’s an example of a complete MThd chunk (with header):

4D 54 68 64     MThd ID
00 00 00 06     Length of the MThd chunk is always 6.
00 01           The Format type is 1.
00 02           There are 2 MTrk chunks in this file.
E7 28           Each increment of delta-time represents a millisecond.

MTrk Chunk

After the MThd chunk, you should find an MTrk chunk, as this is the only other currently defined chunk. (If you find some other chunk ID, it must be proprietary to some other program, so skip it by ignoring the following data bytes indicated by the chunk’s Length).

An MTrk chunk contains all of the midi data (with timing bytes), plus optional non-midi data for one track. Obviously, you should encounter as many MTrk chunks in the file as the MThd chunk’s NumTracks field indicated.

The MTrk header begins with the ID of MTrk, followed by the Length (ie, number of data bytes to read for this track). The Length will likely be different for each track. (After all, a track containing the violin part for a Bach concerto will likely contain more data than a track containing a simple 2 bar drum beat).

Variable Length Quantities — Event’s Time

Think of a track in the MIDI file in the same way that you normally think of a track in a sequencer. A sequencer track contains a series of events. For example, the first event in the track may be to sound a middle C note. The second event may be to sound the E above middle C. These two events may both happen at the same time. The third event may be to release the middle C note. This event may happen a few musical beats after the first two events (ie, the middle C note is held down for a few musical beats). Each event has a “time” when it must occur, and the events are arranged within a “chunk” of memory in the order that they occur.

In a MIDI file, an event’s “time” precedes the data bytes that make up that event itself. In other words, the bytes that make up the event’s time-stamp come first. A given event’s time-stamp is referenced from the previous event. For example, if the first event occurs 4 clocks after the start of play, then its “delta-time” is 04. If the next event occurs simultaneously with that first event, its time is 00. So, a delta-time is the duration (in clocks) between an event and the preceding event.

NOTE: Since all tracks start with an assumed time of 0, the first event’s delta-time is referenced from 0.

A delta-time is stored as a series of bytes which is called a variable length quantity. Only the first 7 bits of each byte is significant (right-justified; sort of like an ASCII byte). So, if you have a 32-bit delta-time, you have to unpack it into a series of 7-bit bytes (ie, as if you were going to transmit it over midi in a SYSEX message). Of course, you will have a variable number of bytes depending upon your delta-time. To indicate which is the last byte of the series, you leave bit #7 clear. In all of the preceding bytes, you set bit #7. So, if a delta-time is between 0-127, it can be represented as one byte. The largest delta-time allowed is 0FFFFFFF, which translates to 4 bytes variable length. Here are examples of delta-times as 32-bit values, and the variable length quantities that they translate to:

NUMBER        VARIABLE QUANTITY
 00000000              00
 00000040              40
 0000007F              7F
 00000080             81 00
 00002000             C0 00
 00003FFF             FF 7F
 00004000           81 80 00
 00100000           C0 80 00
 001FFFFF           FF FF 7F
 00200000          81 80 80 00
 08000000          C0 80 80 00
 0FFFFFFF          FF FF FF 7F

Here’s some C routines to read and write variable length quantities such as delta-times. With WriteVarLen(), you pass a 32-bit value (ie, unsigned long) and it spits out the correct series of bytes to a file. ReadVarLen() reads a series of bytes from a file until it reaches the last byte of a variable length quantity, and returns a 32-bit value.

void WriteVarLen(register unsigned long value)
{
   register unsigned long buffer;
   buffer = value & 0x7F;

   while ( (value >>= 7) )
   {
     buffer <<= 8;
     buffer |= ((value & 0x7F) | 0x80);
   }

   while (TRUE)
   {
      putc(buffer,outfile);
      if (buffer & 0x80)
          buffer >>= 8;
      else
          break;
   }
}


doubleword ReadVarLen()
{
    register doubleword value;
    register byte c;

    if ( (value = getc(infile)) & 0x80 )
    {
       value &= 0x7F;
       do
       {
         value = (value << 7) + ((c = getc(infile)) & 0x7F);
       } while (c & 0x80);
    }

    return(value);
}

NOTE: The concept of variable length quantities (ie, breaking up a large value into a series of bytes) is used with other fields in a MIDI file besides delta-times, as you’ll see later.


Events

The first (1 to 4) byte(s) in an MTrk will be the first event’s delta-time as a variable length quantity. The next data byte is actually the first byte of that event itself. I’ll refer to this as the event’s Status. For MIDI events, this will be the actual MIDI Status byte (or the first midi data byte if running status). For example, if the byte is hex 90, then this event is aNote-On upon midi channel 0. If for example, the byte was hex 23, you’d have to recall the previous event’s status (ie, midi running status). Obviously, the first MIDI event in the MTrk must have a status byte. After a midi status byte comes its 1 or 2 data bytes (depending upon the status – some MIDI messages only have 1 subsequent data byte). After that you’ll find the next event’s delta time (as a variable quantity) and start the process of reading that next event.

SYSEX (system exclusive) events (status = F0) are a special case because a SYSEX event can be any length. After the F0 status (which is always stored — no running status here), you’ll find yet another series of variable length bytes. Combine them with ReadVarLen() and you’ll come up with a 32-bit value that tells you how many more bytes follow which make up this SYSEX event. This length doesn’t include the F0 status.

For example, consider the following SYSEX MIDI message:

F0 7F 7F 04 01 7F 7F F7

This would be stored in a MIDI file as the following series of bytes (minus the delta-time bytes which would precede it):

F0 07 7F 7F 04 01 7F 7F F7

Some midi units send a system exclusive message as a series of small “packets” (with a time delay inbetween transmission of each packet). The first packet begins with an F0, but it doesn’t end with an F7. The subsequent packets don’t start with an F0 nor end with F7. The last packet doesn’t start with an F0, but does end with the F7. So, between the first packet’s opening F0 and the last packet’s closing F7, there’s 1 SYSEX message there. (Note: only extremely poor designs, such as the crap marketed by Casio exhibit these aberrations). Of course, since a delay is needed inbetween each packet, you need to store each packet as a separate event with its own time in the MTrk. Also, you need some way of knowing which events shouldn’t begin with an F0 (ie, all of them except the first packet). So, the MIDI file spec redefines a midi status of F7 (normally used as an end mark for SYSEX packets) as a way to indicate an event that doesn’t begin with F0. If such an event follows an F0 event, then it’s assumed that the F7 event is the second “packet” of a series. In this context, it’s referred to as a SYSEX CONTINUATION event. Just like the F0 type of event, it has a variable length followed by data bytes. On the other hand, the F7 event could be used to store MIDI REALTIME or MIDI COMMON messages. In this case, after the variable length bytes, you should expect to find a MIDI Status byte of F1, F2, F3, F6, F8, FA, FB, FC, or FE. (Note that you wouldn’t find any such bytes inside of a SYSEX CONTINUATION event). When used in this manner, the F7 event is referred to as an ESCAPED event.

A status of FF is reserved to indicate a special non-MIDI event. (Note that FF is used in MIDI to mean “reset”, so it wouldn’t be all that useful to store in a data file. Therefore, the MIDI file spec arbitrarily redefines the use of this status). After the FF status byte is another byte that tells you what Type of non-MIDI event it is. It’s sort of like a second status byte. Then after this byte is another byte(s — a variable length quantity again) that tells how many more data bytes follow in this event (ie, its Length). This Length doesn’t include the FF, Type byte, nor the Length byte. These special, non-MIDI events are called Meta-Events, and most are optional unless otherwise noted. What follows are some defined Meta-Events (including the FF Status and Length). Note that unless otherwise mentioned, more than one of these events can be placed in an Mtrk (even the same Meta-Event) at any delta-time. (Just like all midi events, Meta-Events have a delta-time from the previous event regardless of what type of event that may be. So, you can freely intermix MIDI and Meta events).

Sequence Number

FF 00 02 ss ss

This optional event which must occur at the beginning of a MTrk (ie, before any non-zero delta-times and before any midi events) specifies the number of a sequence. The two data bytes ss ss, are that number which corresponds to the MIDI Cue message. In a format 2 MIDI file, this number identifies each “pattern” (ie, Mtrk) so that a “song” sequence can use the MIDI Cue message to refer to patterns. If the ss ss numbers are omitted (ie, Length byte = 0 instead of 2), then the MTrk’s location in the file is used (ie, the first MTrk chunk is the first pattern). In format 0 or 1, which contain only one “pattern” (even though format 1 contains several MTrks), this event is placed in only the first MTrk. So, a group of format 1 files with different sequence numbers can comprise a “song collection”.

There can be only one of these events per MTrk chunk in a Format 2. There can be only one of these events in a Format 0 or 1, and it must be in the first MTrk.

Text

FF 01 len text

Any amount of text (amount of bytes = len) for any purpose. It’s best to put this event at the beginning of an MTrk. Although this text could be used for any purpose, there are other text-based Meta-Events for such things as orchestration, lyrics, track name, etc. This event is primarily used to add “comments” to a MIDI file which a program would be expected to ignore when loading that file.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Copyright

FF 02 len text

A copyright message (ie, text). It’s best to put this event at the beginning of an MTrk.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Sequence/Track Name

FF 03 len text

The name of the sequence or track (ie, text). It’s best to put this event at the beginning of an MTrk.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Instrument

FF 04 len text

The name of the instrument that the track plays (ie, text). This might be different than the Sequence/Track Name. For example, maybe the name of your sequence (ie, Mtrk) is “Butterfly”, but since the track is played on a piano, you might also include an Instrument Name of “Piano”.

It’s best to put one (or more) of this event at the beginning of an MTrk to provide the user with identification of what instrument(s) is playing the track. Usually, the instruments (ie, patches, tones, banks, etc) are setup on the audio devices via MIDI Program Change events within the MTrk, particularly in MIDI files that are intended for General MIDI Sound Modules. So, this event exists merely to provide the user with visual feedback of the instrumentation for a track.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Lyric

FF 05 len text

A song lyric (ie, text) which occurs on a given beat. A single Lyric MetaEvent should contain only one syllable.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Marker

FF 06 len text

A marker (ie, text) which occurs on a given beat. Marker events might be used to denote a loop start and loop end (ie, where the sequence loops back to a previous event).

Note that len could be a series of bytes since it is expressed as a variable length quantity.

Cue Point

FF 07 len text

A cue point (ie, text) which occurs on a given beat. A Cue Point might be used to denote where a WAVE (ie, sampled sound) file starts playing, where the text would be the WAVE’s filename.

Note that len could be a series of bytes since it is expressed as a variable length quantity.

MIDI Channel

FF 20 01 cc

This optional event which normally occurs at the beginning of an MTrk (ie, before any non-zero delta-times and before any MetaEvents except Sequence Number) specifies to which MIDI Channel any subsequent MetaEvent or System Exclusive events are associated. The data byte cc, is the MIDI channel, where 0 would be the first channel.

The MIDI spec does not give a MIDI channel to System Exclusive events. Nor do MetaEvents have an imbedded channel. When creating a Format 0 MIDI file, all of the System Exclusive and MetaEvents go into one track, so its hard to associate these events with respective MIDI Voice messages. (ie, For example, if you wanted to name the musical part on MIDI channel 1 “Flute Solo”, and the part on MIDI Channel 2 “Trumpet Solo”, you’d need to use 2 Track Name MetaEvents. Since both events would be in the one track of a Format 0 file, in order to distinguish which track name was associated with which MIDI channel, you would place a MIDI Channel MetaEvent with a channel number of 0 before the “Flute Solo” Track Name MetaEvent, and then place another MIDI Channel MetaEvent with a channel number of 1 before the “Trumpet Solo” Track Name MetaEvent.

It is acceptable to have more than one MIDI channel event in a given track, if that track needs to associate various events with various channels.

MIDI Port

FF 21 01 pp

This optional event which normally occurs at the beginning of an MTrk (ie, before any non-zero delta-times and before any midi events) specifies out of which MIDI Port (ie, buss) the MIDI events in the MTrk go. The data byte pp, is the port number, where 0 would be the first MIDI buss in the system.

The MIDI spec has a limit of 16 MIDI channels per MIDI input/output (ie, port, buss, jack, or whatever terminology you use to describe the hardware for a single MIDI input/output). The MIDI channel number for a given event is encoded into the lowest 4 bits of the event’s Status byte. Therefore, the channel number is always 0 to 15. Many MIDI interfaces have multiple MIDI input/output busses in order to work around limitations in the MIDI bandwidth (ie, allow the MIDI data to be sent/received more efficiently to/from several external modules), and to give the musician more than 16 MIDI Channels. Also, some sequencers support more than one MIDI interface used for simultaneous input/output. Unfortunately, there is no way to encode more than 16 MIDI channels into a MIDI status byte, so a method was needed to identify events that would be output on, for example, channel 1 of the second MIDI port versus channel 1 of the first MIDI port. This MetaEvent allows a sequencer to identify which MTrk events get sent out of which MIDI port. The MIDI events following a MIDI Port MetaEvent get sent out that specified port.

It is acceptable to have more than one Port event in a given track, if that track needs to output to another port at some point in the track.

End of Track

FF 2F 00

This event is NOT optional. It must be the last event in every MTrk. It’s used as a definitive marking of the end of an MTrk. Only 1 per MTrk.

Tempo

FF 51 03 tt tt tt

Indicates a tempo change. The 3 data bytes of tt tt tt are the tempo in microseconds per quarter note. In other words, the microsecond tempo value tells you how long each one of your sequencer’s “quarter notes” should be. For example, if you have the 3 bytes of 07 A1 20, then each quarter note should be 0x07A120 (or 500,000) microseconds long.

So, the MIDI file format expresses tempo as “the amount of time (ie, microseconds) per quarter note”.

BPM

Normally, musicians express tempo as “the amount of quarter notes in every minute (ie, time period)”. This is the opposite of the way that the MIDI file format expresses it.

When musicians refer to a “beat” in terms of tempo, they are referring to a quarter note (ie, a quarter note is always 1 beat when talking about tempo, regardless of the time signature. Yes, it’s a bit confusing to non-musicians that the time signature’s “beat” may not be the same thing as the tempo’s “beat” — it won’t be unless the time signature’s beat also happens to be a quarter note. But that’s the traditional definition of BPM tempo). To a musician, tempo is therefore always “how many quarter notes happen during every minute”. Musicians refer to this measurement as BPM (ie, Beats Per Minute). So a tempo of 100 BPM means that a musician must be able to play 100 steady quarter notes, one right after the other, in one minute. That’s how “fast” the “musical tempo” is at 100 BPM. It’s very important that you understand the concept of how a musician expresses “musical tempo” (ie, BPM) in order to properly present tempo settings to a musician, and yet be able to relate it to how the MIDI file format expresses tempo.

To convert the MIDI file format’s tempo (ie, the 3 bytes that specify the amount of microseconds per quarter note) to BPM:

BPM = 60,000,000/(tt tt tt)

For example, a tempo of 120 BPM = 07 A1 20.

So why does the MIDI file format use “time per quarter note” instead of “quarter notes per time” to specify its tempo? Well, its easier to specify more precise tempos with the former. With BPM, sometimes you have to deal with fractional tempos (for example, 100.3 BPM) if you want to allow a finer resolution to the tempo. Using microseconds to express tempo offers plenty of resolution.

Also, SMPTE is a time-based protocol (ie, it’s based upon seconds, minutes, and hours, rather than a musical tempo). Therefore it’s easier to relate the MIDI file’s tempo to SMPTE timing if you express it as microseconds. Many musical devices now use SMPTE to sync their playback.

PPQN Clock

A sequencer typically uses some internal hardware timer counting off steady time (ie, microseconds perhaps) to generate a software “PPQN clock” that counts off the timebase (Division) “ticks”. In this way, the time upon which an event occurs can be expressed to the musician in terms of a musical bar:beat:PPQN-tick rather than how many microseconds from the start of the playback. Remember that musicians always think in terms of a beat, not the passage of seconds, minutes, etc.

As mentioned, the microsecond tempo value tells you how long each one of your sequencer’s “quarter notes” should be. From here, you can figure out how long each one of your sequencer’s PPQN clocks should be by dividing that microsecond value by your MIDI file’s Division. For example, if your MIDI file’s Division is 96 PPQN, then that means that each of your sequencer’s PPQN clock ticks at the above tempo should be 500,000 / 96 (or 5,208.3) microseconds long (ie, there should be 5,208.3 microseconds in between each PPQN clock tick in order to yield a tempo of 120 BPM at 96 PPQN. And there should always be 96 of these clock ticks in each quarter note, 48 ticks in each eighth note, 24 ticks in each sixteenth, etc).

Note that you can have any timebase at any tempo. For example, you can have a 96 PPQN file playing at 100 BPM just as you can have a 192 PPQN file playing at 100 BPM. You can also have a 96 PPQN file playing at either 100 BPM or 120 BPM. Timebase and tempo are two entirely separate quantities. Of course, they both are needed when you setup your hardware timer (ie, when you set how many microseconds are in each PPQN tick). And of course, at slower tempos, your PPQN clock tick is going to be longer than at faster tempos.

MIDI Clock

MIDI clock bytes are sent over MIDI, in order to sync the playback of 2 devices (ie, one device is generating MIDI clocks at its current tempo which it internally counts off, and the other device is syncing its playback to the receipt of these bytes). Unlike with SMPTE frames, MIDI clock bytes are sent at a rate related to the musical tempo.

Since there are 24 MIDI Clocks in every quarter note, the length of a MIDI Clock (ie, time inbetween each MIDI Clock message) is the microsecond tempo divided by 24. In the above example, that would be 500,000/24, or 20,833.3 microseconds in every MIDI Clock. Alternately, you can relate this to your timebase (ie, PPQN clock). If you have 96 PPQN, then that means that a MIDI Clock byte must occur every 96 / 24 (ie, 4) PPQN clocks.

SMPTE

SMPTE counts off the passage of time in terms of seconds, minutes, and hours (ie, the way that non-musicians count time). It also breaks down the seconds into smaller units called “frames”. The movie industry created SMPTE, and they adopted 4 different frame rates. You can divide a second into 24, 25, 29, or 30 frames. Later on, even finer resolution was needed by musical devices, and so each frame was broken down into “subframes”.

So, SMPTE is not directly related to musical tempo. SMPTE time doesn’t vary with “musical tempo”.

Many devices use SMPTE to sync their playback. If you need to sychronize with such a device, then you may need to deal with SMPTE timing. Of course, you’re probably still going to have to maintain some sort of PPQN clock, based upon the passing SMPTE subframes, so that the user can adjust the tempo of the playback in terms of BPM, and can consider the time of each event in terms of bar:beat:tick. But since SMPTE doesn’t directly relate to musical tempo, you have to interpolate (ie, calculate) your PPQN clocks from the passing of subframes/frames/seconds/minutes/hours (just as we previously calculated the PPQN clock from a hardware timer counting off microseconds).

Let’s take the easy example of 25 Frames and 40 SubFrames. As previously mentioned in the discussion of Division, this is analogous to millisecond based timing because you have 1,000 SMPTE subframes per second. (You have 25 frames per second. Each second is divided up into 40 subframes, and you therefore have 25 * 40 subframes per second. And remember that 1,000 milliseconds are also in every second). Every millisecond therefore means that another subframe has passed (and vice versa). Every time you count off 40 subframes, a SMPTE frame has passed (and vice versa). Etc.

Let’s assume you desire 96 PPQN and a tempo of 500,000 microseconds. Considering that with 25-40 Frame-SubFrame SMPTE timing 1 millisecond = 1 subframe (and remember that 1 millisecond = 1,000 microseconds), there should be 500,000 / 1,000 (ie, 500) subframes per quarter note. Since you have 96 PPQN in every quarter note, then every PPQN ends up being 500 / 96 subframes long, or 5.2083 milliseconds (ie, there’s how we end up with that 5,208.3 microseconds PPQN clock tick just as we did above in discussing PPQN clock). And since 1 millisecond = 1 subframe, every PPQN clock tick also equals 5.2083 subframes at the above tempo and timebase.

Conclusions

BPM = 60,000,000/MicroTempo
MicrosPerPPQN = MicroTempo/TimeBase
MicrosPerMIDIClock = MicroTempo/24
PPQNPerMIDIClock = TimeBase/24
MicrosPerSubFrame = 1000000 * Frames * SubFrames
SubFramesPerQuarterNote = MicroTempo/(Frames * SubFrames)
SubFramesPerPPQN = SubFramesPerQuarterNote/TimeBase
MicrosPerPPQN = SubFramesPerPPQN * Frames * SubFrames

SMPTE Offset

FF 54 05 hr mn se fr ff

Designates the SMPTE start time (hours, minutes, secs, frames, subframes) of the MTrk. It should be at the start of the MTrk. The hour should not be encoded with the SMPTE format as it is in MIDI Time Code. In a format 1 file, the SMPTE OFFSET must be stored with the tempo map (ie, the first MTrk), and has no meaning in any other MTrk. The ff field contains fractional frames in 100ths of a frame, even in SMPTE based MTrks which specify a different frame subdivision for delta-times (ie, different from the subframe setting in the MThd).

Time Signature

FF 58 04 nn dd cc bb

Time signature is expressed as 4 numbers. nn and dd represent the “numerator” and “denominator” of the signature as notated on sheet music. The denominator is a negative power of 2: 2 = quarter note, 3 = eighth, etc. The cc expresses the number of MIDI clocks in a metronome click. The bb parameter expresses the number of notated 32nd notes in a MIDI quarter note (24 MIDI clocks). This event allows a program to relate what MIDI thinks of as a quarter, to something entirely different. For example, 6/8 time with a metronome click every 3 eighth notes and 24 clocks per quarter note would be the following event:

FF 58 04 06 03 18 08

Key Signature

FF 59 02 sf mi

sf = -7 for 7 flats, -1 for 1 flat, etc, 0 for key of c, 1 for 1 sharp, etc.

mi = 0 for major, 1 for minor

Proprietary Event

FF 7F len data…

This can be used by a program to store proprietary data. The first byte(s) should be a unique ID of some sort so that a program can identity whether the event belongs to it, or to some other program. A 4 character (ie, ascii) ID is recommended for such.

Note that len could be a series of bytes since it is expressed as a variable length quantity.


Errata

In a format 0 file, the tempo and time signature changes are scattered throughout the one MTrk. In format 1, the very first MTrk should consist of just the tempo and time signature events so that it could be read by some device capable of generating a “tempo map”. In format 2, each MTrk should begin with at least one initial tempo and time signature event.

NOTE: If there are no tempo and time signature events in a MIDI file, assume 120 BPM and 4/4.

RMID Files

The method of saving data in chunks (ie, where the data is preceded by an 8 byte header consisting of a 4 char ID and a 32-bit size field) is the basis for Interchange File Format. You should now read the article About Interchange File Format for background information.

As mentioned, MIDI File format is a “broken” IFF. It lacks a file header at the start of the file. One bad thing about this is that a standard IFF parsing routine will choke on a MIDI file (because it will expect the first 12 bytes to be the group ID, filesize, and type ID fields). In order to fix the MIDI File format so that it strictly adheres to IFF, Microsoft simply made up a 12-byte header that is prepended to MIDI files, and thereby came up with the RMID format. An RMID file begins with the group ID (4 ascii chars) of ‘R’, ‘I’, ‘F’, ‘F’, followed by the 32-bit filesize field, and then the type ID of ‘R’, ‘M’, ‘I’, ‘D’. Then, the chunks of a MIDI file follow (ie, the MThd and MTrk chunks). If you chop off the first 12 bytes of an RMID file, then you end up with a standard MIDI file.

Note that chunks within a MIDI file are not padded out (with an extra 0 byte) to an even number of bytes. I don’t know as if the RMID format corrects this aberration of the MIDI file format too.