mirror of
https://gitlab.freedesktop.org/gstreamer/gstreamer.git
synced 2024-12-30 12:10:37 +00:00
351 lines
16 KiB
Text
351 lines
16 KiB
Text
This document was created by editing together a series of emails and IRC logs.
|
|
This means that the language might seem a little weird at places, but it should
|
|
outline most of the thinking and design adding MIDI support to GStreamer so far.
|
|
|
|
Authors of this document include:
|
|
|
|
Steve Baker <steve@stevebaker.org>
|
|
Leif Johnson <leif@ambient.2y.net>
|
|
Andy Wingo <wingo@pobox.com>
|
|
Christian Schaller <Uraeus@gnome.org>
|
|
|
|
About MIDI
|
|
----------
|
|
|
|
MIDI (Musical Instrument Digital Interface) is used mainly as a communications
|
|
protocol for devices in a music studio. These devices could be physical entities
|
|
(e.g. synthesizers, sequencers, etc.) or purely logical (e.g. sequencers or
|
|
filter banks implemented as software applications).
|
|
|
|
The MIDI specification essentially consists of a list of MIDI messages that can
|
|
be passed among devices ; these messages (also referred to as "events") are
|
|
usually things like NoteOn (start playing a sound), NoteOff (stop playing a
|
|
sound), Clock (for synchronization), ProgramChange (for signaling an instrument
|
|
or program change), etc.
|
|
|
|
MIDI is different from other cross-device or inter-process streaming methods
|
|
(e.g. JACK and possibly some CORBA implementations) because MIDI messages are
|
|
discrete and usually only exchanged a few times per second ; the devices
|
|
involved are supposed to interpret the MIDI messages and produce sounds or
|
|
signals. The devices in a MIDI chain typically send their audio signals out on
|
|
separate (physical) cables or (logical) channels that have nothing to do with
|
|
the MIDI chain itself.
|
|
|
|
We want to have MIDI messages available in GStreamer pipelines because MIDI is a
|
|
common protocol in many existing studios, and MIDI is more or less a standard
|
|
for inter-device communications. With MIDI support in GStreamer we can look
|
|
forward to (a) controlling and being controlled by external devices like
|
|
keyboards and sequencers, and (b) synchronizing and communicating among multiple
|
|
applications on a studio computer.
|
|
|
|
GStreamer and MIDI
|
|
------------------
|
|
|
|
MIDI could be thought of in terms of dataflow as a sparse, non-constant flow of
|
|
bytes. GStreamer works best with near-constant data flow, so a MIDI stream would
|
|
probably have to consist mostly of filler events, sent at a constant tick rate.
|
|
It makes the most sense at this point to distribute MIDI events in a GStreamer
|
|
pipeline as a sequence of subclasses of GstEvent (GstMidiEvent or suchlike).
|
|
|
|
On-the-wire hardware MIDI connections run at a fixed data rate:
|
|
|
|
The MIDI data stream is a unidirectional asynchronous bit stream at 31.25
|
|
Kbits/sec. with 10 bits transmitted per message (a start bit, 8 data bits, and
|
|
one stop bit).
|
|
|
|
Which is to say, 3125 bytes/sec. I would assume that the rawmidi interface would
|
|
already filter out the stop and start bits? dunno. How about the diagram on[1],
|
|
I found that to be useful. The MIDI specification is also available (though I
|
|
can't find it online at the moment ... might have to buy a copy), and there are
|
|
several tutorial and help pages (just google for MIDI tutorial).
|
|
|
|
There's another form of MIDI (the common usage?), "Standard MIDI files," which
|
|
essentially specify how to save and restore MIDI events in a file. We'll talk
|
|
about that in a bit.
|
|
|
|
[1] http://www.philrees.co.uk/#midi
|
|
|
|
MIDI and current Linux/Unix audio systems
|
|
-----------------------------------------
|
|
|
|
We don't know very much about the OSS MIDI interface; apparently there exists an
|
|
evil /dev/sequencer interface, and maybe a better /dev/midi* one. I only know
|
|
this from overhearing it from people. For latency reasons, the ALSA MIDI
|
|
interface will be much more solid than using these devices ; however, the
|
|
/dev/midi* devices might be more of a cross-platform standard.
|
|
|
|
ALSA has a couple ways to access MIDI devices. One way is the sequencer API.
|
|
There's a tutorial[1], and some example code[2] -- the paradigm is 'wait on some
|
|
event fd's until you get an event, then process the event'. Not very
|
|
GStreamer-like. This API timestamps the events, much like Standard MIDI files.
|
|
|
|
The other way to use MIDI with alsa is with the rawmidi interface. There is a
|
|
canonical reference[3] and example code, too[4]. This is much more like
|
|
GStreamer. I do wonder about the ability to connect to other sequencer clients,
|
|
though...
|
|
|
|
[1] http://www.suse.de/~mana/alsa090_howto.html#sect04
|
|
[2] http://www.suse.de/~mana/seqdemo.c
|
|
[3] http://www.alsa-project.org/alsa-doc/alsa-lib/rawmidi.html#rawmidi
|
|
[4] http://www.alsa-project.org/alsa-doc/alsa-lib/_2test_2rawmidi_8c-example.html#example_test_rawmidi
|
|
|
|
Getting MIDI into GStreamer
|
|
---------------------------
|
|
|
|
All buffers are timestamped, and MIDI buffers should be no exception. A buffer
|
|
with MIDI data will have a timestamp which says exactly when the data should be
|
|
played. In some cases this would mean a buffer contains just a couple of bytes
|
|
(eg, NoteOn). If this turns out to be inefficient we can deal with that later.
|
|
|
|
In addition to integrating more tightly with GStreamer audio pipelines (see the
|
|
dparams and midi2pcm sections below), there are several elements that we will
|
|
need for basic MIDI interaction in GStreamer. These basics include file parsing
|
|
and encoding (is that the opposite of parsing ?), and direct hardware input and
|
|
output. We'll also probably need a more generic sequencer interface for defining
|
|
elements that are capable of sending and receiving this type of nonlinear stream
|
|
information.
|
|
|
|
For these tasks, we need to define some MIME types, some general properties, and
|
|
some MIDI elements for GStreamer.
|
|
|
|
Types :
|
|
|
|
- MIDI being passed to/from a text file : audio/midi (This is in my midi.types
|
|
file, associated with a .midi or .mid extension. It seems analogous to a .wav
|
|
file, which contains "audio/x-wav" type information.)
|
|
|
|
- MIDI in a pipeline : audio/x-gst-midi ?
|
|
|
|
Properties :
|
|
|
|
- tick rate : (default to 96.0, or something like that) This is measured in
|
|
ticks per quarter note (or "pulses per quarter note" (ppqn) to be picky). We
|
|
should use float for this so we can support nonstandard (fractional)
|
|
tempos.
|
|
|
|
- tempo : (default to 120 bpm) This can be measured in bpm (beats per minute,
|
|
the musician's viewpoint), or mpq (microseconds per quarter note, the unit
|
|
used in a MIDI file[1]). Seems like we might want a query format for these
|
|
different units ? Or maybe we should just use MPQ and leave it to apps to do
|
|
conversions ?
|
|
|
|
Elements :
|
|
|
|
- midiparse : audio/midi -> audio/x-gst-midi
|
|
|
|
This element converts MIDI information from a file into MIDI signals that can
|
|
be passed around in GStreamer pipelines. This would parse so-called Standard
|
|
MIDI files (and XML format MIDI files ?). Standard MIDI files are just
|
|
timestamped MIDI data; they don't run at a constant bitrate, and for that
|
|
reason you need this element.
|
|
|
|
The timestamps that this element produces would be based on the tempo
|
|
property, and the time deltas of the MIDI file data. If no data exists for a
|
|
given tick, the element can just send a filler event.
|
|
|
|
The element should support both globbing and streaming the file. Streaming it
|
|
is the most GStreamerish way of handling it, but there are MIDI file formats
|
|
which are by definition unstreamable, therefore a MIDI plugin needs to support
|
|
streaming and globbing - and globbing might be easiest to implement first. The
|
|
modplug plugin also reads an entire file before playing so its a valid
|
|
technique.
|
|
|
|
- ossmidisink : audio/x-gst-midi -> hardware
|
|
|
|
Could be added to the existing OSS plugin dir, sends MIDI data to the OSS MIDI
|
|
sequencer device (/dev/midi). Makes extensive use of GstClock to send out data
|
|
only when the buffer/event timestamp says it should. (Could instead use the
|
|
raw MIDI device for clocking, doesn't matter which.)
|
|
|
|
- alsamidisink : audio/x-gst-midi -> ALSA rawmidi API
|
|
|
|
Guess what this does. Don't know whether alsa's sequencer interface would be
|
|
better than its rawmidi one. Probably rawmidi?
|
|
|
|
- ossmidisrc, alsamidisrc : hardware -> audio/x-gst-midi
|
|
|
|
Real time midi input. This needs to be from the rawmidi APIs.
|
|
|
|
It seems like we could implement a class hierarchy around these elements. We
|
|
could use a GstMidiElement superclass, which would include the above properties
|
|
and contain utility functions for things like reading from the clock and
|
|
converting between different time measurement units. From this element we ought
|
|
to have GstMidiSource, GstMidiFilter, and GstMidiSink parent classes. (Maybe
|
|
that's overkill ?) Each of these MIDI elements listed above could then inherit
|
|
from an appropriate superclass.
|
|
|
|
We also need an interface (GstSequencer) to allow multiple implementations for
|
|
one of the most common MIDI tasks (duh, sequencing). The midisinks could
|
|
implement this interface, as could other soft-sequencer elements like
|
|
playondemand. The sequencer interface needs to be able to support MIDI
|
|
sequencing tasks, but it should support more generic sequencing concepts.
|
|
|
|
As you might have guessed, getting MIDI support into GStreamer is thus a matter
|
|
of (a) creating a series of elements that handle MIDI data, and (b) creating a
|
|
sort of MIDI library (like Timidity ?) that basically includes #defines for MIDI
|
|
message codes and stuff like that. This stuff should be coded in the gst-plugins
|
|
module, under gst-libs/gst/sequencer (for the interface) and
|
|
gst-libs/gst/audio/midi/ (for the defines and superclasses).
|
|
|
|
Of course, this is just the basics ... read on for the really gory future stuff.
|
|
:)
|
|
|
|
[1] http://www.borg.com/~jglatt/tech/midifile/ppqn.htm
|
|
|
|
Looking ahead
|
|
-------------
|
|
|
|
- MIDI to PCM
|
|
|
|
It would be nice to be able to transform MIDI (audio/midi) to audio
|
|
(audio/x-raw-{int|float}), which could be further processed in a GStreamer
|
|
pipeline. In effect this would be using GStreamer as some kind of softsynth.
|
|
|
|
The first way to do this would be to send MIDI data to softsynths and get audio
|
|
data out. There's a very, very nice way of doing this in ALSA (the sequencer
|
|
API). Timidity can already register itself as a sequencer client, as can
|
|
amSynth, AlsaModularSynth, SpiralSynth, etc... and these latter ones are *much*
|
|
more interesting. This is the proper, IMHO, way of doing things.
|
|
|
|
But, the other question is getting that data back for use by GStreamer. In that
|
|
sense a librafied Timidity would be useful, I guess... see the thing is that all
|
|
of these sequencer clients probably want to output to the sound card directly,
|
|
although they are configurable. In this, the musician's only hope is Jack. If
|
|
the synth is jacked up, we can get its output back into GStreamer. If not, oh
|
|
well, it's gone ...
|
|
|
|
- MIDI to dparams
|
|
|
|
Once we have MIDI streams, we can start doing fun things like writing a
|
|
midi2dparams element which would map midi data to control the dynamic parameters
|
|
of other elements, but lets not get ahead of ourselves.
|
|
|
|
Which gets back to MIDI. MIDI is a representation of control signals. So all you
|
|
need are elements to convert that representation to control signals. In
|
|
addition, you'd probably want something like SuperCollider's Voicer element --
|
|
see [1] for more information on that.
|
|
|
|
All of this is pretty specific to a synthesizer system, and rightly so :
|
|
multiple projects use it it could go in some kind of library or what-what but
|
|
otherwise it can stay in individial projects.
|
|
|
|
[1] http://www.audiosynth.com/schtmldocs/Help/Unit_Generators/Spawners/Voicer.help.html
|
|
|
|
On using dparams for MIDI
|
|
-------------------------
|
|
|
|
You might want to look into using dparams if:
|
|
|
|
- you wanted your control parameters to change at a higher rate thanyour buffer
|
|
rate (think zipper noise and sample-granularity-interpolation)
|
|
- you wanted a better way to store and represent control data than midifiles
|
|
- We wrote a linear interpolation time-aware dparam so that we could really
|
|
demonstrate what they're good for.
|
|
|
|
It was always the intention for dparams to be able to send values to and get
|
|
values from pads. All we need is some simple elements to do the forwarding.
|
|
|
|
Possible inefficiency remedy : GstControlPad
|
|
--------------------------------------------
|
|
|
|
If it turns out that sending MIDI events spaced out with filler (blank) events
|
|
isn't efficient enough, we'll need to look into implementing something new ; for
|
|
now, though, we'll just try the simple approach and hope our CPUs are fast
|
|
enough. But read on for a little brainstorming.
|
|
|
|
It seems like GStreamer could benefit from a different subclass of GstPad,
|
|
something like GstControlPad. Pads of this type could contain control data like
|
|
parameters for oscillators/filters, MIDI events, text information for subtitles,
|
|
etc. The defining characteristic of this type of data is that it operates at a
|
|
much lower sample rate than the multimedia data that GStreamer currently
|
|
handles.I think that control data can be sent down existing pads without making
|
|
any changes.
|
|
|
|
GstControlPad instances could also contain a default value like Wingo has been
|
|
pondering, so apps wouldn't need to connect actual data to the pads if the
|
|
default value sufficed. There could also be some sweet integration with dparams,
|
|
it seems like.If you want a default value on a control pad, just make the source
|
|
element send the value when the state changes. Elements that have control pads
|
|
could also have standard GstPads, and I'd imagine there would need to be some
|
|
scheduler modifications to enable the lower processing demands of control pads.
|
|
|
|
An example : integrating amSynth[1]
|
|
-----------------------------------
|
|
|
|
We would want to be able to write amSynth as a plugin. This would require that
|
|
when the process function is called, we have a MIDI buffer as input, containing
|
|
how ever many MIDI events occurred in, say, 1/100 sec for example, and then we
|
|
generate an audio buffer of the same time duration...)
|
|
|
|
Maybe this will indicate the kind of problems to be faced. GStreamer has solved
|
|
this problem for audio/video syncing, so you should probably do it the same way.
|
|
The first task would be to make this pipeline work:
|
|
|
|
filesrc location=foo.mid ! midiparse ! amSynth ! osssink
|
|
|
|
midiparse will take MIDI file data as an input, and produce timestamped MIDI
|
|
buffers as output. It could have a beats-per-minute property as mentioned above
|
|
to specify how the MIDI beat offsets are converted to timestamps.
|
|
|
|
An amSynth element should be a loop element. It would read MIDI buffers until it
|
|
has more than enough to produce audio for the duration of one audio buffer. It
|
|
knows it has enough MIDI buffers by looking at the timestamp. Because amSynth is
|
|
setting the timestamps on the audio buffers going out, a MIDI sink element would
|
|
know when to play them. Once this is working, a more challenging pipeline might
|
|
be :
|
|
|
|
alsamidisrc ! amSynth ! alsasink
|
|
|
|
This would be a real-time pipeline : any MIDI input should instantly be
|
|
transformed into audio. You would have small audio buffers for lowlatency (64
|
|
samples seems to be typical).
|
|
|
|
This is a problem for amSynth because it can't sit there waiting for more MIDI
|
|
just in case there is more than one MIDI event per audio buffer. In this case
|
|
you could either :
|
|
|
|
- listen to the clock so you know when its time to output the buffer
|
|
- have some kind of real-time mode for amSynth which doesn't wait forMIDI events
|
|
which may never come
|
|
- have alsamidisrc produce empty timestamped MIDI buffers so that amSynth knows
|
|
that is time to spit out some audio.
|
|
|
|
[1] http://amsynthe.sourceforge.net/amSynth/index.html
|
|
|
|
Extended midi files : .kar and karaoke
|
|
-----------------------------------
|
|
|
|
KAR files are standard MIDI files that also contain a stream with lyrics, for karaoke,
|
|
synchronised on music. MIDI players play them without any problem, ignoring the
|
|
additional data.
|
|
|
|
It is the more widespread karaoke file format. (one other beeing .kok files, for mp3)
|
|
|
|
KAR files are based on standard MIDI files with the following additional events:
|
|
|
|
The KAR text meta events start with an @ followed by a character indicating
|
|
the type of KAR text meta event, then followed by text for that event. The
|
|
following text meta events occur embedded in regular MIDI text events:
|
|
|
|
FileType: @KMIDI KARAOKE FILE
|
|
Version: @V0100
|
|
Information: @I<text>
|
|
Language: @LENGL
|
|
Title 1: @T<title>
|
|
Title 2: @T<author>
|
|
Title 3: @T<copyright>
|
|
|
|
The following lyric text indicators are defined. A \ (backslash) in the
|
|
text is to clear the screen. A / (forwardslash) in the text is a line feed
|
|
(next line).
|
|
|
|
Some more info on the data format could be found at those locations :
|
|
|
|
http://www.krazykats-karaoke.co.uk/file_formats.html
|
|
http://www.wotsit.org/download.asp?f=kar
|
|
http://filext.com/detaillist.php?extdetail=KAR
|
|
|
|
Some Linux players that handle this format :
|
|
|
|
http://lmuse.sourceforge.net/files.php
|
|
http://sourceforge.net/projects/gkaraoke/
|