The audio node adds the possibility to work with digital audio in Verse applications. Audio in Verse is represented as raw, uncompressed, pulse-code modulated (PCM) samples. This simply means that audio is described by providing a linear sequence of amplitude values, called samples, and a desired replay frequency. This way of describing audio digitally is fairly standardized and well supported by typical hardware.
![]() | Terminology |
|---|---|
The word sample is used to denote a single numerical value in a PCM sequence; it is not used to refer to the whole sound. |
Verse audio is always monaural, a single sound cannot be in stereo. This better mimics the properties of real-world audio sources, which are considered to be points in space from which audio is emitted. It is up to whatever software is used to "render", i.e. play, the audio to create different versions for a human listener's left and right ears, if so desired.
Audio, in both buffers and streams, is represented as uncompressed PCM (pulse-code modulation) data, meaning linear sequences of digital values that represent the audio amplitude at some point in time. The two main parameters that control the quality of the sound then become the sampling frequency, i.e. how many such values exist per unit of time, and the sampling resolution, i.e. how many bits are used to represent the value. Verse supports arbitrary sampling frequencies, and a six different sample formats of both fixed-point (integer) and floating-point varieties.
Verse supports samples in the following formats:
| Format | Description |
|---|---|
| VN_A_BLOCK_INT8 | 8-bit signed integers. The most space-efficient format supported, but not a very high-quality one. |
| VN_A_BLOCK_INT16 | 16-bit signed integers. Perhaps the format that is most commonly used in low to medium-end audio applications, such as typical games on PCs and consoles. |
| VN_A_BLOCK_INT24 | 24-bit signed integers. Since most general-purpose CPUs don't have a native 24-bit integer data type, these are represented as three unsigned 8-bit bytes, stored in big-endian order without any padding. |
| VN_A_BLOCK_INT32 | 32-bit signed integers, for added precision. |
| VN_A_BLOCK_REAL32 | 32-bit floating point numbers, stored in IEEE-754 big-endian format. For high-end processing applications. |
| VN_A_BLOCK_REAL64 | 64-bit floating point numbers, stored in IEEE-754 big-endian format. For very high-end processing applications. |
Samples are always transmitted and received collected into blocks; it is not possible to send a single sample value. This coarser granularity helps reduce the overhead of transmitting bulk data such as audio. The number of samples in a single block depends on the chosen sample format, according to the following table:
| Format | Block Size |
|---|---|
| VN_A_BLOCK_INT8 | 1024 |
| VN_A_BLOCK_INT16 | 512 |
| VN_A_BLOCK_INT24 | 384 |
| VN_A_BLOCK_INT32 | 256 |
| VN_A_BLOCK_REAL32 | 256 |
| VN_A_BLOCK_REAL64 | 128 |
The sampling frequency associated with audio data is specified using a 64-bit floating point number, and is expressed in Hertz (Hz). So, for CD-quality audio, you would use a VN_A_BLOCK_INT16-type layer at a frequency of 44,100.0 Hz.
The audio node provides two distinct kinds of support for handling audio data:
Buffers, that store but cannot play audio samples.
Streams, that play but cannot efficiently store audio samples.
Because the number of samples per block varies with the block's type, and the duration of playback of a single sample varies with the frequency (as 1:1, i.e. at N Hz, you need N samples per second), it is not possible to specify the duration of playback of a single block of audio.
Buffers, like in the text node node, are used to store audio data for editing. A buffer is simply a named container that can hold blocks of samples. Each such block is given an index, simply an integer that tells you the location of the block in the buffer as a whole. There can be gaps in the index sequence, that represent silence.
The intended use for audio buffers is creating audio editing applications; they provide a host-side "back end" for audio storage. By storing the samples as blocks of the same size used by streams (see below), the transition from passive storage to active playback can be made easier.
The following image (Figure 2-5) illustrates how buffer blocks form a sequence, and that there can be gaps where the data is "clear", i.e. the amplitude is zero and the audio silent. The vertical lines at regular intervals illustrate block boundaries (these blocks are 128 samples each), and the digits below are the block indices of each. Note how they start off at zero, and how the index of the silent block is still a valid index.
Streams are simply independent "places" where audio data can be sent for playback. It might be helpful to think of them as channels of a "radio" (the node), each of which can be subscribed to individually.
Unlike almost all other data in Verse, stream data is transferred in an unreliable way; any dropped audio commands will not be resent. This is because the intent is for the commands to contain data to be replayed imminently, there should not be enough time to do a resend.
Data in streams arrives in time-stamped blocks, one per command. The size of the blocks, in number of samples, varies with the data type chosen as per the table above. For network data encoding/decoding reasons, each block specifies its data type and sample frequency, although it is not recommended that these are actually varied in the same stream as that creates rather complex problems during replay.
The timestamp in each block lets the receiver know when that block is supposed to start playing. A typical stream playing client will put the block in a queue, sorted on the timestamp. Blocks are then de-queued and played, possibly employing some kind of double buffering scheme, as the current time reaches that of the block's timestamp.