Converting audio files from formats you can listen to, store, transmit, and analyze can be migraine-inducing (at least for me) so here are some shortcuts that work for getting all that done (at least for me).
Here is a runnable script demonstrating all the snippets below.
Libraries
There is a vibrant and confusing landscape of libraries and tools for working with audio in Python.
One pair that has worked well for me is soundfile
and simpleaudio
.
They cover the tasks I need to do, and have licenses that allow me to use
them in code at work.
soundfile is an open source tool for manipulating audio files, released under the BSD 3-clause license. The source code is browsable.
simpleaudio is an MIT Licensed cross-platform library for playing turning audio files into phyiscal vibrations of the air.
import simpleaudio as sa import soundfile as sf
Read an .mp3 to a Numpy array
audio_data, samplerate = sf.read(mp3_filename)
Write a Numpy array to an .mp3 file.
sf.write(mp3_filename, audio_data, samplerate)
Convert a Numpy array to a string of .mp3-formatted bytes.
mp3_buf = io.BytesIO() mp3_buf.name = 'file.mp3' sf.write(mp3_buf, audio_data, samplerate) mp3_buf.seek(0) # Necessary for read() to return all bytes mp3_bytes = mp3_buf.read()
An mp3_buf
name that ends in .mp3
is important because soundfile
infers file type from the filename extension.
Convert a string of .mp3-formatted bytes to a Numpy array.
new_mp3_buf = io.BytesIO(mp3_bytes) new_mp3_buf.name = 'new_file.mp3' new_audio_array, new_samplerate = sf.read(new_mp3_buf)
Convert a string of bytes to a base64-encoded ASCII string.
base64_mp3_bytes = base64.b64encode(mp3_bytes) base64_mp3_string = base64_mp3_bytes.decode("ascii")
Convert a base64-encoded ASCII string to a string of bytes.
new_base64_mp3_bytes = base64_mp3_string.encode("ascii") new_mp3_bytes = base64.b64decode(new_base64_mp3_bytes)
Other file types
soundfile can work with other filetypes in the same way including .wav, .flac, and .ogg.
Normalize Numpy data for simpleaudio playback
This ensures that the playback takes advantage of the full range of the audio playback volume. It doesn't try to push too hard and overdrive, but it also doesn't leave range unused. From the docs:
audio_array = data * 32767 / max(abs(data)) audio_array = audio_array.astype(np.int16)
Play audio from a Numpy array
# simpleaudio.play_buffer(audio_data, num_channels, bytes_per_sample, sample_rate) play_obj = sa.play_buffer(audio_array, num_channels, 2, samplerate) play_obj.wait_done()