Finding the source of a crash in pyodide

I recently started experimenting with pyodide. Pyodide is a project that brings Python to the web environment by compiling the Python interpreter to WebAssembly using Emscripten. I was able to create a small web application that applies a band pass filter to audio file.

This works just fine, except for it crashes when I tried to process Ogg/Vorbis files. The actual file I was trying to process is just a file from OpenGameArt.org. Since this is running under the browser sandbox, what actually happened was the Pyodide interpreter seemed to lose its mind an do all sort of unexpected things. I found this error in the console of my browser

The browser console indicates "memory access out of bounds"

Since this is just Python code, what I decided to do was to reproduce this outside of Pyodide. I came up with this script to produce the same outcome

 crash_soundfile.py 697 Bytes

import soundfile as sf
import sys
import os
import numpy as np
audio_subtype = os.environ.get('AUDIO_SUBTYPE', 'VORBIS')
audio_format = 'OGG'
channel_cnt = 2
sample_rate = 44100

dtype = os.environ.get('DTYPE','float64')
sample_cnt = int(os.environ.get('SAMPLE_CNT', '11235535'))

if len(sys.argv) == 2:
  with sf.SoundFile(sys.argv[1]) as audioin:
    channels = audioin.read(dtype=dtype, always_2d=True)
    print("read %d samples" % (len(channels),))
else:
  channels = np.ndarray((sample_cnt, channel_cnt), dtype=dtype)


with sf.SoundFile('crash.ogg', mode='w', samplerate=sample_rate, channels=channel_cnt, subtype=audio_subtype, format=audio_format) as audioout:
  audioout.write(channels)

This script can be used to read and write an audio file, or to just write an audio file containing nothing but total silence. Running this produces a segmentation fault

$ python3 crash_soundfile.py /home/ericu/Downloads/Path\ to\ Lake\ Land.ogg 
read 11235535 samples
Segmentation fault

This seems to be roughly equivalent to the problem seen when running using Pyodide in the browser. This output alone is not very informative. In order to try and get more info from this I ended up using gdb and also passing the -X faulthandler argument to Python.

$ gdb -ex "file $(which python3)" -ex "run -X faulthandler crash_soundfile.py /home/ericu/Downloads/Path\ to\ Lake\ Land.ogg"

(No debugging symbols found in /home/ericu/tmp/soundfileenv/bin/python3)
Starting program: /home/ericu/tmp/soundfileenv/bin/python3 -X faulthandler crash_soundfile.py /home/ericu/Downloads/Path\ to\ Lake\ Land.ogg
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
Downloading separate debug info for /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/numpy/_core/_multiarray_umath.cpython-312-aarch64-linux-gnu.so
Downloading separate debug info for /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/numpy/_core/../../numpy.libs/libscipy_openblas64_-71e1b124.so
Downloading separate debug info for /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/numpy/_core/../../numpy.libs/libgfortran-daac5196-038a5e3c.so.5.0.0
[New Thread 0xffffed3ff180 (LWP 89749)]                                                                                              
[New Thread 0xffffecbef180 (LWP 89750)]
[New Thread 0xffffec3df180 (LWP 89751)]
Downloading separate debug info for /usr/lib/python3.12/lib-dynload/_contextvars.cpython-312-aarch64-linux-gnu.so
Downloading separate debug info for /usr/lib/python3.12/lib-dynload/_ctypes.cpython-312-aarch64-linux-gnu.so                         
Downloading separate debug info for /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/numpy/linalg/_umath_linalg.cpython-312-aarch64-linux-gnu.so
Downloading separate debug info for /usr/lib/python3.12/lib-dynload/_bz2.cpython-312-aarch64-linux-gnu.so                            
Downloading separate debug info for /usr/lib/python3.12/lib-dynload/_lzma.cpython-312-aarch64-linux-gnu.so                           
Downloading separate debug info for /lib/aarch64-linux-gnu/liblzma.so.5                                                              
Downloading separate debug info for /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so    
read 11235535 samples                                                                                                                

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000ffffeb170c04 in _preextrapolate_helper ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
(gdb) bt
#0  0x0000ffffeb170c04 in _preextrapolate_helper ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
#1  0x0000ffffeb171a60 in vorbis_analysis_wrote ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
#2  0x0000ffffeb0e7874 in vorbis_write_samples ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
#3  0x0000ffffeb0e7de0 in vorbis_write_d ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
#4  0x0000ffffeb0b74b8 in sf_writef_double ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_soundfile_data/libsndfile_arm64.so
#5  0x0000ffffeb4e8050 in ffi_call_SYSV ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_cffi_backend.cpython-312-aarch64-linux-gnu.so
#6  0x0000ffffeb4e6b04 in ffi_call_int ()
   from /home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/_cffi_backend.cpython-312-aarch64-linux-gnu.so
#7  0x0000ffffeb4e493c in cdata_call (cd=0xfffff75fa1c0, args=<optimized out>, kwds=<optimized out>) at src/c/_cffi_backend.c:3229
#8  0x00000000004c2d1c in _PyObject_MakeTpCall ()
#9  0x0000000000563824 in _PyEval_EvalFrameDefault ()
#10 0x0000000000561b74 in PyEval_EvalCode ()
#11 0x000000000059ac14 in ?? ()
#12 0x000000000067f274 in ?? ()
#13 0x000000000067ee48 in _PyRun_SimpleFileObject ()
#14 0x000000000067ec14 in _PyRun_AnyFileObject ()
#15 0x0000000000689c5c in Py_RunMain ()
#16 0x0000000000689818 in Py_BytesMain ()
#17 0x0000fffff7cd84c4 in __libc_start_call_main (main=main@entry=0x5f4df4 <_start+52>, argc=argc@entry=5, 
    argv=argv@entry=0xffffffffed98) at ../sysdeps/nptl/libc_start_call_main.h:58
#18 0x0000fffff7cd8598 in __libc_start_main_impl (main=0x5f4df4 <_start+52>, argc=5, argv=0xffffffffed98, init=<optimized out>, 
    fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>) at ../csu/libc-start.c:360
#19 0x00000000005f4df0 in _start ()
(gdb) continue
Continuing.
Fatal Python error: Segmentation fault

Current thread 0x0000fffff7ff5720 (most recent call first):
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1403 in _cdata_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1394 in _array_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1068 in write
  File "/home/ericu/tmp/crash_soundfile.py", line 25 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cffi_backend (total: 3)

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
Download failed: Invalid argument.  Continuing without source file ./nptl/./nptl/pthread_kill.c.
__pthread_kill_implementation (threadid=281474842449696, signo=signo@entry=11, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
warning: 44 ./nptl/pthread_kill.c: No such file or directory

This gives me both a Python stacktrace and a C language stack trace. I can see that the crash is happening as part of a call to vorbis_analysis_wrote. The documentation for this function says "This function tells the encoder new data is available for compression". In other words, there is nothing unusual about the function call that is happening here. It is part of the expected outcome of encoding a file using the soundfile module in Python. The actual call stack here is

  1. My Python code
  2. The Python soundfile module
  3. Python C foreign function interface
  4. The libsndfile library
  5. The libvorbis library

I can't imagine a scenario where Vorbis output of the soundfile module is just completely broken. So I looked at the number of samples I was trying to write from the source file. This value is 11235535. What I did was try and write blank data at different sample lengths until I found one that works. I tried the following output sizes: 11235535, 8388608, 4194304, 2097152, then 1048576

(soundfileenv) ericu@ericu-raspi5:~/tmp$ SAMPLE_CNT=11235535 python -X faulthandler ./crash_soundfile.py 
Fatal Python error: Segmentation fault

Current thread 0x0000ffffb188b720 (most recent call first):
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1403 in _cdata_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1394 in _array_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1068 in write
  File "/home/ericu/tmp/./crash_soundfile.py", line 25 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cffi_backend (total: 3)
Segmentation fault
(soundfileenv) ericu@ericu-raspi5:~/tmp$ SAMPLE_CNT=8388608 python -X faulthandler ./crash_soundfile.py 
Fatal Python error: Segmentation fault

Current thread 0x0000ffff87597720 (most recent call first):
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1403 in _cdata_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1394 in _array_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1068 in write
  File "/home/ericu/tmp/./crash_soundfile.py", line 25 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cffi_backend (total: 3)
Segmentation fault
(soundfileenv) ericu@ericu-raspi5:~/tmp$ SAMPLE_CNT=4194304 python -X faulthandler ./crash_soundfile.py 
Fatal Python error: Segmentation fault

Current thread 0x0000ffff854b2720 (most recent call first):
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1403 in _cdata_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1394 in _array_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1068 in write
  File "/home/ericu/tmp/./crash_soundfile.py", line 25 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cffi_backend (total: 3)
Segmentation fault
(soundfileenv) ericu@ericu-raspi5:~/tmp$ SAMPLE_CNT=2097152 python -X faulthandler ./crash_soundfile.py 
Fatal Python error: Segmentation fault

Current thread 0x0000ffff95d6b720 (most recent call first):
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1403 in _cdata_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1394 in _array_io
  File "/home/ericu/tmp/soundfileenv/lib/python3.12/site-packages/soundfile.py", line 1068 in write
  File "/home/ericu/tmp/./crash_soundfile.py", line 25 in <module>

Extension modules: numpy._core._multiarray_umath, numpy.linalg._umath_linalg, _cffi_backend (total: 3)
Segmentation fault
(soundfileenv) ericu@ericu-raspi5:~/tmp$ SAMPLE_CNT=1048576 python -X faulthandler ./crash_soundfile.py
(soundfileenv) ericu@ericu-raspi5:~/tmp$ ls -l ./crash.ogg
-rw-rw-r-- 1 ericu ericu 6520 Sep  1 19:29 ./crash.ogg

When I tried to write 1048576 samples, it wrote an output file. The file is of course complete silence, but it does work. So with a small enough input, it should work. What this lead me to believe is that somewhere an internal buffer is being overran inside libvorbis. The "memory access out of bounds" error in the browser console suggested this, but I do not know exactly where this happens. So what I did was to recompile the following libraries in debug mode: libvorbis, libogg, libsndfile, and the Python "soundfile" module.

Now, I was able to launch this again with gdb and produce a backtrace with more data

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000ffffeb06d78c in _preextrapolate_helper (v=0xd3c580) at block.c:420
420   float *work=alloca(v->pcm_current*sizeof(*work));
(gdb) bt
#0  0x0000ffffeb06d78c in _preextrapolate_helper (v=0xd3c580) at block.c:420
#1  0x0000ffffeb06dd48 in vorbis_analysis_wrote (v=0xd3c580, vals=2097152) at block.c:516
#2  0x0000ffffeb22f7c0 in vorbis_write_samples (psf=0xfe1ee0, odata=0xfe3ec0, vdata=0xd3c520, in_frames=2097152)
    at src/ogg_vorbis.c:694
#3  0x0000ffffeb22fd2c in vorbis_write_d (psf=0xfe1ee0, ptr=0xffffe8faf010, lens=4194304) at src/ogg_vorbis.c:791
#4  0x0000ffffeb1fe2f8 in sf_writef_double (sndfile=0xfe1ee0, ptr=0xffffe8faf010, frames=2097152) at src/sndfile.c:2666
#5  0x0000ffffeb2e8050 in ffi_call_SYSV ()
   from /home/ericu/tmp/soundfilelocal/lib/python3.12/site-packages/_cffi_backend.cpython-312-aarch64-linux-gnu.so
#6  0x0000ffffeb2e6b04 in ffi_call_int ()
   from /home/ericu/tmp/soundfilelocal/lib/python3.12/site-packages/_cffi_backend.cpython-312-aarch64-linux-gnu.so
#7  0x0000ffffeb2e493c in cdata_call (cd=0xfffff7423770, args=<optimized out>, kwds=<optimized out>)
    at src/c/_cffi_backend.c:3229
#8  0x00000000004c2d1c in _PyObject_MakeTpCall ()
#9  0x0000000000563824 in _PyEval_EvalFrameDefault ()
#10 0x0000000000561b74 in PyEval_EvalCode ()
#11 0x000000000059ac14 in ?? ()
#12 0x000000000067f274 in ?? ()
#13 0x000000000067ee48 in _PyRun_SimpleFileObject ()
#14 0x000000000067ec14 in _PyRun_AnyFileObject ()
#15 0x0000000000689c5c in Py_RunMain ()
#16 0x0000000000689818 in Py_BytesMain ()
#17 0x0000fffff7cd84c4 in __libc_start_call_main (main=main@entry=0x5f4df4 <_start+52>, argc=argc@entry=6, 
    argv=argv@entry=0xffffffffed08) at ../sysdeps/nptl/libc_start_call_main.h:58
#18 0x0000fffff7cd8598 in __libc_start_main_impl (main=0x5f4df4 <_start+52>, argc=6, argv=0xffffffffed08, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=<optimized out>)
    at ../csu/libc-start.c:360
#19 0x00000000005f4df0 in _start ()

This is the same backtrace, but with debugging data. Since we're in the debugger we can see what is being executed here. The function being called is alloca and it is passed v->pcm_current * sizeof(*work) as the argument. We can figure out the argument values by using the debugger interactively

(gdb) print v->pcm_current
$1 = 2098176
(gdb) print sizeof(*work)
$2 = 4

Since the arguments are multiplied together the value is actually 8392704. The alloca function allocates memory. But crucially the alloca function allocates memory on the stack. This means that the code is trying to allocate 8 megabytes of memory on the stack. The stack size is generally not this large. When the requested amount of stack space exceeds what is available a stack overflow happens. The actual function in libvorbis looks like this

/* from lib/block.c at line 420 in libvorbis-1.3.7 */
static void _preextrapolate_helper(vorbis_dsp_state *v){
  int i;
  int order=16;
  float *lpc=alloca(order*sizeof(*lpc));
  float *work=alloca(v->pcm_current*sizeof(*work));
  long j;
  v->preextrapolate=1;

There are two calls to alloca here, it is the second one that fails. The documentation for alloca actually states "If the allocation causes stack overflow, program behavior is undefined." That meets the observation of what was happening in my case exactly. I'm pretty surprised to see that alloca is being used in libvorbis. Although I am aware of it, I've never written software that uses it. Instead I prefer to pre-allocate buffers using malloc and call free later to release the memory. There are 70 occurrences of alloca in the code, so even refactoring this one instance would likely not make much difference in the usability of this library.

One way to workaround this would be to change my Python code to call the .write function from the SoundFile module in small batches. This would avoid the call to alloca asking for so much stack space at once. However, it turns out the Vorbis audio codec is intended to be replaced by Opus. I don't really care to try and refactor a library that is intended to be replaced anyways. So what I settled on was to do the following when then the input audio is Vorbis

  1. If the sample rate is 8000, 12000, 16000, 24000, or 48000 samples per second use the Opus audio format. These are the only sample rates supported by Opus.
  2. Otherwise, use PCM16 Wave file output.

This solves the problem of the program crashing for me. Given the constraints of running via WebAssembly in the browser this seems like a good enough solution for now.


Copyright Eric Urban 2025, or the respective entity where indicated