7.15 Case Study 3: FastAPI Background Task Failure
The Setup: Your FastAPI application processes uploaded files in the background. Users upload a CSV, get an immediate response, and receive an email when processing completes. This worked perfectly in development, but in production, background tasks fail silently about 30% of the time. No exceptions in logs, no error emails, no indication of what went wrong.
The code looks straightforward:
from fastapi import FastAPI, BackgroundTasks, UploadFile
import asyncio
app = FastAPI()
@app.post("/upload")
async def upload_file(file: UploadFile, background_tasks: BackgroundTasks):
# Save file to disk
content = await file.read()
filepath = f"/tmp/{file.filename}"
with open(filepath, "wb") as f:
f.write(content)
# Process in background
background_tasks.add_task(process_csv, filepath)
return {"status": "processing"}
async def process_csv(filepath: str):
async with DatabaseConnection() as db:
data = parse_csv(filepath)
await db.insert_many(data)
await send_completion_email()
The initial confusion: You add logging to every step:
async def process_csv(filepath: str):
print(f"Starting process_csv for {filepath}")
async with DatabaseConnection() as db:
print("Database connected")
data = parse_csv(filepath)
print(f"Parsed {len(data)} rows")
await db.insert_many(data)
print("Data inserted")
await send_completion_email()
print("Email sent")
In production logs, you see:
Starting process_csv for /tmp/data.csv
Database connected
Parsed 250 rows
Then... nothing. The task stops after parsing. No error, no exception, no "Data inserted" log. The process just vanishes.
You try wrapping everything in try-except:
async def process_csv(filepath: str):
try:
async with DatabaseConnection() as db:
data = parse_csv(filepath)
await db.insert_many(data)
await send_completion_email()
except Exception as e:
print(f"Error: {e}")
raise
Still no error appears in logs. The try-except never catches anything. How can code fail without raising an exception?
After 6 hours of adding more logging, inspecting database connections, checking file permissions, and restarting services, you're stuck. The task fails silently, and you have no way to see what's happening inside that async context manager.
This is the "silent failure in async code" problem—one of the most frustrating debugging scenarios in modern Python.
Problem: Background task fails silently
Let's solve this properly with the right tools in 30 minutes.
Phase 1: Understanding async task execution (10 minutes)
First, recognize the core issue: FastAPI background tasks run in the same event loop as your request handling, but they're not monitored the same way. If an exception occurs in a background task after the response is sent, there's no HTTP response to attach it to, and the exception might be suppressed depending on how the event loop handles it.
The key question: Is the task actually failing, or is it just not completing? Let's find out with py-spy.
Install py-spy (if not already installed):
pip install py-spy
py-spy is a sampling profiler that can attach to running Python processes without modifying code or restarting. It shows you exactly what functions are executing at any moment.
Start your FastAPI application in one terminal:
uvicorn main:app --host 0.0.0.0 --port 8000
Find the process ID:
ps aux | grep uvicorn
# Output: user 12345 0.5 0.1 ... python -m uvicorn main:app
The process ID is 12345 in this example.
Upload a file to trigger the background task:
curl -F "file=@test.csv" http://localhost:8000/upload
Immediately attach py-spy while the background task should be running:
sudo py-spy dump --pid 12345
py-spy dump takes a snapshot of all threads and their current call stacks. You see:
Thread 0x7f8b2c3d4700 (active): "MainThread"
File "asyncio/base_events.py", line 1823, in _run_once
File "asyncio/events.py", line 80, in _run
File "starlette/background.py", line 42, in __call__
File "main.py", line 18, in process_csv
async with DatabaseConnection() as db:
File "database.py", line 34, in __aenter__
self.conn = await asyncpg.connect(...)
File "asyncpg/connection.py", line 156, in connect
File "asyncio/selector_events.py", line 829, in _read_ready
# Waiting for connection...
This is the key insight: The background task is stuck waiting in DatabaseConnection().__aenter__(). It's not failing—it's hanging during the database connection attempt.
This explains why you never saw "Data inserted" in your logs. The code never got past the async with DatabaseConnection() line. But why does asyncpg.connect() hang instead of raising a timeout error?
Check the DatabaseConnection implementation:
class DatabaseConnection:
def __init__(self):
self.conn = None
async def __aenter__(self):
self.conn = await asyncpg.connect(
host=os.getenv("DB_HOST"),
database=os.getenv("DB_NAME"),
user=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD")
# Missing: timeout parameter!
)
return self.conn
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.conn.close()
The connection has no timeout. When the database is overloaded or network is slow, asyncpg.connect() waits indefinitely. In production, database connection pools might be exhausted, causing new connections to hang forever waiting for an available slot.
But there's a second problem: even if connection succeeded, you need to check if the async context manager is being used correctly.
Phase 2: Async context manager validation (10 minutes)
Add more detailed logging with exception handling that actually works for async:
import traceback
import sys
async def process_csv(filepath: str):
try:
print(f"Starting process_csv for {filepath}", flush=True)
print(f"Creating DatabaseConnection...", flush=True)
db_conn = DatabaseConnection()
print(f"Entering context manager...", flush=True)
async with db_conn as db:
print("Database connected", flush=True)
data = parse_csv(filepath)
print(f"Parsed {len(data)} rows", flush=True)
await db.insert_many(data)
print("Data inserted", flush=True)
await send_completion_email()
print("Email sent", flush=True)
except Exception as e:
print(f"Exception caught: {type(e).__name__}: {e}", flush=True)
traceback.print_exc(file=sys.stdout)
# Don't just print - actually log somewhere persistent
raise
finally:
print("process_csv complete (finally block)", flush=True)
Notice the flush=True parameter—this is crucial. Without it, print buffering might delay logs until the process ends, making you think code didn't execute when it actually did.
After adding this and uploading another file, you see:
Starting process_csv for /tmp/data.csv
Creating DatabaseConnection...
Entering context manager...
Still hangs at the same place. But now let's use py-spy in a different mode—continuous recording:
sudo py-spy record -o profile.svg --pid 12345 --duration 30
This records the call stack every 10 milliseconds for 30 seconds and generates a flamegraph. Upload a file, wait 30 seconds, then open profile.svg in a browser.
The flamegraph shows:
process_csv (100% of time)
└─ DatabaseConnection.__aenter__ (100% of time)
└─ asyncpg.connect (100% of time)
└─ asyncio.selector_events._read_ready (100% of time)
The task spends 100% of its time waiting for the database connection. This confirms it's not a bug in your code logic—it's a connection timeout/configuration issue.
Phase 3: Finding the actual bug (10 minutes)
Now that you know the connection is the problem, add a timeout and see what error actually occurs:
async def __aenter__(self):
try:
self.conn = await asyncio.wait_for(
asyncpg.connect(
host=os.getenv("DB_HOST"),
database=os.getenv("DB_NAME"),
user=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD")
),
timeout=10.0 # Add 10 second timeout
)
return self.conn
except asyncio.TimeoutError:
print("Database connection timeout!")
raise
Upload a file again. After 10 seconds, you see:
Starting process_csv for /tmp/data.csv
Creating DatabaseConnection...
Entering context manager...
Database connection timeout!
Exception caught: TimeoutError
Good! Now you're getting an actual error. But check the production configuration. You discover:
# production.env
DB_HOST=localhost
DB_NAME=production_db
DB_USER=app_user
DB_PASSWORD=secret123
Wait—DB_HOST=localhost? But in production, the database isn't running on localhost. It's running in a separate container. The environment variable is wrong!
The correct value should be DB_HOST=postgres-container (the Docker service name). The connection was trying to reach localhost:5432 where no database exists, hanging indefinitely waiting for a response that would never come.
But here's the actual async gotcha. Even after fixing the host, you discover another issue. Look at this code again:
@app.post("/upload")
async def upload_file(file: UploadFile, background_tasks: BackgroundTasks):
content = await file.read()
filepath = f"/tmp/{file.filename}"
with open(filepath, "wb") as f: # Synchronous file I/O in async function!
f.write(content)
background_tasks.add_task(process_csv, filepath)
return {"status": "processing"}
The with open() is synchronous I/O that blocks the event loop. In production with many concurrent uploads, this blocks all other async tasks. Better approach:
import aiofiles
@app.post("/upload")
async def upload_file(file: UploadFile, background_tasks: BackgroundTasks):
content = await file.read()
filepath = f"/tmp/{file.filename}"
async with aiofiles.open(filepath, "wb") as f: # Async file I/O
await f.write(content)
background_tasks.add_task(process_csv, filepath)
return {"status": "processing"}
The complete bug: The actual problem was multi-layered:
-
Wrong database host in environment variables (localhost vs container name)
-
No connection timeout, causing tasks to hang silently forever
-
Synchronous I/O in async functions, blocking the event loop
-
No structured exception logging for background tasks
All four issues combined to create silent failures that were nearly impossible to debug with print statements alone.
Tools used: py-spy + custom exception logging
py-spy gave you:
-
Call stack snapshots showing exactly where code was stuck (
asyncpg.connect) -
Flamegraph visualization showing time spent in each function (100% in connection)
-
No code changes required—attach to running process
-
Minimal overhead—sampling profiler, not tracing profiler
-
Works in production—safe to use on live systems
Custom exception logging (done right) gave you:
-
Async-aware exception handling—catches exceptions in async context
-
Explicit flush=True—ensures logs appear immediately
-
Traceback printing—shows full call stack when errors occur
-
Finally blocks—confirms whether code completed or hung
Why print debugging failed:
Print statements showed:
Starting process_csv
Database connected (never appeared)
This tells you the code stops somewhere between starting and "connected", but:
-
❌ Doesn't show if it's hanging or erroring
-
❌ Doesn't show the call stack (what function is blocking)
-
❌ Doesn't show timing (is it slow or stuck?)
-
❌ Doesn't reveal the environment misconfiguration
py-spy revealed:
-
âś… Code is hanging (not erroring)
-
âś… Exact location:
asyncpg.connect()waiting for network response -
âś… Timing: spending 100% of time there
-
âś… Led you to check connection configuration
The workflow that worked:
-
Recognize silent failure pattern
-
Attach
py-spyto see where code is stuck -
Identify hanging operation (database connection)
-
Add timeout to force error instead of hang
-
See actual error message
-
Discover configuration issue
Discovery: Async context manager not awaited correctly
Let's dig deeper into what "not awaited correctly" actually means, because this is a subtle async programming error that catches even experienced developers.
The problematic pattern:
# This looks correct but has a subtle bug
async def process_csv(filepath: str):
async with DatabaseConnection() as db:
# If an exception occurs here...
data = parse_csv(filepath) # Synchronous, might raise
await db.insert_many(data)
If parse_csv() raises an exception, the async context manager's __aexit__ runs:
async def __aexit__(self, exc_type, exc_val, exc_tb):
await self.conn.close() # Closes the connection properly
But here's the gotcha: If the connection never completes in __aenter__, the context manager never fully "enters", so your code inside the async with block never runs. The task just hangs in __aenter__ forever.
Another common mistake:
# Creating the context manager without awaiting
db = DatabaseConnection() # This doesn't connect yet
conn = await db.__aenter__() # Manual await - don't do this!
The proper way:
# Let Python handle the async context manager protocol
async with DatabaseConnection() as db:
# Python automatically:
# 1. Calls await DatabaseConnection().__aenter__()
# 2. Assigns result to 'db'
# 3. Runs your code block
# 4. Calls await DatabaseConnection().__aexit__() even if exception occurs
pass
The fix for this case study:
class DatabaseConnection:
def __init__(self, timeout=10.0):
self.conn = None
self.timeout = timeout
async def __aenter__(self):
try:
# Wrap connection with timeout
self.conn = await asyncio.wait_for(
asyncpg.connect(
host=os.getenv("DB_HOST", "localhost"),
database=os.getenv("DB_NAME"),
user=os.getenv("DB_USER"),
password=os.getenv("DB_PASSWORD"),
command_timeout=60 # Query timeout
),
timeout=self.timeout # Connection timeout
)
return self.conn
except asyncio.TimeoutError as e:
# Log properly for background tasks
import logging
logging.error(f"Database connection timeout after {self.timeout}s")
raise ConnectionError(f"Could not connect to database within {self.timeout} seconds") from e
except Exception as e:
logging.error(f"Database connection failed: {e}")
raise
async def __aexit__(self, exc_type, exc_val, exc_tb):
if self.conn:
try:
await asyncio.wait_for(self.conn.close(), timeout=5.0)
except asyncio.TimeoutError:
# Force close if graceful close hangs
self.conn.terminate()
return False # Don't suppress exceptions
Additional async gotchas discovered:
- Background tasks don't propagate exceptions by default:
# Exceptions in background tasks are swallowed!
background_tasks.add_task(process_csv, filepath)
# Better: Add exception handler
async def safe_process_csv(filepath):
try:
await process_csv(filepath)
except Exception as e:
logging.exception(f"Background task failed for {filepath}")
# Send alert, store error, etc.
background_tasks.add_task(safe_process_csv, filepath)
- Async functions must be awaited in background tasks:
# WRONG: Adds coroutine object, doesn't run it
background_tasks.add_task(process_csv(filepath))
# RIGHT: Adds function reference, FastAPI awaits it
background_tasks.add_task(process_csv, filepath)
-
Event loop differences between dev and prod:
-
Development:
uvicorn --reload(restarts on code changes, masks some issues) -
Production:
uvicornwith multiple workers (different event loop per worker)
Time to fix: 30 minutes with tools, 6 hours without
Time breakdown with proper tools:
-
10 minutes: Attach
py-spy, identify hanging location -
5 minutes: Add connection timeout, reproduce error
-
10 minutes: Discover environment variable misconfiguration
-
5 minutes: Fix DatabaseConnection with proper error handling
Total: 30 minutes from silent failure to deployed fix.
Time spent without tools (the first 6 hours):
-
2 hours: Adding print statements throughout code
-
1 hour: Trying different exception handling patterns
-
1 hour: Manually testing database connections from Python shell
-
1 hour: Reading asyncpg documentation looking for gotchas
-
1 hour: Restarting services, checking logs, examining Docker networking
Total: 6 hours of frustration with no clear progress.
Why the 12Ă— time difference?
Without py-spy:
-
You're debugging blind—you don't know if code is running, hanging, or erroring
-
Print statements don't appear when code hangs
-
Async hangs are nearly impossible to diagnose with logging alone
-
You chase wrong hypotheses (checking database permissions, file I/O, etc.)
With py-spy:
-
First run shows exactly where code is stuck
-
Flamegraph quantifies the problem (100% time in connection)
-
You immediately focus on the actual issue (connection config)
-
No code changes needed for diagnosis
The key lesson: Async debugging requires runtime inspection tools. Print statements don't show you:
-
What the event loop is doing
-
Where async operations are blocking
-
Whether code is waiting vs running vs failed
py-spy reveals the complete picture instantly.
When to use py-spy:
-
âś… Silent failures in production
-
âś… Hanging async operations
-
âś… Performance profiling without code changes
-
âś… Understanding where time is spent
-
âś… Diagnosing deadlocks or stuck tasks
When py-spy isn't enough:
-
❌ Logic errors in your code (use debugger)
-
❌ Data flow problems (use debugger with breakpoints)
-
❌ One-time crashes (use exception logging)
-
❌ Race conditions (use async debugging tools like
asynciodebug mode)
This case study demonstrates that modern Python async code requires modern debugging tools. The patterns that worked for synchronous Python (print debugging, try-except logging) are insufficient for async. Tools like py-spy bridge that gap.