01-71-why-static-code-reading-fails

Part I: The Problem Space

7.1 Why Static Code Reading Fails

You open a Django project with 47 installed apps. Someone reports that user passwords aren't being hashed correctly during registration. You need to find where password hashing actually happens.

Your first instinct? Search for password in the codebase. You find 89 files. You grep for set_password. Twelve matches. You open models.py and see:

class User(AbstractBaseUser):

    def set_password(self, raw_password):

        self.password = make_password(raw_password)

Perfect! Except... when does this actually run? You search for calls to set_password. You find references in:

A custom registration view
A password reset handler
Three different form classes
A management command
Something called UserCreationForm from Django's auth

Which one executes during the registration flow you care about? You start reading through registration views. One inherits from CreateView. Another uses UserCreationForm. A third seems to handle OAuth separately. You spend twenty minutes reading code. You think you understand the flow. You're probably wrong.

The illusion of understanding through reading alone

Here's the uncomfortable truth: reading code creates a mental model of what you think happens, not a record of what actually happens. Consider this seemingly simple Django form submission:

# forms.py

class ProfileUpdateForm(forms.ModelForm):

    class Meta:

        model = UserProfile

        fields = ['bio', 'avatar']



    def save(self, commit=True):

        profile = super().save(commit=False)

        profile.last_modified = timezone.now()

        if commit:

            profile.save()

        return profile

Reading this, you conclude: "When the form saves, it updates last_modified and saves the profile." But what you don't see by reading:

A pre_save signal handler that validates the avatar dimensions
A post_save signal handler that invalidates three different cache keys
Middleware that wraps every database write in a transaction
An override of UserProfile.save() that triggers a Celery task
A database trigger that updates a materialized view

You've read the code. You understand the explicit logic. You have no idea what actually executes when that form submits.

This is not a failure of reading skill—it's a fundamental limitation of static analysis. Modern applications are assemblages of explicit code, framework magic, signals, decorators, middleware chains, and implicit behavior. The execution path isn't encoded in the source text. It's determined at runtime by configuration, inheritance hierarchies, decorator stacking order, and framework conventions you might not even know exist.

Let's watch a real developer hit this wall. Sarah inherits a Django project. A user reports: "When I update my profile bio, my notification preferences get reset." Sarah searches for "notification" in ProfileUpdateForm. Nothing. She searches the entire forms file. Nothing. She searches the view that handles the form. Still nothing. She's certain there must be code that does this—she just can't find where reading alone.

The code that resets notifications? It's in a post_save signal handler in a completely different app, registered in apps.py, triggered implicitly by Django when any model saves. You cannot find this by reading the form code. You can only find it by watching what executes.

When documentation doesn't match reality

Documentation lies. Not maliciously—but inevitably. The docs say the authentication flow works one way. The code was refactored twice since those docs were written. Now it works differently.

Real example from a production codebase:

The Documentation Says:

User authentication flow:

1. LoginView receives credentials

2. authenticate() verifies username/password

3. login() creates session

4. User redirected to dashboard

What Actually Happens:

1. LoginView receives credentials

2. Custom middleware intercepts request, checks for SSO headers

3. If SSO headers present, bypass form validation

4. Custom authenticate() override queries three different user tables

5. Signal handler triggers email verification check

6. Another signal handler logs login attempt to audit table

7. Third signal handler checks if user is in beta program

8. Middleware wraps response, sets 4 different cookies

9. JavaScript in template makes AJAX call to /api/user/preferences

10. That API endpoint triggers another Celery task

11. Finally, redirect to dashboard (sometimes)

You read the four-step documented flow. You understand those four steps perfectly. You have no idea about steps 2-10. And when debugging a login failure, those hidden steps are where the bug lives.

Notice the pattern: the documented code is the tip of the iceberg. The real behavior includes:

Middleware that runs before and after every view
Signal handlers that trigger on model events
Decorators that modify function behavior
Base class methods you inherited but never read
Configuration-dependent behavior paths
Third-party packages doing their own magic

You cannot find these by reading. You cannot trust documentation to enumerate them. You can only discover them by watching the system execute.

The "where does this actually run?" problem

Here's a particularly insidious variant. You find a function. You want to know when it executes. You search for calls to it. You find... nothing. Or you find 47 calls and need to know which one runs in your scenario.

Consider this real example:

# utils/notifications.py

def send_notification(user, message, channel='email'):

    """Send notification to user via specified channel."""

    if channel == 'email':

        send_email(user.email, message)

    elif channel == 'sms':

        send_sms(user.phone, message)

    elif channel == 'push':

        send_push_notification(user.device_token, message)

Question: When does this function run? You grep for send_notification and find it called in:

Password reset flow
New message received handler
Comment reply notification
Weekly digest generator
Admin actions
Three different Celery tasks
A management command

But you need to understand: "When I post a comment, why do some users get notified and others don't?" Which of those seven callers handles comment notifications? You read through them. Two seem plausible. You read their implementations. Both could be it. You're not sure which actually executes, or if both do, or if there's a conditional you're missing.

The fundamental problem: function call graphs don't tell you execution graphs. Just because code CAN call a function doesn't mean it DOES in your scenario. Static analysis shows possibilities. You need to see actualities.

This gets worse with dynamic languages and framework conventions:

# views.py

class ArticleViewSet(viewsets.ModelViewSet):

    queryset = Article.objects.all()

    serializer_class = ArticleSerializer

    permission_classes = [IsAuthenticatedOrReadOnly]

When you hit GET /api/articles/, what code executes? There's no explicit view function to read. Django REST Framework generates it based on conventions. Where does permission checking happen? Authentication? Query filtering? Pagination? These are configured declaratively. The execution path exists, but it's not in text you can read—it's generated by framework machinery at runtime.

Case study: Following a Django form submission through 15 files

Let's walk through the exact experience of a developer (let's call him Marcus) trying to understand a real Django form submission. This is reconstructed from actual debugging notes.

The Task: Figure out why submitting the "Create Project" form takes 8 seconds and sometimes fails with a cryptic "IntegrityError: duplicate key value violates unique constraint."

Attempt 1: Reading the form (10 minutes)

Marcus opens forms.py:

class ProjectCreateForm(forms.ModelForm):

    class Meta:

        model = Project

        fields = ['name', 'description', 'team']



    def clean_name(self):

        name = self.cleaned_data['name']

        if Project.objects.filter(name=name).exists():

            raise ValidationError("Project name must be unique")

        return name

This looks straightforward. The form validates name uniqueness. But if it validates, why the IntegrityError? Marcus assumes there must be a race condition. He reads the view.

Attempt 2: Reading the view (15 minutes)

class ProjectCreateView(LoginRequiredMixin, CreateView):

    model = Project

    form_class = ProjectCreateForm

    template_name = 'projects/create.html'

    success_url = reverse_lazy('project_list')

A generic class-based view. Four lines. Where's the form handling logic? Marcus knows Django's CreateView handles form submission, but WHERE exactly? He's read Django tutorials. He knows conceptually that form_valid() gets called. But what does form_valid() actually DO in this case?

He opens Django's source code for CreateView. It inherits from ModelFormMixin. He reads that. It calls form.save(). Okay, so ProjectCreateForm.save() must run. But the form doesn't override save(), so it uses the default ModelForm.save(). What does that do? He starts reading Django's ModelForm implementation. Fifteen minutes in, he's lost in Django's source code, several inheritance levels deep.

Attempt 3: Searching for signals (20 minutes)

Marcus remembers that Django projects often use signals. Maybe something is hooked into post_save? He searches the codebase for post_save and Project. He finds:

# signals.py (in 'projects' app)

@receiver(post_save, sender=Project)

def create_default_categories(sender, instance, created, **kwargs):

    if created:

        for category_name in ['Development', 'Testing', 'Documentation']:

            Category.objects.create(project=instance, name=category_name)

Aha! When a project is created, it auto-creates three categories. Could that be related to the slowness? But wait, this is triggered AFTER the project saves. The IntegrityError happens ON the project save, not after. He keeps searching.

# signals.py (in 'teams' app)

@receiver(post_save, sender=Project)

def add_creator_to_team(sender, instance, created, **kwargs):

    if created:

        TeamMembership.objects.create(

            team=instance.team,

            user=get_current_user(),  # ??? How does this work?

            role='owner'

        )

Another signal. This one creates a team membership. Marcus notices get_current_user()—that seems like it shouldn't work outside a request context. He searches for that function.

# middleware.py

from threading import local



_thread_locals = local()



def get_current_user():

    return getattr(_thread_locals, 'user', None)



class ThreadLocalUserMiddleware:

    def __init__(self, get_response):

        self.get_response = get_response



    def __call__(self, request):

        _thread_locals.user = request.user

        response = self.get_response(request)

        return response

Oh no. Thread-local user storage. Marcus knows this is a code smell, but it's here, and it works (usually). Could thread locals cause the IntegrityError? He's now 45 minutes into reading code, and he's discovered:

A form with validation
A generic view
Two signal handlers
Thread-local middleware

He STILL doesn't know:

What exact order these execute in
Where the 8 seconds is spent
What causes the IntegrityError
If there are MORE signal handlers he hasn't found
What other middleware runs
If there are model save() overrides
If there are database constraints beyond the code

Attempt 4: Adding print statements (30 minutes, abandoned)

Marcus starts adding prints:

def clean_name(self):

    name = self.cleaned_data['name']

    print(f"VALIDATING NAME: {name}")

    if Project.objects.filter(name=name).exists():

        print(f"NAME EXISTS: {name}")

        raise ValidationError("Project name must be unique")

    print(f"NAME OK: {name}")

    return name

He adds prints to both signal handlers. He adds prints in the middleware. He submits the form. His console output:

MIDDLEWARE: User = marcus@example.com

VALIDATING NAME: New Project

NAME OK: New Project

(8 second pause)

Created default categories for project: New Project

Added creator to team

Wait. The validation happens instantly. Then there's an 8-second pause BEFORE the first signal handler. What happens during those 8 seconds? The print statements can't tell him. He'd need to add prints inside Django's ModelForm.save() method? Inside the ORM's save() method? He could do that, but it feels wrong to modify installed packages.

He tries adding more prints, but realizes he's playing whack-a-mole. Every print statement requires a code change, server restart, form submission, log reading, hypothesis formation, and another print statement. After the tenth iteration, he's exhausted and still doesn't have a complete picture.

The Actual Problem (revealed later with proper tools)

When Marcus finally used the Django Debug Toolbar and a debugger, he discovered the actual execution flow:

Form validation runs (instant)
CreateView.form_valid() calls form.save()
ModelForm.save() calls Project.save()
Project.save() was overridden in models.py to call a method generate_unique_slug()
generate_unique_slug() does this:

```python

def generate_unique_slug(self):

   base_slug = slugify(self.name)

   slug = base_slug

   counter = 1

   while Project.objects.filter(slug=slug).exists():

       slug = f"{base_slug}-{counter}"

       counter += 1

   self.slug = slug

```

In a database with 50,000 projects and poor indexing, this query runs slowly (0.5 seconds × 15 iterations = 7.5 seconds)
After slug generation completes, the project saves
Two signal handlers run (the ones he found)
A THIRD signal handler (in the 'analytics' app, which he never searched) logs the creation to an external analytics service—and that network call adds another 0.5 seconds

The IntegrityError? It happened when two users submitted forms with the same project name simultaneously. Both passed the clean_name() validation check (no existing project yet), both entered the slug generation loop, both saved with different slugs but the same name—and the database had a unique constraint on the name field that the Django model didn't declare.

Marcus spent 75 minutes reading code and adding print statements. He discovered three of the eight components involved in the execution flow. He never found the third signal handler. He never saw the overridden save() method (it was inherited from a base class in a different file). He never understood the timing.

This is why static code reading fails. The execution flow of a modern web application is an emergent property of code, configuration, framework conventions, and runtime state. You cannot reconstruct it by reading. You can only observe it by tracing.

This isn't a story about a bad codebase (though the thread-local middleware is questionable). This is a story about the normal complexity of Django applications. Every project Marcus works on has:

Class-based views with implicit behavior
Signal handlers registered in multiple apps
Middleware chains
Model method overrides in base classes
Third-party packages doing their own things
Database constraints not reflected in models

Reading code gives you partial understanding. Tracing execution gives you complete understanding. The difference isn't academic—it's the difference between 75 minutes of frustrated guessing and 10 minutes of systematic discovery.