🏠

Part VIII: Mastery and Beyond


Chapter 22: Common Pitfalls and How to Avoid Them

You've learned the patterns. You've seen the examples. Now let's talk about the mistakes you'll make anyway—and how to recover from them quickly.

The "Write Before Looking" Mistake

Picture this: You're staring at a tree-sitter node. You need to find all function definitions. Your fingers start typing a recursive function. Twenty minutes later, you have 40 lines of code. Then you check the documentation and discover .children is already a list you can iterate over, and there's a .type field that tells you exactly what you're looking at.

The trap: We love to solve problems. The moment we see tree structure, our brains light up with recursive algorithms. But most libraries already did the hard work.

The fix: Spend two minutes before you write any code. Open a Python shell:

# When you get a new object, investigate it first
print(type(node))
print(dir(node))
help(node)

# Try obvious things
print(node.children)  # Does it have children?
print(list(node))     # Is it iterable?

I still catch myself doing this. Last month I wrote a JSON flattener before realizing pandas.json_normalize() existed. The evidence of my hubris sits in my scratch/ folder.

Real example: A student once spent an hour writing recursive BeautifulSoup traversal to find all links. The .find_all('a') method would have taken one line. But here's the thing—they learned from writing that recursion. The mistake becomes wisdom when you recognize it early next time.

Mixing Access Patterns

This one catches everyone with tree-sitter:

# This doesn't work the way you think
cursor = root.walk()
for child in cursor.node.children:  # ❌ Confused mixing
    process(child)

# Pick one approach
# Option 1: Direct access (usually better)
for child in root.children:  # âś“ Clean and clear
    process(child)

# Option 2: Manual cursor control (when you need memory efficiency)
cursor = root.walk()
if cursor.goto_first_child():
    process(cursor.node)
    while cursor.goto_next_sibling():
        process(cursor.node)

Why this happens: The tree-sitter documentation mentions both patterns. Your brain tries to combine them. I've done this. You've probably done this. The TreeCursor looks like it should be a smart iterator, but it's actually a manual pointer.

The fix: When you start working with a new tree structure, write a small test file. Navigate it both ways. Pick the approach that feels natural for your problem, then stick with it.

Ignoring Library Features

You're searching nested JSON for all occurrences of a key. You write a beautiful recursive function with path tracking. It works. Then someone mentions, "Why didn't you use jsonpath-ng?"

The pattern: We learn to crawl before we walk, so when we see a tree, we reach for recursion. But libraries encode years of edge cases you haven't hit yet.

When custom code is actually better:

When to use the library:

I have a rule: Write the simple version first. If it gets complicated (more than 20 lines, or handling more than two special cases), look for a library.

Infinite Recursion

def find_nodes(node, node_type):
    results = []
    if node.type == node_type:
        results.append(node)
    for child in node.children:
        results.extend(find_nodes(child, node_type))
    return results

This looks fine. It is fine—for trees. But what if your structure has cycles? What if you're traversing HTML with circular references? What if your JSON (through some API quirk) references itself?

The symptom: Your program hangs. You kill it. You add a print statement. You see the same path repeating forever.

The fix pattern:

def find_nodes(node, node_type, visited=None):
    if visited is None:
        visited = set()

    # Use id() for object identity
    node_id = id(node)
    if node_id in visited:
        return []
    visited.add(node_id)

    results = []
    if node.type == node_type:
        results.append(node)
    for child in node.children:
        results.extend(find_nodes(child, node_type, visited))
    return results

But know your domain: File systems don't cycle (unless you follow symlinks). ASTs don't cycle (they're trees by definition). JSON usually doesn't cycle. HTML DOM might cycle through JavaScript. Add cycle protection when you need it, not by default.

Not Handling Missing Data

Real-world data is messy. API responses omit fields. HTML pages have incomplete structure. Configuration files were hand-edited by someone in a hurry.

# Brittle
name = response['user']['profile']['name']

# Fragile
name = response.get('user', {}).get('profile', {}).get('name')

# Robust
def safe_get_name(response):
    try:
        return response['user']['profile']['name']
    except (KeyError, TypeError):
        return None

# Sometimes clearest
name = None
if 'user' in response:
    if 'profile' in response['user']:
        name = response['user']['profile'].get('name')

Which to choose? It depends on your context:

I've learned to ask: "What should happen if this data is missing?" before I write the access code.

Performance Blindness

Your traversal works. It's slow. You add more data—it gets really slow. You profile it and discover you're re-walking the same tree branch 10,000 times.

Common performance mistakes:

# Rebuilding paths repeatedly
for node in all_nodes:
    path = get_path_from_root(node)  # Walks from root every time!

# Not stopping early
def find_first(node, target):
    # Visits EVERY node even after finding target
    for child in node.children:
        results = find_first(child, target)
        if results:
            return results

# Loading everything into memory
all_data = list(walk_entire_tree())  # Gigabytes!

Performance wins:

# Build paths once during traversal
def traverse_with_paths(node, path=()):
    current_path = path + (node.name,)
    yield node, current_path
    for child in node.children:
        yield from traverse_with_paths(child, current_path)

# Stop when found
def find_first(node, target):
    if matches(node, target):
        return node
    for child in node.children:
        result = find_first(child, target)
        if result is not None:
            return result
    return None

# Use generators
def walk_tree(node):
    yield node
    for child in node.children:
        yield from walk_tree(child)

The key insight: Traversal code often looks simple but runs many times. That nested loop you barely notice might execute millions of times on a large tree.


Chapter 23: The Traversal Toolkit

You're debugging at 2 AM. The code isn't working. You don't remember all the patterns from this guide. You need a checklist.

Here it is.

The Five-Step Protocol

Step 1: Explore the Object (30 seconds)

thing = get_mysterious_object()
print(f"Type: {type(thing)}")
print(f"Dir: {[x for x in dir(thing) if not x.startswith('_')]}")

Ask yourself: Is this a built-in structure (dict, list) or a library object?

Step 2: Check Library Features (2 minutes)

Before writing any traversal code, check if the library already solved your problem:

Read the documentation for 2 minutes. Seriously. Just 2 minutes. It will save you 30 minutes of coding.

Step 3: Identify Problem Type (30 seconds)

Which of these are you doing?

Step 4: Choose Your Approach (Think, then code)

For search/collection, your base pattern:

def find_all(node, condition, results=None):
    if results is None:
        results = []

    # Check current
    if condition(node):
        results.append(node)

    # Recurse on children
    children = get_children(node)  # Adapt this line to your structure
    for child in children:
        find_all(child, condition, results)

    return results

Adapt get_children() for your structure:

Step 5: Handle Errors Gracefully

Add this after your first working version:

# What if children don't exist?
children = get_children(node) if has_children(node) else []

# What if the structure is malformed?
try:
    process(node)
except (KeyError, AttributeError, TypeError) as e:
    logger.warning(f"Skipped malformed node: {e}")
    return []

Decision Tree (Printable Reference)

START: I need to work with nested data
│
├─ Do I know the exact path?
│  └─ YES → Use direct access (dict['key']['subkey'])
│     └─ Add error handling if data might be missing
│
├─ NO: I need to search for something
│  │
│  ├─ Does the library provide search?
│  │  └─ YES → Use it (os.walk, .find_all(), etc.)
│  │  └─ NO → Write recursive search
│  │
│  └─ Do I need context while searching?
│     └─ YES → Track state (path, parent, scope)
│     └─ NO → Simple recursive collection
│
└─ Is performance critical?
   ├─ Large data → Use generators, early stopping
   ├─ Memory limited → Use cursors/streaming
   └─ Otherwise → Simple working version first

Quick Patterns By Domain

File System - Finding files by extension:

for root, dirs, files in os.walk(start_path):
    for file in files:
        if file.endswith('.py'):
            yield os.path.join(root, file)

JSON - Finding all occurrences of a key:

def find_key(data, target_key):
    if isinstance(data, dict):
        for key, value in data.items():
            if key == target_key:
                yield value
            yield from find_key(value, target_key)
    elif isinstance(data, list):
        for item in data:
            yield from find_key(item, target_key)

HTML - Getting all text from elements:

for element in soup.find_all('p'):
    if element.name:  # Skip text nodes
        text = element.get_text(strip=True)
        if text:
            yield text

AST - Finding all functions:

def find_functions(node):
    if node.type == 'function_definition':
        yield node
    for child in node.children:
        yield from find_functions(child)

When Something Goes Wrong

Symptom: Nothing gets returned

Symptom: Infinite recursion

Symptom: AttributeError or KeyError

Symptom: Too slow

The Debugging Print Statement

When your traversal isn't working, add this:

def traverse(node, depth=0):
    indent = "  " * depth
    print(f"{indent}Visiting: {type(node).__name__} - {getattr(node, 'type', 'no type')}")
    # ... rest of your code

It shows you exactly what the code is seeing. Remove it when you're done.


Chapter 24: Continuing Your Journey

You understand traversal now. You recognize the pattern across different domains. You know when to use a library and when to write your own. Where do you go from here?

The Next Level: Graph Traversal

Trees are everywhere, but so are graphs—structures with cycles, multiple paths between nodes, and complex relationships. If you've mastered tree traversal, graph traversal is your next challenge.

The key differences:

Start here:

Why it matters: Database query optimization, finding dependencies in code, route planning, recommendation systems—all graph problems.

Performance at Scale

The patterns in this guide work for millions of nodes. But what about billions? What about real-time streams?

Explore:

Libraries to investigate:

Transforming What You've Learned

The mental model you've built applies beyond data structures:

Debugging: Following an exception's stack trace is traversal. The stack is your tree.

System design: Understanding how a web request travels through services—each service is a node, each call is an edge.

Learning itself: When you learn a new concept, you're traversing a knowledge graph. Each idea connects to others. Some paths are direct (this chapter → next chapter). Others require backtracking (remembering earlier patterns to understand advanced ones).

Practice Projects Worth Doing

These projects force you to apply everything:

1. Code Refactoring Tool Use tree-sitter to build a tool that finds old-style Python code and suggests modern alternatives. Hunt for patterns like:

2. Configuration File Merger Write a tool that deep-merges nested JSON/YAML configuration files, handling conflicts intelligently. It needs to:

3. Documentation Link Checker Build a tool that reads Markdown documentation, follows all links (external and internal), and reports broken ones. It traverses:

4. Dependency Graph Visualizer Parse Python files to extract all imports, then build and visualize the dependency graph. You'll traverse:

Resources That Helped Me

I'm not going to list generic tutorials. These are resources that changed how I think about traversal:

Papers:

Code to Read:

Communities:

The Skill You've Really Learned

This guide taught you traversal, but you learned something bigger: pattern recognition across domains.

You now see that a problem in one domain (file systems) looks like a problem in another domain (JSON) once you strip away the surface details. This is transferable thinking.

When you encounter a new tree-like structure—maybe it's a game's scene graph, or a company's organizational chart, or a music notation format—you won't be lost. You'll ask the four questions:

  1. Where am I?
  2. What's here?
  3. Where can I go?
  4. What am I looking for?

Then you'll explore the object, check the library, identify the problem type, and solve it.

That skill transcends programming. You're recognizing underlying structure in complex systems. You're decomposing problems into patterns you've seen before. You're building mental models that work across contexts.

Keep doing this. Not just with data structures—with everything. Look for the patterns. Most problems have been solved before in a different form.

A Final Word

I wrote this guide because I spent years relearning the same lesson: explore before you code, recognize patterns, use what's already built. I hope it saved you some of that time.

But here's the truth: you'll still make these mistakes. You'll still write recursive functions when the library has a method. You'll still forget to handle missing data. I still do.

The difference is you'll catch yourself faster. You'll recognize the pattern. You'll fix it in minutes instead of hours.

That's mastery—not perfection, but speed of recognition and recovery.

Now go traverse something.