Part VIII: Mastery and Beyond
Chapter 22: Common Pitfalls and How to Avoid Them
You've learned the patterns. You've seen the examples. Now let's talk about the mistakes you'll make anyway—and how to recover from them quickly.
The "Write Before Looking" Mistake
Picture this: You're staring at a tree-sitter node. You need to find all function definitions. Your fingers start typing a recursive function. Twenty minutes later, you have 40 lines of code. Then you check the documentation and discover .children is already a list you can iterate over, and there's a .type field that tells you exactly what you're looking at.
The trap: We love to solve problems. The moment we see tree structure, our brains light up with recursive algorithms. But most libraries already did the hard work.
The fix: Spend two minutes before you write any code. Open a Python shell:
# When you get a new object, investigate it first
print(type(node))
print(dir(node))
help(node)
# Try obvious things
print(node.children) # Does it have children?
print(list(node)) # Is it iterable?
I still catch myself doing this. Last month I wrote a JSON flattener before realizing pandas.json_normalize() existed. The evidence of my hubris sits in my scratch/ folder.
Real example: A student once spent an hour writing recursive BeautifulSoup traversal to find all links. The .find_all('a') method would have taken one line. But here's the thing—they learned from writing that recursion. The mistake becomes wisdom when you recognize it early next time.
Mixing Access Patterns
This one catches everyone with tree-sitter:
# This doesn't work the way you think
cursor = root.walk()
for child in cursor.node.children: # ❌ Confused mixing
process(child)
# Pick one approach
# Option 1: Direct access (usually better)
for child in root.children: # âś“ Clean and clear
process(child)
# Option 2: Manual cursor control (when you need memory efficiency)
cursor = root.walk()
if cursor.goto_first_child():
process(cursor.node)
while cursor.goto_next_sibling():
process(cursor.node)
Why this happens: The tree-sitter documentation mentions both patterns. Your brain tries to combine them. I've done this. You've probably done this. The TreeCursor looks like it should be a smart iterator, but it's actually a manual pointer.
The fix: When you start working with a new tree structure, write a small test file. Navigate it both ways. Pick the approach that feels natural for your problem, then stick with it.
Ignoring Library Features
You're searching nested JSON for all occurrences of a key. You write a beautiful recursive function with path tracking. It works. Then someone mentions, "Why didn't you use jsonpath-ng?"
The pattern: We learn to crawl before we walk, so when we see a tree, we reach for recursion. But libraries encode years of edge cases you haven't hit yet.
When custom code is actually better:
- Your traversal logic has unusual requirements (tracking grandparent context)
- The library would require more code than a simple recursive function
- Performance is critical and the library is slow for your use case
- You're learning (writing it once teaches the pattern)
When to use the library:
- It handles errors you haven't thought of
- It's tested on thousands of edge cases
- Your teammates already know it
- The documentation is clear
I have a rule: Write the simple version first. If it gets complicated (more than 20 lines, or handling more than two special cases), look for a library.
Infinite Recursion
def find_nodes(node, node_type):
results = []
if node.type == node_type:
results.append(node)
for child in node.children:
results.extend(find_nodes(child, node_type))
return results
This looks fine. It is fine—for trees. But what if your structure has cycles? What if you're traversing HTML with circular references? What if your JSON (through some API quirk) references itself?
The symptom: Your program hangs. You kill it. You add a print statement. You see the same path repeating forever.
The fix pattern:
def find_nodes(node, node_type, visited=None):
if visited is None:
visited = set()
# Use id() for object identity
node_id = id(node)
if node_id in visited:
return []
visited.add(node_id)
results = []
if node.type == node_type:
results.append(node)
for child in node.children:
results.extend(find_nodes(child, node_type, visited))
return results
But know your domain: File systems don't cycle (unless you follow symlinks). ASTs don't cycle (they're trees by definition). JSON usually doesn't cycle. HTML DOM might cycle through JavaScript. Add cycle protection when you need it, not by default.
Not Handling Missing Data
Real-world data is messy. API responses omit fields. HTML pages have incomplete structure. Configuration files were hand-edited by someone in a hurry.
# Brittle
name = response['user']['profile']['name']
# Fragile
name = response.get('user', {}).get('profile', {}).get('name')
# Robust
def safe_get_name(response):
try:
return response['user']['profile']['name']
except (KeyError, TypeError):
return None
# Sometimes clearest
name = None
if 'user' in response:
if 'profile' in response['user']:
name = response['user']['profile'].get('name')
Which to choose? It depends on your context:
- If missing data is exceptional and you want to know about it: let it raise
- If missing data is common and you have good defaults: use
.get() - If the path is long and you want one check: try-except
- If you want to handle each level differently: explicit checks
I've learned to ask: "What should happen if this data is missing?" before I write the access code.
Performance Blindness
Your traversal works. It's slow. You add more data—it gets really slow. You profile it and discover you're re-walking the same tree branch 10,000 times.
Common performance mistakes:
# Rebuilding paths repeatedly
for node in all_nodes:
path = get_path_from_root(node) # Walks from root every time!
# Not stopping early
def find_first(node, target):
# Visits EVERY node even after finding target
for child in node.children:
results = find_first(child, target)
if results:
return results
# Loading everything into memory
all_data = list(walk_entire_tree()) # Gigabytes!
Performance wins:
# Build paths once during traversal
def traverse_with_paths(node, path=()):
current_path = path + (node.name,)
yield node, current_path
for child in node.children:
yield from traverse_with_paths(child, current_path)
# Stop when found
def find_first(node, target):
if matches(node, target):
return node
for child in node.children:
result = find_first(child, target)
if result is not None:
return result
return None
# Use generators
def walk_tree(node):
yield node
for child in node.children:
yield from walk_tree(child)
The key insight: Traversal code often looks simple but runs many times. That nested loop you barely notice might execute millions of times on a large tree.
Chapter 23: The Traversal Toolkit
You're debugging at 2 AM. The code isn't working. You don't remember all the patterns from this guide. You need a checklist.
Here it is.
The Five-Step Protocol
Step 1: Explore the Object (30 seconds)
thing = get_mysterious_object()
print(f"Type: {type(thing)}")
print(f"Dir: {[x for x in dir(thing) if not x.startswith('_')]}")
Ask yourself: Is this a built-in structure (dict, list) or a library object?
Step 2: Check Library Features (2 minutes)
Before writing any traversal code, check if the library already solved your problem:
- File systems: Does
os.walk()orpathlibhave what you need? - JSON/dicts: Can you access it directly? Do you need
jsonpath-ng? - HTML: What BeautifulSoup methods exist? (
.find_all(),.select(),.descendants) - AST: Does tree-sitter provide the access pattern? (
.children,.child_by_field_name())
Read the documentation for 2 minutes. Seriously. Just 2 minutes. It will save you 30 minutes of coding.
Step 3: Identify Problem Type (30 seconds)
Which of these are you doing?
- Path Navigation: You know exactly where the data is → Direct access
- Search/Collection: You need to find all X in the structure → Recursive traversal
- Structure Exploration: You don't know what's there → Mapping/printing
- Contextual Navigation: The meaning depends on where you are → State tracking
Step 4: Choose Your Approach (Think, then code)
For search/collection, your base pattern:
def find_all(node, condition, results=None):
if results is None:
results = []
# Check current
if condition(node):
results.append(node)
# Recurse on children
children = get_children(node) # Adapt this line to your structure
for child in children:
find_all(child, condition, results)
return results
Adapt get_children() for your structure:
- Files:
os.listdir() - Dicts:
value.values()orvalue.items() - HTML:
element.children - AST:
node.children
Step 5: Handle Errors Gracefully
Add this after your first working version:
# What if children don't exist?
children = get_children(node) if has_children(node) else []
# What if the structure is malformed?
try:
process(node)
except (KeyError, AttributeError, TypeError) as e:
logger.warning(f"Skipped malformed node: {e}")
return []
Decision Tree (Printable Reference)
START: I need to work with nested data
│
├─ Do I know the exact path?
│ └─ YES → Use direct access (dict['key']['subkey'])
│ └─ Add error handling if data might be missing
│
├─ NO: I need to search for something
│ │
│ ├─ Does the library provide search?
│ │ └─ YES → Use it (os.walk, .find_all(), etc.)
│ │ └─ NO → Write recursive search
│ │
│ └─ Do I need context while searching?
│ └─ YES → Track state (path, parent, scope)
│ └─ NO → Simple recursive collection
│
└─ Is performance critical?
├─ Large data → Use generators, early stopping
├─ Memory limited → Use cursors/streaming
└─ Otherwise → Simple working version first
Quick Patterns By Domain
File System - Finding files by extension:
for root, dirs, files in os.walk(start_path):
for file in files:
if file.endswith('.py'):
yield os.path.join(root, file)
JSON - Finding all occurrences of a key:
def find_key(data, target_key):
if isinstance(data, dict):
for key, value in data.items():
if key == target_key:
yield value
yield from find_key(value, target_key)
elif isinstance(data, list):
for item in data:
yield from find_key(item, target_key)
HTML - Getting all text from elements:
for element in soup.find_all('p'):
if element.name: # Skip text nodes
text = element.get_text(strip=True)
if text:
yield text
AST - Finding all functions:
def find_functions(node):
if node.type == 'function_definition':
yield node
for child in node.children:
yield from find_functions(child)
When Something Goes Wrong
Symptom: Nothing gets returned
- Check: Is your condition too strict? Print what you're checking.
- Check: Are you actually recursing? Add a print at the start of the function.
Symptom: Infinite recursion
- Check: Do you have cycles in your data? Add visited tracking.
- Check: Is your base case correct? Print the recursion depth.
Symptom: AttributeError or KeyError
- Check: Are you assuming structure that doesn't exist? Add type checks.
- Check: Did you mix access patterns? (Especially with tree-sitter)
Symptom: Too slow
- Check: Are you building paths repeatedly? Cache them or build during traversal.
- Check: Are you continuing after you found what you need? Add early returns.
- Check: Are you loading everything into memory? Use generators.
The Debugging Print Statement
When your traversal isn't working, add this:
def traverse(node, depth=0):
indent = " " * depth
print(f"{indent}Visiting: {type(node).__name__} - {getattr(node, 'type', 'no type')}")
# ... rest of your code
It shows you exactly what the code is seeing. Remove it when you're done.
Chapter 24: Continuing Your Journey
You understand traversal now. You recognize the pattern across different domains. You know when to use a library and when to write your own. Where do you go from here?
The Next Level: Graph Traversal
Trees are everywhere, but so are graphs—structures with cycles, multiple paths between nodes, and complex relationships. If you've mastered tree traversal, graph traversal is your next challenge.
The key differences:
- Graphs have cycles (you must track visited nodes)
- Graphs have multiple paths to the same node
- Order of traversal matters more (breadth-first vs depth-first)
Start here:
- Study NetworkX for graph manipulation
- Implement Dijkstra's algorithm to understand weighted graphs
- Try traversing social network connections or web link structures
Why it matters: Database query optimization, finding dependencies in code, route planning, recommendation systems—all graph problems.
Performance at Scale
The patterns in this guide work for millions of nodes. But what about billions? What about real-time streams?
Explore:
- Concurrent traversal: Using threads/async to traverse multiple branches simultaneously
- Stream processing: Handling trees that arrive piece by piece (think: parsing massive XML feeds)
- Incremental algorithms: Updating results as the tree changes instead of re-traversing
Libraries to investigate:
- Apache Beam for distributed traversal
- Tree-sitter's incremental parsing
- Graph databases (Neo4j) for persistent, queryable structures
Transforming What You've Learned
The mental model you've built applies beyond data structures:
Debugging: Following an exception's stack trace is traversal. The stack is your tree.
System design: Understanding how a web request travels through services—each service is a node, each call is an edge.
Learning itself: When you learn a new concept, you're traversing a knowledge graph. Each idea connects to others. Some paths are direct (this chapter → next chapter). Others require backtracking (remembering earlier patterns to understand advanced ones).
Practice Projects Worth Doing
These projects force you to apply everything:
1. Code Refactoring Tool Use tree-sitter to build a tool that finds old-style Python code and suggests modern alternatives. Hunt for patterns like:
%string formatting → suggest f-stringsopen()withoutwith→ suggest context managers- Nested
ifstatements → suggest early returns
2. Configuration File Merger Write a tool that deep-merges nested JSON/YAML configuration files, handling conflicts intelligently. It needs to:
- Traverse both structures simultaneously
- Preserve structure while merging values
- Handle arrays, objects, and primitives differently
3. Documentation Link Checker Build a tool that reads Markdown documentation, follows all links (external and internal), and reports broken ones. It traverses:
- File system for Markdown files
- Text content for links
- Web for external URLs
- File system again for relative links
4. Dependency Graph Visualizer Parse Python files to extract all imports, then build and visualize the dependency graph. You'll traverse:
- File system for Python files
- AST for import statements
- The resulting graph to detect cycles
Resources That Helped Me
I'm not going to list generic tutorials. These are resources that changed how I think about traversal:
Papers:
- "The Visitor Pattern" from the Gang of Four book explains a pattern you've been using without knowing it
- Research papers on tree differencing algorithms (how Git detects changes) show advanced traversal applications
Code to Read:
- The source code for
os.walk()in CPython—it's remarkably simple - BeautifulSoup's
find_all()implementation—see how they handle edge cases - Tree-sitter's own traversal code—beautifully optimized C with Python bindings
Communities:
- Python's official discourse (discuss.python.org) has threads on performance optimization
- Stack Overflow questions tagged
recursion+python—seeing others' mistakes accelerates learning
The Skill You've Really Learned
This guide taught you traversal, but you learned something bigger: pattern recognition across domains.
You now see that a problem in one domain (file systems) looks like a problem in another domain (JSON) once you strip away the surface details. This is transferable thinking.
When you encounter a new tree-like structure—maybe it's a game's scene graph, or a company's organizational chart, or a music notation format—you won't be lost. You'll ask the four questions:
- Where am I?
- What's here?
- Where can I go?
- What am I looking for?
Then you'll explore the object, check the library, identify the problem type, and solve it.
That skill transcends programming. You're recognizing underlying structure in complex systems. You're decomposing problems into patterns you've seen before. You're building mental models that work across contexts.
Keep doing this. Not just with data structures—with everything. Look for the patterns. Most problems have been solved before in a different form.
A Final Word
I wrote this guide because I spent years relearning the same lesson: explore before you code, recognize patterns, use what's already built. I hope it saved you some of that time.
But here's the truth: you'll still make these mistakes. You'll still write recursive functions when the library has a method. You'll still forget to handle missing data. I still do.
The difference is you'll catch yourself faster. You'll recognize the pattern. You'll fix it in minutes instead of hours.
That's mastery—not perfection, but speed of recognition and recovery.
Now go traverse something.