🏠

Part VI: Advanced Patterns and Best Practices


Chapter 15: Problem Type Recognition

You've now worked with file systems, JSON, HTML, and ASTs. You've seen os.walk(), BeautifulSoup, and tree-sitter. But here's the deeper lesson: these aren't four separate skills. They're one skill wearing different hats.

The true mastery isn't memorizing each library's API. It's recognizing what kind of problem you're solving, then choosing the right tool for that problem type.

The Four Problem Types Revisited

Let's start simple. Every nested data problem falls into one of four categories:

1. Path Navigation
You know exactly where you're going. You have a map.

# You know the structure
user_email = api_response['data']['user']['contact']['email']

# You know the node type you want
function_body = func_node.child_by_field_name('body')

This is the easiest problem. Use direct access. Don't overcomplicate it.

2. Search/Collection
You're looking for something, but you don't know where it is. You need to explore.

# Find all functions named 'helper' anywhere in the code
# Find all JSON keys named 'price' at any nesting level
# Find all <a> tags with class 'external'

This needs traversal—either recursive or using a library method that does it for you.

3. Structure Exploration
You don't even know what you're looking for yet. You're mapping the territory.

# What fields does this API response have?
# What node types appear in this AST?
# What's the hierarchy of this HTML?

You need to print, inspect, and understand before you can write real code.

4. Contextual Navigation
The hard one. What you do depends on where you are or where you've been.

# Extract table data, but only from the "Results" section
# Count function calls, but track which function they're inside
# Process JSON differently based on parent keys

This requires tracking state as you traverse.

A Real Example: Let's Classify Together

Imagine you're given this task:

"Parse Python files and find all function calls to print(), but only report them if they're inside a function whose name starts with test_."

Before you write any code, let's think through this:

What's the problem type?

This is Type 4: Contextual Navigation.

What does this tell us?

Now let's write it. First, let's explore what we're working with:

import tree_sitter_python as tspython
from tree_sitter import Language, Parser

# Setup (you'd do this once)
PY_LANGUAGE = Language(tspython.language())
parser = Parser(PY_LANGUAGE)

code = b"""
def test_basic():
    print("Starting test")
    result = calculate()
    print(f"Result: {result}")

def calculate():
    print("This shouldn't be reported")
    return 42
"""

tree = parser.parse(code)
root = tree.root_node

# Let's explore the structure first
def print_tree(node, depth=0):
    print("  " * depth + node.type)
    for child in node.children:
        print_tree(child, depth + 1)

print_tree(root)

Run this. Look at the output. You'll see function_definition nodes, call nodes, and the structure becomes clear.

Now we can build our solution:

def find_test_prints(node, source, current_function=None):
    """Find print() calls inside test_ functions."""
    results = []

    # Are we entering a new function?
    if node.type == 'function_definition':
        name_node = node.child_by_field_name('name')
        if name_node:
            func_name = source[name_node.start_byte:name_node.end_byte].decode('utf8')
            # Update context only if it's a test function
            if func_name.startswith('test_'):
                current_function = func_name
            else:
                current_function = None  # Don't track non-test functions

    # Are we at a print call?
    if node.type == 'call' and current_function:
        func_node = node.child_by_field_name('function')
        if func_node:
            func_text = source[func_node.start_byte:func_node.end_byte].decode('utf8')
            if func_text == 'print':
                line_num = source[:node.start_byte].decode('utf8').count('\n') + 1
                results.append({
                    'function': current_function,
                    'line': line_num,
                    'call': source[node.start_byte:node.end_byte].decode('utf8')
                })

    # Continue traversing
    for child in node.children:
        results.extend(find_test_prints(child, source, current_function))

    return results

# Use it
prints = find_test_prints(root, code)
for p in prints:
    print(f"{p['function']} (line {p['line']}): {p['call']}")

What did we just do?

  1. Recognized the problem type (contextual navigation)
  2. Explored the structure first
  3. Built a simple version that tracks one piece of context
  4. Used recursive traversal with state passing

This is the pattern. Problem recognition → exploration → implementation.

The Recognition Matrix

Here's how to quickly classify problems:

If you need to... Problem Type Approach
Get a known field/path Path Navigation Direct access
Find all X anywhere Search/Collection Recursive traverse or library method
Understand structure Structure Exploration Print/inspect
Track "where am I?" Contextual Navigation Recursive + state parameter

Exercise: Classification Practice

For each scenario, identify the problem type and the approach you'd use:

  1. "Extract all email addresses from a JSON API response"
  2. "Get the username from: config['server']['auth']['username']"
  3. "Find function definitions that contain more than 3 print statements"
  4. "List all possible keys that appear in this deeply nested JSON"
  5. "Extract all links from <a> tags, but only those in the main article, not the sidebar"

Answers:

  1. Type 2 (Search) if structure unknown; Type 1 (Path) if you know emails are always at data[i]['email']
  2. Type 1 (Path Navigation) - direct access
  3. Type 4 (Contextual) - need to track "inside which function?" and count
  4. Type 3 (Structure Exploration) - you're mapping the territory
  5. Type 4 (Contextual) - need to know which section you're in

Chapter 16: Performance and Optimization

Let's talk about a mistake many developers make: they optimize too early.

Here's the truth: most traversal code doesn't need to be optimized. You're usually processing a file, an API response, or a single HTML page. These are small. Your first version will probably be fast enough.

But when it's not fast enough, you need to know what to do.

Start By Measuring

Never optimize without measuring. Here's a simple pattern:

import time

def time_it(func, *args):
    start = time.time()
    result = func(*args)
    elapsed = time.time() - start
    print(f"{func.__name__} took {elapsed:.3f} seconds")
    return result

Or use Python's cProfile:

import cProfile
import pstats

profiler = cProfile.Profile()
profiler.enable()

# Your traversal code here
traverse_large_structure(data)

profiler.disable()
stats = pstats.Stats(profiler)
stats.sort_stats('cumulative')
stats.print_stats(10)  # Top 10 slowest functions

Run this first. Find out where the time actually goes. You might be surprised.

Memory: The Generator Pattern

Let's say you're finding all Python files in a large directory. Here's the obvious way:

def find_python_files_slow(root_dir):
    """Returns a list of all .py files"""
    results = []
    for dirpath, dirnames, filenames in os.walk(root_dir):
        for filename in filenames:
            if filename.endswith('.py'):
                results.append(os.path.join(dirpath, filename))
    return results

# This loads ALL files into memory at once
files = find_python_files_slow('/large/codebase')
for f in files:
    process(f)

If there are 10,000 files, you're storing 10,000 paths in memory at once.

Now watch this:

def find_python_files_fast(root_dir):
    """Yields .py files one at a time"""
    for dirpath, dirnames, filenames in os.walk(root_dir):
        for filename in filenames:
            if filename.endswith('.py'):
                yield os.path.join(dirpath, filename)

# This processes one file at a time
for f in find_python_files_fast('/large/codebase'):
    process(f)

The only change: yield instead of append + return. But the memory difference is huge.

When to use generators:

When lists are fine:

Early Stopping: Don't Do More Work Than Needed

Imagine finding the first function in a file:

# Slow: traverses the entire tree
def find_first_function_slow(node):
    results = []
    if node.type == 'function_definition':
        results.append(node)
    for child in node.children:
        results.extend(find_first_function_slow(child))
    return results[0] if results else None

This visits every node, even after finding what you want.

# Fast: stops when found
def find_first_function_fast(node):
    if node.type == 'function_definition':
        return node

    for child in node.children:
        result = find_first_function_fast(child)
        if result:  # Found it! Stop searching
            return result

    return None

The fast version stops as soon as it finds a match. If the first function is at line 10, it doesn't traverse lines 11-1000.

Depth Limiting: When You Don't Need to Go Deep

Sometimes you only care about top-level structure:

def find_top_level_functions(node, max_depth=2):
    """Only search the first few levels"""
    results = []

    def search(node, depth):
        if depth > max_depth:
            return

        if node.type == 'function_definition':
            results.append(node)

        for child in node.children:
            search(child, depth + 1)

    search(node, 0)
    return results

This is especially useful for deeply nested JSON or HTML where you know your target is near the top.

Caching: When You Traverse Repeatedly

Let's say you're analyzing a codebase and need to find functions by name multiple times:

# Without caching: re-traverses every time
def find_function(root, name):
    # ... traverse tree looking for function with 'name' ...
    pass

# Called multiple times - slow!
find_function(root, 'initialize')
find_function(root, 'process')
find_function(root, 'cleanup')

Better approach: traverse once, cache results:

def build_function_map(root, source):
    """Traverse once, build a lookup dict"""
    functions = {}

    def traverse(node):
        if node.type == 'function_definition':
            name_node = node.child_by_field_name('name')
            if name_node:
                name = source[name_node.start_byte:name_node.end_byte].decode('utf8')
                functions[name] = node

        for child in node.children:
            traverse(child)

    traverse(root)
    return functions

# Build once
func_map = build_function_map(root, source)

# Look up many times - fast!
init_func = func_map.get('initialize')
process_func = func_map.get('process')
cleanup_func = func_map.get('cleanup')

The TreeCursor Memory Optimization

For very large ASTs, tree-sitter's TreeCursor can reduce memory:

def count_nodes_with_list(node):
    """Creates a list of children - uses more memory"""
    count = 1
    for child in node.children:  # Creates list of all children
        count += count_nodes_with_list(child)
    return count

def count_nodes_with_cursor(node):
    """Uses cursor - less memory"""
    cursor = node.walk()
    count = 1

    if cursor.goto_first_child():
        count += count_nodes_with_cursor(cursor.node)
        while cursor.goto_next_sibling():
            count += count_nodes_with_cursor(cursor.node)

    return count

But honestly? Only use TreeCursor if you're processing massive files (10,000+ line files) and profiling shows memory is an issue. The direct .children approach is clearer and fast enough for most cases.

Real-World Example: Optimizing a Slow Script

Let's walk through a real optimization:

# Original: Slow for large codebases
def find_complex_functions(directory):
    results = []

    # Issue 1: Traverses file system multiple times
    for file in find_all_python_files(directory):
        with open(file, 'rb') as f:
            code = f.read()

        tree = parser.parse(code)

        # Issue 2: Traverses AST multiple times
        functions = find_all_functions(tree.root_node, code)
        for func in functions:
            lines = count_lines(func, code)
            calls = count_function_calls(func, code)

            # Issue 3: Does work even for simple functions
            if lines > 50 or calls > 10:
                results.append({
                    'file': file,
                    'name': get_function_name(func, code),
                    'lines': lines,
                    'calls': calls
                })

    return results

Optimization 1: Single AST traversal

def analyze_function(node, source):
    """Do all analysis in one pass"""
    lines = source[node.start_byte:node.end_byte].decode('utf8').count('\n')

    # Count calls while traversing
    calls = 0
    def count_calls(n):
        nonlocal calls
        if n.type == 'call':
            calls += 1
        for child in n.children:
            count_calls(child)

    count_calls(node)

    return lines, calls

Optimization 2: Early filtering

# Check line count first (fast)
lines, calls = analyze_function(func, code)
if lines <= 50 and calls <= 10:
    continue  # Skip simple functions early

Optimization 3: Generator for files

def find_complex_functions(directory):
    for file in find_all_python_files(directory):  # Already a generator
        # Process one file at a time
        # Memory usage stays constant

Result: 10x faster, uses 1/10th the memory.

When Not to Optimize

Remember: premature optimization is the root of all evil. Don't optimize if:

Write clear code first. Optimize only when you measure a real problem.

Exercise: Find the Bottleneck

Here's a slow function:

def extract_all_strings(root, source):
    """Extract all string literals from AST"""
    results = []

    def traverse(node):
        if node.type == 'string':
            text = source[node.start_byte:node.end_byte].decode('utf8')
            results.append(text)

        # Get all children, then filter
        for child in node.children:
            traverse(child)

    traverse(root)

    # Remove duplicates
    unique = []
    for s in results:
        if s not in unique:
            unique.append(s)

    return sorted(unique)

Your task: Identify the performance issues and rewrite this function.

Hints - The traversal itself is fine - The duplicate removal is O(n²) - Sorting at the end might not be necessary - Consider using a set

Chapter 17: Error Handling and Robustness

Here's a hard truth: your first version will break on real data.

It doesn't matter how carefully you write your traversal code. Real-world data is messy:

Let's learn to write code that doesn't crash when reality hits.

The Defensive Mindset

Compare these two approaches:

# Optimistic (crashes on messy data)
def get_user_email_bad(data):
    return data['users'][0]['contact']['email']

# Defensive (handles missing data)
def get_user_email_good(data):
    users = data.get('users')
    if not users or len(users) == 0:
        return None

    contact = users[0].get('contact')
    if not contact:
        return None

    return contact.get('email')

The good version is longer, yes. But it won't crash when:

But wait—there's a cleaner way for simple path navigation:

def get_nested_value(data, *keys, default=None):
    """Safely navigate nested dicts"""
    for key in keys:
        if isinstance(data, dict):
            data = data.get(key)
        elif isinstance(data, list) and isinstance(key, int):
            data = data[key] if 0 <= key < len(data) else None
        else:
            return default

        if data is None:
            return default

    return data

# Use it
email = get_nested_value(data, 'users', 0, 'contact', 'email', default='unknown')

Try-Except: When and How

Don't use try-except to hide logic errors. This is bad:

# Bad: Catching your own mistakes
try:
    result = node.children[5].children[2].text  # What if structure differs?
except:
    result = None

Do use try-except for things outside your control:

# Good: Handling I/O errors
try:
    with open(filepath, 'rb') as f:
        content = f.read()
except PermissionError:
    print(f"Cannot access {filepath}: permission denied")
    return None
except FileNotFoundError:
    print(f"File not found: {filepath}")
    return None

# Good: Handling encoding issues
try:
    text = content.decode('utf-8')
except UnicodeDecodeError:
    # Try alternate encoding
    text = content.decode('latin-1', errors='replace')

Be specific with exceptions:

# Bad: Too broad
try:
    process_data(data)
except Exception:  # Catches everything!
    pass

# Good: Specific handling
try:
    process_data(data)
except KeyError as e:
    print(f"Missing required key: {e}")
except ValueError as e:
    print(f"Invalid value format: {e}")

Handling Missing Data in Traversals

Let's build a robust JSON traversal:

def find_all_values(data, target_key):
    """Find all values for a key, handling any data type"""
    results = []

    def traverse(obj, path="root"):
        # Handle None
        if obj is None:
            return

        # Handle dictionaries
        if isinstance(obj, dict):
            for key, value in obj.items():
                if key == target_key:
                    results.append({
                        'path': f"{path}.{key}",
                        'value': value
                    })
                # Recurse into value
                traverse(value, f"{path}.{key}")

        # Handle lists
        elif isinstance(obj, list):
            for i, item in enumerate(obj):
                traverse(item, f"{path}[{i}]")

        # Primitives (strings, numbers, booleans) - base case
        # No action needed, just return

    traverse(data)
    return results

This handles:

HTML Traversal Robustness

BeautifulSoup is forgiving, but you still need to be careful:

def extract_article_text(soup):
    """Extract text, handling missing elements gracefully"""
    article = soup.find('article')
    if not article:
        print("Warning: No <article> tag found")
        return ""

    # Some elements might be text nodes, not tags
    paragraphs = []
    for elem in article.descendants:
        # Check if it's a tag (not a text node)
        if hasattr(elem, 'name') and elem.name == 'p':
            text = elem.get_text(strip=True)
            if text:  # Skip empty paragraphs
                paragraphs.append(text)

    return '\n\n'.join(paragraphs)

AST Traversal with Error Recovery

Tree-sitter is robust, but your code needs to handle edge cases:

def extract_function_info(node, source):
    """Extract function details with error handling"""
    if node.type != 'function_definition':
        raise ValueError(f"Expected function_definition, got {node.type}")

    # Name might be missing (syntax error in source)
    name_node = node.child_by_field_name('name')
    name = "unknown"
    if name_node:
        try:
            name = source[name_node.start_byte:name_node.end_byte].decode('utf-8')
        except UnicodeDecodeError:
            name = "decode_error"

    # Parameters might be missing
    params_node = node.child_by_field_name('parameters')
    param_count = 0
    if params_node:
        # Count parameter nodes
        param_count = sum(1 for child in params_node.children
                          if child.type == 'identifier')

    # Body is required, but might have errors
    body_node = node.child_by_field_name('body')
    line_count = 0
    if body_node:
        body_text = source[body_node.start_byte:body_node.end_byte].decode('utf-8', errors='replace')
        line_count = body_text.count('\n')

    return {
        'name': name,
        'params': param_count,
        'lines': line_count
    }

Logging for Debugging

When traversal fails, you need to know where and why:

import logging

logging.basicConfig(level=logging.DEBUG)
logger = logging.getLogger(__name__)

def traverse_with_logging(node, depth=0):
    """Traverse with detailed logging"""
    indent = "  " * depth
    logger.debug(f"{indent}Visiting: {node.type}")

    try:
        # Your traversal logic
        for child in node.children:
            traverse_with_logging(child, depth + 1)

    except Exception as e:
        logger.error(f"{indent}Error at {node.type}: {e}")
        logger.error(f"{indent}Position: {node.start_point}")
        raise

For file system traversal:

def traverse_directory_safe(root_dir):
    """Traverse with error reporting"""
    for dirpath, dirnames, filenames in os.walk(root_dir):
        # Handle permission errors on directories
        dirnames[:] = [d for d in dirnames
                       if os.access(os.path.join(dirpath, d), os.R_OK)]

        for filename in filenames:
            filepath = os.path.join(dirpath, filename)
            try:
                # Process file
                with open(filepath, 'r') as f:
                    content = f.read()
                    process(content)

            except PermissionError:
                logger.warning(f"Permission denied: {filepath}")
            except UnicodeDecodeError:
                logger.warning(f"Encoding error: {filepath}")
            except Exception as e:
                logger.error(f"Unexpected error in {filepath}: {e}")

Exercise: Make It Bulletproof

Here's fragile code:

def extract_prices(html):
    """Extract all prices from product listings"""
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.find_all('div', class_='product')

    prices = []
    for product in products:
        price_text = product.find('span', class_='price').text
        # Expect format: "$19.99"
        price = float(price_text.replace('$', ''))
        prices.append(price)

    return prices

Your task: Rewrite this to handle:


Chapter 18: Designing Reusable Traversal Functions

You've written traversal code for specific problems. Now let's talk about writing traversal code that you'll use again and again.

The goal: write a function once, use it everywhere.

The Generic Traversal Pattern

Here's the skeleton of most traversal problems:

def generic_traverse(root, condition_func, action_func):
    """
    Traverse a tree structure.

    Args:
        root: Starting node/dict/object
        condition_func: (node) -> bool - should we process this?
        action_func: (node) -> Any - what to do if condition is true

    Returns:
        List of results from action_func
    """
    results = []

    def traverse(node):
        # Check condition
        if condition_func(node):
            result = action_func(node)
            if result is not None:
                results.append(result)

        # Get children (customize per data type)
        children = get_children(node)
        for child in children:
            traverse(child)

    traverse(root)
    return results

Now you can use it for different problems:

# Find all function definitions
functions = generic_traverse(
    root=ast_root,
    condition_func=lambda n: n.type == 'function_definition',
    action_func=lambda n: extract_function_info(n, source)
)

# Find all large files
large_files = generic_traverse(
    root=root_dir,
    condition_func=lambda path: os.path.isfile(path) and os.path.getsize(path) > 1_000_000,
    action_func=lambda path: path
)

The Visitor Pattern

For more complex cases, use the visitor pattern:

class ASTVisitor:
    """Base class for AST traversal with specific handlers"""

    def visit(self, node):
        """Main entry point"""
        method_name = f'visit_{node.type}'
        visitor = getattr(self, method_name, self.generic_visit)
        return visitor(node)

    def generic_visit(self, node):
        """Default: just traverse children"""
        for child in node.children:
            self.visit(child)

    # Subclasses override specific visit_* methods

class FunctionCollector(ASTVisitor):
    def __init__(self, source):
        self.source = source
        self.functions = []

    def visit_function_definition(self, node):
        """Called automatically for function nodes"""
        name_node = node.child_by_field_name('name')
        if name_node:
            name = self.source[name_node.start_byte:name_node.end_byte].decode('utf8')
            self.functions.append(name)

        # Continue traversing children
        self.generic_visit(node)

# Use it
collector = FunctionCollector(source)
collector.visit(root)
print(collector.functions)

This scales beautifully:

class CodeAnalyzer(ASTVisitor):
    def __init__(self, source):
        self.source = source
        self.functions = []
        self.classes = []
        self.imports = []

    def visit_function_definition(self, node):
        # Handle functions
        self.functions.append(self.extract_function_info(node))
        self.generic_visit(node)

    def visit_class_definition(self, node):
        # Handle classes
        self.classes.append(self.extract_class_info(node))
        self.generic_visit(node)

    def visit_import_statement(self, node):
        # Handle imports
        self.imports.append(self.extract_import_info(node))
        # No need to traverse import children

Generator vs. List Returns

Return a list when:

Yield results when:

# List version: loads everything
def find_all_functions(root):
    functions = []
    # ... traverse and append ...
    return functions

# Generator version: yields one at a time
def find_all_functions(root):
    def traverse(node):
        if node.type == 'function_definition':
            yield node
        for child in node.children:
            yield from traverse(child)

    yield from traverse(root)

# Using the generator
for func in find_all_functions(root):
    if is_what_i_need(func):
        return func  # Can stop early!

Configuration Objects for Complex Traversals

When your traversal has many options, use a config object:

from dataclasses import dataclass
from typing import Callable, Optional

@dataclass
class TraversalConfig:
    """Configuration for file system traversal"""
    max_depth: Optional[int] = None
    follow_symlinks: bool = False
    include_hidden: bool = False
    file_filter: Optional[Callable[[str], bool]] = None
    error_handler: Optional[Callable[[str, Exception], None]] = None

def traverse_directory(root, config: TraversalConfig):
    """Flexible directory traversal"""
    def should_include(filename):
        if not config.include_hidden and filename.startswith('.'):
            return False
        if config.file_filter:
            return config.file_filter(filename)
        return True

    def handle_error(path, error):
        if config.error_handler:
            config.error_handler(path, error)
        else:
            print(f"Error: {path}: {error}")

    # ... implementation using config ...

# Use it
config = TraversalConfig(
    max_depth=3,
    include_hidden=False,
    file_filter=lambda f: f.endswith('.py'),
    error_handler=lambda path, err: logging.warning(f"Skipped {path}: {err}")
)

traverse_directory('/my/project', config)

Type Hints for Clarity

Type hints make your reusable functions easier to use:

from typing import TypeVar, Callable, List, Iterator, Optional, Any

T = TypeVar('T')  # Generic type variable

def find_all(
    root: Any,
    predicate: Callable[[Any], bool],
    get_children: Callable[[Any], Iterator[Any]]
) -> List[Any]:
    """
    Generic tree traversal.

    Args:
        root: The root node to start from
        predicate: Function that returns True for nodes to collect
        get_children: Function that returns an iterable of child nodes

    Returns:
        List of all nodes where predicate returns True

    Example:
        # Find all dicts with 'error' key
        errors = find_all(
            root=data,
            predicate=lambda x: isinstance(x, dict) and 'error' in x,
            get_children=lambda x: x.values() if isinstance(x, dict)
                                   else x if isinstance(x, list)
                                   else []
        )
    """
    results = []

    def traverse(node):
        if predicate(node):
            results.append(node)

        for child in get_children(node):
            traverse(child)

    traverse(root)
    return results

Documentation Matters

Good documentation turns your traversal function into a tool others (including future you) can use:

def extract_functions(
    root_node,
    source: bytes,
    include_nested: bool = True,
    min_lines: int = 0
) -> List[dict]:
    """
    Extract function information from a Python AST.

    This function traverses a tree-sitter AST and collects metadata
    about all function definitions found.

    Args:
        root_node: The root node of the AST (from tree.root_node)
        source: The original source code as bytes
        include_nested: If True, includes functions defined inside other
                       functions or classes. If False, only top-level
                       functions are returned.
        min_lines: Only return functions with at least this many lines

    Returns:
        A list of dictionaries, each containing:
            - 'name': Function name (str)
            - 'line_start': Starting line number (int)
            - 'line_end': Ending line number (int)
            - 'params': List of parameter names (List[str])
            - 'is_nested': Whether function is nested (bool)

    Example:
        >>> tree = parser.parse(source_code)
        >>> functions = extract_functions(tree.root_node, source_code)
        >>> for func in functions:
        ...     print(f"{func['name']} ({func['line_end'] - func['line_start']} lines)")
        calculate (5 lines)
        process_data (23 lines)

    Note:
        Functions with syntax errors may have incomplete information.
        The 'name' field will be 'unknown' if the name cannot be extracted.
    """
    # Implementation...

Building a Traversal Library

Let's put it all together into a reusable module:

# traversal_tools.py
"""
Reusable traversal utilities for trees, nested dicts, and file systems.
"""

from typing import Any, Callable, Iterator, List, Optional, TypeVar
from dataclasses import dataclass

T = TypeVar('T')

@dataclass
class TraversalResult:
    """Container for traversal results with metadata"""
    value: Any
    path: str
    depth: int

def traverse_dict(
    data: dict,
    predicate: Optional[Callable[[str, Any], bool]] = None,
    max_depth: Optional[int] = None
) -> Iterator[TraversalResult]:
    """
    Traverse nested dictionaries/lists yielding matching items.

    Args:
        data: The dictionary to traverse
        predicate: Function(key, value) -> bool. If None, yields all items.
        max_depth: Maximum depth to traverse. None for unlimited.

    Yields:
        TraversalResult objects for matching items
    """
    def traverse(obj, path="root", depth=0):
        if max_depth is not None and depth > max_depth:
            return

        if isinstance(obj, dict):
            for key, value in obj.items():
                current_path = f"{path}.{key}"

                if predicate is None or predicate(key, value):
                    yield TraversalResult(value, current_path, depth)

                yield from traverse(value, current_path, depth + 1)

        elif isinstance(obj, list):
            for i, item in enumerate(obj):
                current_path = f"{path}[{i}]"
                yield from traverse(item, current_path, depth + 1)

    yield from traverse(data)

def find_by_key(data: dict, target_key: str) -> List[Any]:
    """
    Find all values for a specific key anywhere in nested structure.

    This is a convenience wrapper around traverse_dict.

    Args:
        data: Nested dictionary/list structure
        target_key: The key to search for

    Returns:
        List of all values found for that key

    Example:
        >>> data = {'user': {'email': 'a@b.com'}, 'admin': {'email': 'c@d.com'}}
        >>> find_by_key(data, 'email')
        ['a@b.com', 'c@d.com']
    """
    results = []
    for result in traverse_dict(data, predicate=lambda k, v: k == target_key):
        results.append(result.value)
    return results

def find_by_value(data: dict, target_value: Any) -> List[str]:
    """
    Find all paths to a specific value in nested structure.

    Args:
        data: Nested dictionary/list structure
        target_value: The value to search for

    Returns:
        List of paths where the value was found

    Example:
        >>> data = {'a': 1, 'b': {'c': 1}}
        >>> find_by_value(data, 1)
        ['root.a', 'root.b.c']
    """
    paths = []
    for result in traverse_dict(data, predicate=lambda k, v: v == target_value):
        paths.append(result.path)
    return paths

class TreeVisitor:
    """
    Base class for custom tree traversals using the visitor pattern.

    Subclass this and override visit_* methods to handle specific node types.
    """

    def __init__(self, get_children: Callable[[Any], Iterator[Any]]):
        """
        Args:
            get_children: Function that returns children for any node
        """
        self.get_children = get_children

    def visit(self, node, depth=0):
        """
        Visit a node and all its descendants.

        Override visit_<type> methods in subclasses for custom handling.
        """
        # Try to call specific visitor method
        node_type = getattr(node, 'type', type(node).__name__)
        method_name = f'visit_{node_type}'
        visitor = getattr(self, method_name, self.generic_visit)

        visitor(node, depth)

        # Traverse children
        for child in self.get_children(node):
            self.visit(child, depth + 1)

    def generic_visit(self, node, depth):
        """
        Default visitor. Override this or specific visit_* methods.
        """
        pass

# Example usage in comments:
"""
# Using traverse_dict:
data = {...}
for result in traverse_dict(data, max_depth=3):
    print(f"{result.path}: {result.value}")

# Using find_by_key:
emails = find_by_key(api_response, 'email')

# Using TreeVisitor:
class MyVisitor(TreeVisitor):
    def __init__(self):
        super().__init__(get_children=lambda node: node.children)
        self.function_count = 0

    def visit_function_definition(self, node, depth):
        self.function_count += 1

visitor = MyVisitor()
visitor.visit(ast_root)
print(visitor.function_count)
"""

Exercise: Build Your Own Utility

Create a reusable function that:

  1. Traverses any tree-like structure
  2. Collects statistics (node count by type, max depth, etc.)
  3. Can be configured to stop early
  4. Handles errors gracefully
  5. Has clear documentation and type hints

Here's a starting point:

from typing import Any, Callable, Dict
from dataclasses import dataclass

@dataclass
class TreeStats:
    """Statistics collected during tree traversal"""
    total_nodes: int = 0
    max_depth: int = 0
    node_type_counts: Dict[str, int] = None

    def __post_init__(self):
        if self.node_type_counts is None:
            self.node_type_counts = {}

def analyze_tree(
    root: Any,
    get_type: Callable[[Any], str],
    get_children: Callable[[Any], list],
    max_nodes: Optional[int] = None
) -> TreeStats:
    """
    Analyze tree structure and collect statistics.

    Your implementation here...
    """
    pass

Bonus challenge: Add timing information and memory usage tracking.


Closing Thoughts on Part VI

You've learned to recognize problem types, optimize when needed, handle errors gracefully, and design reusable functions.

But here's the real lesson: these patterns are everywhere.

Once you see the traversal pattern in file systems, you'll see it in:

The specifics change. The pattern doesn't.

In the next part, we'll see this pattern in action with real-world case studies—showing you how professionals approach these problems from start to finish.