Appendix E: Glossary
Core Concepts
Abstract Syntax Tree (AST)
A tree representation of the syntactic structure of source code. Each node represents a construct in the code (like a function definition, variable assignment, or expression). Unlike the raw text, an AST understands the hierarchical structure and meaning of the code.
Example: The code x = 1 + 2 becomes a tree with an assignment node at the root, containing a variable node (x) and an addition expression node (containing 1 and 2).
Ancestor
Any node above the current node in the tree hierarchy. A node's parent, grandparent, great-grandparent, etc. are all ancestors. In a file system, /home and /home/user are ancestors of /home/user/documents/file.txt.
Artifact
In Tree-sitter terminology, the parsed tree itself is the artifact. More generally, any output produced by traversal (a list of results, a transformed structure, a report) can be called an artifact.
Base Case
In recursive functions, the condition that stops recursion. Without a base case, recursion continues infinitely. Example: "if this is a file (not a directory), stop recursing."
Breadth-First Search (BFS)
A traversal strategy that explores all nodes at depth N before moving to depth N+1. Processes siblings before children. Uses a queue. Think of ripples spreading out from a stone dropped in water.
Example order: Root → All children of root → All grandchildren → etc.
Child
A node directly connected to and one level below another node. In a file system, files and subdirectories within a directory are its children. In an AST, the parameters of a function are children of the function definition node.
Cursor
In Tree-sitter, a lightweight pointer to a position in the tree. Unlike iterating through .children (which creates Python list objects), a cursor moves through the tree without extra memory allocation. Used for memory-efficient traversal of very large trees.
Cycle
A path through a structure that returns to a previously visited node. Trees by definition have no cycles, but graphs do. A symbolic link that points back to a parent directory creates a cycle in a file system.
Defensive Programming
Writing code that anticipates and handles unexpected inputs, missing data, and error conditions. In traversal: checking if a node has children before iterating, using .get() instead of direct dictionary access, wrapping operations in try-except blocks.
Depth
The distance from the root node. The root has depth 0, its children have depth 1, grandchildren have depth 2, etc. Also called "level."
Depth-First Search (DFS)
A traversal strategy that explores as deeply as possible before backtracking. Processes children before siblings. Uses recursion or a stack. Think of exploring a maze by always taking the first unexplored path.
Example order: Root → First child → First grandchild → ... (until leaf) → Backtrack to explore siblings.
Descendant
Any node below the current node in the tree hierarchy. Children, grandchildren, great-grandchildren, etc. are all descendants. In a file system, all files and folders nested within a directory are its descendants.
Document Object Model (DOM)
A tree representation of an HTML or XML document. Each HTML tag becomes a node in the tree. The DOM allows programs to navigate and manipulate web page structure.
Early Stopping
Optimization technique where traversal terminates as soon as the desired result is found, rather than visiting all nodes. Example: When searching for the first file over 1GB, stop as soon as you find one.
Edge
In graph terminology, the connection between two nodes. In trees, the parent-child relationship is an edge. In file systems, the "contains" relationship between directory and file is an edge.
Field Name
In Tree-sitter, a semantic name for a child node's role. Instead of accessing children by numeric index (node.children[0]), you can access by field name (node.child_by_field_name('name')). Makes code more robust to grammar changes.
Generator
A Python function that uses yield instead of return. Produces values lazily (one at a time) rather than building a complete list in memory. Essential for memory-efficient traversal of large structures.
Key difference: return [results] gives you everything at once; yield result gives you items one by one.
Graph
A general structure of nodes connected by edges, where cycles are allowed and multiple paths may exist between nodes. Trees are a special case of graphs (connected, acyclic).
Irregular Tree
A tree where nodes don't follow a consistent pattern. HTML is irregular: some tags have attributes, some have text children, some have element children, and the structure varies widely between documents. Contrast with a regular tree like a binary search tree where every node follows the same pattern.
Iterable
Any Python object that can be iterated over in a for loop. Lists, tuples, generators, and many custom objects (including Tree-sitter node children lists) are iterables.
Leaf Node
A node with no children. In a file system, files are leaves (they contain data but don't contain other files). In an AST, identifier tokens and literals are often leaves.
Named Node
In Tree-sitter, a node representing a semantic element of the language (function, variable, class). Contrast with unnamed nodes which represent syntax like punctuation ((, :, }). Usually you only care about named nodes when analyzing code.
Nested Structure
Any data structure that contains instances of itself. Dictionaries containing dictionaries, lists containing lists, directories containing directories, HTML tags containing HTML tags.
Node
A single element in a tree. Contains data and references to children (and sometimes parent). In different domains: a file/directory in file systems, a dictionary/value in JSON, an HTML tag in DOM, a code construct in AST.
Parent
The node directly above another node in the tree hierarchy. Every node except the root has exactly one parent. In a file system, /home/user is the parent of /home/user/documents.
Path
A sequence of nodes from one node to another, typically from root to a target node. Can be represented as a list of keys or indices. Example JSON path: ['users', 0, 'profile', 'email'] represents navigating through data['users'][0]['profile']['email'].
Recursive Function
A function that calls itself. Essential for tree traversal because trees are recursive structures (a tree contains trees). Must have a base case to prevent infinite recursion.
Pattern:
def process(node):
# Base case
if is_leaf(node):
return result
# Recursive case
for child in node.children:
process(child) # Call itself
Root Node
The topmost node in a tree, with no parent. In a file system, / or C:\. In JSON, the outermost dictionary or array. In an AST, the module or program node. All traversals begin at the root.
Schema
The expected structure of data. In JSON, the schema defines what keys exist, what types their values are, and what nesting structure exists. When you know the schema, you can use direct access (data['known']['path']). When you don't, you need search traversal.
Sibling
Nodes that share the same parent. In a file system, all files and folders in the same directory are siblings. In an AST, all statements in a function body are siblings.
Stack
A Last-In-First-Out (LIFO) data structure. Used for iterative depth-first traversal (add children to stack, pop to process). Python's function call stack also uses this structure, which is why recursive functions work for DFS.
State Tracking
Maintaining information about where you are during traversal. Examples: the current depth, the path taken to reach current node, which parent scope you're in, what section of a document you're processing. Essential for context-aware traversal.
Terminal Node
Another term for leaf node. A node that terminates a branch of the tree.
Text Node
In HTML DOM, a node containing text content rather than an HTML element. Between <p> and </p>, the text "Hello" is a text node. Text nodes don't have a .name attribute, which can cause errors if not checked.
Traversal
The process of systematically visiting nodes in a tree or graph. The core pattern involves: knowing where you are, examining what's there, deciding where to go next, and knowing when to stop.
Tree
A hierarchical data structure consisting of nodes connected by edges, where:
- There is exactly one root node
- Every node except the root has exactly one parent
- No cycles exist
- Every node is reachable from the root
Examples: File systems, JSON objects, HTML DOM, ASTs, organizational charts, family trees.
Tree-sitter
A parser generator tool and incremental parsing library. Creates ASTs from source code in many languages. Known for speed and accuracy. Used in code editors, analysis tools, and syntax highlighting.
Type Annotation
In Python, hints about what types variables and functions should have. Example: def find(node: Node) -> List[Node]:. In Tree-sitter, the .type property returns the type of a node as a string (like 'function_definition' or 'identifier').
Visitor Pattern
A design pattern for separating traversal logic from processing logic. You traverse the structure once, but "visit" nodes with different handler functions depending on node type. Example: One handler for function definitions, another for class definitions.
Walk
Common terminology for traversal. "Walking the tree" means visiting all nodes. In Python, os.walk() walks the file system tree.
Whitespace
In code parsing, spaces, tabs, and newlines. ASTs often ignore whitespace (it doesn't affect meaning), but Tree-sitter preserves it (position information is byte-accurate). Text nodes in HTML often contain only whitespace.
Domain-Specific Terms
CSS Selector
A pattern for selecting HTML elements. Examples: div.main (divs with class "main"), #content (element with id "content"), p > a (links directly inside paragraphs). BeautifulSoup's .select() method uses CSS selectors.
Docstring
Python documentation string. First statement in a function/class if it's a string literal. Used to document what code does. Tools can extract docstrings to generate documentation automatically.
Field (Tree-sitter)
A named role for a child in a specific node type. Function definitions have fields for name, parameters, body, return_type. Provides semantic access to tree structure.
Glob Pattern
A pattern for matching file paths using wildcards. Examples: *.txt (all text files), **/*.py (all Python files recursively). The * matches anything, ** matches any number of directory levels.
JSONPath
A query language for JSON, similar to XPath for XML. Example: $.store.book[*].author finds all book authors. The jsonpath-ng library implements this for Python.
Parser
A program that reads text (code, HTML, etc.) and builds a structured representation (usually a tree). Tree-sitter is a parser for code. BeautifulSoup includes parsers for HTML.
Query (Tree-sitter)
An S-expression pattern for matching nodes in an AST. More powerful than simple type checking. Example: (function_definition name: (identifier) @func_name) captures all function names.
S-expression
A notation for nested list structures using parentheses. Tree-sitter queries use S-expressions. Example: (call function: (identifier) @func_name) matches function calls and captures the function name.
Semantic
Related to meaning rather than syntax. Semantic nodes in Tree-sitter represent meaningful language constructs. Semantic access means accessing nodes by their role (what they mean) rather than position.
Symlink (Symbolic Link)
A file that points to another file or directory. Following symlinks during traversal can create cycles. os.walk(path, followlinks=True) will follow symlinks.
Tag (HTML)
An HTML element like <div>, <p>, <a>. The tag name is the string like "div" or "p". In BeautifulSoup, element.name returns the tag name.
Token
In parsing, the smallest meaningful unit. Keywords, operators, identifiers, and literals are tokens. In Tree-sitter, unnamed nodes often represent syntax tokens like punctuation.
Common Acronyms
AST - Abstract Syntax Tree
BFS - Breadth-First Search
CSS - Cascading Style Sheets (but CSS Selector is the relevant meaning here)
DFS - Depth-First Search
DOM - Document Object Model
HTML - HyperText Markup Language
JSON - JavaScript Object Notation
LIFO - Last In, First Out (stack behavior)
XML - eXtensible Markup Language
Related Concepts Not in This Guide
These terms are related to traversal but not covered in depth:
Backtracking - Returning to a previous state when current path doesn't work
Graph Algorithms - Dijkstra's, A*, topological sort, etc.
Inorder/Preorder/Postorder - Binary tree traversal orderings
Memoization - Caching results to avoid recomputation
Tree Balancing - Keeping tree height minimal (AVL, Red-Black trees)
Trie - Tree structure for prefix-based search
These are natural next topics after mastering basic tree traversal.