How the Equation Evolver Works
A deep dive into Genetic Programming — how equations are represented as trees, compete for survival, and evolve through crossover and mutation to fit your data.
The Hidden Physics Problem
Imagine you have a set of experimental measurements — raw X/Y data points scattered across a chart. You suspect there's a mathematical law governing the relationship, but you don't know what it is. Is it quadratic? Trigonometric? Some exotic combination of both?
Traditional curve fitting (like polynomial regression) requires you to guess the form of the equation first, then fit its parameters. Symbolic regression flips this on its head: it discovers both the structure and the parameters simultaneously. The Equation Evolver does this using a Genetic Algorithm — inspired by biological evolution.
Equations as Trees (AST)
The key insight is representing mathematical equations as Abstract Syntax Trees (ASTs). Instead of treating y = 3 * sin(x) + 2 as a flat string, we represent it as a tree:
(+)
/ \
(*) 2
/ \
3 sin
|
xEach node in the tree is either a terminal (a number like 3.2 or the variable x) or an operator (+, -, *, /, ^, sin, cos). This tree structure is what makes genetic operations possible — you can swap, mutate, and recombine branches just like DNA.
The Genetic Algorithm
The algorithm follows Darwin's principles of natural selection:
- Population. We start with a population of random equation trees — hundreds of completely random mathematical expressions. Most are garbage. That's fine.
- Fitness. Each equation is evaluated against your data by computing the Mean Squared Error (MSE) — the average squared difference between what the equation predicts and the actual data points. Lower MSE = better fit = higher "fitness."
- Selection. We use Tournament Selection: pick 3 random individuals, and the fittest one wins the right to reproduce. This creates selection pressure without being too aggressive.
- Crossover. Two parent trees swap random subtrees — like genetic recombination. A subtree from Parent A replaces a subtree in Parent B, and vice versa. This is how useful building blocks (like
sin(x)) get combined with others (likex²). - Mutation. Random changes are applied: an operator might flip from
+to*, a constant might be nudged, or an entire subtree might be replaced with a new random one. This prevents the population from getting stuck. - Elitism. The top 2 equations from each generation survive unchanged into the next — ensuring we never lose our best solution.
What You See on the Canvas
The visualization makes the search process visible:
- Glowing cyan dots are your raw data points.
- The bright gradient curve (violet → cyan) is the current best-fit equation — the fittest individual in the population.
- The faint white ghost curves are the top 5 runner-up equations. These show you the algorithm "searching" — trying different shapes and structures to find the right one. Watch them converge as the population evolves.
Protected Operators
Real-world genetic programming has a problem: randomly generated equations frequently produce NaN and Infinity. Dividing by zero, raising negative numbers to fractional powers, and exponential blowup can kill an entire population.
The engine uses protected operators to handle this gracefully:
- Protected division: if the denominator is near-zero, it returns
1instead of crashing. - Clamped exponents: the
^operator clamps its exponent to the range [-5, 5], preventing runaway growth. - Depth limits: trees are capped at 6 levels deep to prevent "bloat" — equations that grow enormous without improving fitness.
Cleaning Up the Math
Evolution is a messy process. The genetic algorithm will often evolve mathematically correct but incredibly chaotic formulas, like y = 0.503 * x * (x + 0.367 * (1.61 - 5.15 / x)). While technically accurate, it is unreadable to a human.
To fix this, the algorithm applies a final Algebraic Expansion when it finishes. It mathematically multiplies everything out and cancels redundant variables. Finally, it uses a smart rounding intuition — if a constant is extremely close to a clean number (like 0.4968), it snaps it to 0.5, mimicking how a human mathematician would write the final equation. This turns biological chaos into a beautifully clean, human-readable polynomial like y = 0.5 * x² + 0.3 * x - 1, complete with a step-by-step derivation trace.
Evolution Controls Explained
- Population Size — This is the size of our "ecosystem." A larger population means more genetic diversity and a higher chance of finding the perfect equation structure, but it takes more computational energy to simulate each generation.
- Mutation Rate — This controls the level of random radiation in the environment. If it's too low, the population might inbreed and get stuck in a "local minimum" (a mediocre equation it can't escape from). If it's too high, beneficial traits are constantly destroyed before they can be passed on.
- The Element of Chance — Because genetic algorithms rely on random mutations and probabilistic selection, they are stochastic. Running the exact same data twice might result in two different evolutionary paths, sometimes yielding two completely different (but equally valid) mathematical formulas.
Ready to discover an equation?
Paste your data and watch evolution find the math behind the curve.
Launch Evolver