← Blog

Anagram Solver

Find every valid word combination hiding inside a phrase

There are websites that find anagrams, but most of them return one result. Maybe two. The interesting question isn't "what's an anagram of this phrase" but "what are all the anagrams of this phrase, and which ones are actually good?" That's a harder problem, and it's the one I wanted to solve.

Listen equals Silent anagram
A classic anagram: LISTEN = SILENT. Wikimedia Commons

The solver takes any input phrase, tears it down to its constituent letters, and rebuilds every possible combination of real English words that uses those same letters exactly. Not just one answer. All of them, ranked by quality.

Letter frequency arrays

The core data structure is simple: a 26-element array of bytes, one slot per letter of the alphabet. Every word in the dictionary gets converted to this representation at startup. The input phrase gets the same treatment. Checking whether a word can be formed from a set of remaining letters is just a comparison of 26 integers. Subtracting a word from the remaining pool is 26 subtractions. This makes the inner loop of the search very fast, which matters when you're exploring millions of combinations.

The dictionary is embedded into the WebAssembly binary at compile time using Rust's include_str! macro. At initialization, the solver filters the full dictionary down to only words that could possibly be formed from the input phrase's letters. It then sorts those candidates by length, longest first. This ordering biases the search toward solutions with longer words early, which tend to be the interesting ones.

Recursive search with pruning

The search is a depth-first recursion. At each level, the algorithm iterates through candidate words, checks whether each can be built from the remaining letter pool, and if so, subtracts those letters and recurses. A start_idx parameter prevents the search from reconsidering words it already passed, which eliminates duplicate permutations. Without this, "star wars" and "wars star" would both appear as separate results.

There's a second layer of deduplication on top of that. The solver tracks "signatures" for each partial result, built from the set of longer words (four letters or more) found so far. If a branch would produce a signature that's already been seen, it gets pruned. This catches cases where different orderings of the search arrive at the same meaningful combination through different paths.

Quality heuristic

Finding all valid anagrams is only half the problem. A phrase like "William Shakespeare" produces thousands of results, and most of them are garbage: strings of two-letter words that technically use all the right letters but aren't interesting to anyone. The solver ranks results using a scoring function that penalizes word count heavily (1000 points per word), rewards longer average word length (100 points per character), and penalizes variance in word lengths (5 points per unit). The effect is that a two-word anagram with balanced lengths scores much higher than a seven-word anagram full of "is" and "at" and "me".

Results come back sorted by this score. The top results are almost always the ones you'd actually want to see.

Running in the browser

The solver is written in Rust and compiled to WebAssembly with wasm-pack. Everything runs client-side. There's no server processing the request, no API call, no latency beyond what your own CPU needs to chew through the search tree. For short phrases the results appear instantly. For longer phrases with many candidate words, it can take a few seconds, during which a spinner replaces the search button so you know it's working.

The UI is minimal on purpose. A text field, a button, a list of results. The interesting part is the algorithm, not the interface.