Very fast fuzzy string matching in Swift for interactive searches

At Ordo One, we needed a fast fuzzy string matcher in Swift for interactive search over 250K–1M financial instruments. Swifts case-insensitive substring matching was too slow and too rigid for a real interactive UX. We looked at existing Swift libraries (Fuse-Swift, Ifrit, FuzzyMatchingSwift) which didn’t meet our performance or quality bar, so we built our own — partly to see how far Claude Code could take it with careful guidance.

It turned out: quite far.

Why fuzzy matching?

Users make typos (“Goldamn”), type incrementally (“gol”), and expect smart matches like abbreviations (“bms” → “Bristol-Myers Squibb”). Plain substring search can’t handle this well. Fuzzy matching supports typos, prefixes, substrings, and word-boundary shortcuts — essential for any real search field.

Scope

Built for financial instruments, but applicable anywhere with large, messy datasets: code symbols, file names, product catalogs, contacts. High performance also makes it ideal for portable devices (although we use it on the macOS desktop primarily).

What it is

FuzzyMatch is a pure Swift (6.2+) library with two modes:

  • Edit Distance (Damerau-Levenshtein) — strong typo tolerance and prefix handling
  • Smith-Waterman — local alignment (like fzf/nucleo), better for multi-word/code search

Both share a single API, zero-allocation hot path, multi-stage prefiltering, Span-based zero-copy access in the internal implementation, and strict Sendable conformance.

Performance

On Apple Silicon (M3 Max, single-threaded):

  • 18M candidates/sec (Edit Distance)
  • 31M candidates/sec (Smith-Waterman)

This outperforms lowercased().contains()with almost 10x while allowing for fuzzy matches.

1M dataset: ~55 ms single-threaded, ~5 ms with 16 workers, so scales nicely.

Quality

Benchmarked against nucleo (Rust), fzf (Go), and RapidFuzz (C++/Python) using 271K instruments and 197 queries:

  • 98% overall accuracy (Edit Distance) — best in comparison
  • 100% typo handling (41/41)
  • 100% prefix, substring, exact name, and ISIN categories
  • Smith-Waterman trades some typo accuracy for ~1.7× throughput

The reference systems scored ~79–82%.

Includes 500+ tests, 95%+ coverage, fuzzing via LLVM, and structured quality and performance benchmarks vs the other libraries on a reference data set.

The Claude Code experiment

This started as a test: could Claude Code (Opus 4.6) build a non-critical but real feature quickly?

From first line to open source release took ~3–4 days:

  • Two algorithm modes with prefiltering
  • 22 test files with 500+ tests
  • Simple Fuzz harness
  • Benchmark suite
  • Algorithm docs
  • DocC API docs
  • Trivial sample app (fuzzygrep)

Claude wrote nearly everything (code, tests, docs, benchmarks, sample app). We steered architecture, reviewed changes, and validated against real datasets. This isn’t “one prompt → library” — it required tight guidance — but the breadth achieved in days would’ve taken much longer manually.

fuzzygrep can even fuzzy-search a 1B-line file in just over a minute — mainly as a showcase.

We also defined a release workflow in CLAUDE.md: benchmarks and quality comparisons are re-run, docs are reconciled with implementation, and performance numbers refreshed — keeping everything in sync.

We wouldn’t use this approach for core systems. And LLMs won’t replace solid engineering anytime soon. But for testable, non-critical, black-box components, it’s pragmatic — like adding a dependency, except tailored exactly to your needs.

In our case: typo-tolerant, prefix-aware fuzzy matching with performance suitable for interactive search on large datasets.

Why open source?

The results exceeded expectations. The Swift ecosystem lacked high-performance fuzzy matching, so we open sourced it: zero dependencies, fully Sendable, Swift 6.2+, documented with DocC.

If you need fast, interactive fuzzy search — give it a try.

GitHub - ordo-one/FuzzyMatch: Fuzzy string matches at full speed

Joakim

48 Likes

This is super cool!

3 Likes

Very interesting. I reckon the excellent performance didn't come by accident - how did you steer it towards maximum performance?

In the same way I would coach a fresh engineer basically - by reviewing the design plan, ensuring there are benchmarks in place, pushing for reasonable algorithms (like the aggressive prefiltering and dynamic allowance for more typos for a longer query string to avoid an explosion of matches for short strings), multiple optimization passes contrasting it with the fastest ”competitor” (rust), many failed optimization passes that were rolled back (a handful of comments left there for posterity) - e.g. tried SIMD unsuccessfully.

But overall, you need to be rigorous and push it in the right direction, just as with a fresh engineer. I also added a comment to the Claude.md instructions that any optimization attempt that fails must be rolled back.

But yes, you are right - it wasn’t an accident - probably spent 1/3 of the time on performance related aspects as it was a key feature for us.

4 Likes

So I just rerun the benchmarks on an M4 Max instead of M3 Max.

Must just say, hats off to the Apple Silicon team, the memory bandwidth improvement just crushes it!

Microbenchmarks (272K financial instruments corpus):

Scenario M3 Max M4 Max Improvement
Query preparation ~3.3μs ~2.2μs ~1.5x
1M dataset (single-threaded) ~55ms ~36ms ~1.5x
1M dataset (8 workers) ~7ms ~4.8ms ~1.5x
1M dataset (16 workers) ~5.3ms ~3.8ms ~1.4x

Corpus throughput (272K candidates, candidates/sec):

Matcher M3 Max M4 Max Improvement
FuzzyMatch (ED) ~18M ~26M ~1.4x
FuzzyMatch (SW) ~31M ~44M ~1.4x
nucleo (Rust) ~58M ~86M ~1.5x
contains() baseline ~2M ~3M ~1.5x

fuzzygrep (parallel grep, 16 cores, query 1235321 -score 0.5):

Input M3 Max Wall M4 Max Wall Speedup M4 ED CPU time M4 SW CPU time
10M lines 0.64s 0.24s 2.7x 2.1s 0.55s
100M lines 6.7s 2.4s 2.8x 24s 6.2s
1B lines 68s 25s 2.7x 267s 74s

The string contains() baseline is a bit painful, quick check seems to point to the lowercasing - perhaps it doesn't have a fast path for when the string is known ASCII...

4 Likes

This is fascinating! Thank you for sharing this!!!
Coincidentally, I was vibe porting fzf to Swift as a hobby and hit exactly the same problem last few weeks. I will definitely give this a try to use it in my project.

GitHub - ainame/fltr: Vibe-ported fzf - fuzzy finder cli in Swift (vibe coded hobby project)

1 Like

Cool, please share how it works out!

1 Like

That’s pretty nice.

I personally have engineered a toy fuzzy matcher with Smith-Waterman algorithm and simple generalization: Character replaced with T: Equatable & Hashable & Comparable.

The work here is a very nice reference.

I did benchmark. Mine (SW-ish) had throughput was between FuzzyMatch (ED) and FuzzyMatch (SW). So I switched matching part to FuzzyMatch now! Again, thank you for sharing!

1 Like

Nice to hear. We actually use ED mode almost exclusively even though it is slower as it has superior matching and typo-handling - gives a nicer user experience overall IMHO - it all depends on the size of the data set you are working with of course - it's fast enough for nice interactive use for us.

5 Likes

Got a number of requests to support older operating systems, so backed out the use of spans for now and will restore it later when 6.0/6.1 is dropped eventually - so this new version supports macOS 14 era operating systems.

Added a specific utf8 public api to retain performance under those conditions.

Provides support for older OS:s - if you run it on older versions, let me know how it goes for you!

3 Likes

For anyone who uses it for matches and want to display stylised attributed strings on the search results that is now easy with the latest 1.3.0 release:

(the highlight matches what the scorer actually would match, which may not be always what a human finds as expected exactly, but overall seems to look more than ok)

4 Likes

Added a sample app in Release 1.3.1 · ordo-one/FuzzyMatch · GitHub which you can use to try out highlight and understand the matching better...

E.g.

You can also load any dataset of your own in a plain file (File -> Open) which is newline-delimited and try it ad-hoc on that, e.g.

5 Likes

Also added live changes of fuzzy matching parameter configuration to allow for easier understanding how they impact search results:

4 Likes

Very cool work! Matches my experience with such tools that careful guidance is quite a force multiplier and can help achieve results quicker, if the developers knows exactly what they're pushing for. Thanks for sharing and the lib definitely will come in handy, thanks for ossing :slight_smile:

7 Likes