Hey folks,
I'd like to share kaleidoscope-lexer, a Swift package that helps you generate lexer from regex patterns and strings. The project first started as a proof of concept when Swift macro was announced in 2023, and is recently made into a more robust library.
This package is inspired by logos and rust-automata.
The package provides a set of macros that can be used on an enum:
import KaleidoscopeLexer
private let parseNumber = { @Sendable (machine: inout LexerMachine<Token>) -> Int in
Int(machine.slice())!
}
private let parseIdentifier = { @Sendable (machine: inout LexerMachine<Token>) -> String in
String(machine.slice())
}
private let printer = { @Sendable (machine: inout LexerMachine<Token>) in
print(machine.slice())
}
@Kaleidoscope
@skip(/[ \t\n]/) // Skip whitespace
enum Token: Equatable {
@token("invalid", callback: printer)
case `invalid`
@token("private")
case `private`
@token("public")
case `public`
@regex(/[a-zA-Z_][a-zA-Z0-9_]*?/, callback: parseIdentifier)
case identifier(String)
@regex(/[0-9]+?/, callback: parseNumber)
case number(Int)
@token(".")
case dot
@token("(")
case parenOpen
@token(")")
case parenClose
}
Then given a string private foo(123), such as the following,
@main
enum Tokenizer {
static func main() {
let source = "private foo(123)"
for result in Token.lexer(source: source) {
switch result {
case let .success(token):
print(token)
case let .failure(error):
print("Error: \(error)")
}
}
}
}
the output will be
private
identifier("foo")
parenOpen
number(123)
parenClose
Internally the package implements a regex engine and transforms regex patterns into DFAs. Macro code generation is based on jumping tables derived from the DFA states.
There are limitations to the existing implementations, mostly with the range of regex syntax it supports, such as inline flags, and potential bugs in the regex engine due to my limited knowledge in this field. But I'm hoping to add supports to more regex expressions in the future.
Thanks for reading! I'd really appreciate your thoughts, use cases, and ideas for improvements.