Hi everyone,
I'm Jakub, and this summer during GSoC, I've worked with @Douglas_Gregor on a new swift-syntax library: SwiftLexicalLookup
.
It consists of various lexical queries, answering questions like "Where is an error thrown here handled?" or "Which switch
cases does this fallthrough
keyword refer to?". Most importantly though, the library implements a new unqualified lookup API, which aims to replace the current ASTScope
implementation in the long run.
In this brief post, I'd like to show you how to use the library, explain how it works, and mention some fun corner cases we've encountered along the way.
Unqualified lookup
Let's first briefly explain what unqualified lookup actually is. It's a query used by IDEs or the compiler to determine the availability of names at a given position in a source file. It traverses lexical scopes, gathers names, filters them by position, and returns the results to the client. Let's consider a simple example:
class Foo { // Class Decl Scope (Nominal Type Decl Scope)
// Member Block Scope
let someVar = 0
func bar(a: Int, b: Int?) { // Function Decl Scope
// Code Block Scope
let a = 42
if let b { // If Stmt Scope
// Code Block Scope
let c = 123
print(b) // <-- 1. lookup here
}
print(d) // <-- 2. lookup here
let d = 1
}
}
For the first lookup, the result would consist of: let c
, the conditional declaration b
, let a
, the parameters a
and b
, the implicit self
introduced at the function boundary, and Self
introduced by the nominal type declaration, in this exact order. For the second query, the result would largely stay the same, with the first two names being absent due to their nested introduction in the if
statement.
It's important to note that the lookup only considers names that can be inferred from enclosing lexical scopes, so it's unable to reason about, for example, nominal type members. Handling those is part of the qualified lookup (one could say it's out of scope for this library ;)). However, this doesn't mean unqualified lookup ignores the existence of members. In fact, it can actually prompt clients where to perform such lookup. In the previous two examples, this would be indicated by a special result flag associated with class Foo
.
SwiftLexicalLookup
Now let's dive into the SwiftLexicalLookup
lookup API.
Usage
The API is stateless, traverses the syntax tree bottom-up, and collects names matching the lookup. It partitions results according to the scopes they were introduced in, which, for example, guides shadowing. Any syntax node can be an entry point for the query. Here's an example call to the API from an arbitrary syntax node:
someNode.lookup(myIdentifier, with: myConfig)
myIdentifier
is an identifier that should be used to filter introduced (match names). It can be set to nil
to skip name matching (as in the earlier example). myConfig
is a LookupConfig
object that allows clients to specify particular behavior during lookup (e.g., how to handle #if
clauses). It can also be left out to provide default behavior.
The result is an array of LookupResult
enums that represent different result kinds and flags (such as prompting where to perform member lookup). It associates names with the scopes where they were introduced. Speaking of names, they are also represented by an array of LookupName
enums, which correspond to specific syntax nodes as queried names. These can be of specific kinds, such as: identifier
, declaration
, implicit
, and more. LookupName
also enables easy name extraction through LookupName.getNames(from:)
which collects all syntax nodes that can be represented as a LookupName
in the given syntax node.
This result data structure retains a good bit of information useful for clients and ensures flexibility for future maintenance and extensions.
Under the hood
Scopes are represented as extensions to specific syntax nodes with a new protocol, ScopeSyntax
. Scopes reference each other through the parentScope
property, which (by default) points to the closest scope ancestor in the syntax tree. The lookup
method performs a lookup at a particular scope and passes the lookup to the parentScope
. Inside the defaultIntroducedNames
computed property, scopes can define names available within themselves regardless of lookup configurations or starting position (such names could include function parameters, variable declarations, and more). Here's the implementation of the for
scope. Inside lookup
, you can see how the API avoids names introduced at this scope based on the starting position.
@_spi(Experimental) extension ForStmtSyntax: ScopeSyntax {
/// Names introduced in the `for` body.
@_spi(Experimental) public var defaultIntroducedNames: [LookupName] {
LookupName.getNames(from: pattern)
}
@_spi(Experimental) public var scopeDebugName: String {
"ForStmtScope"
}
/// Returns results with names matching lookup.
/// Doesn't include names introduced at this scope
/// if lookup started inside it's `pattern` or `sequence`.
@_spi(Experimental) public func lookup(
_ identifier: Identifier?,
at lookUpPosition: AbsolutePosition,
with config: LookupConfig
) -> [LookupResult] {
if pattern.range.contains(lookUpPosition) || sequence.range.contains(lookUpPosition) {
return lookupInParent(identifier, at: lookUpPosition, with: config)
} else {
return defaultLookupImplementation(identifier, at: lookUpPosition, with: config)
}
}
}
Some scopes are more complex than others, providing shared functionality or closely collaborating with other scopes to offer more advanced behavior. Let's see an example:
func foo<A>(a: A) {
let b = 42
x // <-- lookup here
}
Functions and nominal types can have generic parameter clauses that themselves introduce names. To generalize this behavior and simplify lookup from inside of it, the generic parameter clause is a scope on its own, but this introduces a bit of complexity. In the syntax tree representation, it's not an ancestor of the function body, it's a sibling! That's why there are two specialized scope kinds that provide alternative lookup routing: GenericParameterScopeSyntax
and WithGenericParameterScopeSyntax
. They ensure that the lookup passes through the generic parameter clause and doesn't create an infinite cycle on return.
Testing
One of the biggest challanges of this project, is properly testing and ensuring quality of the results. Integration of changes in a system that's supposed to replace an important piece in the compiler should be a top priority. That's why we came up with two ways of testing SwiftLexicalLookup
.
Cheap "unit" tests
Inside swift-syntax
, we've defined a suite of tests that are meant to test individual scope kinds and behaviors in isolation. These are standard XCTest
s with a custom harness function that are easy to write and inexpensive to run. Each test takes a snippet of code marked with markers and defines entry points for lookup along with the respective expected results.
Result expectations are defined with a special data structure that closely resembles the results produced by an unqualified lookup query. As of now, it has to be hard-coded, so testing a newly added result or name kind requires some manual effort. I hope we will be able to rewrite it into a nice macro in the future.
func testNameLookupForNilParameter() {
assertLexicalNameLookup(
source: """
🔟class foo {
let 1️⃣a = 0
let 2️⃣b = 0
3️⃣func foo() {
let 4️⃣a = 0
let 5️⃣c = 0
if let 6️⃣a = 7️⃣x {
let (8️⃣a, 9️⃣b) = (0, 0)
0️⃣x
}
}
}
""",
references: [
"7️⃣": [
.fromScope(CodeBlockSyntax.self, expectedNames: ["5️⃣", "4️⃣"]),
.fromScope(
FunctionDeclSyntax.self,
expectedNames: [NameExpectation.implicit(.self("3️⃣"))]
),
.lookInMembers(ClassDeclSyntax.self),
],
"0️⃣": [
.fromScope(CodeBlockSyntax.self, expectedNames: ["8️⃣", "9️⃣"]),
.fromScope(IfExprSyntax.self, expectedNames: ["6️⃣"]),
.fromScope(CodeBlockSyntax.self, expectedNames: ["5️⃣", "4️⃣"]),
.fromScope(
FunctionDeclSyntax.self,
expectedNames: [NameExpectation.implicit(.self("3️⃣"))]
),
.lookInMembers(ClassDeclSyntax.self),
],
],
expectedResultTypes: .all(
IdentifierPatternSyntax.self,
except: [
"3️⃣": FunctionDeclSyntax.self,
"🔟": ClassDeclSyntax.self,
]
),
useNilAsTheParameter: true
)
}
The assertLexicalNameLookup
test harness provides a number of additional optional assertions that further help in defining the expected result. On top of that, the data structure used for representing expected results offers additional assertions as well, such as explicitly specifying the name kind.
Integration tests
So far, we've discussed SwiftLexicalLookup
in isolation from the current compiler ASTScope
implementation. This is also how we initially developed the query, based on our understanding, simple empirical tests, and a mental model of Swift lexical scope semantics.
Of course, by only following this approach, there are many corner cases and differences that go unnoticed. That's why we also created a special function that could automatically perform a comparison between the results from ASTScope
and SwiftLexicalLookup
. Here's a sample output of the validation before a fix for one of the files from the Swift standard library:
-----> Lookup started at: 1706:7 ("debugPrint") finishInSequentialScope: false
| ASTScope | SwiftLexicalLookup
> ✅ | k 1699:10 | k 1699:10
> ✅ | v 1699:13 | v 1699:13
> ❌ | first 1698:9 | type 1690:18
> ❌ | result 1697:9 | self 1689:17
> ❌ | type 1690:18 | ↕️ V 1689:49
> ❌ | self 1689:17 | ↕️ K 1689:46
> ℹ️ | Omitted SwiftLexicalLookup name: Self 1683:1
> ❌ | V 1689:49 | Look memb: 1684:11
> ❌ | K 1689:46 | -----
> ℹ️ | Omitted ASTScope name: K 1689:46
> ℹ️ | Omitted ASTScope name: V 1689:49
> ❌ | End here: Look memb: 1684:11 | -----
indicate valid result matches, while indicate the exact opposite. There are also partial matches indicated by . Needless to say, after initially running validation for a couple of relatively small files, my console got absolutely filled with red color :).
You might also notice gray markers, which indicate that some results got reordered or omitted. Sometimes, when performing real-world tests, we've encountered rather... weird ASTScope
behavior. In this case, you can see that K
and V
generic parameters are introduced twice in different orders, leading to filtering on the ASTScope
side and reordering in the SwiftLexicalLookup
result. Also, nominal types consistently do not introduce the Self
keyword in ASTScope
(which is then accounted for in the type checker) that's why here, it's filtered on the SwiftLexicalLookup
side.
Real-world testing is also the current state of SwiftLexicalLookup
development. Currently, validation can be successfully run for the entire source code of SwiftIfConfig
, and I'm working to fix all the discrepancies in the files of the Swift standard library. At this point, it's mainly about refining the validation function and making tweaks in the name representation (such as the exact position the name is introduced at which is especially problematic with implicit names), with a few major discrepancies popping up every now and then.
The friends we've made along the way...
When working on SwiftLexicalLookup
we've encountered and documented some really fun scope behaviour. Some of them gave us a bit of a headache when implementing and sometimes it prompted us to slightly rethink our approach.
guard
All of us know and love guard
! It's really powerful when combined with unwrapping optionals. It's also really simple in the way it works: If the variables are not nil
, unwrap and continue execution of this code block, otherwise perform some action and exit the scope this guard
is part of.
Interestingly, this simplicity doesn't translate well to our scope representation. Here we have a scope that introduces names to it's parent scope, but completely ignores those names when looking up from the body. Also the introduced names are visible in subsequent introduced names in the same guard
. Oh and also we need to somehow interleave the partitioned results with results from parent code block scope so clients have a way to figure out shadowing etc...
This was the first major obstacle we've faced when developing this library and it prompted us to create the first specialized scope kind pair: SequentialScope
and IntroducingToSequentialParentScopeSyntax
. First one provides an algorithm that "sequentially" introduces names from inside code block items and interleaves them with the results found inside the second one. So as an example:
func foo() {
let a = 1
let b = 2
guard let b = a else { return }
x // <-- lookup here [ guard(b), codeBlock(b, a) ]
let c = 3
let d = 4
x // <-- lookup here [ codeBlock(d, c), guard(b), codeBlock(b, a) ]
guard let a = b else { return }
x // <-- lookup here [ guard(a), codeBlock(d, c), guard(b), codeBlock(b, a) ]
}
You can see how, in this pseudo-representation, results from guard
s are combined with the ones from the code block. Another added complexity to sequential scopes, are types declared inside of them which should be visible at any point of the scope. That's why they need to be hoisted to the top of a code block (so that they are available in the entire scope).
subscript
This one was a bit surprising to me. I've initially thought we could handle it similarly to a function declaration, but there's one big catch here, and it's the self
keyword.
A simple read-only subscript
can be written like this:
class Y {
var arr = ["AB", "CD"]
subscript(i: Int) -> String {
return arr[i]
}
}
And here's the output of the validating function:
-----> Lookup started at: 4:12 ("arr") finishInSequentialScope: false
| ASTScope | SwiftLexicalLookup
> ✅ | i 3:13 | i 3:13
> ✅ | self 3:31 | self 3:31
> ✅ | End here: Look memb: 1:7 | Look memb: 1:7
You can notice that self
is being introduced at the boundary of the accessor block i.e. the opening curly brace of the subscript
. That's a similar behavior to function declarations which also introduce self
at their boundaries.
Now let's consider a subscript
with getters and setters:
class X {
var arr = ["AB", "CD"]
subscript(i: Int) -> String {
get {
return arr[i]
}
set(newValue) {
arr[i] = newValue
}
}
}
And when we run the validating function...
-----> Lookup started at: 5:14 ("arr") finishInSequentialScope: false
| ASTScope | SwiftLexicalLookup
> ✅ | i 3:13 | i 3:13
> ⚠️ | self 4:5 | self 3:31
> ✅ | Look memb: 1:7 | Look memb: 1:7
There's a discrepency! In this case, turns out the compiler is introducing self
at the boundary of individual accessors, which are inside the accessor block, but does it after the parameters. That greatly complicates handling of this scope as we need to check what is the exact kind of a subscript
, in which accessor declaration did the lookup start and also we need to associate the result with that accessor.
Invalid macro parameter ordering
In the process, we haven't just discovered bugs in SwiftLexicalLookup
. Out of those, the most interesting one, which we also decided to report as an issue 77141, is about an incorrect parameter order in macro declarations within ASTScope
.
We assume that most discrepancies between implementations, which we account for during validation, are likely filtered by clients. However, this particular issue could potentially lead to unexpected behavior. Let’s look at this macro declaration from the Swift standard library:
@freestanding(expression)
public macro externalMacro<T>(module: String, type: String) -> T = Builtin.ExternalMacro
SwiftLexicalLookup
models macro declarations similarly to function, initializer, and nominal type declarations i.e. it first returns the regular parameters, followed by the generic parameters. This ordering makes sense when, for example, looking up from within the parameter clause. Let's run the validation now and...
-----> Lookup started at: 2:39 ("String") finishInSequentialScope: false
| ASTScope | SwiftLexicalLookup
> ❌ | T 2:28 | module 2:31
> ℹ️ | Omitted ASTScope name: T 2:28
> ❌ | module 2:31 | type 2:47
> ❌ | type 2:47 | T 2:28
... see that ASTScope
introduces generic parameters first, followed by regular parameters! This is rather unexpected behavior, which could cause some issues. For this reason, instead of just accounting for it during validation, we’ve also reported the bug.
I think this example nicely illustrates the current process of bringing the two implementations closer together: running tests on lots of code, identifying issues, and determining how to address them. This may involve either updating SwiftLexicalLookup
to explicitly model these cases or developing a new representation and adjusting validation accordingly. We could also assume erroneous ASTScope
behavior, which could mean simply filtering certain cases during validation, or, in rare cases like this one, filing an issue.
Future of the library
There's still plenty of work to be done with SwiftLexicalLookup
. The top priority is currently to verify the implementation with a large amount of tests. The validation function I've been discussing in this post is part of my recent PR 77140. Concurrently, I'm also distilling a strong suite of integration tests that could possibly be later incorporated in the CI pipeline.
There's also one thing that remains to be modeled by the new unqualified lookup, namely "almost visible" names. They are used by the compiler to produce diagnostics and those could be e.g. variable names daclared later in a code block scope.
func foo() {
print(a) // Error: Use of local variable 'a' before its declaration
let a = 123
}
After fixing remaining discrepencies and building a strong suite of unit and integration tests, I'd also like to try to refractor some parts of the unqualified lookup API. Especially inside particular scope lookup
method implementations, I was thinking about incorporating some kind of result builder functions to make them more readable and maintainable, but that's definitely low priority right now.
Beyond unqualified lookup, SwiftLexicalLookup
also houses some simple lexical queries (that I've briefly mentioned at the start of this post) which remain to be tested against the compiler. We could possibly make the current validation method more flexible and reuse it in this case. Also there's a potential for adding more lexical queries like: "Where does this return
return from?". I think that could actually make a great "Good First Issue".
The ultimate goal of SwiftLexicalLookup
is to be incorporated in the compiler itself and replace ASTScope
. That goal would further advance introduction of Swift code in the compiler and bring the language closer to being self-hosted. SwiftLexicalLookup
could be also incorporated with the RequestEvaluator
central infrastructure (which should be easier due to it's stateless nature!) to properly cache the results.
Beyond the compiler, SwiftLexicalLookup
also enables a whole universe of scope based queries that could be used by IDEs and could be implemented in SourceKit-LSP
. Furthermore, such queries would be guaranteed to be correct, as the same unqualified lookup API would be used by the compiler!
How to currently use the library?
Right now, the easiest way to try out the library is to use the current implementation in swift-syntax
. After adding the package as a dependency to your project, simply use @_spi(Experimental) import SwiftLexicalLookup
to access the new APIs. There's also currently open PR 2883 that introduces handling of #if
clauses (thanks to SwiftIfConfig
).
If you'd like to try out the validation, you can download this toolchain and use -enable-experimental-feature UnqualifiedLookupValidation
.
Wrapping up...
I hope you liked and had fun reading this brief post about SwiftLexicalLookup
as did I this summer. So far I'm really proud and happy how the library turned out and, even though there's still quite a bit of work left to do, I can already think of some use cases the library could be used for.
From here, I’d also like to thank my mentor @Douglas_Gregor again for his extensive knowledge, guidance throughout the summer, and for continuing to work with me on SwiftLexicalLookup
. It means a lot to me to have you as a mentor and I really appreciate your patience with my many questions. :)
I’d also like to specifically thank @ahoppen for his consistently thorough and helpful code reviews. Thank you for always ensuring the quality of my PRs and for your valuable input on the implementation.
Additionally, I’d like to thank everyone else from the Swift community who helped me along the way. It wouldn’t have been possible without your support!
Lastly, I’m really looking forward to continuing my work on this project, bringing the library to a production-ready state, and unlocking its full potential. I’d love to hear your thoughts and ideas for the library and hope I’ve convinced some of you to already give it a try.
Jakub