Where is for-in loop handling done in the compiler?

CTMacUser · December 22, 2018, 1:46am

I was looking at the top level directory of Apple's Swift GitHub site, and realized that I have no idea where the compiler's own code is kept in there. Specifically, I'm wondering where the code to interpret for-in loops are. I want to see how feasible it is to increase the number of protocols for-in loops support (which is the real solution to the "some iterable things aren't actually sequences" problem).

(For those who wonder, an indexable object doesn't have to be a sequence if its indices set is a sequence. The new protocol would add a layer of indirection during iteration.)

beccadax · December 22, 2018, 3:26am

"The compiler" is actually several different libraries whose headers are in the include/swift directory and whose implementations are in the lib directory. Each library is responsible for a different function, such as parsing, semantic analysis, or optimizing SIL. Some parts are more closely related to other tools; for instance, the IDE library is essentially a set of APIs for SourceKit to talk to the compiler.

For your task, the most important type will be ForEachStmt, the abstract syntax tree node representing a for-in loop. You can search for uses in the repository, but here are a few of the most important landmarks:

Like all AST nodes, ForEachStmt is declared in the AST library, and specifically in include/swift/AST/Stmt.h.
Type-checking is part of semantic analysis, which is the job of the Sema library; specifically, ForEachStmt is typechecked in lib/Sema/TypeCheckConstraints.cpp.
AST nodes are lowered into SIL (a sort of pseudo-assembly language which still expresses many Swift behaviors) in the SILGen library. Specifically, the instructions for ForEachStmt are emitted in lib/SILGen/SILGenStmt.cpp. At this point, the code ceases to be a for-in loop and becomes just a set of instructions which happen to loop until IteratorProtocol.next() returns nil.
Although not strictly part of the compiler, the standard library is in the stdlib/public/core directory. Types interface with for-in loops through the Sequence and IteratorProtocol protocols, which are in stdlib/public/core/Sequence.swift.

If you do try to tackle this project, your next question will probably be "How do I get started working on Swift?" I recommend watching this talk by @harlanhaskins and @codafi, which explains how to build, navigate, and work on the compiler.

Good luck!

jawbroken · December 22, 2018, 9:23am

If it's an indexable object with indices that are in a sequence, and you can use it in a for-in loop, in what way is it not a sequence?

CTMacUser · December 26, 2018, 11:11pm

In my imagining of the concept, the indices for the elements of an unordered container aren't ordered either! (Index would conform only to Equatable.) But a stable interface to allow each index (and therefore each corresponding element) to be visited exactly once needs to be provided for an updated for-in statement to work. The path that indices uses to reach every index has to be defined by each conforming type.

codafi · December 27, 2018, 1:45am

As long as you can yield an iterator, you’re a sequence, and that is what is powering for-in loops. The concept of ordered indices as an internal property doesn’t make sense, as you note, save for ordered collections so it’s not a requirement - Set and Dictionary still conform to Sequence. Is there some other essential property of ”iterability” you’re looking to capture?

CTMacUser · December 27, 2018, 3:07am

I'm proposing to change what is powering for-in loops. Keep the currentSequence support, but add another protocol for indexable containers that want to be covered by for-in loops without getting general sequencing operations. What we have now with our sequence & container hierarchy is too coarse; that's why Set and Dictionary have to be Sequences although many consider that a "bug."

Ben_Cohen · December 27, 2018, 5:51am

But Sequence really just means “for-innable”, not much more. Most sequence algorithms are just building on for-in loops (to filter, map, compare, etc). So it’s not clear what you’d actually be achieving. The problem with people misinterpreting Set as having meaningful order is mainly about being able to for-in over it, rather than because it is a sequence. So adding more things that can be for-inned wouldn’t fix the issue.

jawbroken · December 27, 2018, 5:59am

These are my thoughts here too. I've never understood the viewpoint that for-in loop functionality is somehow safe from misinterpretation and confusion but all the other sequence methods, which are trivial to write yourself using for-in loops, are confusing or useless. This is why the best solution that I've heard for this issue is @dabrahams's idea to move Sequence conformance to a view property for these types where the iteration order isn't meaningful. I'm not sure if that solution would be acceptable at this point because of the source compatibility break, though.

CTMacUser · December 31, 2018, 4:45am

AIUI, there would be two things

Types where element deployment in a linear order is inherently part of the concept
Types that don't have a linearly ordered deployment as part of their concept, but need some way for users to visit each element for practicality reasons.

Right now, anything in the second category would need to be forced to conform to (at least) Sequence in order to be useful, even though that protocol looks like it's for the first category.