If you're really motivated, I think you can contribute by the end of the of 2018.
I'll start off by saying that different people learn differently than others, so my way of teaching this may not help you, but it may help someone! It's also worth noting that there are many different moving parts of the Swift compiler. By this I mean that there are a lot of different projects working together like the actual compiler, the standard library, foundation, dispatch, xctest, the package manager, the new SourceKit-LSP, and a lot more projects that are all working hard to make Swift the best it can be. A lot of these projects are in Swift itself, so it's really easy to contribute and give back.
Project Links
Swift Standard Library - https://github.com/apple/swift/tree/master/stdlib/public/core
Swift Package Manager - https://github.com/apple/swift-package-manager
Foundation - https://github.com/apple/swift-corelibs-foundation
SourceKit-LSP - https://github.com/apple/sourcekit-lsp
You can find more projects here - https://github.com/apple
So you want to contribute to the actual Swift compiler?
Before I start, I think it's very important that we understand the general pipeline of the compiler. This is useful to understand where new changes or where to fix things should go. In each section, I'll show you an example of what the process is doing, and at the end of each section I'll give you an error that happens in that stage. Please bear in mind that I'm still relatively new to compiler development as well and that mostly everything here is being explained at a high level. Shall we begin?
Lexer
This is the first part in the compilation process where the compiler breaks down your source code into what we call tokens.
// Source code
let x = 16
// The Lexer turns this into tokens
[kw_let, identifier("x"), equal, integer_literal(16)]
// kw means keyword
The goal of the lexer is to be able to produce this sequence of tokens for the parser to build an AST (Abstract Syntax Tree) from it.
An example error that can happen during this stage:
// error: unterminated string literal
let x = "
^
The lexer wants to make a string_literal
token, but notices that there is not a matching "
Links
Lexer.h (Header file) - https://github.com/apple/swift/blob/master/include/swift/Parse/Lexer.h
Lexer.cpp (Implementation) - https://github.com/apple/swift/blob/master/lib/Parse/Lexer.cpp
Lexer Diagnostics (errors, warnings, notes) - https://github.com/apple/swift/blob/master/include/swift/AST/DiagnosticsParse.def#L40-L197
Lexical Analysis - https://en.wikipedia.org/wiki/Lexical_analysis
Parser
The Parser takes this tokens and tries build the AST (Abstract Syntax Tree). This is how the compiler actually is able to understand your source code and make sense of it.
// Tokens from lexer
[kw_let, identifier("x"), equal, integer_literal(16)]
// The Parser transforms this into the AST
let x = 16
Here we can see that we actually understand what the parser formed here. At this stage, we can also start adding some flags to the compiler to see some more of how the compiler understands and lowers our source code.
$ swiftc -dump-parse example.swift
This will dump the basic pre-typechecked AST that the parser has built from our source code:
(source_file
(top_level_code_decl
(brace_stmt
(pattern_binding_decl
(pattern_named 'x')
(integer_literal_expr type='<null>' value=16))))
(var_decl "x" type='<null type>' let storage_kind=stored))
We can see our variable declaration for x
and we can also see that our integer_literal(16)
has turned into an expression for something called a pattern binding declaration. (For now we don't really need to understand what that means to really get the gist of what is happening here. But, a pattern binding declaration is basically binding an expression to a pattern. In this case that pattern is x
).
An example error that happen during this stage:
// Tokens produced from lexer for this source code
// [kw_struct, l_brace, r_brace]
// error: expected identifier in struct declaration
struct {}
^
The parser sees this kw_struct
token from the lexer and wants to make a struct declaration from it, but the next token is not an identifier to name the structure, but rather the next token is l_brace
(left brace).
Links
Parse Headers - https://github.com/apple/swift/tree/master/include/swift/Parse
Parse Implementation files - https://github.com/apple/swift/tree/master/lib/Parse
Parse diagnostics - https://github.com/apple/swift/blob/master/include/swift/AST/DiagnosticsParse.def
Parsing - https://en.wikipedia.org/wiki/Parsing
Abstract Syntax Tree - https://en.wikipedia.org/wiki/Abstract_syntax_tree
Sema (Semantic Analysis)
Sema is our next step in the compilation process. This includes things like the Type Checker and the Constraint System. There are a lot of goals that this step has to accomplish like assign types to expressions, ensure our structs are conforming to protocols properly, automatically deriving protocol conformance for types, or synthesizing code like the memberwise initializer for structs, and many more.
// AST from the parser
let x = 16
// Sema refined AST
internal let x: Int = 16
I know, it doesn't seem like Sema did much in this step, but there is another flag we can pass to the compiler now!
$ swiftc -dump-ast example.swift
This will dump the typechecked AST.
(source_file
(top_level_code_decl
(brace_stmt
(pattern_binding_decl
(pattern_named type='Int' 'x')
(call_expr implicit type='Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] nothrow arg_labels=_builtinIntegerLiteral:
(constructor_ref_call_expr implicit type='(_MaxBuiltinIntegerType) -> Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] nothrow
(declref_expr implicit type='(Int.Type) -> (_MaxBuiltinIntegerType) -> Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] decl=Swift.(file).Int.init(_builtinIntegerLiteral:) function_ref=single)
(type_expr implicit type='Int.Type' location=<source>:1:9 range=[<source>:1:9 - line:1:9] typerepr='Int'))
(tuple_expr implicit type='(_builtinIntegerLiteral: Int2048)' location=<source>:1:9 range=[<source>:1:9 - line:1:9] names=_builtinIntegerLiteral
(integer_literal_expr type='Int2048' location=<source>:1:9 range=[<source>:1:9 - line:1:9] value=16))))))
(var_decl "x" type='Int' interface type='Int' access=internal let storage_kind=stored))
Woah! There's a lot happening here! Hang in there with me, lets try to decipher this a little bit. Lets start simple:
(var_decl "x" type='Int' interface type='Int' access=internal let storage_kind=stored))
We can see our variable declaration for x
still and we notice that there is now a type, Int
, and we see the access control level for it as well, internal
. That wasn't too bad...
Now for the good stuff!
(pattern_binding_decl
(pattern_named type='Int' 'x')
(call_expr implicit type='Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] nothrow arg_labels=_builtinIntegerLiteral:
(constructor_ref_call_expr implicit type='(_MaxBuiltinIntegerType) -> Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] nothrow
(declref_expr implicit type='(Int.Type) -> (_MaxBuiltinIntegerType) -> Int' location=<source>:1:9 range=[<source>:1:9 - line:1:9] decl=Swift.(file).Int.init(_builtinIntegerLiteral:) function_ref=single)
(type_expr implicit type='Int.Type' location=<source>:1:9 range=[<source>:1:9 - line:1:9] typerepr='Int'))
(tuple_expr implicit type='(_builtinIntegerLiteral: Int2048)' location=<source>:1:9 range=[<source>:1:9 - line:1:9] names=_builtinIntegerLiteral
(integer_literal_expr type='Int2048' location=<source>:1:9 range=[<source>:1:9 - line:1:9] value=16))))))
We see that we our simple integer_literal_expr
from parsing has now transformed into what appears to be a call_expr
. I'll help you out a bit and spell this out in Swift and hopefully you can look at this and kind of understand the structure.
internal let x: Int = Int(_builtinIntegerLiteral: 16)
This is where we see the tight relationship between the compiler and the standard library. You can actually find this initializer for Int
here: https://github.com/apple/swift/blob/master/stdlib/public/core/IntegerTypes.swift.gyb#L1111 (forgive the gyb syntax, but trust me when I say this does what it should
)
This is a trivial example about what Sema does during this stage, but I encourage you to check it out further as this is my favorite step in the pipeline!
An example error during this stage:
// error: cannot convert value of type 'Bool' to specified type 'Int'
let x: Int = true
^~~~
We assigned a Bool
value to a variable with the Int
type and thankfully the type checker caught this for us!
Links
Sema - https://github.com/apple/swift/tree/master/lib/Sema
Semantic Analysis - https://en.wikipedia.org/wiki/Semantic_analysis_(compilers) (This wiki isnt too helpful, but it includes some useful links to things like type checker)
SIL (Swift Intermediate Language)
If you've made it this far, you're in for a wild ride next! SIL provides things like high level optimizations for our Swift code, emitting dataflow diagnostics, and a many more.
// Type checked AST from Sema
internal let x: Int = 16
// SIL
sil_stage canonical
import Builtin
import Swift
import SwiftShims
let x: Int
sil_global hidden [let] @$S6output1xSivp : $Int
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32 {
bb0(%0 : $Int32, %1 : $UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>):
alloc_global @$S6output1xSivp // id: %2
%3 = global_addr @$S6output1xSivp : $*Int // user: %6
%4 = integer_literal $Builtin.Int64, 16 // user: %5
%5 = struct $Int (%4 : $Builtin.Int64) // user: %6
store %5 to %3 : $*Int // id: %6
%7 = integer_literal $Builtin.Int32, 0 // user: %8
%8 = struct $Int32 (%7 : $Builtin.Int32) // user: %9
return %8 : $Int32 // id: %9
} // end sil function 'main'
sil public_external [transparent] [serialized] @$SSi22_builtinIntegerLiteralSiBi2048__tcfC : $@convention(method) (Builtin.Int2048, @thin Int.Type) -> Int {
bb0(%0 : $Builtin.Int2048, %1 : $@thin Int.Type):
%2 = builtin "s_to_s_checked_trunc_Int2048_Int64"(%0 : $Builtin.Int2048) : $(Builtin.Int64, Builtin.Int1) // user: %3
%3 = tuple_extract %2 : $(Builtin.Int64, Builtin.Int1), 0 // user: %4
%4 = struct $Int (%3 : $Builtin.Int64) // user: %5
return %4 : $Int // id: %5
} // end sil function '$SSi22_builtinIntegerLiteralSiBi2048__tcfC'
My oh my there is a lot to cover here. There's a lot of interesting parts that we could go over, but for the sake of having a basic understanding of what is happening here I'll go over just the parts in the main function.
sil @main : $@convention(c) (Int32, UnsafeMutablePointer<Optional<UnsafeMutablePointer<Int8>>>) -> Int32
If this function signature looks similar to anyone, you know exactly what this looks like.
int main(int argc, char **argv)
for the body of the main function:
alloc_global @$S6output1xSivp // id: %2
%3 = global_addr @$S6output1xSivp : $*Int // user: %6
%4 = integer_literal $Builtin.Int64, 16 // user: %5
%5 = struct $Int (%4 : $Builtin.Int64) // user: %6
store %5 to %3 : $*Int // id: %6
%7 = integer_literal $Builtin.Int32, 0 // user: %8
%8 = struct $Int32 (%7 : $Builtin.Int32) // user: %9
return %8 : $Int32 // id: %9
Hopefully I can translate directly what this does for a quick overview. First we allocate a global variable, x
, and then we store the address of it in %3
. We then construct an Int
with the value of 16
which is stored in %5
. After we construct the integer, we store the value of it to the address of the global variable x
. Then the last three lines are simply return 0
for the main function.
It's important to note that there are two steps in this SIL step, SILGen and SIL. SILGen takes the AST from Sema and produces a raw version of SIL that is handed off to SIL for mandatory optimizations and diagnostics which then produces a canonical version of the SIL.
We also get new flags we can pass here as well!
// This emits the raw version of SIL from SILGen
$ swiftc -emit-silgen example.swift
// This emits the canonical version of SIL
$ swiftc -emit-sil example.swift
An example error at this stage:
// error: missing return in a function expected to return 'Int'
func add(_ x: Int, _ y: Int) -> Int {
let sum = x + y
}
^
This is part of the dataflow diagnostics that I mentioned earlier. SIL determines that there is no "terminator", or return instruction in this case, for this function and emits an error letting the developer know.
Links
SIL Header files - https://github.com/apple/swift/tree/master/include/swift/SIL
SIL Diagnostics - https://github.com/apple/swift/blob/master/include/swift/AST/DiagnosticsSIL.def
SIL Optimizer Header files - https://github.com/apple/swift/tree/master/include/swift/SILOptimizer
SILGen Implementation files - https://github.com/apple/swift/tree/master/lib/SILGen
SIL Implementation files - https://github.com/apple/swift/tree/master/lib/SIL
SIL Optimizer Implementation files - https://github.com/apple/swift/tree/master/lib/SILOptimizer
SIL Documentation - https://github.com/apple/swift/blob/master/docs/SIL.rst (I highly recommend reading this as it will do a far better job explaining SIL and its purpose)
SIL Programmer's Manual (Work in Progress) - https://github.com/apple/swift/blob/master/docs/SILProgrammersManual.md
Video with Joe Groff and Chris Lattner at LLVM conf - https://www.youtube.com/watch?v=Ntj8ab-5cvE
IR (LLVM IR Generation)
I promise we're almost done! This is the final step (at least for the Swift compiler!) in the compilation process. The goal of this step is to take our refined and optimized SIL and emit LLVM IR (LLVM Intermediate Representation).
I won't copy the SIL from the last step here, but I will show you the IR produced from this stage (not all of the IR, but the relevant stuff).
%TSi = type <{ i64 }>
define protected i32 @main(i32, i8**) #0 !dbg !25 {
%2 = bitcast i8** %1 to i8*
store i64 16, i64* getelementptr inbounds (%TSi, %TSi* @"$S6output1xSivp", i32 0, i32 0), align 8, !dbg !30
ret i32 0, !dbg !30
}
This isn't too bad! First we define %TSi
as a type containing i64
, which is a 64 bit integer in LLVM. I'll spell out what exactly what %TSi
is in Swift for a better understanding:
public struct Int {
var _value: Builtin.Int
}
%TSi
is just our Int
type in Swift!
Next, we define our main method and store i64 16
into our global variable x
and finally return i32 0
(return 0).
And, as always we get new flags to emit llvm ir for us:
$ swiftc -emit-ir example.swift
From here, we pass this IR to LLVM to emit a native binary for us. If you want to see the assembly produced from LLVM:
$ swiftc -emit-assembly example.swift
Links
IRGen - https://github.com/apple/swift/tree/master/lib/IRGen
IRGen Diagnostics - https://github.com/apple/swift/blob/master/include/swift/AST/DiagnosticsIRGen.def
So we know the pipeline now, but now what? How do I actually contribute?
Yes, you will need to know some C++ to get started with contributing. Don't panic! The C++ in the compiler isn't overly complex just to be complex, it's actually pretty down to Earth.
This is where my ways of learning how to contribute can be different for folks.
For me, I like to implement a lot of cool features and the best way for me is to look at some of the errors that prevent these features and find them in the diagnostic files. Once I find the specific diagnostic, I find the name of it and look for it in the project. It might be in the parser or Sema where it emits this error and I start from there. I look at the code around it and try to make sense of why its producing an error. This is how I learned most of my C++ by just looking around at the compiler. This method is pretty effective for people because you see an error that bothers you sometimes, and look at the code behind it. You have a relationship (I couldn't find the exact wording for this at the moment) with the error and you can relate to the code that produces it. By learning the compiler this way, you'll notice a lot of helpful comments around the code explaining why its making an error or what in the world its doing. (There are a lot of places without comments, so this is also a fantastic contribution that you can make for future contributors).
For others, it may require you to study C++ for a while to feel comfortable looking at compiler source code. This is totally fine! There are plenty of great material online for you to learn C++.
I almost forgot to mention, the Swift compiler C++ is very closely tied to LLVM C++ which is a lot easier to work in. LLVM C++ isn't an actual language, but its the APIs that help make it more down to Earth.
There are plenty of Starter bugs on the Swift bugs website (https://bugs.swift.org/browse/SR-9218?jql=labels%20%3D%20StarterBug) where you can give squashing bugs a shot.
Harlan Haskins and Robert Widmann gave an excellent presentation about this very topic here - https://www.youtube.com/watch?v=oGJKsp-pZPk
Of course I just gave a brief overview on the whole compilation process, but I hope you learn the various places where errors or bugs can occur. It's also very helpful when you want to implement a new feature because now you hopefully have a rough idea where to start (For example, if you want a new automatic derived conformance, you can look in Sema which has a lot of derived conformance files for you to look at). The Swift Readme can guide you through building the project and getting a project structure ready for you to start contributing effectively.
Hopefully this wasn't too much at once, but I look forward to everyone contributing to Swift!