On the behavior of variables in top-level code

The variables in top-level code behave weirdly. They are declared in the global scope, but are initialized serially, like local variables in a normal function. This allows some interesting pieces of code to compile, such as:

print(x)
var x = 32

This program will print 0. The basic value types are automatically initialized, so they have a value. However, if x is a class, this results in a crash. Clearly, this is not an ideal situation. You shouldn't be able to use variables before they are initialized, let alone before they are declared.

My thoughts are to make the top-level variables behave like local variables inside of an implicit function.
Before prescribing a change in the form of a pitch, I have a few questions that I would like to ask the forum.

  • Are you relying on variables in top-level code behaving as global variables? (How and why?)

  • In wrapping the top-level code in an implicit main function, are any functions declared in that space nested inside of this main function, or are they global?

The following code would implicitly behave like the following:

var x = 32
func foo() {
   print("Hello World \(x)")
}

In the nested form:

@main struct Main {
  static func main() {
    var x = 32
    func foo() {
       print("Hello World \(x)")
    }
  }
}

Un-nested:

func foo(_ x: Int) {
  print("Hello World \(x)")
}

@main struct Main {
  static func main() {
    var x = 32
    foo(x)
  }
}

Note that in the un-nested form, x is not visible from foo, so you would need to pass it in directly.
This is a departure from how top-level code has behaved previously, but is easier to reason about once concurrency is a factor.

Before prescribing a change, I would like to get your thoughts on the matter and hear if anyone is relying on this behaviour. If you are, how and why?

Given that this is source breaking, changes here would be a change for Swift 6 at the earliest.

2 Likes

I don't have specific thoughts on the global part of things, but whatever solution ends up being selected here needs to account for other kinds of declarations and their differences in behavior between being at file scope vs at function scope.

For example, consider types, where it gets more complicated.

var x: Int = 3
struct S { var y: Int = x } // OK

func f() {
  var w: Int = 3
  struct S {
    var v: Int = w // error
  }
}

There's also the long standing issue where conformances for types declared at function scope do not get looked up correctly.

You've also got extensions, which are not allowed inside functions:

var x: Int = 3
extension Int { var y: Int = x } // OK

func f() {
  var w: Int = 3
  extension Int { // error
    var v: Int = w
  }
}
3 Likes

Yikes! I’d consider this a bug — you should probably report it at bugs.swift.org (after checking for duplicates).

As for how variables in executable top-level code should behave, I’d expect them to be local variables. Functions in executable top-level code should be global as long as they don’t capture any local variables — there’s no reason why they can’t be global and being global is more in line with how struct / enum / class / actor / protocol / typealias / extension declarations work in executable top-level code. We could add a warning for global code that tries to access a function that captures local variables, Swift already recognizes a distinction between functions that capture local variables and functions that don’t capture local variables when it comes to conversions to C function pointers.

Alternatively, we could consider making all variables in top-level executable code lazy, but I imagine that would make it too easy to accidentally create a circular reference. On the other hand, it would avoid the possibility of breaking existing code.

We could also instead consider making variables in executable top-level code global as long as their initialization expressions don’t capture any local variables, like I’m proposing with functions. However, I think this behavior would be unexpected for most programmers and could lead to unexpected problems with many variables being unnecessarily lazy and variables being initialized too early. Functions don’t have this problem since they don’t need to be initialized and the use of local functions is rare anyway.

If we do find that people are doing this, we could treat top-level static var and static let (which are currently illegal) as globals in main.swift, with the usual lazy-initialization behavior expected for globals. This would not allow you to initialize them based on sequential code, but that’s precisely the thing that causes safety problems, so that seems like a feature.

I like this (In reference to Becca's suggestion of using static). It would certainly save us from needed additional keywords.

Pragmatically, I think this will probably be the route we take. On a purely philosophical level, I kind of don't like static because the word doesn't really communicate what it's doing. Would we make the static variables behave like static variables in C and not export them, or would we just have static mean global? Like you say though, getting the free semantic checking is a definite feature.

Alternatively, what do you think about having access modifiers on globals and unmodified variables be local? A public let foo = 32 would mean that foo is an exported global, while private let bar = 42 would be the equivalent of a static variable in C. I'm not entirely sure how important this is because I don't know how often folks import top-level code. The only case I can think of is in the REPL (as an implementation detail) which will probably need a language mode anyway until it can be cleaned up.

Sorry, brain went to static global functions instead of static global variables. I’ve clearly pumpkinized for the night. The keyword makes more sense in the context of static global variables. :slight_smile:

1 Like

I think this is a great idea. Local variables are the semantics we expect here, and this also nicely addresses any concerns about the interaction with concurrency because that behavior is well-defined for local variables.

I'd prefer not to use static here, because static var and static let already have a meaning in local functions, and it's different from "this is visible outside of the function".

This is my preferred solution...

because access control doesn't exist for local declarations at all. By explicitly putting an access control modifier on the var or let you're saying it has non-local visibility and also what visibility it has. I guess what we lose relative to static is that static more strongly implies lazy initialization.

It occurs to me that should decide whether we need to support declaring local types and local functions in top-level code. Local types in top-level code are probably not useful at all, because a fileprivate type can do everything that a local type of a non-generic function can do. Local functions could be useful, if you want to capture local variables in top-level code. However, we'd probably need to burn a keyword on this (local func f()...), and it really doesn't feel like it's that important.

Doug

3 Likes

You can get rational top-level code behavior today by always wrapping it in a do block, which will force all the declarations inside to be treated as purely local declarations. That seems sufficient to me as a way of explicitly forming local declarations in a future design too. Alternatively, since we have to do capture analysis anyway, we could potentially DWIM and make functions behave as local functions when they refer to topl-level local variables of the script.

3 Likes

As others have said, this is definitely a bug. Report it. I just found an example with more bizarre behavior:

print(variable)
let variable = 5
print(variable)

Output:

0
5

A let constant with two different values!

This only applies to top-level code in the main.swift file. Otherwise, top-level and static variables are initialized lazily (on first access).

2 Likes

The do block is a good idea for getting local semantics for functions and such; much better than my local func suggestion.

I'd rather us not go down the capture-analysis route, because it means we would not be able to tell whether a given function is visible to another translation unit without performing type checking on its body.

Doug

3 Likes

Another consistent model might be to say that all declarations, not only properties, in a top-level code file are local by default, and use explicit private/internal/public visibility to mark the ones that should actually be treated as global declarations.

1 Like

I used to think the access control thing was a good idea, but now I think it just adds complexity. I’d like to see if Evan’s original idea can stand without breaking too much code in practice, and then we don’t need any sort of capture analysis.

1 Like

The REPL and hence Xcode playgrounds currently relies on this behavior (global) to allow new code blocks to refer to variables declared at the top level in previous blocks (that may already be compiled and executed).

However, the user doesn't experience this as global, so a different implementation (eg like async continuations) of the same effect (access to variables in a prior continuation) would probably be a good substitute

Capture analysis is unlikely to cut it in this case, we should assume all variables and functions are captured.

Note that some values not currently visible as "globals" in this fashion ought to be, such as x in guard let x.

The out of order behavior you demonstrate is actually bad for playgrounds because it creates potential cut points in the code which create illegal programs (declarations after the cut are not visible yet)

In the choice between “variables are lazy but do not make a cut” and “variables are eager and do make a cut” it’s a compromise both ways, but I strongly think the current behavior there is the better choice. It would be good to diagnose it better, though, which would suggest changing how the cut is modeled in the compiler.

(I implemented the global representation, to replace an increasingly hacky model of “type-check some of a SourceFile but not all of it”. The model for users is more important than the model for the compiler, though.)

EDIT: To put it another way, function bodies already work like this, and complaints about that are rare.

1 Like