Initializers of global variables

Erik_Eckstein · March 19, 2020, 8:37am

I recently created GlobalOpt: don't speculatively execute initializers of global variables by eeckstein · Pull Request #30445 · apple/swift · GitHub, which triggered a discussion with @Andrew_Trick.

There is an undocumented language rule that global variables can be initialized anywhere between the entry point of main() and the first use of a variable. That means that programmers cannot rely on the actual point in the program where a global is initialized. This might result in surprising effects if an initializer of a global has side effects.

While my PR does not change anything about the language rule (it's just about the optimization), it raises the question if this language rule makes sense at all.

If anyone has an opinion on this, please let me know.
See also the discussion in the PR.

Avi · March 19, 2020, 10:10am

Are you sure it's not documented? I have a recollection of learning back in the Swift 1 days that static and global variables are lazily initialized.

Erik_Eckstein · March 19, 2020, 10:11am

That's what I can remember, too. But AFAIK, it's not documented that the time of initialization is undefined.

Avi · March 19, 2020, 10:13am

lazy var five = {5}()

<path to file>: 'lazy' must not be used on an already-lazy global

Joe_Groff · March 19, 2020, 3:19pm

Because global initializers are invoked at the latest on the first load from the global variable, it is true regardless of whether we optimize that programmers cannot rely on the actual point in the program where a global is initialized. You don't control the code that runs before you, and that code may or may not have accessed the variable previously, so you can't reliably point at any one access as the point where the initialization occurs.

benrimmington · March 19, 2020, 5:58pm

It is documented in:

§Global and Local Variables of the language guide.

Global constants and variables are always computed lazily, in a similar manner to Lazy Stored Properties. Unlike lazy stored properties, global constants and variables do not need to be marked with the lazy modifier.
§Global Variables of the Files and Initialization blog post.

The lazy initializer for a global variable (also for static members of structs and enums) is run the first time that global is accessed, and is launched as dispatch_once to make sure that the initialization is atomic. This enables a cool way to use dispatch_once in your code: just declare a global variable with an initializer and mark it private.
§2013-12-18 of the CHANGELOG.

Global variables and static properties are now lazily initialized on first use. Where you would use dispatch_once to lazily initialize a singleton object in Objective-C, you can simply declare a global variable with an initializer in Swift. Like dispatch_once, this lazy initialization is thread safe.

jrose · March 19, 2020, 6:51pm

I think there are three questions here, and this only covers the second one:

If access X goes unused, can the compiler skip the initialization?
If access X does not occur (e.g. in a loop that dynamically executes 0 times), and the global has been initialized, can the compiler delete the code that would check if it's been initialized?
If access X does not occur, and the global hadn't been initialized, can the compiler perform the initialization?

I think we've held that the answer to #1 is "no, it has side effects", and real programs depend on that (hence the note about dispatch_once in the docs). In my interpretation, Joe's response suggests that the answer to #2 is "yes", but since that doesn't change any observable behavior I'm not sure it matters.

It's #3 I'm concerned about. If there's a global variable for "database connection" and it's in a (badly-designed) loop that makes changes to the database, it seems wrong to me to have that connection initialization hoisted out of the loop if there are no changes to make.

(It's also not the case that a programmer does not control what code runs before a particular access, because a global still has access control and therefore may only have one use site. It's not possible for code in other modules to access it if it's not public.)

Joe_Groff · March 19, 2020, 7:06pm

That sounds about right. #3 is also the situation that prompted Erik's change. Saying that a loop must definitely have a nonzero trip count would be unfortunate, though, from the perspective of being able to take advantage of eliminating side effects from the loop.

Modules aren't set in stone, though, and maintenance or refactoring can introduce new uses of the global. Besides that, there's no way to know for certain whether any individual invocation of a function that accesses a private global is the first invocation, so the logic in the function should not rely on the initialization happening or not.

jrose · March 19, 2020, 7:09pm

I think my point wasn't that you shouldn't rely on a particular access being the one to initialize, but that the lack of access should be something you can reason about. (So I guess we're in agreement.)

Andrew_Trick · March 19, 2020, 11:54pm

As Erik explained, in the compiler's current model, initializers can run any time between program start and the first access to the global. There are some problems with that model that I still don't have answers to. I think we either need to come up with good answers to those problems, or implement a more conservative model as done in the PR above.

Let's just be honest that, if the compiler implements a conservative model of global initializer order, it cannot realistically be changed later in a way that violates that model. The source breakage would be unacceptable.

Problems:

Crash on reentrant initialization

This results from cyclic dependencies on global initializers. This can only be solved by diagnostics. Static diagnostics can't catch every case, so we need a way to dynamically diagnose reentrant initialization.

The compiler's aggressive model makes this worse by exposing reentrant initialization on an access that is dynamically unreachable access X does not occur. Is it acceptable to diagnose even these cases as errors just as the compiler should for regular cyclic dependencies? Would that be too confusing? Would it prohibit any valid programming technique?

Unspecified behavior resulting from unreachable access

Again, this is the access X does not occur case. Is it acceptable for intializer side effects to be observed even if the global is never dynamically accessed? If the answer is "no", then I suspect the answer to Problem #1 should similarly be that the compiler cannot diagnose unreachable reentrant initialization as an error.

This is especially problematic because -Onone and -O behavior will diverge, and I cannot envision any reliable diagnostic that could warn the programmer about that divergent behavior.

Unspecified behavior resulting from reordering initialization

let a:Int = { print("a", terminator: ","); return 1 }()

func foo() {
  print("foo", terminator: ",")
  _ = a
}

-Onone always prints "foo,a"
-O could print "a,foo"

Use your imagination for how this might do something unexpected in a real application.

I don't like the idea of programmers relying on the order of initializer side effects.

I also don't like the idea of the compiler reordering side effects at -O unless it's possible to diagnose most cases where it would matter as a warning. Does anyone think it would ever be practical to enable such a diagnostic?

Erik_Eckstein · March 20, 2020, 10:56am

ad 1. Actually there is a runtime check for the reentrant initialization case - at least on Darwin. It's just not easy to "discover" the message:

* thread #1, queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff6ac8e155 libdispatch.dylib`_dispatch_once_wait.cold.1 + 14
libdispatch.dylib`_dispatch_once_wait.cold.1:
->  0x7fff6ac8e155 <+14>: ud2    

libdispatch.dylib`_dispatch_gate_broadcast_slow.cold.1:
    0x7fff6ac8e157 <+0>:  movl   %edi, %eax
    0x7fff6ac8e159 <+2>:  leaq   0x5daa(%rip), %rcx        ; "BUG IN CLIENT OF LIBDISPATCH: lock not owned by current thread"
    0x7fff6ac8e160 <+9>:  movq   %rcx, 0x29ed33a9(%rip)    ; gCRAnnotations + 8
Target 0: (a.out) stopped.

So it's not crashing because of an infinite recursion.

Michael_Gottesman · March 21, 2020, 2:28am

@Erik_Eckstein file a bug?