Compiler bug or feature?

#1

Why can I do this?:

main.swift:

class A {
    init() { }
}

b() //crashes at runtime of course, because c is uninitialized

var c = A()

some other file:

func b() {
    print(c)
}

Why is this allowed?

(Jens Persson) #2

I think this old thread is related.

It includes similar (single file) examples like:

func foo() -> Int {
    return x
}
let x = foo()
print(x)

which still compiles, and prints 0.
And if you change Int to [Int] it will crash at runtime.

Here's one of the replies in that old thread:


Some possibly related bugs:
https://bugs.swift.org/browse/SR-2730
https://bugs.swift.org/browse/SR-2727
https://bugs.swift.org/browse/SR-284

4 Likes
(John McCall) #3

Right. What we should probably do is recognize that script variables are special and either

  • do a complex cross-file interprocedural definitive-initialization analysis to statically prove that uses of script variables are only evaluable after they are initialized by the main script flow,
  • unconditionally ban using script variables (or anything which might reference one) from secondary files, or
  • make references from outside of the main script flow (i.e. from secondary files or non-top-level code in the script file) fail dynamically if evaluated before the script has initialized the variable according to normal definitive initialization rules.

The first feels like it'd be a very poor use of implementation and maintenance effort. The second is probably excessively onerous. The last sounds pretty reasonable.

4 Likes
#4

But wouldn't the last option still let my example compile? I think I like your second option more.

(John McCall) #5

The problem with the second one is that it means you can only break a script up into totally self-contained chunks because nothing can refer back into the script. In practice, I think that would discourage breaking up scripts at all.

For example, it's pretty common for script files to begin with a lot of configuration code, then define some types and functions that implement most of the core logic, then end with something that kicks off the basic behavior of the script. Something like this:

let useCrustyMagic = commandLine.has("--crusty")

...

func coreLogic() {
  if useCrustyMagic {
    // the crusty way
  } else {
    // the normal way
  }
}

...

coreLogic()

The most common reason that scripts start getting split across multiple files is that the core logic has gotten too big. But that core logic almost certainly depends on the configuration variables, so the second option means that all the global configuration has to be split into separate files first. And since that means the global configuration options have to be defined in a non-script file, they can't be initialized script-style, which means all they all have to be declared separately from the parsing code and they all have to have default values. So while I don't love the idea of script variables being usable from secondary files, I'm not sure the alternative is better.

(Matthew Johnson) #6

Is there a reason it wouldn’t be acceptable to explicitly pass the configuration into the core logic?

1 Like
(Jordan Rose) #7

I'm still hopeful for doing #2, but I agree that it might be too onerous.

(John McCall) #8

Of course that’s the better-abstracted library design, but we’re talking about a script here. I’m saying there needs to be a middle-ground between “I have a single-file script” and “I have a set of well-composed libraries.”

1 Like
(Andrew Trick) #9

This isn't just a problem with variable initialization. It also causes tremendous confusion whenever developers write small scripts or playgrounds to experiment with Swift semantics or performance without realizing that script globals differ from locals in many subtle ways.

The obvious expectation of casual scripters, which makes up the vast majority of Swift scripts, is that script variables are local variables.

In the much less common case that scripts become bona fide software projects, it makes perfect sense to explicitly declare any configuration variables as global variables using "internal", "public", or some other qualifier of choice.

This problem is only going to get worse over time. Globals are a huge thorn in the side of compiler diagnostics and optimization logic.

4 Likes
(John McCall) #10

If we can make script files totally independent without seriously regressing the scripting experience, that would be great. I'm just worried about talking as if we're going to spend a huge amount of time refining this area. A dynamic check for initialization is a comparatively self-contained way of eliminating a major soundness hole.

(Chéyo Jiménez) #11

This totally took me back to JavaScript land and variable hoisting. Would you say that the solution is to wrap all file scripts into do blocks?

//main.swift
do {

//.... my script scope

}
1 Like
(John McCall) #12

The current semantics of scripts are that top-level variables from the script are visible in secondary files. That's just how it is. We can consider changing those semantics to make top-level script variables local to the script file, and that very well might be the right thing to do, although I think it's a more complex question than it's getting credit for. But I don't think it's justifiable to do that as a bug fix, which is to say, without going through evolution and applying it unconditionally in all language modes. It's a potentially serious source-compatibility break and needs to be handled with the normal evolution process, which is to say, it needs a proposal, and that proposal will only take full effect in a future language mode.

But we don't need an evolution proposal to fix the implementation so that the current rules are at least dynamically sound.

4 Likes
(Cassie Jones) #13

Could the semantics be specified such that all top-level variables are set up as globals which will be lazily initialized, and the main function simply forces their initialization in the order they're defined? The example above:

class A {
    init() { }
}

b() //crashes at runtime of course, because c is uninitialized

var c = A()

could be defined as something like:

class A {
    init() { }
}

var c = A()

// just to illustrate that c acts like global
// not the current way script variables work
main {
    b() // makes c initialize early
    let _ = c // would initialize c if it wasn't already initialized
}

This would leave all currently correct programs behaving the same way (since they never access variables out-of-order) but would make these variables behave much more consistently like other globals do.

The downside being that if you don't want to do this, you can still accidentally write code with confusing execution order.

(I like this solution because I think it would let me compile a script file as a bundle and just ignore its main function and have everything still work well.)

(John McCall) #14

Variables aren't necessarily initialized directly, and even if they are, the initializer might have side-effects, or its correct value might depend on values that would require running the script up to that point to compute. So actually using those semantics could very easily be extremely confusing.

1 Like