Compiler bug or feature?

Why can I do this?:

main.swift:

class A {
    init() { }
}

b() //crashes at runtime of course, because c is uninitialized

var c = A()

some other file:

func b() {
    print(c)
}

Why is this allowed?

I think this old thread is related.

It includes similar (single file) examples like:

func foo() -> Int {
    return x
}
let x = foo()
print(x)

which still compiles, and prints 0.
And if you change Int to [Int] it will crash at runtime.

Here's one of the replies in that old thread:


Some possibly related bugs:

4 Likes

Right. What we should probably do is recognize that script variables are special and either

  • do a complex cross-file interprocedural definitive-initialization analysis to statically prove that uses of script variables are only evaluable after they are initialized by the main script flow,
  • unconditionally ban using script variables (or anything which might reference one) from secondary files, or
  • make references from outside of the main script flow (i.e. from secondary files or non-top-level code in the script file) fail dynamically if evaluated before the script has initialized the variable according to normal definitive initialization rules.

The first feels like it'd be a very poor use of implementation and maintenance effort. The second is probably excessively onerous. The last sounds pretty reasonable.

6 Likes

But wouldn't the last option still let my example compile? I think I like your second option more.

The problem with the second one is that it means you can only break a script up into totally self-contained chunks because nothing can refer back into the script. In practice, I think that would discourage breaking up scripts at all.

For example, it's pretty common for script files to begin with a lot of configuration code, then define some types and functions that implement most of the core logic, then end with something that kicks off the basic behavior of the script. Something like this:

let useCrustyMagic = commandLine.has("--crusty")

...

func coreLogic() {
  if useCrustyMagic {
    // the crusty way
  } else {
    // the normal way
  }
}

...

coreLogic()

The most common reason that scripts start getting split across multiple files is that the core logic has gotten too big. But that core logic almost certainly depends on the configuration variables, so the second option means that all the global configuration has to be split into separate files first. And since that means the global configuration options have to be defined in a non-script file, they can't be initialized script-style, which means all they all have to be declared separately from the parsing code and they all have to have default values. So while I don't love the idea of script variables being usable from secondary files, I'm not sure the alternative is better.

Is there a reason it wouldn’t be acceptable to explicitly pass the configuration into the core logic?

1 Like

I'm still hopeful for doing #2, but I agree that it might be too onerous.

Of course that’s the better-abstracted library design, but we’re talking about a script here. I’m saying there needs to be a middle-ground between “I have a single-file script” and “I have a set of well-composed libraries.”

1 Like

This isn't just a problem with variable initialization. It also causes tremendous confusion whenever developers write small scripts or playgrounds to experiment with Swift semantics or performance without realizing that script globals differ from locals in many subtle ways.

The obvious expectation of casual scripters, which makes up the vast majority of Swift scripts, is that script variables are local variables.

In the much less common case that scripts become bona fide software projects, it makes perfect sense to explicitly declare any configuration variables as global variables using "internal", "public", or some other qualifier of choice.

This problem is only going to get worse over time. Globals are a huge thorn in the side of compiler diagnostics and optimization logic.

5 Likes

If we can make script files totally independent without seriously regressing the scripting experience, that would be great. I'm just worried about talking as if we're going to spend a huge amount of time refining this area. A dynamic check for initialization is a comparatively self-contained way of eliminating a major soundness hole.

This totally took me back to JavaScript land and variable hoisting. Would you say that the solution is to wrap all file scripts into do blocks?

//main.swift
do {

//.... my script scope

}
1 Like

The current semantics of scripts are that top-level variables from the script are visible in secondary files. That's just how it is. We can consider changing those semantics to make top-level script variables local to the script file, and that very well might be the right thing to do, although I think it's a more complex question than it's getting credit for. But I don't think it's justifiable to do that as a bug fix, which is to say, without going through evolution and applying it unconditionally in all language modes. It's a potentially serious source-compatibility break and needs to be handled with the normal evolution process, which is to say, it needs a proposal, and that proposal will only take full effect in a future language mode.

But we don't need an evolution proposal to fix the implementation so that the current rules are at least dynamically sound.

4 Likes

Could the semantics be specified such that all top-level variables are set up as globals which will be lazily initialized, and the main function simply forces their initialization in the order they're defined? The example above:

class A {
    init() { }
}

b() //crashes at runtime of course, because c is uninitialized

var c = A()

could be defined as something like:

class A {
    init() { }
}

var c = A()

// just to illustrate that c acts like global
// not the current way script variables work
main {
    b() // makes c initialize early
    let _ = c // would initialize c if it wasn't already initialized
}

This would leave all currently correct programs behaving the same way (since they never access variables out-of-order) but would make these variables behave much more consistently like other globals do.

The downside being that if you don't want to do this, you can still accidentally write code with confusing execution order.

(I like this solution because I think it would let me compile a script file as a bundle and just ignore its main function and have everything still work well.)

Variables aren't necessarily initialized directly, and even if they are, the initializer might have side-effects, or its correct value might depend on values that would require running the script up to that point to compute. So actually using those semantics could very easily be extremely confusing.

1 Like

@Paul_Hudson CC https://twitter.com/twostraws/status/1276590212844597248?s=20

Was beaten by this today:

func mainProc() { bar() }
func bar() { baz() }
func baz() { foo() }

mainProc()

var x: [Int] = []

func foo() {
    print(x.count) // 💣 crash: Thread 1: EXC_BAD_ACCESS (code=1, address=0x10)
}

I thought that so long as "var x = ..." is declared above it's usage in "foo" (in this case this is the only usage of "x") it should be fine, but, alas, "var x" has to be declared before "mainProc()" callout. Not very obvious IMHO. FWIW this was tested in Debug.

1 Like

Huh, that smells more like Python behaviour than Swift. I'm aware that global variables have some weird behaviours in Swift, but I thought these kind of basic issues were fixed already.

Not that the order of declarations should never matter - Swift very deliberately chooses not to care about order, unlike its predecessors (C/C++ et al). Because it's so much nicer that way, and eliminates a whole class of source code structuring problems (A is referenced by B which is referenced by C which is referenced by A, so now I have to play games with type / function declarations instead of definitions - thereby complicating the language with the ability to even have that distinction, etc
).

2 Likes

Same example with a simple type is even more dangerous: in this case there's no crash or other warning and the app silently using a wrong value for the variable:

func mainProc() { bar() }
func bar() { baz() }
func baz() { foo() }

mainProc()

var x: Int = 42

struct S {
    static var y: Int = 42
}

func foo() {
    print(x)    // 0
    print(S.y)  // 42
}

If I split the declaration and initialisation:

...
var x: Int
x = 42
func foo() {
    print(x)    // 0
}

It's the same result and if I move x = 42 to after func foo:

...
var x: Int
func foo() {
    print(x)    // 🛑 Variable 'x' used by function definition before being initialized
}
x = 42

Compiler complains with x used before being initialised... meaning that in previous examples compiler thinks the variable is initialised ... yet using the wrong value.

and here:

func foo() {
    print(x)    // 0
}
var x: Int = 42

there's no complaint about variable being used before being initialised, just using the wrong value.

I'd say if we can't fix it properly we should consider detecting such instances and issuing a "this is not currently supported" compilation error.

1 Like