Design and performance of Vector2/3/4 and Matrix

I'm working on a math library for OpenGL. At the core of this are, of
course, scalar types. The scalars are grouped into vectors of length 2, 3,
and 4. The vectors are used to build matrices of all variations from 2x2 to
4x4.

In order to be performant, scalars, vectors, and matrices must all be
values types aka structs. This way, for example, an Array<Vector3<Float>>
can be passed directly to OpenGL without any copying. In my testing so
far, Swift does this quite well.

Ideally, I'd do something like this:

public struct Vector2<T:ScalarType> : Array<T, 2> {

    public var x:T { get {return self[0]} set {self[0] = newValue} }

    public var y:T { get {return self[1]} set {self[1] = newValue} }

}

But there's so much wrong with that. You can't use inheritance with
structs. Array isn't really a struct; the docs say it is but really it's a
reference to a special copy-on-write value type. Array can't be a fixed
size. You can't use literals with generic placeholders. Ok, fine, I accept
this isn't C++, let's move on to something Swifty.

public struct Vector2<T:ScalarType> {

    public var x:T, y:T

    public var r:T { get {return x} set {x = newValue} }

    public var g:T { get {return y} set {y = newValue} }

    public var s:T { get {return x} set {x = newValue} }

    public var t:T { get {return y} set {y = newValue} }

    public subscript(i: Int) -> T {

        get {

            switch(i) {

            case 0: return x

            case 1: return y

            default: fatalError()

            }

        }

        set {

            switch(i) {

            case 0: x = newValue

            case 1: y = newValue

            default: fatalError()

            }

        }

    }

}

Functionally, this works fine. The x and y properties are the struct data.
I can access vectors with subscripts, both coordinate properties, and
color properties It's exactly what someone using an OpenGL vector type
would expect. You can make these into arrays, use them in other structs
which are made into arrays, and pass them to OpenGL just fine.

But I hit some performance issues. Let's use myvec.x as a baseline. This is
always inlined and as fast as C.

You might expect myvec.r to have the same performance. It does in other
languages but not Swift. The performance penalty is 20X. Yes, twenty times
slower than myvec.x. The performance hit comes entirely from dynamic
dispatch. 10X because it's not inlined, and another 10X because Vector2 is
a template.

I can get rid of 10X of that by writing my own preprocessor for the
template. There's only four scalar types that are valid for OpenGL so this
really isn't that hard. But it's not a complete solution and preprocessing
core languages features only to gain performance is an indication the
compiler isn't doing optimization as well as it could.

Subscript access is the same 20X slower for the same reasons. The switch
disappears into the noise. But I'm still tempted to use a precondition and
cast to an UnsafePointer.

I'm aware you can mark a class final and a method private to enable
inlining. Except this isn't a class and making the API private, well, it's
not an API then. Forcing @inline(__always) doesn't seem to do anything.

Perhaps I could just not make this a module and leave everything internal.
Supposedly it'd be inlined when whole module optimization is enabled.
Except that doesn't happen.

How can I get these property aliases to be inlined? Is it possible today?
Will it be possible in the future? Is there a different pattern for vectors
that's better suited to the task?

-david (https://github.com/AE9RB/SwiftGL\)

Do you have the optimizer enabled (using -O)? I see inlining happening as you'd expect. This:

let x = Vector2(x: 1, y: 1)
foo(x.x)
foo(x.r)

optimizes down to:

  %15 = integer_literal $Builtin.Int64, 1 // user: %16
  %16 = struct $Int (%15 : $Builtin.Int64) // users: %17, %17, %20, %21
  %17 = struct $Vector2<Int> (%16 : $Int, %16 : $Int) // user: %18
  // function_ref foo.foo (Swift.Int) -> ()
  %19 = function_ref @_TF3foo3fooFSiT_ : $@convention(thin) (Int) -> () // users: %20, %21
  %20 = apply %19(%16) : $@convention(thin) (Int) -> ()
  %21 = apply %19(%16) : $@convention(thin) (Int) -> ()

forwarding the constant 1 from the Vector2 constructor to foo as one would hope the optimizer would.

-Joe

···

On Dec 18, 2015, at 12:07 PM, David Turnbull via swift-users <swift-users@swift.org> wrote:

I'm working on a math library for OpenGL. At the core of this are, of course, scalar types. The scalars are grouped into vectors of length 2, 3, and 4. The vectors are used to build matrices of all variations from 2x2 to 4x4.

In order to be performant, scalars, vectors, and matrices must all be values types aka structs. This way, for example, an Array<Vector3<Float>> can be passed directly to OpenGL without any copying. In my testing so far, Swift does this quite well.

Ideally, I'd do something like this:

public struct Vector2<T:ScalarType> : Array<T, 2> {
    public var x:T { get {return self[0]} set {self[0] = newValue} }
    public var y:T { get {return self[1]} set {self[1] = newValue} }
}

But there's so much wrong with that. You can't use inheritance with structs. Array isn't really a struct; the docs say it is but really it's a reference to a special copy-on-write value type. Array can't be a fixed size. You can't use literals with generic placeholders. Ok, fine, I accept this isn't C++, let's move on to something Swifty.

public struct Vector2<T:ScalarType> {
    public var x:T, y:T

    public var r:T { get {return x} set {x = newValue} }
    public var g:T { get {return y} set {y = newValue} }

    public var s:T { get {return x} set {x = newValue} }
    public var t:T { get {return y} set {y = newValue} }

    public subscript(i: Int) -> T {
        get {
            switch(i) {
            case 0: return x
            case 1: return y
            default: fatalError()
            }
        }
        set {
            switch(i) {
            case 0: x = newValue
            case 1: y = newValue
            default: fatalError()
            }
        }
    }
}

Functionally, this works fine. The x and y properties are the struct data. I can access vectors with subscripts, both coordinate properties, and color properties It's exactly what someone using an OpenGL vector type would expect. You can make these into arrays, use them in other structs which are made into arrays, and pass them to OpenGL just fine.

But I hit some performance issues. Let's use myvec.x as a baseline. This is always inlined and as fast as C.

You might expect myvec.r to have the same performance. It does in other languages but not Swift. The performance penalty is 20X. Yes, twenty times slower than myvec.x. The performance hit comes entirely from dynamic dispatch. 10X because it's not inlined, and another 10X because Vector2 is a template.

I can get rid of 10X of that by writing my own preprocessor for the template. There's only four scalar types that are valid for OpenGL so this really isn't that hard. But it's not a complete solution and preprocessing core languages features only to gain performance is an indication the compiler isn't doing optimization as well as it could.

Subscript access is the same 20X slower for the same reasons. The switch disappears into the noise. But I'm still tempted to use a precondition and cast to an UnsafePointer.

I'm aware you can mark a class final and a method private to enable inlining. Except this isn't a class and making the API private, well, it's not an API then. Forcing @inline(__always) doesn't seem to do anything.

Perhaps I could just not make this a module and leave everything internal. Supposedly it'd be inlined when whole module optimization is enabled. Except that doesn't happen.

How can I get these property aliases to be inlined? Is it possible today? Will it be possible in the future? Is there a different pattern for vectors that's better suited to the task?

-david (https://github.com/AE9RB/SwiftGL\)

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

What about using tuples? You can name the elements of a tuple, and IIRC you can access them using subscripts too.

—Jens

···

On Dec 18, 2015, at 12:07 PM, David Turnbull via swift-users <swift-users@swift.org> wrote:

In order to be performant, scalars, vectors, and matrices must all be values types aka structs. This way, for example, an Array<Vector3<Float>> can be passed directly to OpenGL without any copying. In my testing so far, Swift does this quite well.

Thanks for looking at this. Nobody else here who can check my work.

Last time I tested not-as-a-module it wasn't inlining. Perhaps in that case
I didn't have -O because of something I didn't understand about Xcode. This
time around I added -O to the debug build instead of trying to build for
release. It did inline all my test cases.

When it does inline it's making nice assembly. Even the subscript turns
into exactly the same as myvec.x. So that's pretty awesome.

But it still doesn't inline for me across modules even with -O
-whole-module-optimization. Only the myvec.x cases get inlined. I suspect
this just isn't working and might come in time.

It's good enough for now. At least I don't have to write the internals with
the clumsy mymatrix.w.y style instead of mymatrix[3][1].

-david (https://github.com/AE9RB/SwiftGL\)

···

On Fri, Dec 18, 2015 at 1:36 PM, Joe Groff <jgroff@apple.com> wrote:

Do you have the optimizer enabled (using -O)? I see inlining happening as
you'd expect.

You will also want to have this code in the same module that is using this type.

If you're using these types from another module you're limited to unspecialized generics which are (unsurprisingly) very slow.

I assume these types are intended for your SwiftGL library. If they are only for internal use then these generic types should be fine but if they should ultimately be public I would recommend you use non-generic types for now.

And since you probably need to drop generics anyway it might make sense to simply wrap the respective GLKit types on OS X and iOS for the GLFloat variants as they are already highly optimized. I have some wrappers for the GLKMatrix and GLKVector types lying around in my own OpenGL wrapper (incidentally also named SwiftGL ;-)) which might save you some typing if you're interested...

- Janosch

···

On 18 Dec 2015, at 22:36, Joe Groff via swift-users <swift-users@swift.org> wrote:

Do you have the optimizer enabled (using -O)? I see inlining happening as you'd expect. This:

let x = Vector2(x: 1, y: 1)
foo(x.x)
foo(x.r)

optimizes down to:

  %15 = integer_literal $Builtin.Int64, 1 // user: %16
  %16 = struct $Int (%15 : $Builtin.Int64) // users: %17, %17, %20, %21
  %17 = struct $Vector2<Int> (%16 : $Int, %16 : $Int) // user: %18
  // function_ref foo.foo (Swift.Int) -> ()
  %19 = function_ref @_TF3foo3fooFSiT_ : $@convention(thin) (Int) -> () // users: %20, %21
  %20 = apply %19(%16) : $@convention(thin) (Int) -> ()
  %21 = apply %19(%16) : $@convention(thin) (Int) -> ()

forwarding the constant 1 from the Vector2 constructor to foo as one would hope the optimizer would.

-Joe

On Dec 18, 2015, at 12:07 PM, David Turnbull via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

I'm working on a math library for OpenGL. At the core of this are, of course, scalar types. The scalars are grouped into vectors of length 2, 3, and 4. The vectors are used to build matrices of all variations from 2x2 to 4x4.

In order to be performant, scalars, vectors, and matrices must all be values types aka structs. This way, for example, an Array<Vector3<Float>> can be passed directly to OpenGL without any copying. In my testing so far, Swift does this quite well.

Ideally, I'd do something like this:

public struct Vector2<T:ScalarType> : Array<T, 2> {
    public var x:T { get {return self[0]} set {self[0] = newValue} }
    public var y:T { get {return self[1]} set {self[1] = newValue} }
}

But there's so much wrong with that. You can't use inheritance with structs. Array isn't really a struct; the docs say it is but really it's a reference to a special copy-on-write value type. Array can't be a fixed size. You can't use literals with generic placeholders. Ok, fine, I accept this isn't C++, let's move on to something Swifty.

public struct Vector2<T:ScalarType> {
    public var x:T, y:T

    public var r:T { get {return x} set {x = newValue} }
    public var g:T { get {return y} set {y = newValue} }

    public var s:T { get {return x} set {x = newValue} }
    public var t:T { get {return y} set {y = newValue} }

    public subscript(i: Int) -> T {
        get {
            switch(i) {
            case 0: return x
            case 1: return y
            default: fatalError()
            }
        }
        set {
            switch(i) {
            case 0: x = newValue
            case 1: y = newValue
            default: fatalError()
            }
        }
    }
}

Functionally, this works fine. The x and y properties are the struct data. I can access vectors with subscripts, both coordinate properties, and color properties It's exactly what someone using an OpenGL vector type would expect. You can make these into arrays, use them in other structs which are made into arrays, and pass them to OpenGL just fine.

But I hit some performance issues. Let's use myvec.x as a baseline. This is always inlined and as fast as C.

You might expect myvec.r to have the same performance. It does in other languages but not Swift. The performance penalty is 20X. Yes, twenty times slower than myvec.x. The performance hit comes entirely from dynamic dispatch. 10X because it's not inlined, and another 10X because Vector2 is a template.

I can get rid of 10X of that by writing my own preprocessor for the template. There's only four scalar types that are valid for OpenGL so this really isn't that hard. But it's not a complete solution and preprocessing core languages features only to gain performance is an indication the compiler isn't doing optimization as well as it could.

Subscript access is the same 20X slower for the same reasons. The switch disappears into the noise. But I'm still tempted to use a precondition and cast to an UnsafePointer.

I'm aware you can mark a class final and a method private to enable inlining. Except this isn't a class and making the API private, well, it's not an API then. Forcing @inline(__always) doesn't seem to do anything.

Perhaps I could just not make this a module and leave everything internal. Supposedly it'd be inlined when whole module optimization is enabled. Except that doesn't happen.

How can I get these property aliases to be inlined? Is it possible today? Will it be possible in the future? Is there a different pattern for vectors that's better suited to the task?

-david (https://github.com/AE9RB/SwiftGL\)

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

_______________________________________________
swift-users mailing list
swift-users@swift.org <mailto:swift-users@swift.org>
https://lists.swift.org/mailman/listinfo/swift-users

You will also want to have this code in the same module that is using this
type.
If you're using these types from another module you're limited to
unspecialized generics which are (unsurprisingly) very slow.

This is becoming clear. Hopefully these patterns can be optimized across
modules eventually. It's easy enough to write a pre-processor that expands
the generics into four specializations. But it doesn't solve everything so
not a priority.

And since you probably need to drop generics anyway it might make sense to
simply wrap the respective GLKit types on OS X and iOS for the GLFloat
variants as they are already highly optimized. I have some wrappers for the
GLKMatrix and GLKVector types lying around in my own OpenGL wrapper
(incidentally also named SwiftGL ;-)) which might save you some typing if
you're interested...

I'm trailblazing cross-platform OpenGL in Swift. Given there's only one
other platform, the key question is, "Does it work on Linux?"

Is your SwiftGL online somewhere? A cursory search didn't yield anything.

-david (https://github.com/AE9RB/SwiftGL\)

···

On Fri, Dec 18, 2015 at 2:31 PM, Janosch Hildebrand via swift-users < swift-users@swift.org> wrote:

I know that cross-module whole-program optimization is a long term goal for the build system; at what priority, I'm not sure. We do have some totally not-ready-for-primetime functionality for cross-module inlining and optimization that's used by the standard library and overlays; these libraries use special flags to serialize SIL bytecode into the module file. You can poke around the cmake files to see how the stdlib is built, though you're off the supported path if you try to mimic it. Since you're going down the code generation path already, though, you might also take a peek at the guts of the `simd` overlay:

https://github.com/apple/swift/blob/master/stdlib/public/SDK/simd/simd.swift.gyb

Though there isn't a portable `simd` module supporting it on Linux, it shows how LLVM vector types are exported as Swift builtins that can be used to implement vector types in terms of machine SIMD types.

-Joe

···

On Dec 18, 2015, at 3:13 PM, David Turnbull via swift-users <swift-users@swift.org> wrote:

On Fri, Dec 18, 2015 at 2:31 PM, Janosch Hildebrand via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:
You will also want to have this code in the same module that is using this type.
If you're using these types from another module you're limited to unspecialized generics which are (unsurprisingly) very slow.

This is becoming clear. Hopefully these patterns can be optimized across modules eventually. It's easy enough to write a pre-processor that expands the generics into four specializations. But it doesn't solve everything so not a priority.

You will also want to have this code in the same module that is using this type.
If you're using these types from another module you're limited to unspecialized generics which are (unsurprisingly) very slow.

This is becoming clear. Hopefully these patterns can be optimized across modules eventually. It's easy enough to write a pre-processor that expands the generics into four specializations. But it doesn't solve everything so not a priority.

We do want to be able to perform these sorts of optimizations (among others) across module boundaries. The reason that it has not been implemented yet is that the model for when/how one could (for instance) inline across module boundaries is not finalized. This will be possible once the resilience feature is complete.

···

On Dec 18, 2015, at 5:13 PM, David Turnbull via swift-users <swift-users@swift.org> wrote:
On Fri, Dec 18, 2015 at 2:31 PM, Janosch Hildebrand via swift-users <swift-users@swift.org <mailto:swift-users@swift.org>> wrote:

And since you probably need to drop generics anyway it might make sense to simply wrap the respective GLKit types on OS X and iOS for the GLFloat variants as they are already highly optimized. I have some wrappers for the GLKMatrix and GLKVector types lying around in my own OpenGL wrapper (incidentally also named SwiftGL ;-)) which might save you some typing if you're interested...

I'm trailblazing cross-platform OpenGL in Swift. Given there's only one other platform, the key question is, "Does it work on Linux?"

Is your SwiftGL online somewhere? A cursory search didn't yield anything.

-david (https://github.com/AE9RB/SwiftGL\)

_______________________________________________
swift-users mailing list
swift-users@swift.org
https://lists.swift.org/mailman/listinfo/swift-users

Multiplying two mat4x4 is one of the most complex and common operations.
Doing 2 million of these...

C++: 1.3 seconds
Swift: 1.9 seconds

The C++ code is glm using packed SIMD operations. Swift is just plain
whatever LLVM gave me with -O.

While implementing the multiplication I darn near fell out of my chair when
the compiler said, "Expression was too complex to be solved in reasonable
time; consider breaking up the expression into distinct sub-expressions." I
had to break it in half twice before the compiler stopped complaining about
the math being too hard.

-david (https://github.com/AE9RB/SwiftGL\)

···

On Fri, Dec 18, 2015 at 4:34 PM, Janosch Hildebrand <jnosh@jnosh.com> wrote:

If you only care about having a simple cross-platform library, doing a
simple implementation by yourself is fine. Also Swift (well, LLVM) is also
pretty good at auto-vectorization so you get decent results for custom
vector/matrix types.

However OpenGL-related code is usually pretty performance sensitive and
the math parts doubly so, so I'd recommend wrapping some appropriate
libraries instead of writing your own...

Looks like I was too late :-)
Also thanks for the very informative answers! I had wanted to ask what the plans were in this regard anyway.

- Janosch

···

On 19 Dec 2015, at 00:46, Michael Gottesman via swift-users <swift-users@swift.org> wrote:

This is becoming clear. Hopefully these patterns can be optimized across modules eventually. It's easy enough to write a pre-processor that expands the generics into four specializations. But it doesn't solve everything so not a priority.

We do want to be able to perform these sorts of optimizations (among others) across module boundaries. The reason that it has not been implemented yet is that the model for when/how one could (for instance) inline across module boundaries is not finalized. This will be possible once the resilience feature is complete.

On 19 Dec 2015, at 00:30, Joe Groff via swift-users <swift-users@swift.org> wrote:

This is becoming clear. Hopefully these patterns can be optimized across modules eventually. It's easy enough to write a pre-processor that expands the generics into four specializations. But it doesn't solve everything so not a priority.

I know that cross-module whole-program optimization is a long term goal for the build system; at what priority, I'm not sure. We do have some totally not-ready-for-primetime functionality for cross-module inlining and optimization that's used by the standard library and overlays; these libraries use special flags to serialize SIL bytecode into the module file. You can poke around the cmake files to see how the stdlib is built, though you're off the supported path if you try to mimic it.
-Joe

If you only care about having a simple cross-platform library, doing a simple implementation by yourself is fine. Also Swift (well, LLVM) is also pretty good at auto-vectorization so you get decent results for custom vector/matrix types.

However OpenGL-related code is usually pretty performance sensitive and the math parts doubly so, so I'd recommend wrapping some appropriate libraries instead of writing your own...

Multiplying two mat4x4 is one of the most complex and common operations. Doing 2 million of these...

C++: 1.3 seconds
Swift: 1.9 seconds

The C++ code is glm using packed SIMD operations. Swift is just plain whatever LLVM gave me with -O.

While implementing the multiplication I darn near fell out of my chair when the compiler said, "Expression was too complex to be solved in reasonable time; consider breaking up the expression into distinct sub-expressions." I had to break it in half twice before the compiler stopped complaining about the math being too hard.

:-) It's not the math that's bothering the compiler but type inference which can get pretty nasty if you have many operations in a single statement.
The compiler emits that error to help you reduce compile times.

Depending on the code, specifying the resulting type can also be of help (eg var foo: T = <lots of math here>) but breaking up the code into smaller pieces will always work.

-david (https://github.com/AE9RB/SwiftGL\)

- Janosch

···

On 19 Dec 2015, at 06:19, David Turnbull <dturnbull@gmail.com> wrote:
On Fri, Dec 18, 2015 at 4:34 PM, Janosch Hildebrand <jnosh@jnosh.com <mailto:jnosh@jnosh.com>> wrote: