[ANN] CeedNumerics released (SN's ShapedArray API discussion)

At this stage, this remains an assertion not backed by evidence. In any case, this is a point where reasonable people absolutely may disagree, and the great thing about OSS is you're allowed to do whatever you want!

I will say: if you are interested in having views on shared data, I would strongly consider having the views be non-mutable. Spooky action at a distance through mutation is a really tricky thing to debug.

Oh gosh I'm sorry, my mistake!

It's worth noting that mutable views on shared data are almost¹ necessary for efficient implementation of a lot of numerical computations. Consider a multi-threaded algorithm where each worker is updating a tile of a matrix; no two threads write to the same elements (and ideally not even to the same cachelines), but they are writing to the same allocation.

However, I think that it's appropriate to put operations like this behind API that follows the basic .withUnsafeMutable... etc patterns, as it should mostly be implementation details, rather than the primary way users interact with such a module.

¹ There's some exciting progress on ideas like fractional ownership that might give a way out in the long-term, but it's still pretty early-stages.

1 Like

Strongly agreed @scanon, but it makes me nervous to have those be the primary interface by which one accesses the data. Sharing data is fine, but it's usually good to be able to express the difference between "I need to compute on shared data" and "I don't want people to change this data underneath me".

1 Like

Yes, RO/RW views would make sense.

To be grounded in reality, I'd be interested if you could go through this simple example, on how it could be implemented with a value type. This is a very common use case, well handled in NumPy, easy to read, easy to write. The matrix slicing here returns another matrix from the same shared data, work is performed on it. No allocation, simple lightweight types. I'd be interested to determine, with value semantics, whether it is possible, which types would be involved (subscripting return value type, in particular), and whether allocations can be avoided to achieve it.

let mat = Matrix<Double>(5000, 5000)

mat[1000~4000, 1000~4000] += 1.0

Unfortunately you picked one of the easier cases to achieve with CoW types. Here's some code that demonstrates the behaviour. Please ignore the use of ManagedBuffer, which I did solely to avoid futzing around with allocations (though it does give us a nice tail-allocation property you may want to investigate in your code), and the fact that these matrix types are deeply useless. This isn't intended as a fully-fledged replacement, just a demonstration of the principle.

internal class MatrixStorage<Type>: ManagedBuffer<MatrixHeader, Type> where Type: Numeric {
    /// Builds an x by y matrix.
    class func buildZeroing(xSize: Int, ySize: Int) -> MatrixStorage<Type> {
        let header = MatrixHeader(xSize: xSize, ySize: ySize)
        let newObject = self.create(minimumCapacity: header.totalBufferSize) { _ in
            return header
        } as! MatrixStorage<Type>

        newObject.withUnsafeMutableBufferPointerToElements { elements in
            elements.initialize(repeating: .zero)
        }
        return newObject
    }

    class func buildCopying(original: MatrixStorage<Type>) -> MatrixStorage<Type> {
        let newHeader = original.withUnsafeMutablePointerToHeader { $0.pointee }
        let newObject = self.create(minimumCapacity: newHeader.totalBufferSize) { _ in
            return newHeader
        } as! MatrixStorage<Type>

        newObject.withUnsafeMutableBufferPointerToElements { newElementBuffer in
            original.withUnsafeMutableBufferPointerToElements { originalElementBuffer in
                _ = newElementBuffer.initialize(from: originalElementBuffer)
            }
        }

        return newObject

    }

    var xSize: Int {
        return self.withUnsafeMutablePointerToHeader { $0.pointee.xSize }
    }

    var ySize: Int {
        return self.withUnsafeMutablePointerToHeader { $0.pointee.ySize }
    }

    var totalBufferSize: Int {
        return self.withUnsafeMutablePointerToHeader { $0.pointee.totalBufferSize }
    }

    func withUnsafeMutableBufferPointerToElements<T>(_ body: (UnsafeMutableBufferPointer<Type>) throws -> T) rethrows -> T {
        return try self.withUnsafeMutablePointers { (header, elements) in
            let size = header.pointee.totalBufferSize
            let elementBuffer = UnsafeMutableBufferPointer(start: elements, count: size)
            return try body(elementBuffer)
        }
    }
}


internal struct MatrixHeader {
    var xSize: Int
    var ySize: Int

    var totalBufferSize: Int {
        return xSize * ySize
    }
}

public struct Matrix<Element> where Element: Numeric {
    private(set) var xRange: Range<Int>

    private(set) var yRange: Range<Int>

    private var storage: MatrixStorage<Element>

    public init(_ xDimension: Int, _ yDimension: Int) {
        self.xRange = 0..<xDimension
        self.yRange = 0..<yDimension
        self.storage = .buildZeroing(xSize: xDimension, ySize: yDimension)
    }

    private init(xRange: Range<Int>, yRange: Range<Int>, storage: MatrixStorage<Element>) {
        precondition(xRange.lowerBound >= 0)
        precondition(xRange.upperBound <= storage.xSize)
        precondition(yRange.lowerBound >= 0)
        precondition(yRange.upperBound <= storage.ySize)

        self.xRange = xRange
        self.yRange = yRange
        self.storage = storage
    }
}

extension Matrix {
    // Note that we are not zero indexed here.
    public subscript(_ xRange: Range<Int>, _ yRange: Range<Int>) -> Matrix<Element> {
        get {
            precondition(xRange.lowerBound >= self.xRange.lowerBound)
            precondition(xRange.upperBound <= self.xRange.upperBound)
            precondition(yRange.lowerBound >= self.yRange.lowerBound)
            precondition(yRange.upperBound <= self.yRange.upperBound)

            return Matrix(xRange: xRange, yRange: yRange, storage: self.storage)
        }

        _modify {
            precondition(xRange.lowerBound >= self.xRange.lowerBound)
            precondition(xRange.upperBound <= self.xRange.upperBound)
            precondition(yRange.lowerBound >= self.yRange.lowerBound)
            precondition(yRange.upperBound <= self.yRange.upperBound)

            // We (the struct) are uniquely owned here, so we can temporarily modify our range.
            let originalXRange = self.xRange
            let originalYRange = self.yRange
            defer {
                self.xRange = originalXRange
                self.yRange = originalYRange
            }

            self.xRange = xRange
            self.yRange = yRange

            yield &self
        }
    }
}


extension Matrix {
    public static func +=(lhs: inout Matrix, rhs: Element) {
        if !isKnownUniquelyReferenced(&lhs.storage) {
            print("copying")
            lhs.storage = .buildCopying(original: lhs.storage)
        }

        // We stride over by the y index.
        let yIndexStride = lhs.storage.xSize
        let (xRange, yRange) = (lhs.xRange, lhs.yRange)

        lhs.storage.withUnsafeMutableBufferPointerToElements { elements in
            for y in yRange {
                for x in xRange {
                    elements[x + (yIndexStride * y)] += rhs
                }
            }
        }
    }
}

If you have a main.swift that imports this code and looks like this:

var mat = Matrix<Double>(5000, 5000)

mat[1000..<4000, 1000..<4000] += 1.0

The this code will never print "copying": we don't have to copy the matrix in order to achieve the goal.

The magic here is the _modify accessor on the subscript, which allows us to tell the compiler that direct modifications on the subscript operation do not need to leave the original object intact. Normally a subscript operation like mat[1000..<4000, 1000..<4000] += 1.0 will call get, then modify the object returned from get, then set it back. With _modify we can yield out an object with temporary lifetime, that exists just long enough for the modification to occur.

This allows us to avoid a CoW operation here. We can nest these modifications arbitrarily, as well.

4 Likes

I read about _modify some time ago, I think it was before it was introduced, but I've never used it so far (and I'm not too familiar with what that allows). Thank you for demoing it.

What I was interested in seeing, is how you could work on some part of a type, and whether something similar to ArraySlice was needed. Like iterating (R/O or R/W) on the columns of a matrix and use/modify the matrix data itself. For instance:

var mat = Matrix<Double>(5000, 5000)
for column in 0 ..< mat.size.column {
    var col = mat[~,column]
    col += Double(column)
}

There's some simplicity when you know that you always access the same, single data, unlike with value type where some duplication might occur (which to the beginner, might look like non-determinism). I have this example in mind where in a notebook (think Jupyter) some variable get set to a second let constant, then the original variable is modified, which triggers a CoW allocation, which crashes the notebook kernel as that was referencing a 1GB data. Such variable manipulations would be very common in practice, yet a crash would never occur under NumPy, which makes it well suited to experimentation. This might be hard to replicate under value type.

Is there a consensus that value type is the best approach here?

At a certain point we get stuck into a conversation about preferred style, and then we enter somewhere without defined true or false answers. Considering your program above, that program naturally does not do what you want when mat is a value type, because col is a separate value than mat, and you never assign it back. This means it must have a different result than the one where it's a view on a reference mat object.

The equivalent value-typed program is:

var mat = Matrix<Double>(5000, 5000)
for column in 0 ..< mat.size.column {
    mat[~,column] += Double(column)
}

And yes, in that program we will not allocate or CoW if you are careful to use the modify accessor. The risks begin if you create too many intermediaries, as the program below may CoW:

var mat = Matrix<Double>(5000, 5000)
for column in 0 ..< mat.size.column {
    var col = mat[~,column]
    col += Double(column)
    mat[~,column] = col
}

Whether this triggers a CoW is down to whether the optimiser is capable of observing that this program could use the _modify accessor. I am honestly not sure whether it can today. The only way I know to guarantee that the _modify accessor will be used is to write everything in a single logical load/store statement. I view this as a non-permanent optimisation limitation, but you may reasonably view it differently.

One caveat though, is that inout that replaces the storage could cause a problem.

func foo(_ value: inout Matrix<Int>) {
  value = newMatrix
}

var a = ...
foo(&a[...]) // Now `a` lost its storage after `yield`.

If that happens, when the we will lost the original storage since the yielding value is the only one that has it.

And I couldn't figure it out a way to avoid both copying and have it work with this. Maybe _modify could be changed to work during the proposal. :thinking:

Yes, good catch, that would be problematic. We'd have to update the _modify code above to check whether that happened and, if it did, to copy the bytes back into the original storage.

At that point we'd have no strong reference to the original storage right? Since yield &self needs self to be the only one holding storage to avoid copying for any mutation during yield.

Yeah, that's right. I think that implementation simply doesn't work: it's not really possible to pass the slice out that way without either retaining the original storage (which will require mutation to CoW) or to accept that you could lose data.

I think we could still get this to work, but it requires more caution in the implementation.

I tried to do it yesterday, the closest I can think of is to have UnownedMatrix that uses unowned reference, but then it risks exposing the implementation details, and could have scenarios with bad reference to UnownedMatrix.

The conclusion I came to yesterday is that the ARC needs to get smarter, WAY smarter, or we need to further push ownership design or something similar.

Yeah, this is definitely tricky.

Definitely. Value semantics are pretty fundamental to Swift - in fact, off the top of my head, I can't think of any classes in the standard library :thinking:. If it doesn't work, all of Swift falls down. And if you don't like value semantics, can't accept it, find them confusing or whatever: you're not going to have much fun with Swift in general. This stuff has to work.

That said, I find the fact that the example above needs to use non-public features like _modify and yield pretty troubling (I wouldn't call it "one of the easier cases to achieve with CoW types" on that basis alone). Points like you raised about the trickiness of the design are also worrying; we should have figured this out by now so we have a simple story to tell when somebody (like OP) comes by, sceptical that value semantics will give them the performance they need.

1 Like

Let me be clear, I absolutely love value semantics. It's smart, it's clean. I've been using it for > 4 years, and it works very well for all kinds of use cases I've been through so far.

Here though, I fear it could be a problem to use value semantics for vector/matrix/tensor from my experience of using NumPy/DNN and in-app buffer allocations, that could lead to mem crashes, user strategies to avoid CoW, compiler optimization dependency.

There's probably no point in further discussing it though. If value type is picked, I'd definitely be curious to see how that plays out & how problems are solved.

Just want to add, that it is indeed an interesting case.

At other places it does drop a whole layer from collection -> element, so read-modify does help avoiding copy. This one is very particular that it drops half a level from collection -> sequence (or not at all—collection -> collection), which is a legit scenario.
Still, it does seem the interplay between

  • get-set (read-modify) semantic
  • the difference between modify inplace (+=) and replacement (=), and
  • ARC

does force the copy. Even Array -> ArraySlice simply use get-set.

Definitely something Swift could improve upon.

Hi all!

I just wanted to chime in that Swift for TensorFlow's Tensor type today has value semantics. We've been exploring this point in design space for a little while and have some experience. A couple high level notes:

  1. When working with automatic differentiation, value semantics compose quite nicely. From my perspective, value semantics is independent of whether the underlying data is "large" (or lives on an accelerator with limited memory capacity) or not. (That said, we're still working on gathering more use cases (both internal and external to Alphabet) to validate this mental model.)
  2. We have encountered instances where mutation of tensors within a composite data structure (e.g. a DNN model) is important, and being able to mutate an underlying buffer is a convenient way to think about things. For now, we use key paths to simulate the multiple pointers to an underlying buffer. Unfortunately, Swift's current implementation of key paths are not the most friendly to work with, and is a bit of a sore spot.
  3. I'd encourage folks to check out the recent open design review [deep link] on SwiftRT by Ed Connell, which covers important aspects including views, shared mutable references (to disjoint subsets of a Tensor) for multithreaded computation, and multiple devices (e.g. accelerators like GPUs and TPUs). (Alas, I believe the video for the meeting has been eaten by cyberspace, but the design docs & code are available. Please do consider joining swift@tensorflow.org to get the calendar invitation for all future S4TF open design meetings.)
  4. Just FYI: We're in the process of reworking significant aspects of Swift for TensorFlow (S4TF) to improve performance. (S4TF's performance today is not representative of where it will be soon, nor value semantics for Tensors.)
  5. We're excited for more powerful ownership model, more reliable semantics (e.g. across different optimization levels), and more! :-)

Happy to chat more!

All the best,
-Brennan

6 Likes

What problems are you having with key paths?

3 Likes

I have advice for you. Name it "Numiracle" :slightly_smiling_face:

1 Like

Great question @Joe_Groff! (And sorry about saying something without providing useful details.) KeyPaths themselves are reasonable for their functionality, but in order to build more powerful abstractions on top of them, we needed to extend KeyPath functionality (e.g. with KeyPathIterable.) There are definitely things I think we can improve in the design and implementation of the key path extensions, but I also think we could benefit from improvements to KeyPath's themselves, such as the ability to debug what a constructed KeyPath refers to more easily. (e.g. The ability to print out a representation of the path to debug.) We currently are focused on improving the performance of S4TF (due to TF-side / C++ side inefficiencies), so we aren't pushing on this at the moment. I hope this helps, and please feel free to reach out with additional questions if you have them! :-)

All the best,
-Brennan

Terms of Service

Privacy Policy

Cookie Policy