[Discussion] Swift for Data Science / ML / Big Data analytics

Since you bring it up, Python exceptions will be annoying - As with other languages, Python can throw from an arbitrary expression. Modeling everything as throws in Swift would be super-annoying and unergonomic for the programmer, because we'd require 'try' everywhere. Thoughts on what to do about that are welcome!

Requiring ‘try’ on every statement is annoying, but not having the ability to catch python exceptions is annoying too. We could probably make python exception handling an opt-in feature. For example:

try Python.do {
    let a = np.array([1, 2, 3])
    let b = np.array([[2], [4]])
    print(a.dot(b)) // matrix mul with incompatible shapes
}
catch let error as PythonException {
    // Handle PythonError.valueError(“objects are not aligned”)
}

To correct my example:

do {
    try Python.do {
        let a = np.array([1, 2, 3])
        let b = np.array([[2], [4]])
        print(a.dot(b)) // matrix mul with incompatible shapes
    }
}
catch let error as PythonException {
    // Handle PythonError.valueError(“objects are not aligned”)
}

Maybe ‘Python.do {}’ should be called something like ‘Python.safely {}’.

That’s a super interesting way to model this. I’ll need to ponder on it more, but it is certainly a nice ergonomic solution.

Question though: how does it work? Say the first np.array call threw a python exception:

try Python.do {
        let a = np.array([1, 2, 3])
        let b = np.array([[2], [4]])
        print(a.dot(b)) // matrix mul with incompatible shapes
    }

We can definitely make the python glue code notice it, catch it and squirrel it away somewhere, but without compiler hacks we couldn’t make it jump out of the closure. This means that np.array would have to return something, and the calls below it would still execute, or am I missing something?

-Chris

···

On Nov 1, 2017, at 3:20 AM, Richard Wei <rxrwei@gmail.com> wrote:

Since you bring it up, Python exceptions will be annoying - As with other languages, Python can throw from an arbitrary expression. Modeling everything as throws in Swift would be super-annoying and unergonomic for the programmer, because we'd require 'try' everywhere. Thoughts on what to do about that are welcome!

Requiring ‘try’ on every statement is annoying, but not having the ability to catch python exceptions is annoying too. We could probably make python exception handling an opt-in feature. For example:

try Python.do {
    let a = np.array([1, 2, 3])
    let b = np.array([[2], [4]])
    print(a.dot(b)) // matrix mul with incompatible shapes
}
catch let error as PythonException {
    // Handle PythonError.valueError(“objects are not aligned”)
}

To correct my example:

do {
    try Python.do {
        let a = np.array([1, 2, 3])
        let b = np.array([[2], [4]])
        print(a.dot(b)) // matrix mul with incompatible shapes
    }
}
catch let error as PythonException {
    // Handle PythonError.valueError(“objects are not aligned”)
}

Maybe ‘Python.do {}’ should be called something like ‘Python.safely {}’.

That’s a super interesting way to model this. I’ll need to ponder on it more, but it is certainly a nice ergonomic solution.

Question though: how does it work? Say the first np.array call threw a python exception:

try Python.do {
        let a = np.array([1, 2, 3])
        let b = np.array([[2], [4]])
        print(a.dot(b)) // matrix mul with incompatible shapes
    }

We can definitely make the python glue code notice it, catch it and squirrel it away somewhere, but without compiler hacks we couldn’t make it jump out of the closure. This means that np.array would have to return something, and the calls below it would still execute, or am I missing something?

We make PythonObjects internally nullable (only in the exception-caught state). The second np.array would just return a null PythonObject.

To be specific, we define three states in the python overlay:
- Normal state: PythonObjects are guaranteed to be non-null. Any exception traps.
- Exception-catching state: PythonObjects are still guaranteed to be non-null. Any exception triggers the exception-caught state.
- Exception-caught state: PythonObjects are nullable — all python expressions return a null PythonObject.

The exception-catching state is entered during the execution of Python.do’s body.

-Richard

···

On Nov 1, 2017, at 21:13, Chris Lattner via swift-evolution <swift-evolution@swift.org> wrote:

On Nov 1, 2017, at 3:20 AM, Richard Wei <rxrwei@gmail.com <mailto:rxrwei@gmail.com>> wrote:

-Chris

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

I don’t think that will work well in practice: if you’re intermixing swift and python calls, it would be extremely surprising to see values be computed but produce no value. This seems similar to the ObjC "messaging nil” problem.

That said, I’ve been experimenting with another approach that seems to work well in practice. Due to the language mismatch, a Swift programmer is going to have to decide whether they care about a throwing call or not. As such, I think it makes sense to make this explicit with some kind of syntax - either a postfix operator like ^ or a method like “throwing”. This allows you to write (when we have language sugar) either "a.foo()” or “a.throwing.foo()” if you want to handle the case when foo throws an exception.

This sounds yucky, but actually works pretty well in practice because the “try” warnings (about the absence/excess of try) dovetail really well. Without language support we get something like this:

  // import pickle
  let pickle = Python.import("pickle")

  // file = open(filename)
  guard let file = try? Python.open.throwing.call(args: filename) else {
    fatalError("""
       Didn't find data file at \(filename)!
       Update the DataPath at the top of the file.\n
       """)
  }

  // blob = file.read()
  let blob = file.get(member: "read").call(args: )

  // a, b, c = pickle.loads(blob)
  let (a, b, c) = pickle.get(member: "loads").call(args: blob).get3Tuple()

When we allow sugaring 'x.get(member: “foo”)’ into ‘x.foo’ and allow sugaring x.call(args: a, b, c) into ‘x(a, b,c)’, we’ll get this code:

  // import pickle
  let pickle = Python.import("pickle")

  // file = open(filename)
  guard let file = try? Python.open.throwing(filename) else {
    fatalError("""
       Didn't find data file at \(filename)!
       Update the DataPath at the top of the file.\n
       """)
  }

  // blob = file.read()
  let blob = file.read()

  // a, b, c = pickle.loads(blob)
  let (a, b, c) = pickleloads(blob).get3Tuple()

Which is pretty nice. We can talk about making tuple destructuring extensible later. :-)

Here’s the prototype that backs the code above - it is hacky, has known suboptimality, and probably isn’t self contained. I’ll clean this up and write up a proposal for the two bits of sugar above when I have time.

-Chris

/// This represents the result of a failable operation when working with
/// python values.
public enum PythonError : Error {
  /// This represents a null IUO being passed into a PyRef. This should only
  /// occur when working with C APIs.
  case nullValue
  /// This represents an exception thrown out of a Python API. This can occur
  /// on calls.
  case exception(_ : PyRef)
}

/// Reference to a Python value. This is always non-null and always owning of
/// the underlying value.
public final class PyRef : PythonObjectionable {
  private var state : UnsafeMutablePointer<PyObject>

  var borrowedPyObject : UnsafeMutablePointer<PyObject> {
    return state
  }
  var ownedPyObject : UnsafeMutablePointer<PyObject> {
    return py_INCREF(state)
  }
  public init(ownedThrowing: UnsafeMutablePointer<PyObject>!) throws {
    if let owned = ownedThrowing {
      state = owned
    } else {
      throw PythonError.nullValue
    }
  }
  public convenience
  init(borrowedThrowing: UnsafeMutablePointer<PyObject>!) throws {
    try self.init(ownedThrowing: borrowedThrowing)
    py_INCREF(state)
  }
  deinit {
    py_DECREF(state)
  }

  public convenience init(owned: UnsafeMutablePointer<PyObject>!) {
    try! self.init(ownedThrowing: owned)
  }
  public convenience init(borrowed: UnsafeMutablePointer<PyObject>!) {
    try! self.init(borrowedThrowing: borrowed)
  }
  public convenience init?(python: UnsafeMutablePointer<PyObject>) {
    self.init(borrowed: python)
  }
  public func toPythonObject() -> PyRef {
    return self
  }

  /// Return a version of this value that throws when an error occurs on its
  /// next use.
  public var throwing : ThrowingPyRef {
    return ThrowingPyRef(self)
  }

  public func throwingGet(member: String) throws -> PyRef {
    return try PyRef(borrowedThrowing: PyObject_GetAttrString(state, member))
  }
  public func get(member: String) -> PyRef {
    return try! throwingGet(member: member)
  }
  public func throwingGet(dictMember: PythonObjectionable) throws -> PyRef {
    return try PyRef(borrowedThrowing:
      PyDict_GetItem(state, dictMember.toPythonObject().borrowedPyObject))
  }

  public func get(dictMember: PythonObjectionable) -> PyRef {
    return try! throwingGet(dictMember: dictMember)
  }
  /// Swift subscripts cannot throw yet, so model this as returning an optional
  /// reference.
  public subscript(throwing idx : PythonObjectionable) -> PyRef? {
    let item = PyObject_GetItem(self.state,
                                idx.toPythonObject().borrowedPyObject)
    return try? PyRef(borrowedThrowing: item)
  }

  public subscript(idx : PythonObjectionable) -> PyRef {
    return self[throwing: idx]!
  }

  public func throwingGet(tupleItem: Int) throws -> PyRef {
    return try PyRef(borrowedThrowing: PyTuple_GetItem(state, tupleItem))
  }

  public func get(tupleItem: Int) -> PyRef {
    return try! throwingGet(tupleItem: tupleItem)
  }
  // Helpers for destructuring tuples
  public func get2Tuple() -> (PyRef, PyRef) {
    return (get(tupleItem: 0), get(tupleItem: 1))
  }
  public func get3Tuple() -> (PyRef, PyRef, PyRef) {
    return (get(tupleItem: 0), get(tupleItem: 1), get(tupleItem: 2))
  }

  /// Call self, which must be a Python Callable.
  public
  func throwingCall(args: [PythonObjectionable],
                    kwargs: [(PythonObjectionable,PythonObjectionable)] = )
    throws -> PyRef {
    // Make sure state errors are not around.
    assert(PyErr_Occurred() == nil, "Python threw an error but wasn't handled")

    let kwdict = kwargs.isEmpty ? nil : pyDict(kwargs)

    // Python calls always return a non-null value when successful. If the
    // Python function produces the equivalent of C "void", it returns the None
    // value. A null result of PyObjectCall happens when there is an error,
    // like 'self' not being a Python callable.
    let result = try PyRef(ownedThrowing:
      PyObject_Call(state, pyTuple(args), kwdict))

    // Translate a Python exception into a Swift error if one was thrown.
    if let exception = PyErr_Occurred() {
      PyErr_Clear()
      throw PythonError.exception(PyRef(borrowed: exception))
    }

    return result
  }

  /// Call self, which must be a Python Callable.
  public
  func call(args: [PythonObjectionable],
            kwargs: [(PythonObjectionable,PythonObjectionable)] = ) -> PyRef {
    return try! throwingCall(args: args, kwargs: kwargs)
  }
  /// Call self, which must be a Python Callable.
  public
  func call(args: PythonObjectionable...,
            kwargs: [(PythonObjectionable,PythonObjectionable)] = ) -> PyRef {
    return try! throwingCall(args: args, kwargs: kwargs)
  }

  // Run the specified closure on the borrowed function guaranteeing the pointer
  // isn't deallocated while the closure runs.
  public func borrowedMap<T>(_ fn: (UnsafeMutablePointer<PyObject>)->T) -> T {
    return withExtendedLifetime(self) {
      return fn(borrowedPyObject)
    }
  }
}

/// Reference to a Python value. This always throws when handed a null object
/// or when a call produces a Python exception.
public struct ThrowingPyRef {
  private var state : PyRef

  public init(_ value : PyRef) {
    state = value
  }
  public init(owned: UnsafeMutablePointer<PyObject>!) throws {
    state = try PyRef(ownedThrowing: owned)
  }
  public init(borrowed: UnsafeMutablePointer<PyObject>!) throws {
    state = try PyRef(borrowedThrowing: borrowed)
  }

  public func get(member: String) throws -> PyRef {
    return try state.throwingGet(member: member)
  }

  public func get(dictMember: PythonObjectionable) throws -> PyRef {
    return try state.throwingGet(dictMember: dictMember)
  }
  /// Swift subscripts cannot throw yet, so model this as returning an optional
  /// reference.
  public subscript(idx : PythonObjectionable) -> PyRef? {
    return state[throwing: idx]
  }

  public func get(tupleItem: Int) throws -> PyRef {
    return try state.throwingGet(tupleItem: tupleItem)
  }

  public func get2Tuple() throws -> (PyRef, PyRef) {
    return try (get(tupleItem: 0), get(tupleItem: 1))
  }
  public func get3Tuple() throws -> (PyRef, PyRef, PyRef) {
    return try (get(tupleItem: 0), get(tupleItem: 1), get(tupleItem: 2))
  }

  /// Call self, which must be a Python Callable.
  public func call(args: [PythonObjectionable],
                   kwargs: [(PythonObjectionable,PythonObjectionable)] = )
                   throws -> PyRef {
    return try state.throwingCall(args: args, kwargs: kwargs)
  }

  /// Call self, which must be a Python Callable.
  public func call(args: PythonObjectionable...,
                   kwargs: [(PythonObjectionable,PythonObjectionable)] = )
                   throws -> PyRef {
    return try state.throwingCall(args: args, kwargs: kwargs)
  }
}

extension ThrowingPyRef : PythonObjectionable {
  public init?(python: UnsafeMutablePointer<PyObject>) {
    self.init(PyRef(python: python)!)
  }
  public func toPythonObject() -> PyRef {
    return state
  }
}

let builtinsObject = PyEval_GetBuiltins()!

public enum Python {
  public static func `import`(_ name: String) -> PyRef {
    return PyRef(owned: PyImport_ImportModule(name)!)
  }

  public static var builtins : PyRef {
    return PyRef(borrowed: builtinsObject)
  }

  // TODO: Make the Python type itself dynamically callable, so that things like
  // "Python.open" naturally resolve to Python.get(member: "open") and all the
  // builtin functions are therefore available naturally and don't have to be
  // enumerated here.
  public static var open : PyRef { return builtins["open"] }
  public static var repr : PyRef { return builtins["repr"] }
}

/// Make “print(pyref)" print a pretty form of the tensor.
extension PyRef : CustomStringConvertible {
  public var description: String {
    return String(python: self.call(member: "__str__"))!
  }
}

// Make PyRef's show up nicely in the Xcode Playground results sidebar.
extension PyRef : CustomPlaygroundQuickLookable {
  public var customPlaygroundQuickLook: PlaygroundQuickLook {
    return .text(description)
  }
}

//===----------------------------------------------------------------------===//
// Helpers working with PyObjects
//===----------------------------------------------------------------------===//

// Create a Python tuple object with the specified elements.
public func pyTuple(_ vals : [PythonObjectionable])
  -> UnsafeMutablePointer<PyObject> {
  let t = PyTuple_New(vals.count)!
  for (idx, elt) in vals.enumerated() {
    PyTuple_SetItem(t, idx, elt.toPythonObject().ownedPyObject)
  }
  return t
}

public func pyTuple(_ vals : PythonObjectionable...)
  -> UnsafeMutablePointer<PyObject> {
  return pyTuple(vals)
}

public func pyList(_ vals : PythonObjectionable...)
  -> UnsafeMutablePointer<PyObject> {
  return pyList(vals)
}
public func pyList(_ vals : [PythonObjectionable])
  -> UnsafeMutablePointer<PyObject> {
  let list = PyList_New(vals.count)!
  for (idx, elt) in vals.enumerated() {
    PyList_SetItem(list, idx, elt.toPythonObject().ownedPyObject)
  }
  return list
}

private func pyDict(_ elts : [(PythonObjectionable,PythonObjectionable)])
  -> UnsafeMutablePointer<PyObject> {
  let dict = PyDict_New()!
  for (key, val) in elts {
    PyDict_SetItem(dict, key.toPythonObject().ownedPyObject,
                   val.toPythonObject().ownedPyObject)
  }
  return dict
}

public func pySlice(_ start: PythonObjectionable,
                    _ end: PythonObjectionable,
                    _ step : PythonObjectionable? = nil)
  -> UnsafeMutablePointer<PyObject> {
  let stepv = step.flatMap { $0.toPythonObject().ownedPyObject }

  return PySlice_New(start.toPythonObject().ownedPyObject,
                     end.toPythonObject().ownedPyObject, stepv)!
}

···

On Nov 2, 2017, at 4:39 AM, Richard Wei <rxrwei@gmail.com> wrote:

Question though: how does it work? Say the first np.array call threw a python exception:

try Python.do {
        let a = np.array([1, 2, 3])
        let b = np.array([[2], [4]])
        print(a.dot(b)) // matrix mul with incompatible shapes
    }

We can definitely make the python glue code notice it, catch it and squirrel it away somewhere, but without compiler hacks we couldn’t make it jump out of the closure. This means that np.array would have to return something, and the calls below it would still execute, or am I missing something?

We make PythonObjects internally nullable (only in the exception-caught state). The second np.array would just return a null PythonObject.

To be specific, we define three states in the python overlay:
- Normal state: PythonObjects are guaranteed to be non-null. Any exception traps.
- Exception-catching state: PythonObjects are still guaranteed to be non-null. Any exception triggers the exception-caught state.
- Exception-caught state: PythonObjects are nullable — all python expressions return a null PythonObject.

The exception-catching state is entered during the execution of Python.do’s body.

Looks like there is an update to this topic:
https://www.tensorflow.org/community/swift
(but time zones are hard, maybe this was posted somewhere where March is already over ;-)

3 Likes

1st really supportive of the effort to allow interoperability with other languages - already one of Swift’s strong points.

I do however have a number of concerns:

  1. The proposal is concentrating on Python; would it make sense to investigate other languages in parallel? (Obviously the immediate application is FlowTensors which is in Python - so this is understandable.)
  2. How would it work with a GC language, e.g. JVM: Kotlin, Scala, Java, etc.?
  3. What about another compiled language like C++?
1 Like

Which proposal? There's one proposal for Python interop that's already accepted (Proposal: Introduce User-defined "Dynamic Member Lookup" Types), and I don't know the status of [Pitch] Introduce user-defined dynamically "callable" types
The Tensorflow-thingy is a different beast, though.
There's not much information available now, but my impression is that this could question the whole concept of Swift Evolution in it's current form...

@Chris_Lattner3 and I introduced Swift for TensorFlow at the TensorFlow Dev Summit yesterday. It's a whole new approach to machine learning frameworks, and the beginning of Swift being a differentiable programming language with the ability to target supercomputers such as TPU.

In the demo, we showed off some DynamicMemberLookup and DynamicCallable stuff :)

Swift for TensorFlow video | TensorFlow Dev Summit 2018

6 Likes

Can we expect some Swift-Evolution proposals on this?

1 Like

I wouldn't be surprised if the majority of all Swift proposals in 2018 will originate from Swift for TensorFlow ;-)

1 Like

I've just watched the presentation, really interesting, a big push for Swift for Data Science and for TensorFlow as a whole. Thank you, guys!

2 Likes

Absolutely. As mentioned in the talk, all of this needs to go through the evolution process!

1 Like

Yes. We aim to get everything upstream to swift.org through the usual processes. Whether or not auto-diff (for example) is accepted will be up to the community of course.

-Chris

4 Likes

If the Tensor Flow gains traction and the Swift community does not embrace its key differentiators (no pun intended) enough or adds new and different features that Google’s approach can change to or we are seeing a first fork for the language. No my trying to create FUD here or to ruin the celebration, just thinking out loud.

Then again when a language evolves to cover devices from toasters to super computers it is perhaps the only thing that can be expected.

1 Like

I'm a little concerned about this as well. I don't have experience with TensorFlow, but the idea of building a TensorFlow graph from Swift code (that can then run on a GPU cluster over the network transparently) sounds like it would require pretty deep integration with the compiler.

In the case some of these changes (the ones that can't be built as an external package) don't get accepted into Swift, we'd then have two dialects of the language, which might be detrimental to growing the Swift community.

I guess until we see how this all works we can't really say much, but maybe (just thinking out loud) there is a case to be made for some kind of compiler plugin system that would allow projects like this to work without forking the compiler?

That being said the capabilities this supposedly provides sound awesome, and Swift and TensorFlow sound like a great match and a good opportunity for Swift to grow and expand into new communities, so I'm looking forward to playing around with this (and also to the enhancements that do make it into Swift proper).

I started a new thread to talk about Swift for TensorFlow, so it is logical to move this discussion over there (allowing this one to remain true to its more general subject line).

That said, there is no reason or need to speculate about forks. No one involved in Swift for TensorFlow is interested in such a thing, we are very much interested in working with the Swift community to make things fit. I realize that there is possibly no way for my words to satisfy whatever concerns you have, but there is also no need to speculate about something when you don't know what changes are involved :-)

-Chris

3 Likes

Your words fully satisfied my concerns, you are not interested in forking Swift and, maybe jumping to conclusions based on what you said, if changes needed for Swift for Tensor flow were not accepted in Swift then Swift for Tensor Flow would adapt to use solutions that would be accepted in Swift mainline hence Swift will not be forked.

@rxwei @Chris_Lattner3 @masters3d
All available for Swift 4.0/4.1 macOS / Ubuntu 16.04, no extra dependencies:

if there was anything valuable to the contribution of TensorFlow 1.8 + Swift 4.2, please just grab and go, my pleasure.

Rocky Wei.