[Discussion] Swift for Data Science / ML / Big Data analytics

Hi Chris and everyone else,

So PerfectlySoft made Perfect-Python months ago to help Swift import python objects and libraries. However, it may reduce the strong type checking. Any idea to avoid it?

https://github.com/PerfectlySoft/Perfect-Python

You can also try Perfect-TensorFlow as well - the TensorFlow Swift Binding, which supports latest one - TensorFlow 1.4.0 released today.

https://github.com/PerfectlySoft/Perfect-TensorFlow

Rocky

···

On Oct 28, 2017, at 9:45 AM, Maxim Veksler via swift-evolution <swift-evolution at swift.org <https://lists.swift.org/mailman/listinfo/swift-evolution&gt;&gt; wrote:
>
> Hey Guys,
>
> The big data and machine learning world is dominated by Python, Scala an R.
>
> I'm a Swifter by heart, but not so much by tools of trait.

Hi Max,

I’m very interested in this topic, with a specific focus on Python. It isn’t the immediate thing on my priority list to deal with, but I hope that we get to push on this.

In short, I think we should build a simple Swift/Python interop story. This sort of thing has be built numerous times for many languages (owing to Python’s great support for embed-ability), including things like PyObjC, boost.python, and many others.

In Swift, it is straightforward to make this example (Python Numpy Tutorial (with Jupyter and Colab) <Python Numpy Tutorial (with Jupyter and Colab)) look something like this:

  let np = Python.import(“numpy”) // Returns a value of type Python.Object.
  let a = np.array([1, 2, 3])
  print(type(a)) // Whether we want to support type(x) or use the Swift equivalent would be up for discussion of course!
  print(a.shape)
  print(a[0], a[1], a[2])
  a[0] = 5
  print(a)

  let b = np.array([[1,2,3],[4,5,6]])
  print(b.shape)
  print(b[0, 0], b[0, 1], b[1, 0])

… which is to say, exactly identical to the Python version except that new variables need to be declared with let/var. This can be done by blessing Python.Object (which is identical to “PyObject*” at the machine level) with some special dynamic name lookup behavior: Dot syntax turns into a call to PyObject_GetAttrString, subscripts turn into PyObject_GetItem, calls turn into PyObject_Call, etc. ARC would be implemented with INCREF etc.

If we do this, the vast majority of the Python ecosystem should be directly usable from within Swift code, and the only a few major syntactic differences (e.g. ranges work differently). We would add failable inits to the primitive datatypes like Int/String/etc to convert Python.Object values into them, and add the corresponding non-failable conversions from Python.Object to those primitives.

Overall, I think it will provide a really nice experience, and allow us to leverage the vast majority of the Python ecosystem directly in Swift code. This project would also have much more narrow impact on the Swift compiler than the ObjC importer (since it works completely differently). For a first cut, I don’t think we would have to worry about Swift classes subclassing Python classes, for example.

-Chris

>
> I'd appreciate a constructive discussion on how that could be changed.
>
> While R is a non goal for obvious reasons, i'd argue that since both Scala and Python are general purpose languages, taking them head to head might be a low hanging fruit.
>
> To make the claim I'd like to reference to projects such as
>
> - Hadoop, Spark, Hive are all huge eco-systems which are entirely JVM based.
> - Apache Parquet, a highly efficient column based storage format for big data analytics which was implemented in Java, and C++.
> - Apache Arrow, a physical memory spec that big data systems can use to allow zero transformations on data transferred between systems. Which (for obvious reasons) focused on JVM, to C interoperability.
>
> Python's Buffer Protocol which ensures it's predominance (for the time being) as a prime candidate for data science related projects Python is the fastest growing programming language due to a feature you've never heard of <https://jeffknupp.com/blog/2017/09/15/python-is-the-fastest-growing-programming-language-due-to-a-feature-youve-never-heard-of/&gt;
>
> While Swift's Memory Ownership manifesto touches similar turf discussing copy on write and optimizing memory access overhead it IMHO takes a system level perspective targeting projects such as kernel code. I'd suggest that viewing the problem from an efficient CPU/GPU data crunching machine perspective might shade a different light on the requirements and use cases.
>
>
> I'd be happy to learn more, and have a constructive discussion on the subject.
>
>
> Thank you,
> Max.
>
>
> --
> puıɯ ʎɯ ɯoɹɟ ʇuǝs
> _______________________________________________
> swift-evolution mailing list
> swift-evolution at swift.org <https://lists.swift.org/mailman/listinfo/swift-evolution&gt;
> https://lists.swift.org/mailman/listinfo/swift-evolution

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.swift.org/pipermail/swift-evolution/attachments/20171028/d1ff89d1/attachment.html&gt;

1 Like

Cool, I wasn’t aware of this. It looks like a straight-forward wrapper for the Python C API.

Here are some random questions:

why do you make the conversions from Python types to Swift types non-failable? This init I’d expect to be failable, for example:

I understand that you’re wrapping PyObject* with a class to get ARC behavior, but why make it “open”? What does subclassability mean for your PyObj type?

Why do you print errors when you throw, instead of including the details in the error that gets thrown?

Why include a fixed list of supported types, instead of using a protocol to make it extensible?

What’s this defer doing?

-Chris

···

On Nov 2, 2017, at 1:58 PM, Rocky Wei via swift-evolution <swift-evolution@swift.org> wrote:

Hi Chris and everyone else,

So PerfectlySoft made Perfect-Python months ago to help Swift import python objects and libraries. However, it may reduce the strong type checking. Any idea to avoid it?

GitHub - PerfectlySoft/Perfect-Python: An expressway to import Python 2.7 modules into Server Side Swift

1 Like

If you are here, you probably want to go here: [Discussion] Swift for Data Science / ML / Big Data analytics

Not sure how this split out into its own thread but I wanted to point out this related commit which addresses the comments.