When you talk about GPU acceleration, will it just be CUDA (NVIDIA Jetson), or also OpenCL? I'm not a big fan of NVIDIA for making GPGPU synonymous with "NVIDIA-only" pretty much everywhere. It took a while to open up TensorFlow and PyTorch to Apple GPUs, but then it's still restricted to a small subset of platforms for the average consumer. Are you planning to use OpenCL anywhere for kernels? If so, the SwiftOpenCL repository I'm currently planning/prototyping might be insightful.
I have a diagram of how this fits in to hardware-accelerating Swift numerical computing frameworks. OpenCL is a key component of that. I plan to present this to the Swift numerics working group that was recently established, and the "Swift Matrix Library" is a framework we are planning for the Linear Algebra capabilities that S4TF lacks.
@Chris_Lattner3 I share the feeling that TensorFlow's C++ code base is not that flexible, but the work Google put into X10 made it easier to generate graphs from eager-like code. Rather, I see CTensorFlow as problematic because it can't compile for iOS (except for inference-only with TFLite) and PluggableDevice only works when you're going to build the Python bindings. Making a more cooperative backend from scratch allows it to be compiled much faster, and through SwiftPM instead of Bazel. Also, since the new backends are written in Swift, they should be more maintainable and have a lower barrier to entry for anyone wanting to contribute. What are your thoughts on this - does that "lower barrier to entry for contributors" align with the vision you had for Swift when initially creating the language?