We have on our roadmap to bring the nicely designed S4TF NN APIs back to life. But it hasn't been a high priority since we are focused on graph compute. For the same reason general purpose Matrix and BLAS hasn't been our current focus. Algebraic graph compute is more flexible and generic, but not as optimized for the tensor use cases.
One interesting data set we have is a specialized thermodynamic simulator we've written. We had a fast hand tuned C version. Our Swift version as of 3 years ago was 98% the speed of c fixed arrays, using Swift contiguous array types. Then we optimized it further using Swift SIMD types, at a 3X speed gain over the C version. Recently we tested a standard Swift array version, and it equalled the performance of the hand tuned Swift SIMD version. Compiler improvements have replaced months of custom optimizations. Not that this eliminates the need for fixed array types, but for CPU dispatch the compiler is now doing much of the heavy lifting.