Are custom allocators possible in Swift?

Is it possible to use custom allocation for certain types of objects?

I'm thinking mainly of Region-based memory management, and whether it's possible to use this kind of strategy in Swift in certain situations.

Not with safe code currently. You can use unsafe constructs to do this though. It would be interesting to consider language extensions to do this, it conceptually makes sense to mark a root class as having custom alloc/dealloc hooks, similar in theoretical spirit to C++ operator new implementations. We should obviously not make all the same mistakes C++ did though :-)

3 Likes

Yes, obviously it's unsafe, because you're making strong claims about what happens to objects and these claims cannot be verified by the compiler.

You can use unsafe constructs to do this though.

Could you please elaborate? What kind of unsafe constructs can possibly facilitate this?

I'm thinking mainly of two points:

  • Take a reference outside the ARC system. This is probably achievable using unowned, am I right?

  • Interpret a raw pointer (void* type of thing) as the address of a struct or a class, and run all the initialization code that typically happens during allocation (including setting virtual tables, if any - I'm not sure how Swift implements polymorphism). Is this possible?

I am not an expert on this part of Swift, but I believe that you can use Unmanaged and UnsafePointer to produce explicitly allocated references yourself.

3 Likes

Thanks. I just started reading the docs for those. Seems promising. I'll try to play around with them a little bit.

EDIT:

So it seems there's an entire section in the docs dedicated for manual memory management: Apple Developer Documentation

1 Like

It's not currently possible to instantiate a Swift class instance at an arbitrary location in memory*, but I could see that being a reasonable thing to add to the standard library (under some appropriately scary "unsafe" name). Please file a bug if you're interested in that.

* with the possible exception of going through the ObjC runtime. I haven't tried it, though, and I wouldn't consider that supported for a Swift class that doesn't inherit from NSObject.

Did this ever get added to the standard lib, or is there a hack way to do it for some experiments ?

Doable via obj-c helper, you'll have to declare your type as a NSObject subclass or one of its subclasses. Good enough for experiments?

Want to run on Linux with alot of cores.

Any other ideas ? Happy to jump through c/c++ with symbols to standard library if necessary . Would like to expand our pooled allocator , and this + immortalize let’s us make banks of memory to use, and gives collections of same typed objects more locality of reference.

Can’t see how to do with structs because you get retain release activity calling through protocol inheritance and can’t turn it off like you can using immortalize.

Also - think of our pools as what we used to call frame memory on the quartz team at Apple. The idea is to have a bank of memory for high frequency allocation that’s used during rendering of a frame separate from long lived objects and the stack.

In Quartz we also had it aligned so hardware accelerators are happy.

What you find when you move the locks and atomics out of the way - is that your code has a very very different performance profile , and starts to actually use the l2/l3 caches effectively - which you can measure in instruments - and quickly get to 10x, 20x speedups in many cases.

But you have to see the case without the locks and atomics first.

This works for heavy compute code like quartz, or audio filters, or some other large transformation or search stack that isn’t encumbered by IO.

Even with IO encumbrance, with the right streaming and caching you still get big wins.

So that’s what we are doing - except our pools are currently costing a lot to alloc (normal alloc), and are eternal (immortalize) or some other special case we handle with same sized struct tricks etc.

This gives us a ton more and we can report back in our findings.

I don't think custom allocators for class instances are necessarily a bad idea, but I do think that what you're talking about doesn't need to be a class at all. You're allocating memory that's uniform and that doesn't need reference counting. Why not use UnsafeMutableBufferPointer?

3 Likes

Agreed. Nothing described requires classes, and UMBP seems quite suitable for the task.

Do you need containers (Array / Dictionary, etc), strings, unicode, etc? What's on the input and output of your component?

If your "input" is coming from normal swift data structures / OS and the "output" is going to feed into, say UI - that would be one case. A totally different case would be if your code is more self contained and doesn't need (a frequent) conversion back and forth between the data structures it is using inside and data structures imposed on your component input/output "ends". For example you can implement your own quick and tailored implementations of array and dictionary (if you need those) and they can be very quick, but if you need to convert between those and swift / OS often the overall result would be slow. If you want more insights tell us more about your project.

Our system has only low frequency connections to UI via GRPC - it’s a Siri for enterprise system that runs instantly on any recent mobile device - so it does ALOT of compute in 200ms. (For example when you say “play Madonna”, the word Madonna is in a model of tens of millions of artists- and that’s just one “domain”.

Our agent arch uniquely has a many minds architecture fusing knowledge , grammar, and vocabulary like a human brain - you can Google the words “pinker thinking machine” and look for books by professor pinker of Harvard to get an inkling.(I used to work on Siri)

Like Viv, our agent can handle sentences like “take me to a place with wine that goes well with Pasta with peas”, but without training the language model.

By using GRPC as our API, our clients connect from anywhere on the internet (including locally or in process) and can be almost any language or on almost any platform. And our public api is only what’s available in GRPC- so we don’t have the typical issues around versioning and public classes.

This also lets us use flutter instead of SwiftUI and have the same pretty UI on almost any platform. & Flutters code-compile-run loop is quite a bit faster than swifts so we like that combination. I like SwiftUI - but we want to do our work on apple platforms and deploy anywhere. I have been at Apple more than once and love what we Apple people do for innovation and progress !

Have fast arrays, and reusable pool of instances, and use immortalize, and fine grained concurrency - really just would like alloc instance from our memory.

We were using structs for more of our instances & fast arrays- but using protocol inheritance incurs unavoidable arc traffic that we can’t afford- which we solved with a reusable immortalized pool of class instances, where both the containers and elements are pooled. We don’t use iterators either - more arc traffic. And we are very careful with our use of actors and async - because using async tends to remove a lot of inlining which tends to exacerbate Arc traffic. We’ve found we can use structs only without protocols and only with no mutability (too much ARC traffic otherwise) - and use that in cases when we can.

The thing about pools is they let us do the same cheap alloc as on the stack - but with more control around generations and usage patterns - and without worrying about blowing the stack. And allocating our instances in our memory let’s us get more locality and performance.

We like performance ! Our engine is already very scalable - but we are building a next gen Ai system, where more performance means “smarter”.

The more nodes we walk, the better our thinking. The wider our Viterbi searches, the less aggressive our pruning, the better our robustness in noise.

We like our system ! And we love swift. And we want more performance !

Please send our custom allocator :slight_smile:

What a self promotion and a half... ok, that aside,

Just on the point of clarification - did you check that ARC leads to real performance issues significant for your project (say, the measurable slowdown was 50%) or did you simply postulate that ARC is a show stopper and a no go, even if leads to an insignificant performance degradation? Was that on Intel or ARM?

On your struct + protocol remark: you mean that struct + protocol + protocol inheritance leads to ARC traffic? And you mean nothing fancy like a struct with reference type members? Can you show a short snippet representing this?

I semi pitched the purity grades system some time ago, that would be an ultimate solution for problems like this (the code marked with a certain purity grade would simply not compile if it contains anything impure for that purity grade (realtime unsafe, ARC dependent / etc), or compiler will refrain from using constructs needed to match needed purity grade). Until this implemented self restraining from certain constructs and constant checking resulting ASM seems the only way to go.

I also remember a semi official immortality capability already in the language which might be of help here, will leave this for others to comment.

Yes, use instruments, I helped design Shark, which is what Instruments is based on. Let's remember too, that sample based profilers catch where code is executing at the time of the sample, and you have to go deeper to find answers to these things, like removing the offending entry and reprofiling. You will find that things like locks and atomics are quite a bit more expensive than the profiler says, because they sort of obviate normal processor look aheads, caching, etc. You want compute heavy code to run wide open, taking full advantage of the latest in processor tech, not stall constantly because of high frequency atomics, which is what ARC does to your codebase.

We use immortality, yeah. What we want to do is take complete control of certain kinds of instances, and place them in a bounds checked buffer (checking during DEBUG only) and remove ARC from certain classes of objects. ( Immortalize is our screen so ARC looks away) We then use ARC objects for those things that live beyond our generations, or jump between executors/threads. By having our buffer we can reuse the same memory for many generations, without making those instances permanent, so we can shrink our memory later if we like.

We've been considering building swift from source and flipping the "non-atomic ARC" switch, which solves most of the problem, but we also would rather the community benefit from a performance competitive swift for everyone.

Sounds like you don't have a way to do class instances from arbitrary memory, but would like to chat about things instead ;-). Happy to do that and also actually would like the allocator. Bueller?

We have a tree of instances, and the elements of the tree are members of a type hierarchy. Very class-y actually. The elements are small, but there is a hierarchy. Trees typically have max 10 levels, and max 10-15 elements. But the number of trees generated is > 10K, and each tree is in effect immutable, but other trees are generated from them.

When Ai's do work, they typically generate lattices & trees and things, in very large numbers. Think of these as theories. Last gen systems are written in C++, because you can express these things simply there while controlling your relationship with the memory subsystem, locks, etc., and hence preserve speed.

Just hoping for the same with swift.

What do you guys think about us just building swift from source and flipping the "make ARC non-atomic" switch?

You may emulate inheritance with composition. And btw, even graphs you can do without reference types (there are tradeoffs, so not necessarily your case).

Do you have an answer to my question above about struct + protocol + protocol inheritance == ARC?

I would love to have this option WITHOUT building swift from sources. e.g. via calling some method very early during app startup, or flipping a switch in the app plist or linking to an alternative library.

Sorry to not offer you concrete ideas, hopefully others would.

Im going to move this to a pseudo-pitch, so its more succinct