Explicit versus inferred ASTContexts?

DaveZ · March 27, 2019, 5:37pm

Hello,

I've noticed that some AST APIs infer the ASTContext from a parameter, but other AST APIs take an explicit ASTContext (even though the ASTContext was inferable from one or more parameters). Why is this? What is the preferred pattern? Realistically, how are multiple ASTContexts supposed to work together given how much ASTContext inference is going on?

I'm also asking because I vaguely remember @Douglas_Gregor commenting that the ASTContext is effectively the compiler's per-thread/per-instance "global" data. Is this more or less accurate and how likely is this to change?

Thanks!
Dave

jrose · March 27, 2019, 5:53pm

Once upon a time, Decl didn't make ASTContext easily accessible; even now, it still requires several pointer-chasing hops to get to one from a Decl. (With Type, interestingly, it's never more than two hops.) At this point it's probably more a micro-optimization to pass the ASTContext around in most cases.

We definitely have no plans to mix ASTContexts. We may some day want to figure out how to make it thread-safe to access one, or at least part of one, but it's unlikely that we'll ever have AST nodes from different ASTContexts interacting in any meaningful way.

Slava_Pestov · March 27, 2019, 5:54pm

I think its about convenience and possibly performance. Getting the ASTContext from a Type or Decl can sometimes require multiple pointer indirections.

This is correct. You'll never mix objects from different ASTContexts; for APIs that take an explicit one there's only going to be one correct choice.

It might make sense to turn ASTContext into a thread-local, for instance. Then you'd never need to pass it around or store it anywhere.

jrose · March 27, 2019, 6:04pm

Ah, we shouldn't do this one because it would force interface-to-compiled-module building to happen on another thread. (It currently happens to, but it shouldn't have to.)

Slava_Pestov · March 27, 2019, 6:30pm

You could still "push" and "pop" your new ASTContext when building the module. (Are you doing this because build flags like the language version are globally per-context, and might be different for each module you build?)

DaveZ · March 27, 2019, 6:30pm

Great. Thanks for confirming what I suspected.

As a tangent, if people are open to including a custom AST allocator that isn't llvm::BumpPtrAllocator, then we could derive the ASTContext pointer from any pointer that is an AST node (either via a bitwise-and, or a bitwise-and and load depending on address-space allocation tradeoffs). This would therefore allow many AST nodes to be smaller, and pointer chasing to find the ASTContext could be eliminated entirely.

jrose · March 27, 2019, 6:32pm

That's clever…but at the same time I'm concerned that it may increase overall memory usage. Right now the ASTContext pointers are stored in places where we'd need a pointer anyway (a Type's canonical type pointer, if it already is canonical, and a DeclContext's parent pointer, if it's a root).

DaveZ · March 27, 2019, 6:39pm

The only way this trick could increase memory usage is if the kernel stops lazily allocating physical pages to back virtual memory mappings. And to use the TypeBase example, I'd switch to something like the "hasClangNode" trick, where the pointer to the canonical type is allocated as an optional preface to TypeBase allocations, and canonical types would lack these preface allocations.

jrose · March 27, 2019, 6:41pm

Maybe I don't fully understand the BumpPtrAllocator replacement idea, but I think you'd be taking one word out of every page to point back to the ASTContext. That's probably not a huge cost, but is still enough to check because we allocate a lot of things.

DaveZ · March 27, 2019, 6:51pm

Not per page. Per power-of-two-aligned mapping/arena. For example, the allocator on 64-bit platforms could allocate (data TLB friendly) 1 GiB mappings that are power-of-two aligned, therefore:

ASTContext *getASTContext() const {
#if ONE_BIG_POWER_OF_TWO_ALIGNED_ARENA_IS_GOOD_ENOUGH
  return reinterpret_cast<ASTContext*>(size_t(this) & -ARENA_SIZE);
#else
  return *reinterpret_cast<ASTContext**>(size_t(this) & -ARENA_SIZE);
#endif
}

jrose · March 27, 2019, 7:03pm

Ah, clever. I'm not sure whether that works on iOS, though, or whether it'll count that as allocated memory. If it doesn't, though, that may be worth it!

DaveZ · March 27, 2019, 7:13pm

It should work on iOS. For better and for worse, software intentionally and accidentally relies on the fact virtual address space is "free" until touched, which forces the kernel to overcommit physical memory.

Andrew_Trick · March 28, 2019, 1:37am

Thread local storage is a fine way to do this, and the cost (of a function call unless you use a reserved key) is either balanced by the savings or at least insignificant relative to the other advantages.

It's tragic that LLVM and Swift compiler internal interface design is dominated by figuring out how to pass context around and get back to it. Whenever I'm working on implementation, the majority of my time is spent on this. Refactoring code and making cross-cutting changes is at least 10x harder because we don't have idiomatic access to thread context. e.g. when I introduced SILFunctionConventions what should have been a mostly mechanical change taking only a few days ending up taking several weeks because I had to rewrite each of a few hundred unfamiliar routines doing their own ad-hoc SILFunction access. Likewise, any time we need to refactor pass logic as utilities, what should be a copy-paste of some code blocks ends up requiring rewriting the code and any referenced function and method declarations (e.g. as instance variables become function arguments and vice-versa).

Back to a compile time... there are plenty of opportunities to reduce the amount of redundant work being done by the compiler, and there is plenty of foolishness when it comes to choices of high level with data structures and algorithms. The real barrier to improving compile time is complexity that obscures understanding and discourages rewriting the code from the top-down where it's really needed. That's why I'm opposed to micro-optimizations that increase complexity.