Shrinking the heap object header


(Joe Groff) #1

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Non-pointer isa for 64-bit platforms

Neither x86-64 nor ARM64 populates the full 64 bits of address space—contemporary x86-64 uses only 48 bits (sign-extended, so effectively 47 bits for userspace), and Apple ARM64 platforms use fewer bits, the exact number dependent on OS version. If we were willing to drop the unowned refcount, and say that "64Ki-retains ought to be enough for anyone", overflowing the retain count into the "not refcounted" bit to leak overly-retained objects, we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

There are of course some costs and complications. For classes, we look up vtable entries and resilient ivar offsets through the isa pointer, and masking the isa costs an extra instruction per object, though that can at least be shared for multiple method calls on the same object since we assume objects don't change class (at least not in ways that would change Swift method implementation or ivar layout). We do already pay this cost on Apple platforms for NSObject subclasses. More interestingly, Objective-C already uses non-pointer isa on ARM64, but not on x86-64. I'm not sure how flexible the ObjC implementation is here—Could Swift use non-pointer isas on platforms where ObjC doesn't? Could it ascribe different meanings to the bits from ObjC's?

-Joe


(John McCall) #2

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Unless I’m misunderstanding something, the activity count implementation doesn’t actually change anything about the need to register the existence of the weak/unowned reference with the object.

We could compact the two reference counts into a single 32-bit header; that would help a lot on 32-bit targets. However, doing so would shrink both reference counts to the point that leaking on overflow would become unacceptable; we would definitely need some ability to spill over into a side table.

John.

···

On Jan 15, 2016, at 10:58 AM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

Non-pointer isa for 64-bit platforms

Neither x86-64 nor ARM64 populates the full 64 bits of address space—contemporary x86-64 uses only 48 bits (sign-extended, so effectively 47 bits for userspace), and Apple ARM64 platforms use fewer bits, the exact number dependent on OS version. If we were willing to drop the unowned refcount, and say that "64Ki-retains ought to be enough for anyone", overflowing the retain count into the "not refcounted" bit to leak overly-retained objects, we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

There are of course some costs and complications. For classes, we look up vtable entries and resilient ivar offsets through the isa pointer, and masking the isa costs an extra instruction per object, though that can at least be shared for multiple method calls on the same object since we assume objects don't change class (at least not in ways that would change Swift method implementation or ivar layout). We do already pay this cost on Apple platforms for NSObject subclasses. More interestingly, Objective-C already uses non-pointer isa on ARM64, but not on x86-64. I'm not sure how flexible the ObjC implementation is here—Could Swift use non-pointer isas on platforms where ObjC doesn't? Could it ascribe different meanings to the bits from ObjC's?

-Joe
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(Greg Parker) #3

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Non-pointer isa for 64-bit platforms

Neither x86-64 nor ARM64 populates the full 64 bits of address space—contemporary x86-64 uses only 48 bits (sign-extended, so effectively 47 bits for userspace), and Apple ARM64 platforms use fewer bits, the exact number dependent on OS version. If we were willing to drop the unowned refcount, and say that "64Ki-retains ought to be enough for anyone", overflowing the retain count into the "not refcounted" bit to leak overly-retained objects,

64K retains is not enough for everybody. NSParagraphStyle had a 19-bit inline retain count with no overflow protection (i.e. it incorrectly deallocated if you retained it too much and then called some releases). This occasionally crashed in Xcode (rdar://16008112).

we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

What is the difference between "pinned" and "not refcounted" ? I would expect that you only need one bit to mark objects that are constant or whose refcount has overflowed.

Note that the memory analysis folks really want a few bits reserved with a constant value. That improves their reliability when distinguishing real objects from non-object memory that happens to have an isa-like field in front. ObjC currently gives them 6 bits on all architectures.

There are of course some costs and complications. For classes, we look up vtable entries and resilient ivar offsets through the isa pointer, and masking the isa costs an extra instruction per object, though that can at least be shared for multiple method calls on the same object since we assume objects don't change class (at least not in ways that would change Swift method implementation or ivar layout). We do already pay this cost on Apple platforms for NSObject subclasses. More interestingly, Objective-C already uses non-pointer isa on ARM64, but not on x86-64.

Objective-C now uses non-pointer isa on x86_64 (as of OS X 10.11 iirc).

I'm not sure how flexible the ObjC implementation is here—Could Swift use non-pointer isas on platforms where ObjC doesn't? Could it ascribe different meanings to the bits from ObjC's?

For backwards deployment, no. libobjc currently assumes that every isa field is either a raw class pointer or libobjc's packed representation.

With libobjc's cooperation, maybe. You would almost certainly need everybody to use the same mask value. If the other bits mean different things in Swift vs ObjC then you would also need to use a bit to distinguish the representations.

How does the object header size interact with resilience? If we squeeze into an 8-byte header and then regret it later, will we have any recourse?

···

On Jan 15, 2016, at 10:58 AM, Joe Groff <jgroff@apple.com> wrote:

--
Greg Parker gparker@apple.com Runtime Wrangler


(Joe Groff) #4

I probably misunderstood too. If that's the case, we could still probably allocate more bits to the strong refcount than to the unowned one, since it's probably more acceptable to spill into a side table for weak references more often than for strong ones.

-Joe

···

On Jan 15, 2016, at 11:51 AM, John McCall <rjmccall@apple.com> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Unless I’m misunderstanding something, the activity count implementation doesn’t actually change anything about the need to register the existence of the weak/unowned reference with the object.

We could compact the two reference counts into a single 32-bit header; that would help a lot on 32-bit targets. However, doing so would shrink both reference counts to the point that leaking on overflow would become unacceptable; we would definitely need some ability to spill over into a side table.


(John McCall) #5

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Non-pointer isa for 64-bit platforms

Neither x86-64 nor ARM64 populates the full 64 bits of address space—contemporary x86-64 uses only 48 bits (sign-extended, so effectively 47 bits for userspace), and Apple ARM64 platforms use fewer bits, the exact number dependent on OS version. If we were willing to drop the unowned refcount, and say that "64Ki-retains ought to be enough for anyone", overflowing the retain count into the "not refcounted" bit to leak overly-retained objects,

64K retains is not enough for everybody. NSParagraphStyle had a 19-bit inline retain count with no overflow protection (i.e. it incorrectly deallocated if you retained it too much and then called some releases). This occasionally crashed in Xcode (rdar://16008112 <rdar://16008112>).

we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

What is the difference between "pinned" and "not refcounted" ? I would expect that you only need one bit to mark objects that are constant or whose refcount has overflowed.

“pinned” is the “this object is undergoing mutation, you don’t need to copy it” bit and has nothing to do with refcount overflow. I can understand the confusion, though.

Note that the memory analysis folks really want a few bits reserved with a constant value. That improves their reliability when distinguishing real objects from non-object memory that happens to have an isa-like field in front. ObjC currently gives them 6 bits on all architectures.

There are of course some costs and complications. For classes, we look up vtable entries and resilient ivar offsets through the isa pointer, and masking the isa costs an extra instruction per object, though that can at least be shared for multiple method calls on the same object since we assume objects don't change class (at least not in ways that would change Swift method implementation or ivar layout). We do already pay this cost on Apple platforms for NSObject subclasses. More interestingly, Objective-C already uses non-pointer isa on ARM64, but not on x86-64.

Objective-C now uses non-pointer isa on x86_64 (as of OS X 10.11 iirc).

I'm not sure how flexible the ObjC implementation is here—Could Swift use non-pointer isas on platforms where ObjC doesn't? Could it ascribe different meanings to the bits from ObjC's?

For backwards deployment, no. libobjc currently assumes that every isa field is either a raw class pointer or libobjc's packed representation.

With libobjc's cooperation, maybe. You would almost certainly need everybody to use the same mask value. If the other bits mean different things in Swift vs ObjC then you would also need to use a bit to distinguish the representations.

How does the object header size interact with resilience? If we squeeze into an 8-byte header and then regret it later, will we have any recourse?

If we want to ever be able to do static fixed offsets, that means assuming a header size. We currently use static fixed offsets on native Swift objects all the time.

John.

···

On Jan 15, 2016, at 4:21 PM, Greg Parker via swift-dev <swift-dev@swift.org> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

--
Greg Parker gparker@apple.com <mailto:gparker@apple.com> Runtime Wrangler

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(Joe Groff) #6

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Non-pointer isa for 64-bit platforms

Neither x86-64 nor ARM64 populates the full 64 bits of address space—contemporary x86-64 uses only 48 bits (sign-extended, so effectively 47 bits for userspace), and Apple ARM64 platforms use fewer bits, the exact number dependent on OS version. If we were willing to drop the unowned refcount, and say that "64Ki-retains ought to be enough for anyone", overflowing the retain count into the "not refcounted" bit to leak overly-retained objects,

64K retains is not enough for everybody. NSParagraphStyle had a 19-bit inline retain count with no overflow protection (i.e. it incorrectly deallocated if you retained it too much and then called some releases). This occasionally crashed in Xcode (rdar://16008112 <rdar://16008112>).

we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

What is the difference between "pinned" and "not refcounted" ? I would expect that you only need one bit to mark objects that are constant or whose refcount has overflowed.

"Pinned" is used by copy-on-write buffers to say "mutation is allowed, even if this isn't uniquely referenced", for situations like parallel array slices mutating different parts of the same array.

Note that the memory analysis folks really want a few bits reserved with a constant value. That improves their reliability when distinguishing real objects from non-object memory that happens to have an isa-like field in front. ObjC currently gives them 6 bits on all architectures.

There are of course some costs and complications. For classes, we look up vtable entries and resilient ivar offsets through the isa pointer, and masking the isa costs an extra instruction per object, though that can at least be shared for multiple method calls on the same object since we assume objects don't change class (at least not in ways that would change Swift method implementation or ivar layout). We do already pay this cost on Apple platforms for NSObject subclasses. More interestingly, Objective-C already uses non-pointer isa on ARM64, but not on x86-64.

Objective-C now uses non-pointer isa on x86_64 (as of OS X 10.11 iirc).

I'm not sure how flexible the ObjC implementation is here—Could Swift use non-pointer isas on platforms where ObjC doesn't? Could it ascribe different meanings to the bits from ObjC's?

For backwards deployment, no. libobjc currently assumes that every isa field is either a raw class pointer or libobjc's packed representation.

With libobjc's cooperation, maybe. You would almost certainly need everybody to use the same mask value. If the other bits mean different things in Swift vs ObjC then you would also need to use a bit to distinguish the representations.

How does the object header size interact with resilience? If we squeeze into an 8-byte header and then regret it later, will we have any recourse?

It's unlikely we'd be able to grow the header size after committing to an ABI, since we'll want to be able to hardcode offsets into closure contexts and root classes. If we're not confident we can fit everything we want in 8 bytes, we shouldn't try to.

-Joe

···

On Jan 15, 2016, at 4:21 PM, Greg Parker <gparker@apple.com> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:


(Greg Parker) #7

To clarify: Analysis tools like `heap` and `leaks` need to distinguish real objects from other heap allocations. The false positive rate is tolerable if the isa field is a raw pointer. (As long as some other code like libobjc itself deliberately disguises some of its own data structures that would otherwise look like false positives.)

Non-pointer isa greatly increases the false positive rate. There is too much non-object data that happens to match some real class pointer when only the class bits can be examined. Adding some constant non-zero bits brings it back down again.

···

On Jan 15, 2016, at 4:21 PM, Greg Parker via swift-dev <swift-dev@swift.org> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

Note that the memory analysis folks really want a few bits reserved with a constant value. That improves their reliability when distinguishing real objects from non-object memory that happens to have an isa-like field in front. ObjC currently gives them 6 bits on all architectures.

--
Greg Parker gparker@apple.com Runtime Wrangler


(John McCall) #8

Agreed. I just mean that it probably still shrinks the strong reference count down to a point (~20 bits?) that leaking isn’t obviously acceptable.

John.

···

On Jan 15, 2016, at 1:08 PM, Joe Groff <jgroff@apple.com> wrote:

On Jan 15, 2016, at 11:51 AM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Swift heap object headers are fairly large—16 bytes on 64-bit, and 12 bytes on 32-bit. Into this space we pack:

- the 'isa' pointer for the object, pointing to its heap metadata/class object,
- the strong and unowned reference counts,
- 'pinned' and 'deallocating' flags.

We've also discussed taking a flag bit for 'not refcounted' objects, such as statically-allocated globals and/or stack promotions that need to be ABI compatible with heap objects, and potentially one for thread-local objects, to avoid barriers when refcounting objects we dynamically know are not referenced from multiple threads. We should consider whether we can reduce the header size. Two ideas come to mind:

Dropping the unowned reference count

If we adopt a sufficiently fast implementation for normal weak references, such as the activity count implementation suggested by Kevin and Mike, the unowned reference count might not be worth the expense. If we dropped it, that would be enough to bring the 32-bit object header down to 8 bytes. The tradeoff would be that unowned references become fatter, like weak references would, which might complicate our plans to eventually allow unowned to transparently become unowned(unsafe) in unchecked builds.

Unless I’m misunderstanding something, the activity count implementation doesn’t actually change anything about the need to register the existence of the weak/unowned reference with the object.

We could compact the two reference counts into a single 32-bit header; that would help a lot on 32-bit targets. However, doing so would shrink both reference counts to the point that leaking on overflow would become unacceptable; we would definitely need some ability to spill over into a side table.

I probably misunderstood too. If that's the case, we could still probably allocate more bits to the strong refcount than to the unowned one, since it's probably more acceptable to spill into a side table for weak references more often than for strong ones.


(Joe Groff) #9

The tools team is probably going to hate us as soon as we start using non-zero address points for objects too. We might need a different approach for Swift to reliably recognize our heap object allocations.

-Joe

···

On Jan 15, 2016, at 4:36 PM, Greg Parker <gparker@apple.com> wrote:

On Jan 15, 2016, at 4:21 PM, Greg Parker via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

On Jan 15, 2016, at 10:58 AM, Joe Groff <jgroff@apple.com <mailto:jgroff@apple.com>> wrote:

we could use a layout similar to this to pack the remaining information into 8 bytes:

bits meaning
----- -------
63 not refcounted
47…62 strong refcount
03…46 metadata pointer
02 (reserved)
01 deallocating
00 pinned

Note that the memory analysis folks really want a few bits reserved with a constant value. That improves their reliability when distinguishing real objects from non-object memory that happens to have an isa-like field in front. ObjC currently gives them 6 bits on all architectures.

To clarify: Analysis tools like `heap` and `leaks` need to distinguish real objects from other heap allocations. The false positive rate is tolerable if the isa field is a raw pointer. (As long as some other code like libobjc itself deliberately disguises some of its own data structures that would otherwise look like false positives.)

Non-pointer isa greatly increases the false positive rate. There is too much non-object data that happens to match some real class pointer when only the class bits can be examined. Adding some constant non-zero bits brings it back down again.