Question about size of Character type


(Ole Begemann) #1

The Character type is essentially defined like this [1]:

struct Character {
    enum Representation {
        // A _StringBuffer whose first grapheme cluster is self.
        // NOTE: may be more than 1 Character long.
        case large(_StringBuffer._Storage)
        case small(Builtin.Int63)
    }

    var _representation: Representation
}

Note the type of the associated value for `case .small`, `Builtin.Int63`. Presumably, this is so that the type fits into a single word, including the single bit that is needed for the enum case.

However, `MemoryLayout<Character>.size` returns 9. Given the above, I would have expected 8 bytes.

Why? Is this a potential optimization that hasn't been implemented? Or am I missing something?

Thanks
Ole

[1]: https://github.com/apple/swift/blob/master/stdlib/public/core/Character.swift


(Jordan Rose) #2

We have an old Radar about this, rdar://problem/16754935 <rdar://problem/16754935>. It's probably just a case we're missing in enum layout. My guess is that it's because we don't have a whole spare bit in a RawPointer, but we should be able to pick some up either from alignment or from ABI knowledge.

Jordan

···

On Aug 19, 2016, at 13:30, Ole Begemann via swift-dev <swift-dev@swift.org> wrote:

The Character type is essentially defined like this [1]:

struct Character {
   enum Representation {
       // A _StringBuffer whose first grapheme cluster is self.
       // NOTE: may be more than 1 Character long.
       case large(_StringBuffer._Storage)
       case small(Builtin.Int63)
   }

   var _representation: Representation
}

Note the type of the associated value for `case .small`, `Builtin.Int63`. Presumably, this is so that the type fits into a single word, including the single bit that is needed for the enum case.

However, `MemoryLayout<Character>.size` returns 9. Given the above, I would have expected 8 bytes.

Why? Is this a potential optimization that hasn't been implemented? Or am I missing something?

Thanks
Ole

[1]: https://github.com/apple/swift/blob/master/stdlib/public/core/Character.swift
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(Slava Pestov) #3

We have an old Radar about this, rdar://problem/16754935 <rdar://problem/16754935>. It's probably just a case we're missing in enum layout. My guess is that it's because we don't have a whole spare bit in a RawPointer, but we should be able to pick some up either from alignment or from ABI knowledge.

Jordan

Hi Jordan,

I asked about a related issue, which is that RawPointer only has 1 extra inhabitant instead of 4096. You guys said you wanted non-zero integers to round-trip through RawPointer. It seems that declaring the high bits of a RawPointer as spare bits would cause the same problem as allowing more extra inhabitants.

Also I don’t think alignment is the answer here, RawPointer should be able to represent a char *, where you have no low spare bits.

Slava

···

On Aug 19, 2016, at 2:04 PM, Jordan Rose via swift-dev <swift-dev@swift.org> wrote:

On Aug 19, 2016, at 13:30, Ole Begemann via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The Character type is essentially defined like this [1]:

struct Character {
   enum Representation {
       // A _StringBuffer whose first grapheme cluster is self.
       // NOTE: may be more than 1 Character long.
       case large(_StringBuffer._Storage)
       case small(Builtin.Int63)
   }

   var _representation: Representation
}

Note the type of the associated value for `case .small`, `Builtin.Int63`. Presumably, this is so that the type fits into a single word, including the single bit that is needed for the enum case.

However, `MemoryLayout<Character>.size` returns 9. Given the above, I would have expected 8 bytes.

Why? Is this a potential optimization that hasn't been implemented? Or am I missing something?

Thanks
Ole

[1]: https://github.com/apple/swift/blob/master/stdlib/public/core/Character.swift
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev


(Ole Begemann) #4

Thanks Jordan!

···

On 19 Aug 2016, at 23:04, Jordan Rose <jordan_rose@apple.com> wrote:

We have an old Radar about this, rdar://problem/16754935 <rdar://problem/16754935>. It's probably just a case we're missing in enum layout. My guess is that it's because we don't have a whole spare bit in a RawPointer, but we should be able to pick some up either from alignment or from ABI knowledge.

Jordan

On Aug 19, 2016, at 13:30, Ole Begemann via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

The Character type is essentially defined like this [1]:

struct Character {
   enum Representation {
       // A _StringBuffer whose first grapheme cluster is self.
       // NOTE: may be more than 1 Character long.
       case large(_StringBuffer._Storage)
       case small(Builtin.Int63)
   }

   var _representation: Representation
}

Note the type of the associated value for `case .small`, `Builtin.Int63`. Presumably, this is so that the type fits into a single word, including the single bit that is needed for the enum case.

However, `MemoryLayout<Character>.size` returns 9. Given the above, I would have expected 8 bytes.

Why? Is this a potential optimization that hasn't been implemented? Or am I missing something?

Thanks
Ole

[1]: https://github.com/apple/swift/blob/master/stdlib/public/core/Character.swift


(Chris Lattner) #5

On 64-bit systems, you can steal the top bits of pointers for other uses (but the value needs to be sign extended or masked out) since virtual address space is more limited than full 64-bits.

ARM64 even has a hardware feature for this, called “top byte ignored” (TBI) which means you don’t even have to do the masking.

-Chris

···

On Aug 19, 2016, at 6:22 PM, Slava Pestov via swift-dev <swift-dev@swift.org> wrote:

On Aug 19, 2016, at 2:04 PM, Jordan Rose via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

We have an old Radar about this, rdar://problem/16754935 <rdar://problem/16754935>. It's probably just a case we're missing in enum layout. My guess is that it's because we don't have a whole spare bit in a RawPointer, but we should be able to pick some up either from alignment or from ABI knowledge.

Jordan

Hi Jordan,

I asked about a related issue, which is that RawPointer only has 1 extra inhabitant instead of 4096. You guys said you wanted non-zero integers to round-trip through RawPointer. It seems that declaring the high bits of a RawPointer as spare bits would cause the same problem as allowing more extra inhabitants.

Also I don’t think alignment is the answer here, RawPointer should be able to represent a char *, where you have no low spare bits.


(Jordan Rose) #6

Ah, yeah, sorry, I didn't really mean RawPointer here. I do think Builtin.RawPointer should continue to be able to round-trip with Int (except 0) because of the things people do in C. I should have said "known non-tagged object pointer", which has to be a valid address, and which _StringBuffer._Storage certainly should be.

I dug into this a little, and it looks like we've got this nesting:

case large(_StringBuffer._Storage)
typealias _StringBuffer._Storage = _HeapBuffer<_StringBufferIVars, UTF16.CodeUnit>
struct _HeapBuffer<Value, Element> {
  internal var _storage: Builtin.NativeObject?
}

So because _HeapBuffer can be empty, we get into trouble. We don't have a _NonEmptyHeapBuffer, but I suppose we could store a _StringBuffer._Storage.Storage instead.

Jordan

···

On Aug 19, 2016, at 18:22, Slava Pestov <spestov@apple.com> wrote:

On Aug 19, 2016, at 2:04 PM, Jordan Rose via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

We have an old Radar about this, rdar://problem/16754935 <rdar://problem/16754935>. It's probably just a case we're missing in enum layout. My guess is that it's because we don't have a whole spare bit in a RawPointer, but we should be able to pick some up either from alignment or from ABI knowledge.

Jordan

Hi Jordan,

I asked about a related issue, which is that RawPointer only has 1 extra inhabitant instead of 4096. You guys said you wanted non-zero integers to round-trip through RawPointer. It seems that declaring the high bits of a RawPointer as spare bits would cause the same problem as allowing more extra inhabitants.

Also I don’t think alignment is the answer here, RawPointer should be able to represent a char *, where you have no low spare bits.


(Quinn “The Eskimo!”) #7

Urgh, horrible flashbacks to the “32-bit clean” effort during the late 80s:

    Currently the Macintosh OS runs in a 24-bit world, where the hardware ignores
    the high byte of all memory addresses (including pointers and handles).
    -- Technote OV11 “The Joy Of Being 32-Bit Clean”

<https://developer.apple.com/legacy/library/technotes/ov/ov_11.html#//apple_ref/doc/uid/DTS10002609>

(-:

Share and Enjoy

···

On 21 Aug 2016, at 02:25, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

ARM64 even has a hardware feature for this, called “top byte ignored” (TBI) which means you don’t even have to do the masking.

--
Quinn "The Eskimo!" <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware


(gian enrico conti) #8

As been as old as Quinn, I got the exact SAME impression!

:slight_smile:

···

On 22 Aug 2016, at 11:03, Quinn The Eskimo! via swift-dev <swift-dev@swift.org> wrote:

On 21 Aug 2016, at 02:25, Chris Lattner via swift-dev <swift-dev@swift.org> wrote:

ARM64 even has a hardware feature for this, called “top byte ignored” (TBI) which means you don’t even have to do the masking.

Urgh, horrible flashbacks to the “32-bit clean” effort during the late 80s:

   Currently the Macintosh OS runs in a 24-bit world, where the hardware ignores
   the high byte of all memory addresses (including pointers and handles).
   -- Technote OV11 “The Joy Of Being 32-Bit Clean”

<https://developer.apple.com/legacy/library/technotes/ov/ov_11.html#//apple_ref/doc/uid/DTS10002609>

(-:

Share and Enjoy
--
Quinn "The Eskimo!" <http://www.apple.com/developer/>
Apple Developer Relations, Developer Technical Support, Core OS/Hardware

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev