Changing how nominal type descriptors store type names


(Joe Groff) #1

Nominal type descriptors currently store the mangled name of the type they represent, but I don't think this is necessary or optimal. I don't believe the string is directly used for uniquing, and it's mostly used for reflective purposes. Since we've decided we want to use a human-parsable syntax for runtime type lookup, a more structured representation would likely be easier to match. Furthermore, since every mangled name includes the containing module name, there's a lot of redundancy in the mangled strings for every type. If we instead store only the unqualified type name, and a reference to the module name as a separate string, that's likely to be more space-efficient, since all types can share the same module name string, and most module names are shorter than the four bytes necessary for a relative reference. So instead of:

@"nominal type descriptor for Foo.Bar" = {
  .name = relative reference to "C3Foo3Bar",
}

we'd have:

@"nominal type descriptor for Foo.Bar" = {
  .name = relative reference to "Bar",
  .moduleName = relative reference to "Foo"
}

The one exception is the standard library, which gets a special one-character mangling; we could maybe pack an "is in standard library" bit somewhere so we avoid storing an extra reference in that case. Any objections or concerns?

-Joe


(Dmitri Gribenko) #2

Since `moduleName` can't be 0, you can define 0 to mean "Swift
module". You can also reserve a few low integers, say 0..<1024, for
other special names -- like __C. Maybe others will come up in future.

Dmitri

···

On Fri, Jan 22, 2016 at 6:21 PM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

@"nominal type descriptor for Foo.Bar" = {
  .name = relative reference to "Bar",
  .moduleName = relative reference to "Foo"
}

The one exception is the standard library, which gets a special one-character mangling; we could maybe pack an "is in standard library" bit somewhere so we avoid storing an extra reference in that case. Any objections or concerns?

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(John McCall) #3

Maybe we should just make a module descriptor and make this a relative reference to that. There are a lot of things that naturally make sense to have in a module descriptor, e.g. a list of the public types; even if we don’t want to record that explicitly, it makes sense to have a unique structure representing it.

For now, I guess the easiest thing would be to emit it with shared linkage. In the long run, it’s reasonable to ask the build system to declare some file to be the primary file of the module; for a dylib, the choice would be arbitrary.

John.

···

On Jan 22, 2016, at 6:21 PM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

Nominal type descriptors currently store the mangled name of the type they represent, but I don't think this is necessary or optimal. I don't believe the string is directly used for uniquing, and it's mostly used for reflective purposes. Since we've decided we want to use a human-parsable syntax for runtime type lookup, a more structured representation would likely be easier to match. Furthermore, since every mangled name includes the containing module name, there's a lot of redundancy in the mangled strings for every type. If we instead store only the unqualified type name, and a reference to the module name as a separate string, that's likely to be more space-efficient, since all types can share the same module name string, and most module names are shorter than the four bytes necessary for a relative reference. So instead of:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "C3Foo3Bar",
}

we'd have:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "Bar",
.moduleName = relative reference to "Foo"
}

The one exception is the standard library, which gets a special one-character mangling; we could maybe pack an "is in standard library" bit somewhere so we avoid storing an extra reference in that case. Any objections or concerns?


(Joe Groff) #4

ld64 also doesn't seem to like relative references to undefined symbols, so we'd need to fix that too before getting away from emitting a module descriptor as shared.

-Joe

···

On Jan 25, 2016, at 12:06 PM, John McCall <rjmccall@apple.com> wrote:

On Jan 22, 2016, at 6:21 PM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

Nominal type descriptors currently store the mangled name of the type they represent, but I don't think this is necessary or optimal. I don't believe the string is directly used for uniquing, and it's mostly used for reflective purposes. Since we've decided we want to use a human-parsable syntax for runtime type lookup, a more structured representation would likely be easier to match. Furthermore, since every mangled name includes the containing module name, there's a lot of redundancy in the mangled strings for every type. If we instead store only the unqualified type name, and a reference to the module name as a separate string, that's likely to be more space-efficient, since all types can share the same module name string, and most module names are shorter than the four bytes necessary for a relative reference. So instead of:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "C3Foo3Bar",
}

we'd have:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "Bar",
.moduleName = relative reference to "Foo"
}

The one exception is the standard library, which gets a special one-character mangling; we could maybe pack an "is in standard library" bit somewhere so we avoid storing an extra reference in that case. Any objections or concerns?

Maybe we should just make a module descriptor and make this a relative reference to that. There are a lot of things that naturally make sense to have in a module descriptor, e.g. a list of the public types; even if we don’t want to record that explicitly, it makes sense to have a unique structure representing it.

For now, I guess the easiest thing would be to emit it with shared linkage. In the long run, it’s reasonable to ask the build system to declare some file to be the primary file of the module; for a dylib, the choice would be arbitrary.


(John McCall) #5

Ah, okay.

John.

···

On Jan 25, 2016, at 12:39 PM, Joe Groff <jgroff@apple.com> wrote:

On Jan 25, 2016, at 12:06 PM, John McCall <rjmccall@apple.com> wrote:

On Jan 22, 2016, at 6:21 PM, Joe Groff via swift-dev <swift-dev@swift.org> wrote:

Nominal type descriptors currently store the mangled name of the type they represent, but I don't think this is necessary or optimal. I don't believe the string is directly used for uniquing, and it's mostly used for reflective purposes. Since we've decided we want to use a human-parsable syntax for runtime type lookup, a more structured representation would likely be easier to match. Furthermore, since every mangled name includes the containing module name, there's a lot of redundancy in the mangled strings for every type. If we instead store only the unqualified type name, and a reference to the module name as a separate string, that's likely to be more space-efficient, since all types can share the same module name string, and most module names are shorter than the four bytes necessary for a relative reference. So instead of:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "C3Foo3Bar",
}

we'd have:

@"nominal type descriptor for Foo.Bar" = {
.name = relative reference to "Bar",
.moduleName = relative reference to "Foo"
}

The one exception is the standard library, which gets a special one-character mangling; we could maybe pack an "is in standard library" bit somewhere so we avoid storing an extra reference in that case. Any objections or concerns?

Maybe we should just make a module descriptor and make this a relative reference to that. There are a lot of things that naturally make sense to have in a module descriptor, e.g. a list of the public types; even if we don’t want to record that explicitly, it makes sense to have a unique structure representing it.

For now, I guess the easiest thing would be to emit it with shared linkage. In the long run, it’s reasonable to ask the build system to declare some file to be the primary file of the module; for a dylib, the choice would be arbitrary.

ld64 also doesn't seem to like relative references to undefined symbols, so we'd need to fix that too before getting away from emitting a module descriptor as shared.