Weird dyn_cast code gen / performance problem

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave

Do you have the llvm-ir handy?

···

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

I don’t have the IR handy. You can easily generate it for yourself though. Just drop the following into any file (I use swift/lib/AST/Type.cpp) and recompile swift.

Decl *my_test_function(Type t) {
  return t->getClassOrBoundGenericClass();
}

···

On Jan 1, 2018, at 12:53, Michael Gottesman <mgottesman@apple.com> wrote:

Do you have the llvm-ir handy?

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Michael,

I reduced it down to a simple test case. I was wrong about this requiring two or more dyn_casts. This actually affects any C++ code that uses the “if (auto x = y(z))” convention. What follows is the reduction (compiled with “clang++ -O3 -c” if it matters):

// Uncomment the next line to see the expected code gen (albeit not inlined)
//__attribute__((used,noinline))
int *x(void *arg) {
  return ((long long)arg & 1) ? (int *)arg : nullptr;
}

int test(void *arg) {
  if (auto y = x(arg))
    return *y;
  return 42;
}

It seems like inlining ‘x’ causes the compiler to effectively generate the following pseudo-code:

int test(void *arg) {
  if (arg != nullptr)
    if (arg & 1)
      return *arg;
  return 42;
}

Which is surprising in multiple ways and (as far as I can tell) difficult to workaround without lots of source churn.

Where should I file a bug?

Dave

···

On Jan 1, 2018, at 13:10, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

I don’t have the IR handy. You can easily generate it for yourself though. Just drop the following into any file (I use swift/lib/AST/Type.cpp) and recompile swift.

Decl *my_test_function(Type t) {
  return t->getClassOrBoundGenericClass();
}

On Jan 1, 2018, at 12:53, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

Do you have the llvm-ir handy?

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Michael,

I reduced it down to a simple test case. I was wrong about this requiring two or more dyn_casts. This actually affects any C++ code that uses the “if (auto x = y(z))” convention. What follows is the reduction (compiled with “clang++ -O3 -c” if it matters):

// Uncomment the next line to see the expected code gen (albeit not inlined)
//__attribute__((used,noinline))
int *x(void *arg) {
  return ((long long)arg & 1) ? (int *)arg : nullptr;
}

int test(void *arg) {
  if (auto y = x(arg))
    return *y;
  return 42;
}

It seems like inlining ‘x’ causes the compiler to effectively generate the following pseudo-code:

int test(void *arg) {
  if (arg != nullptr)
    if (arg & 1)
      return *arg;
  return 42;
}

Which is surprising in multiple ways and (as far as I can tell) difficult to workaround without lots of source churn.

Where should I file a bug?

bugs.llvm.org <http://bugs.llvm.org/&gt; would be best. Including both your reduced test case and the fact that it was reduced from dyn_cast patterns should make them sit up and take notice.

John.

···

On Jan 1, 2018, at 5:51 PM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

Dave

On Jan 1, 2018, at 13:10, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I don’t have the IR handy. You can easily generate it for yourself though. Just drop the following into any file (I use swift/lib/AST/Type.cpp) and recompile swift.

Decl *my_test_function(Type t) {
  return t->getClassOrBoundGenericClass();
}

On Jan 1, 2018, at 12:53, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

Do you have the llvm-ir handy?

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

Hi Michael,

I reduced it down to a simple test case. I was wrong about this requiring two or more dyn_casts. This actually affects any C++ code that uses the “if (auto x = y(z))” convention. What follows is the reduction (compiled with “clang++ -O3 -c” if it matters):

// Uncomment the next line to see the expected code gen (albeit not inlined)
//__attribute__((used,noinline))
int *x(void *arg) {
  return ((long long)arg & 1) ? (int *)arg : nullptr;
}

int test(void *arg) {
  if (auto y = x(arg))
    return *y;
  return 42;
}

It seems like inlining ‘x’ causes the compiler to effectively generate the following pseudo-code:

int test(void *arg) {
  if (arg != nullptr)
    if (arg & 1)
      return *arg;
  return 42;
}

Which is surprising in multiple ways and (as far as I can tell) difficult to workaround without lots of source churn.

Where should I file a bug?

bugs.llvm.org <http://bugs.llvm.org/&gt; would be best. Including both your reduced test case and the fact that it was reduced from dyn_cast patterns should make them sit up and take notice.

Thanks John,

In case anybody wants to cherry-pick the fix once it is found:
  35790 – if (auto x = y(z)) generates bad/slow code (found via analysis of dyn_cast)

···

On Jan 1, 2018, at 18:20, John McCall <rjmccall@apple.com> wrote:

On Jan 1, 2018, at 5:51 PM, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

John.

Dave

On Jan 1, 2018, at 13:10, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

I don’t have the IR handy. You can easily generate it for yourself though. Just drop the following into any file (I use swift/lib/AST/Type.cpp) and recompile swift.

Decl *my_test_function(Type t) {
  return t->getClassOrBoundGenericClass();
}

On Jan 1, 2018, at 12:53, Michael Gottesman <mgottesman@apple.com <mailto:mgottesman@apple.com>> wrote:

Do you have the llvm-ir handy?

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org <mailto:swift-dev@swift.org>> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org <mailto:swift-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-dev

I posted a quick comment there + the llvm ir. It seems like something that could be a simple SimplifyInstructions transformation.

Michael

···

On Jan 1, 2018, at 6:37 PM, David Zarzycki <dave@znu.io> wrote:

On Jan 1, 2018, at 18:20, John McCall <rjmccall@apple.com> wrote:

On Jan 1, 2018, at 5:51 PM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

Hi Michael,

I reduced it down to a simple test case. I was wrong about this requiring two or more dyn_casts. This actually affects any C++ code that uses the “if (auto x = y(z))” convention. What follows is the reduction (compiled with “clang++ -O3 -c” if it matters):

// Uncomment the next line to see the expected code gen (albeit not inlined)
//__attribute__((used,noinline))
int *x(void *arg) {
  return ((long long)arg & 1) ? (int *)arg : nullptr;
}

int test(void *arg) {
  if (auto y = x(arg))
    return *y;
  return 42;
}

It seems like inlining ‘x’ causes the compiler to effectively generate the following pseudo-code:

int test(void *arg) {
  if (arg != nullptr)
    if (arg & 1)
      return *arg;
  return 42;
}

Which is surprising in multiple ways and (as far as I can tell) difficult to workaround without lots of source churn.

Where should I file a bug?

bugs.llvm.org would be best. Including both your reduced test case and the fact that it was reduced from dyn_cast patterns should make them sit up and take notice.

Thanks John,

In case anybody wants to cherry-pick the fix once it is found:
  35790 – if (auto x = y(z)) generates bad/slow code (found via analysis of dyn_cast)

John.

Dave

On Jan 1, 2018, at 13:10, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

I don’t have the IR handy. You can easily generate it for yourself though. Just drop the following into any file (I use swift/lib/AST/Type.cpp) and recompile swift.

Decl *my_test_function(Type t) {
  return t->getClassOrBoundGenericClass();
}

On Jan 1, 2018, at 12:53, Michael Gottesman <mgottesman@apple.com> wrote:

Do you have the llvm-ir handy?

On Jan 1, 2018, at 11:30 AM, David Zarzycki via swift-dev <swift-dev@swift.org> wrote:

Hello,

I noticed recently that the code gen of CanType::getClassOrBoundGenericClass() could be better and along the way I found a clang/LLVM bug. Where exactly, I do not know, although my bet is the LLVM optimizer.

When more than one dyn_cast() happens in a row, LLVM/clang emits redundant and pointless nullptr checks. Both Apple clang-900.0.39.2 and clang/llvm top-of-tree generate essentially the same code:

<+35>: movb 0x8(%rbx), %cl ; getKind()
<+38>: testq %rbx, %rbx ; XXX - nullptr check after deref is pointless
<+41>: je 0x1377df6 ; <+54>
<+43>: cmpb $0x12, %cl ; isa<ClassType>()
<+46>: jne 0x1377df6 ; <+54>
<+48>: addq $0x10, %rbx ; (void*)this + offsetof(ClassType, TheDecl)
<+52>: jmp 0x1377e06 ; <+70>
<+54>: xorl %eax, %eax ; the default return value (nullptr)
<+56>: testq %rbx, %rbx ; XXX - another pointless nullptr check?
<+59>: je 0x1377e09 ; <+73>
<+61>: cmpb $0x29, %cl ; isa<BoundGenericClassType>()
<+64>: jne 0x1377e09 ; <+73>
<+66>: addq $0x18, %rbx ; (void*)this + offsetof(BoundGenericClassType, TheDecl)
<+70>: movq (%rbx), %rax ; load the decl pointer
<+73>: popq %rbx
<+74>: retq

I’ve tried adding different “nonnull” spellings in various parts of both Swift and LLVM’s casting machinery, but with no luck. The only thing that seems to work is to create a free function that takes a non-null “const TypeBase *” parameter and then have CanType::getClassOrBoundGenericClass() call that.

FWIW – I *suspect* this is because LLVM’s casting machinery internally converts traditional pointers into C++ references before ultimately calling classof(&Val).

Before I file a bug against clang/llvm, might I be missing something? Can anybody think of a good workaround?

Dave
_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev

_______________________________________________
swift-dev mailing list
swift-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-dev