[proposal]Decouple definition of Int8 from target char type


(William Dillon) #1

Hi All,

(For best viewing, use a fixed-width font with this email)

While porting Swift to ARM/Linux (and while observing other ports, such as SwiftAndroid) several of my collaborators and I have noticed inconsistencies in the handling of char. I’ve taken the liberty of producing a table to summarize these differences that will hopefully illuminate the problem.

char ARM mips ppc ppc64 i386 x86_64
Linux/ELF unsigned[1] unsigned[2] unsigned[3] unsigned[4] signed[5] signed[6]
Mach-O signed[7] N/A signed[7] signed[7] signed[7] signed[7]
Windows —=-=-=-=-=-=-=-=-=-=-=-=-=-=- signed —=-=-=-=-=-=-=-=-=-=-=-=-=-=-[8]

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

This proposal aims to address the problem by defining the CChar type to equal whatever type the target defines as char. Further, Int8 and UInt8 will always be their advertised signedness. The current status quo requires extensive special-casing (or casting) for compilation to succeed, and if it does, the tests require specialization to the underlying type of the target device.

We have come across this issue several times during code review of our patches, and I’ve included a sampling of these to provide some context to the discussion:

https://github.com/apple/swift/pull/1103
https://github.com/apple/swift-corelibs-foundation/pull/265

In these discussions we obviously struggle to adequately solve the issues at hand without introducing the changes proposed here. Furthermore, other than completely disabling the test, there is no other satisfactory way to make the last remaining failure on Linux/ARM (c_layout.sil) pass.

The implementation of this proposal requires changes to the way Swift imports these types. The impact of these changes will be fully evaluated, and an sample implementation will likely be complete, prior to the submission of this proposal for scheduling and consideration. We do anticipate some changes to user code, the standard library, and Foundation may be required.

These changes should happen during a major release. Considering them for Swift 3 will enable us to move forward efficiently while constraining any source incompatibilities to transitions where users expect them. A stated goal of Swift 3 is to improve portability, and these changes are, to us, necessary to consider such a goal a success. Code that works properly on each of these platforms now is already resilient to changes in the implementation of char, and should continue to work. Further, the implementation of this proposal will identify cases where such a problem exists, but for which the symptoms have not yet been identified.

The proposed new mapping of types is as follows:

    C type | Swift type

···

-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
         char | CChar
unsigned char | UInt8
  signed char | Int8

I appreciate any comments, concerns, or questions.
Thanks,
- Will

[1]: http://www.eecs.umich.edu/courses/eecs373/readings/ARM-AAPCS-EABI-v2.08.pdf
[2]: http://math-atlas.sourceforge.net/devel/assembly/mipsabi32.pdf
[3]: https://uclibc.org/docs/psABI-ppc.pdf
[4]: http://refspecs.linuxfoundation.org/ELF/ppc64/PPC-elf64abi.html
[5]: http://www.sco.com/developers/devspecs/abi386-4.pdf
[6]: http://www.x86-64.org/documentation/abi.pdf
[7]: *proof by construction* (is it signed by convention?)
$ cat test.c
char _char(char a) { return a; }
signed char _schar(signed char a) { return a; }
unsigned char _uchar(unsigned char a) { return a; }

$ clang -S -emit-llvm -target <arch>-unknown-{windows,darwin}

and look for “signext” OR “zeroext" in @_char definition

[8]: Windows char is signed by convention.


(Dmitri Gribenko) #2

Hi All,

(For best viewing, use a fixed-width font with this email)

While porting Swift to ARM/Linux (and while observing other ports, such as SwiftAndroid) several of my collaborators and I have noticed inconsistencies in the handling of char. I’ve taken the liberty of producing a table to summarize these differences that will hopefully illuminate the problem.

char ARM mips ppc ppc64 i386 x86_64
Linux/ELF unsigned[1] unsigned[2] unsigned[3] unsigned[4] signed[5] signed[6]
Mach-O signed[7] N/A signed[7] signed[7] signed[7] signed[7]
Windows —=-=-=-=-=-=-=-=-=-=-=-=-=-=- signed —=-=-=-=-=-=-=-=-=-=-=-=-=-=-[8]

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

What I would expect to happen is that on Linux arm the Clang importer
would map 'char' to UInt8, instead of mapping it to Int8 like it does
on x86_64.

    C type | Swift type
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
         char | CChar
unsigned char | UInt8
  signed char | Int8

This brings in the notion of the CChar type, and requires us to define
(hopefully!) some rules for type-based aliasing, since you want to be
able to freely cast UnsafePointer<CChar> to UnsafePointer<UInt8> or
UnsafePointer<Int8>.

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

Dmitri

···

On Thu, Feb 25, 2016 at 7:47 PM, William Dillon via swift-evolution <swift-evolution@swift.org> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(William Dillon) #3

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

I think the best way to demonstrate this is through an example. Here is a sample swift program:

import Foundation
print(NSNumber(char: Int8.min).shortValue)

Compile and run this on Darwin and you get what you would expect:

Falcon:~ wdillon$ ./example; uname -a
-128
Darwin Falcon.local 15.3.0 Darwin Kernel Version 15.3.0: Thu Dec 10 18:40:58 PST 2015; root:xnu-3248.30.4~1/RELEASE_X86_64 x86_64

On Linux/ARM you’ll get something entirely unexpected:

wdillon@tegra-ubuntu:~$ ./example; uname -a
128
Linux tegra-ubuntu 3.10.40-gdacac96 #1 SMP PREEMPT Thu Jun 25 15:25:11 PDT 2015 armv7l armv7l armv7l GNU/Linux

What I would expect to happen is that on Linux arm the Clang importer
would map 'char' to UInt8, instead of mapping it to Int8 like it does
on x86_64.

That would approach a satisfactory solution, except that it would lead to frustration in the long term, and ultimately an expansion of the number of special cases. Any API that relies upon the definition of char would be bifurcated. The user would have to either bracket with #if blocks (and know what platform specifies what), or explicitly cast to a consistent type at every entry point char is used. And, when providing values to C, the reverse is true. The user would have to know what platforms do what, and explicitly cast their internally-used type into the correct type for char first.

By using CChar, the user isn’t required to maintain this knowledge and list of platforms in countless locations in their code. All they would have to do is cast from CChar to whatever type they want to use within Swift. When going the other way, to get a value from Swift to C, they just cast to CChar and the correct action is taken. In cases where the Swift type is the same as CChar the cast is basically a no-op.

Another benefit is that the process brings awareness to the fact that char is not defined consistently across platforms. I believe that it is worth while for people to understand the implications of the code they write, and the cast from CChar to something else provides an opportunity for a moment of reflection and the asking of the question “what am I doing here, and what do I want.”

   C type | Swift type
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
        char | CChar
unsigned char | UInt8
signed char | Int8

This brings in the notion of the CChar type, and requires us to define
(hopefully!) some rules for type-based aliasing, since you want to be
able to freely cast UnsafePointer<CChar> to UnsafePointer<UInt8> or
UnsafePointer<Int8>.

Swift already has a CChar. It’s defined in https://github.com/apple/swift/blob/master/stdlib/public/core/CTypes.swift#L19 In the usage in CTypes.swift, the fact that Int8 has this dual meaning is relied upon. I agree that the ability to cast between UnsafePointers specialized to each type is desirable.

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

In a very real sense this is exactly what is happening currently.

Thanks for sharing your thoughts,
- Will


(Dmitri Gribenko) #4

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

I think the best way to demonstrate this is through an example. Here is a sample swift program:

import Foundation
print(NSNumber(char: Int8.min).shortValue)

There is a lot happening in this snippet of code (including importing
two completely different implementations of Foundation, and the pure
swift one not being affected by Clang importer at all). Could you
provide AST dumps for both platforms for this code?

What I would expect to happen is that on Linux arm the Clang importer
would map 'char' to UInt8, instead of mapping it to Int8 like it does
on x86_64.

That would approach a satisfactory solution, except that it would lead to frustration in the long term, and ultimately an expansion of the number of special cases. Any API that relies upon the definition of char would be bifurcated. The user would have to either bracket with #if blocks (and know what platform specifies what), or explicitly cast to a consistent type at every entry point char is used. And, when providing values to C, the reverse is true. The user would have to know what platforms do what, and explicitly cast their internally-used type into the correct type for char first.

By using CChar, the user isn’t required to maintain this knowledge and list of platforms in countless locations in their code. All they would have to do is cast from CChar to whatever type they want to use within Swift. When going the other way, to get a value from Swift to C, they just cast to CChar and the correct action is taken. In cases where the Swift type is the same as CChar the cast is basically a no-op.

Another benefit is that the process brings awareness to the fact that char is not defined consistently across platforms. I believe that it is worth while for people to understand the implications of the code they write, and the cast from CChar to something else provides an opportunity for a moment of reflection and the asking of the question “what am I doing here, and what do I want.”

I agree it makes sense to do what you are proposing, if we agree on
the premise that having 'char' preserve the platform's C idea of
signedness is a good thing. This is what we need to figure out,
whether we want to preserve that difference or erase it completely.
What I'm seeing now is that we are failing to erase the difference.

   C type | Swift type
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
        char | CChar
unsigned char | UInt8
signed char | Int8

This brings in the notion of the CChar type, and requires us to define
(hopefully!) some rules for type-based aliasing, since you want to be
able to freely cast UnsafePointer<CChar> to UnsafePointer<UInt8> or
UnsafePointer<Int8>.

Swift already has a CChar.

I agree, but it is not set in stone. We can change it, or remove it,
if it would make sense.

It’s defined in https://github.com/apple/swift/blob/master/stdlib/public/core/CTypes.swift#L19 In the usage in CTypes.swift, the fact that Int8 has this dual meaning is relied upon. I agree that the ability to cast between UnsafePointers specialized to each type is desirable.

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

In a very real sense this is exactly what is happening currently.

Sorry, I don't see that yet -- it is still unclear to me what is happening.

Dmitri

···

On Thu, Feb 25, 2016 at 9:58 PM, William Dillon <william@housedillon.com> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(William Dillon) #5

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

I think the best way to demonstrate this is through an example. Here is a sample swift program:

import Foundation
print(NSNumber(char: Int8.min).shortValue)

There is a lot happening in this snippet of code (including importing
two completely different implementations of Foundation, and the pure
swift one not being affected by Clang importer at all). Could you
provide AST dumps for both platforms for this code?

Of course. Here’s the AST on ARM:

wdillon@tegra-ubuntu:~$ swiftc -dump-ast example.swift
(source_file
  (import_decl 'Foundation')
  (top_level_code_decl
    (brace_stmt
      (call_expr type='()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:42] nothrow
        (declref_expr type='(Any..., separator: String, terminator: String) -> ()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:1] decl=Swift.(file).print(_:separator:terminator:) specialized=no)
        (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42] sourceIsScalar elements=[-2, -1, -1] variadic_sources=[0]
          (paren_expr type='Any' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42]
            (erasure_expr implicit type='Any' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32]
              (member_ref_expr type='Int16' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32] decl=Foundation.(file).NSNumber.shortValue
                (call_expr type='NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:30] nothrow
                  (constructor_ref_call_expr type='(char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] nothrow
                    (declref_expr implicit type='NSNumber.Type -> (char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] decl=Foundation.(file).NSNumber.init(char:) specialized=no)
                    (type_expr type='NSNumber.Type' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] typerepr='NSNumber'))
                  (tuple_expr type='(char: Int8)' location=example.swift:2:15 range=[example.swift:2:15 - line:2:30] names=char
                    (member_ref_expr type='Int8' location=example.swift:2:27 range=[example.swift:2:22 - line:2:27] decl=Swift.(file).Int8.min
                      (type_expr type='Int8.Type' location=example.swift:2:22 range=[example.swift:2:22 - line:2:22] typerepr='Int8'))))))))))))

And Darwin:

Falcon:~ wdillon$ xcrun -sdk macosx swiftc -dump-ast example.swift
(source_file
  (import_decl 'Foundation')
  (top_level_code_decl
    (brace_stmt
      (call_expr type='()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:42] nothrow
        (declref_expr type='(Any..., separator: String, terminator: String) -> ()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:1] decl=Swift.(file).print(_:separator:terminator:) specialized=no)
        (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42] sourceIsScalar elements=[-2, -1, -1] variadic_sources=[0]
          (paren_expr type='Any' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42]
            (erasure_expr implicit type='Any' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32]
              (member_ref_expr type='Int16' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32] decl=Foundation.(file).NSNumber.shortValue
                (call_expr type='NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:30] nothrow
                  (constructor_ref_call_expr type='(char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] nothrow
                    (declref_expr implicit type='NSNumber.Type -> (char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] decl=Foundation.(file).NSNumber.init(char:) specialized=no)
                    (type_expr type='NSNumber.Type' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] typerepr='NSNumber'))
                  (tuple_expr type='(char: Int8)' location=example.swift:2:15 range=[example.swift:2:15 - line:2:30] names=char
                    (member_ref_expr type='Int8' location=example.swift:2:27 range=[example.swift:2:22 - line:2:27] decl=Swift.(file).Int8.min
                      (type_expr type='Int8.Type' location=example.swift:2:22 range=[example.swift:2:22 - line:2:22] typerepr='Int8')))))))))))

I want to point out that these are identical, as far as I can tell. This makes sense because the behavior is set within the standard library. This also implies that it is not possible to change the behavior while compiling user code. As another exercise, you can tell clang to use signed or unsigned chars and there will be no change:

wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -funsigned-char
wdillon@tegra-ubuntu:~$ ./example
128
wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -fsigned-char
wdillon@tegra-ubuntu:~$ ./example
128

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

In a very real sense this is exactly what is happening currently.

Sorry, I don't see that yet -- it is still unclear to me what is happening.

That’s ok. We’ll keep working on it until I’ve proven to everyone’s satisfaction that there really is a problem.

Cheers,
- Will

···

On Feb 25, 2016, at 11:13 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Thu, Feb 25, 2016 at 9:58 PM, William Dillon <william@housedillon.com> wrote:


(Dmitri Gribenko) #6

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

I think the best way to demonstrate this is through an example. Here is a sample swift program:

import Foundation
print(NSNumber(char: Int8.min).shortValue)

There is a lot happening in this snippet of code (including importing
two completely different implementations of Foundation, and the pure
swift one not being affected by Clang importer at all). Could you
provide AST dumps for both platforms for this code?

Of course. Here’s the AST on ARM:

wdillon@tegra-ubuntu:~$ swiftc -dump-ast example.swift
(source_file
  (import_decl 'Foundation')
  (top_level_code_decl
    (brace_stmt
      (call_expr type='()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:42] nothrow
        (declref_expr type='(Any..., separator: String, terminator: String) -> ()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:1] decl=Swift.(file).print(_:separator:terminator:) specialized=no)
        (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42] sourceIsScalar elements=[-2, -1, -1] variadic_sources=[0]
          (paren_expr type='Any' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42]
            (erasure_expr implicit type='Any' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32]
              (member_ref_expr type='Int16' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32] decl=Foundation.(file).NSNumber.shortValue
                (call_expr type='NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:30] nothrow
                  (constructor_ref_call_expr type='(char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] nothrow
                    (declref_expr implicit type='NSNumber.Type -> (char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] decl=Foundation.(file).NSNumber.init(char:) specialized=no)
                    (type_expr type='NSNumber.Type' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] typerepr='NSNumber'))
                  (tuple_expr type='(char: Int8)' location=example.swift:2:15 range=[example.swift:2:15 - line:2:30] names=char
                    (member_ref_expr type='Int8' location=example.swift:2:27 range=[example.swift:2:22 - line:2:27] decl=Swift.(file).Int8.min
                      (type_expr type='Int8.Type' location=example.swift:2:22 range=[example.swift:2:22 - line:2:22] typerepr='Int8'))))))))))))

And Darwin:

Falcon:~ wdillon$ xcrun -sdk macosx swiftc -dump-ast example.swift
(source_file
  (import_decl 'Foundation')
  (top_level_code_decl
    (brace_stmt
      (call_expr type='()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:42] nothrow
        (declref_expr type='(Any..., separator: String, terminator: String) -> ()' location=example.swift:2:1 range=[example.swift:2:1 - line:2:1] decl=Swift.(file).print(_:separator:terminator:) specialized=no)
        (tuple_shuffle_expr implicit type='(Any..., separator: String, terminator: String)' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42] sourceIsScalar elements=[-2, -1, -1] variadic_sources=[0]
          (paren_expr type='Any' location=example.swift:2:32 range=[example.swift:2:6 - line:2:42]
            (erasure_expr implicit type='Any' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32]
              (member_ref_expr type='Int16' location=example.swift:2:32 range=[example.swift:2:7 - line:2:32] decl=Foundation.(file).NSNumber.shortValue
                (call_expr type='NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:30] nothrow
                  (constructor_ref_call_expr type='(char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] nothrow
                    (declref_expr implicit type='NSNumber.Type -> (char: Int8) -> NSNumber' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] decl=Foundation.(file).NSNumber.init(char:) specialized=no)
                    (type_expr type='NSNumber.Type' location=example.swift:2:7 range=[example.swift:2:7 - line:2:7] typerepr='NSNumber'))
                  (tuple_expr type='(char: Int8)' location=example.swift:2:15 range=[example.swift:2:15 - line:2:30] names=char
                    (member_ref_expr type='Int8' location=example.swift:2:27 range=[example.swift:2:22 - line:2:27] decl=Swift.(file).Int8.min
                      (type_expr type='Int8.Type' location=example.swift:2:22 range=[example.swift:2:22 - line:2:22] typerepr='Int8')))))))))))

I want to point out that these are identical, as far as I can tell.

I agree. Then, the difference in behavior should be contained in the
NSNumber implementation. As far as this piece of code is concerned,
it correctly passes the value as Int8. Could you debug what's
happening in the corelibs Foundation, to find out why it is not
printing a negative number?

As another exercise, you can tell clang to use signed or unsigned chars and there will be no change:

wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -funsigned-char
wdillon@tegra-ubuntu:~$ ./example
128
wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -fsigned-char
wdillon@tegra-ubuntu:~$ ./example
128

And it makes sense, since the program you provided does not compile
any C code. It is pure-swift (though it calls into C via corelibs
Foundation).

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

In a very real sense this is exactly what is happening currently.

Sorry, I don't see that yet -- it is still unclear to me what is happening.

That’s ok. We’ll keep working on it until I’ve proven to everyone’s satisfaction that there really is a problem.

Given what you showed with corelibs Foundation, I agree there's a
problem. I'm just trying to understand how much of that behavior was
intended, if there are any bugs in the compiler (in implementing our
intended behavior), if there are any bugs in Foundation, and what
would the behavior be if we fixed those bugs. When we have that, we
can analyze our model (as-if it was implemented as intended) and make
a judgement whether it works, and whether is a good one.

For example, if it turns out that the issue above is due to a bug in
the C parts of CoreFoundation that assumes signed char on arm (because
of iOS, say), then there's nothing that a language change in Swift
could do.

Dmitri

···

On Fri, Feb 26, 2016 at 9:01 AM, William Dillon <william@housedillon.com> wrote:

On Feb 25, 2016, at 11:13 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Thu, Feb 25, 2016 at 9:58 PM, William Dillon <william@housedillon.com> wrote:

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(William Dillon) #7

Swift currently maps the Int8 type to be equal to the char type of the target platform. On targets where char is unsigned by default, Int8 becomes an unsigned 8-bit integer, which is a clear violation of the Principle of Least Astonishment. Furthermore, it is impossible to specify a signed 8-bit integer type on platforms with unsigned chars.

I'm probably misunderstanding you, but are you sure that's what is
happening? I can't imagine how the standard library would just
silently make Int8 unsigned on Linux arm.

I think the best way to demonstrate this is through an example. Here is a sample swift program:

import Foundation
print(NSNumber(char: Int8.min).shortValue)

There is a lot happening in this snippet of code (including importing
two completely different implementations of Foundation, and the pure
swift one not being affected by Clang importer at all). Could you
provide AST dumps for both platforms for this code?

Of course. Here’s the AST on ARM:

wdillon@tegra-ubuntu:~$ swiftc -dump-ast example.swift
(source_file
...

And Darwin:

Falcon:~ wdillon$ xcrun -sdk macosx swiftc -dump-ast example.swift
(source_file
...

I want to point out that these are identical, as far as I can tell.

I agree. Then, the difference in behavior should be contained in the
NSNumber implementation. As far as this piece of code is concerned,
it correctly passes the value as Int8. Could you debug what's
happening in the corelibs Foundation, to find out why it is not
printing a negative number?

I want to be clear that this isn’t a problem specific to NSNumber. I chose that example because I wanted something that was trivial to check on your own, and limited to Swift project code, that demonstrates the issue. This behavior will occur in any case where a char is imported into swift from C. Fixing NSNumber will address the issue in only that one place. Even if all of of stdlib and CoreFoundation were modified to hide this problem, any user code that interfaces with C will have issues, and require fixes of their own.

I don’t think it’s reasonable to expect that the issue be known and addressed in literally thousands of places where chars from C APIs are present, especially as the issue is hidden from view by the nature of mapping char into Int8. An implementor would have to know that a given API returns char, that it’ll be imported as Int8, and that it might be an Int8 that was intended to be unsigned, then do the right thing.

In contrast, if C char is imported as CChar, it’s very clear what’s happening, and leads the user toward a course of action that is more likely to be appropriate.

I’ve create a github project that demonstrates this problem without using Foundation or CoreFoundation at all. This code creates a small C-based object that has three functions that return a char; one returns -1, one 1 and the last 255.

On signed-char platforms:
From Swift: Type: Int8
From Swift: Negative value: -1, positive value: 1, big positive value: -1
From clang: Negative value: -1, positive value: 1, big positive value: -1

On unsigned-char platforms:
From Swift: Type: Int8
From Swift: Negative value: -1, positive value: 1, big positive value: -1
From clang: Negative value: 255, positive value: 1, big positive value: 255

Code: https://github.com/hpux735/badCharExample.git

It’s clear that Swift is interpreting the bit pattern of the input value as a signed 8-bit integer regardless of how it’s defined in the target platform.

As another exercise, you can tell clang to use signed or unsigned chars and there will be no change:

wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -funsigned-char
wdillon@tegra-ubuntu:~$ ./example
128
wdillon@tegra-ubuntu:~$ swiftc example.swift -Xcc -fsigned-char
wdillon@tegra-ubuntu:~$ ./example
128

And it makes sense, since the program you provided does not compile
any C code. It is pure-swift (though it calls into C via corelibs
Foundation).

Yep, that’s right.

What about a proposal where we would always map 'char' to Int8,
regardless of the C's idea of signedness?

In a very real sense this is exactly what is happening currently.

Sorry, I don't see that yet -- it is still unclear to me what is happening.

That’s ok. We’ll keep working on it until I’ve proven to everyone’s satisfaction that there really is a problem.

Given what you showed with corelibs Foundation, I agree there's a
problem. I'm just trying to understand how much of that behavior was
intended, if there are any bugs in the compiler (in implementing our
intended behavior), if there are any bugs in Foundation, and what
would the behavior be if we fixed those bugs. When we have that, we
can analyze our model (as-if it was implemented as intended) and make
a judgement whether it works, and whether is a good one.

I believe that, based on the comments in CTypes.swift

/// This will be the same as either `CSignedChar` (in the common
/// case) or `CUnsignedChar`, depending on the platform.
public typealias CChar = Int8

that the dual-meaning of Int8 is expected and intended, otherwise the author of this comment and code (Ted and Jordan respectively) don’t understand the intended behavior, and I find that hard to believe.

For example, if it turns out that the issue above is due to a bug in
the C parts of CoreFoundation that assumes signed char on arm (because
of iOS, say), then there's nothing that a language change in Swift
could do.

Hopefully I’ve been able to demonstrate that CoreFoundation is not a party to this issue, per se. Really, any time char gets imported into swift there is the possibility of unintended (and potentially very frustrating to diagnose) behavior.

Cheers,
- Will

···

On Feb 26, 2016, at 9:09 AM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Fri, Feb 26, 2016 at 9:01 AM, William Dillon <william@housedillon.com> wrote:

On Feb 25, 2016, at 11:13 PM, Dmitri Gribenko <gribozavr@gmail.com> wrote:
On Thu, Feb 25, 2016 at 9:58 PM, William Dillon <william@housedillon.com> wrote:


(Dmitri Gribenko) #8

No, it isn't intended. This typealias should be conditionally defined
as either Int8 or UInt8, depending on the platform. That's what the
comment says, but it is not implemented in code, because we didn't
have a port of Swift to a platform where that code was incorrect.

Dmitri

···

On Fri, Feb 26, 2016 at 2:31 PM, William Dillon <william@housedillon.com> wrote:

I believe that, based on the comments in CTypes.swift

/// This will be the same as either `CSignedChar` (in the common
/// case) or `CUnsignedChar`, depending on the platform.
public typealias CChar = Int8

that the dual-meaning of Int8 is expected and intended, otherwise the author of this comment and code (Ted and Jordan respectively) don’t understand the intended behavior, and I find that hard to believe.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/


(William Dillon) #9

that the dual-meaning of Int8 is expected and intended, otherwise the author of this comment and code (Ted and Jordan respectively) don’t understand the intended behavior, and I find that hard to believe.

No, it isn't intended. This typealias should be conditionally defined
as either Int8 or UInt8, depending on the platform. That's what the
comment says, but it is not implemented in code, because we didn't
have a port of Swift to a platform where that code was incorrect.

Ok, if that’s the case, I’ll write up the draft to the proposal. I think we agree on 80% of the concepts at this point.

- Will


(Dmitri Gribenko) #10

I'm glad that we reached understanding :slight_smile: I think fixing the bug and
porting the code to the new API would be a good test for this change
(both from the implementation standpoint, and API-wise).

But there's still a question of what is a better user model. In the
other thread, Joe noted that we had the same issue with CGFloat. It
used to be a typealias to Double on some platforms, and Float on other
platforms. We found that in practice, it was hard for developers to
write cross-platform code. Developers followed the guidance from the
type checker, and made the code compile for their primary platform,
relying on CGFloat to be identical to Double or Float. The code
typically didn't compile for other platforms.

It would be great if we could try fixing the bug with the CChar
typealias (to make its definition platform-dependent), and see how bad
the problem with cross-platform code is in practice. Maybe CChar is
not like CGFloat at all.

Another possible model that I would like to suggest is to make 'char'
always import as Int8 or UInt8, consistently across all platforms.
The rationale is that if the C code is relying on numeric values
outside of the portable 0..<128 range, and is requiring the API
clients to perform numeric operations on those values (that is, the
numeric values are important, not just the bit pattern), then the
corresponding Swift code won't be portable anyway, and introducing
CChar won't fix anything (since CChar won't be able to provide
arithmetic operations).

Dmitri

···

On Mon, Feb 29, 2016 at 10:43 AM, William Dillon <william@housedillon.com> wrote:

that the dual-meaning of Int8 is expected and intended, otherwise the
author of this comment and code (Ted and Jordan respectively) don’t
understand the intended behavior, and I find that hard to believe.

No, it isn't intended. This typealias should be conditionally defined
as either Int8 or UInt8, depending on the platform. That's what the
comment says, but it is not implemented in code, because we didn't
have a port of Swift to a platform where that code was incorrect.

Ok, if that’s the case, I’ll write up the draft to the proposal. I think we
agree on 80% of the concepts at this point.

--
main(i,j){for(i=2;;i++){for(j=2;j<i;j++){if(!(i%j)){j=0;break;}}if
(j){printf("%d\n",i);}}} /*Dmitri Gribenko <gribozavr@gmail.com>*/