Foundation: NSString init detect encoding


(Andy Best) #1

Hey,

I've been looking at the init(contentsOfFile, usedEncoding) initializer for
NSString in corelibs-foundation.

Am I right in thinking that this method should use some method to attempt
to detect the character encoding of the file before returning a decoded
String?

If so, I've been working on a pure Swift library to detect string
encodings, and wondered if continued work on it might be useful for
implementing this missing method?

Andy


(Tony Parker) #2

Hi Andy,

Hey,

I've been looking at the init(contentsOfFile, usedEncoding) initializer for NSString in corelibs-foundation.

Am I right in thinking that this method should use some method to attempt to detect the character encoding of the file before returning a decoded String?

In this case, the Foundation implementation just looks at an extended attribute of the file to see if it contains the encoding. If it doesn’t have the xattr then we don’t attempt to guess (name of xattr is “com.apple.TextEncoding”).

Foundation has another API which attempts to guess the encoding of a data blob, but I think we left it out of the swift-corelibs stubs:

+ (NSStringEncoding)stringEncodingForData:(NSData *)data
                          encodingOptions:(nullable NSDictionary<NSStringEncodingDetectionOptionsKey, id> *)opts
                          convertedString:(NSString * _Nullable * _Nullable)string
                      usedLossyConversion:(nullable BOOL *)usedLossyConversion API_AVAILABLE(macos(10.10), ios(8.0), watchos(2.0), tvos(9.0));

- Tony

···

On Jun 21, 2017, at 7:39 AM, Andy Best via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:

If so, I've been working on a pure Swift library to detect string encodings, and wondered if continued work on it might be useful for implementing this missing method?

Andy
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev


(Andy Best) #3

Is the preferred approach to mirror Foundation as closely as possible (e.g.
under Linux basically do nothing), or is implementing something like
stringEncodingForData under the hood preferable in this case?

···

On 21 June 2017 at 17:43, Tony Parker <anthony.parker@apple.com> wrote:

Hi Andy,

On Jun 21, 2017, at 7:39 AM, Andy Best via swift-corelibs-dev < > swift-corelibs-dev@swift.org> wrote:

Hey,

I've been looking at the init(contentsOfFile, usedEncoding) initializer
for NSString in corelibs-foundation.

Am I right in thinking that this method should use some method to attempt
to detect the character encoding of the file before returning a decoded
String?

In this case, the Foundation implementation just looks at an extended
attribute of the file to see if it contains the encoding. If it doesn’t
have the xattr then we don’t attempt to guess (name of xattr is
“com.apple.TextEncoding”).

Foundation has another API which attempts to guess the encoding of a data
blob, but I think we left it out of the swift-corelibs stubs:

+ (NSStringEncoding)stringEncodingForData:(NSData *)data
                          encodingOptions:(nullable NSDictionary<
NSStringEncodingDetectionOptionsKey, id> *)opts
                          convertedString:(NSString * _Nullable *
_Nullable)string
                      usedLossyConversion:(nullable BOOL
*)usedLossyConversion API_AVAILABLE(macos(10.10), ios(8.0), watchos(2.0),
tvos(9.0));

- Tony

If so, I've been working on a pure Swift library to detect string
encodings, and wondered if continued work on it might be useful for
implementing this missing method?

Andy
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev


(Tony Parker) #4

Our preferred approach so far is to mirror Foundation as closely as possible.

I don’t know if we want to implement stringEncodingForData as part of swift-corelibs-foundation. In any case, we are trying to avoid bringing in as few dependencies outside of the Swift project itself as possible, to keep Foundation as low level as possible for stability, ease of use, and ease of portability.

- Tony

···

On Jun 21, 2017, at 9:51 AM, Andy Best <andybest.net@gmail.com> wrote:

Is the preferred approach to mirror Foundation as closely as possible (e.g. under Linux basically do nothing), or is implementing something like stringEncodingForData under the hood preferable in this case?

On 21 June 2017 at 17:43, Tony Parker <anthony.parker@apple.com <mailto:anthony.parker@apple.com>> wrote:
Hi Andy,

On Jun 21, 2017, at 7:39 AM, Andy Best via swift-corelibs-dev <swift-corelibs-dev@swift.org <mailto:swift-corelibs-dev@swift.org>> wrote:

Hey,

I've been looking at the init(contentsOfFile, usedEncoding) initializer for NSString in corelibs-foundation.

Am I right in thinking that this method should use some method to attempt to detect the character encoding of the file before returning a decoded String?

In this case, the Foundation implementation just looks at an extended attribute of the file to see if it contains the encoding. If it doesn’t have the xattr then we don’t attempt to guess (name of xattr is “com.apple.TextEncoding”).

Foundation has another API which attempts to guess the encoding of a data blob, but I think we left it out of the swift-corelibs stubs:

+ (NSStringEncoding)stringEncodingForData:(NSData *)data
                          encodingOptions:(nullable NSDictionary<NSStringEncodingDetectionOptionsKey, id> *)opts
                          convertedString:(NSString * _Nullable * _Nullable)string
                      usedLossyConversion:(nullable BOOL *)usedLossyConversion API_AVAILABLE(macos(10.10), ios(8.0), watchos(2.0), tvos(9.0));

- Tony

If so, I've been working on a pure Swift library to detect string encodings, and wondered if continued work on it might be useful for implementing this missing method?

Andy
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org <mailto:swift-corelibs-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev


(Tony Parker) #5

Someone on the team here just reminded me that we do have a very basic form of encoding detection here as well: just looking for the BOM at the beginning of the data.

- Tony

···

On Jun 21, 2017, at 9:55 AM, Tony Parker via swift-corelibs-dev <swift-corelibs-dev@swift.org> wrote:

Our preferred approach so far is to mirror Foundation as closely as possible.

I don’t know if we want to implement stringEncodingForData as part of swift-corelibs-foundation. In any case, we are trying to avoid bringing in as few dependencies outside of the Swift project itself as possible, to keep Foundation as low level as possible for stability, ease of use, and ease of portability.

- Tony

On Jun 21, 2017, at 9:51 AM, Andy Best <andybest.net@gmail.com <mailto:andybest.net@gmail.com>> wrote:

Is the preferred approach to mirror Foundation as closely as possible (e.g. under Linux basically do nothing), or is implementing something like stringEncodingForData under the hood preferable in this case?

On 21 June 2017 at 17:43, Tony Parker <anthony.parker@apple.com <mailto:anthony.parker@apple.com>> wrote:
Hi Andy,

On Jun 21, 2017, at 7:39 AM, Andy Best via swift-corelibs-dev <swift-corelibs-dev@swift.org <mailto:swift-corelibs-dev@swift.org>> wrote:

Hey,

I've been looking at the init(contentsOfFile, usedEncoding) initializer for NSString in corelibs-foundation.

Am I right in thinking that this method should use some method to attempt to detect the character encoding of the file before returning a decoded String?

In this case, the Foundation implementation just looks at an extended attribute of the file to see if it contains the encoding. If it doesn’t have the xattr then we don’t attempt to guess (name of xattr is “com.apple.TextEncoding”).

Foundation has another API which attempts to guess the encoding of a data blob, but I think we left it out of the swift-corelibs stubs:

+ (NSStringEncoding)stringEncodingForData:(NSData *)data
                          encodingOptions:(nullable NSDictionary<NSStringEncodingDetectionOptionsKey, id> *)opts
                          convertedString:(NSString * _Nullable * _Nullable)string
                      usedLossyConversion:(nullable BOOL *)usedLossyConversion API_AVAILABLE(macos(10.10), ios(8.0), watchos(2.0), tvos(9.0));

- Tony

If so, I've been working on a pure Swift library to detect string encodings, and wondered if continued work on it might be useful for implementing this missing method?

Andy
_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org <mailto:swift-corelibs-dev@swift.org>
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev

_______________________________________________
swift-corelibs-dev mailing list
swift-corelibs-dev@swift.org
https://lists.swift.org/mailman/listinfo/swift-corelibs-dev