Pitch: Wrap calls to NSFileHandle and NSData in autorelease pools


(Charles Srstka) #1

MOTIVATION:

Meet Bob. Bob is a developer with mostly C++ and Java experience, but who has been learning Swift. Bob needs to write an app to parse some proprietary binary data format that his company requires. Bob’s written this app, and it’s worked pretty well on Linux:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = handle.readData(ofLength: bufsize)
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Later, Bob needs to port this same app to macOS. All seems to work well, until Bob tries opening a large file of many gigabytes in size. Suddenly, the simple act of running the app causes Bob’s Mac to completely lock up, beachball, and finally pop up with the dreaded “This computer is out of system memory” message. If Bob’s particularly unlucky, things will locked up tight enough that he can’t even recover from there, and may have to hard-reboot the machine.

What happened?

Experienced Objective-C developers will spot the problem right away; the Foundation APIs that Bob used generated autoreleased objects, which would never be released until Bob’s loop finished. However, Bob’s never programmed in Objective-C, and to him, this behavior is completely undecipherable.

After a copious amount of time spent Googling for answers and asking for help on various mailing lists and message boards, Bob finally gets the recommendation from someone to try wrapping the file handle read in an autorelease pool. So he does:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = autoreleasepool { handle.readData(ofLength: bufsize) }
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Unfortunately, Bob’s program still eats RAM like Homer Simpson in an all-you-can-eat buffet. Turns out the data.withUnsafeBytes call *also* causes the data to be autoreleased. What Bob really needs to do is to wrap the whole thing in an autorelease pool, creating a Pyramid of Doom:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        autoreleasepool {
            let data = handle.readData(ofLength: bufsize)
            
            if data.isEmpty {
                break // error: ‘break’ is allowed only inside a loop, if, do, or switch
            }
            
            data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
                // do something with bytes
            }
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

However, when Bob tries to run this, he now gets a compile error on the ‘break’ statement; it’s no longer possible to break out of the loop, since everything inside the autorelease block is in a closure.

Bob is now regretting his decision not to become an insurance adjuster instead.

Bob’s problem, of course, can be solved by using *two* autorelease pools, one when getting the data, and the next when working with it. But this situation is confusing to newcomers to the language, since autorelease pools are not really part of Swift’s idiom, and aren’t mentioned anywhere in the usual Swift documentation. Thus, without Objective-C experience, autorelease-related issues are completely mysterious and baffling, particularly since, as a struct, it isn’t obvious that Objective-C will be involved at all when using the Data type. Even to experienced Objective-C developers, autorelease pools in Swift can become awkward since, unlike with Objective-C, they can’t simply be tacked onto a loop without losing flow control via break and continue.

PROPOSED SOLUTION:

In the Foundation overlay, wrap calls to Objective-C NSFileHandle and NSData APIs that generate autoreleased objects in an autorelease pool, so that they behave the way a user new to the language would expect, and in a manner consistent with how they likely behave on other platforms which lack the Objective-C bridge.

This would likely add a small performance overhead, but this should be negligible compared to the overhead involved in reading from the disk which will occur when using a FileHandle. In addition, if Data objects are being accessed frequently enough for performance to be an issue, it’s likely that enough of them to be generated to make memory overhead an issue if an autorelease pool is not used.

IMPACT ON EXISTING CODE:

Code that currently works around these issues with an autorelease pool may end up double-wrapping until these manual workarounds are removed.
  
Charles


(John McCall) #2

MOTIVATION:

Meet Bob. Bob is a developer with mostly C++ and Java experience, but who has been learning Swift. Bob needs to write an app to parse some proprietary binary data format that his company requires. Bob’s written this app, and it’s worked pretty well on Linux:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = handle.readData(ofLength: bufsize)
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Later, Bob needs to port this same app to macOS. All seems to work well, until Bob tries opening a large file of many gigabytes in size. Suddenly, the simple act of running the app causes Bob’s Mac to completely lock up, beachball, and finally pop up with the dreaded “This computer is out of system memory” message. If Bob’s particularly unlucky, things will locked up tight enough that he can’t even recover from there, and may have to hard-reboot the machine.

What happened?

Experienced Objective-C developers will spot the problem right away; the Foundation APIs that Bob used generated autoreleased objects, which would never be released until Bob’s loop finished. However, Bob’s never programmed in Objective-C, and to him, this behavior is completely undecipherable.

After a copious amount of time spent Googling for answers and asking for help on various mailing lists and message boards, Bob finally gets the recommendation from someone to try wrapping the file handle read in an autorelease pool. So he does:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = autoreleasepool { handle.readData(ofLength: bufsize) }
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Unfortunately, Bob’s program still eats RAM like Homer Simpson in an all-you-can-eat buffet. Turns out the data.withUnsafeBytes call *also* causes the data to be autoreleased.

This seems like a bug that should be fixed. I don't know why the other one would cause an unreclaimable autorelease.

John.

···

On Jul 14, 2017, at 1:12 PM, Charles Srstka via swift-evolution <swift-evolution@swift.org> wrote:

What Bob really needs to do is to wrap the whole thing in an autorelease pool, creating a Pyramid of Doom:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        autoreleasepool {
            let data = handle.readData(ofLength: bufsize)
            
            if data.isEmpty {
                break // error: ‘break’ is allowed only inside a loop, if, do, or switch
            }
            
            data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
                // do something with bytes
            }
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

However, when Bob tries to run this, he now gets a compile error on the ‘break’ statement; it’s no longer possible to break out of the loop, since everything inside the autorelease block is in a closure.

Bob is now regretting his decision not to become an insurance adjuster instead.

Bob’s problem, of course, can be solved by using *two* autorelease pools, one when getting the data, and the next when working with it. But this situation is confusing to newcomers to the language, since autorelease pools are not really part of Swift’s idiom, and aren’t mentioned anywhere in the usual Swift documentation. Thus, without Objective-C experience, autorelease-related issues are completely mysterious and baffling, particularly since, as a struct, it isn’t obvious that Objective-C will be involved at all when using the Data type. Even to experienced Objective-C developers, autorelease pools in Swift can become awkward since, unlike with Objective-C, they can’t simply be tacked onto a loop without losing flow control via break and continue.

PROPOSED SOLUTION:

In the Foundation overlay, wrap calls to Objective-C NSFileHandle and NSData APIs that generate autoreleased objects in an autorelease pool, so that they behave the way a user new to the language would expect, and in a manner consistent with how they likely behave on other platforms which lack the Objective-C bridge.

This would likely add a small performance overhead, but this should be negligible compared to the overhead involved in reading from the disk which will occur when using a FileHandle. In addition, if Data objects are being accessed frequently enough for performance to be an issue, it’s likely that enough of them to be generated to make memory overhead an issue if an autorelease pool is not used.

IMPACT ON EXISTING CODE:

Code that currently works around these issues with an autorelease pool may end up double-wrapping until these manual workarounds are removed.
  
Charles

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Charles Srstka) #3

Sticking a break at the end of my original code snippet so the loop runs only once and then running it through Instruments, it seems the NSConcreteData instance gets autoreleased… 32,769 times. o_O

First autorelease occurs inside -[NSConcreteFileHandle readDataOfLength:], the next 32,768 occur in Data.Iterator.next(), which is called by specialized RangeReplaceableCollection.init<A>.

I can send you the trace off-list if you’d like.

Charles

···

On Jul 14, 2017, at 2:35 PM, John McCall <rjmccall@apple.com> wrote:

On Jul 14, 2017, at 1:12 PM, Charles Srstka via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
MOTIVATION:

Meet Bob. Bob is a developer with mostly C++ and Java experience, but who has been learning Swift. Bob needs to write an app to parse some proprietary binary data format that his company requires. Bob’s written this app, and it’s worked pretty well on Linux:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = handle.readData(ofLength: bufsize)
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Later, Bob needs to port this same app to macOS. All seems to work well, until Bob tries opening a large file of many gigabytes in size. Suddenly, the simple act of running the app causes Bob’s Mac to completely lock up, beachball, and finally pop up with the dreaded “This computer is out of system memory” message. If Bob’s particularly unlucky, things will locked up tight enough that he can’t even recover from there, and may have to hard-reboot the machine.

What happened?

Experienced Objective-C developers will spot the problem right away; the Foundation APIs that Bob used generated autoreleased objects, which would never be released until Bob’s loop finished. However, Bob’s never programmed in Objective-C, and to him, this behavior is completely undecipherable.

After a copious amount of time spent Googling for answers and asking for help on various mailing lists and message boards, Bob finally gets the recommendation from someone to try wrapping the file handle read in an autorelease pool. So he does:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = autoreleasepool { handle.readData(ofLength: bufsize) }
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Unfortunately, Bob’s program still eats RAM like Homer Simpson in an all-you-can-eat buffet. Turns out the data.withUnsafeBytes call *also* causes the data to be autoreleased.

This seems like a bug that should be fixed. I don't know why the other one would cause an unreclaimable autorelease.


(John McCall) #4

We should absolutely not need to do the later autoreleases. We have logic to autorelease objects when calling returns-inner-pointer objects on them, but we shouldn't need to do that in safe patterns like what Data does here by scoping the pointer to the closure. We probably just don't actually have a way to turn that logic off, i.e. an equivalent of objc_precise_lifetime in ObjC ARC.

I have no idea why the first autorelease wouldn't be reclaimed. There's a well-known issue with micro-reductions involving autoreleases on x86, where the first autorelease from the executable doesn't get reclaimed because the dynamic linker's lazy-binding stub interferes somehow. Can you verify that you still see that initial autorelease on subsequent Data creations?

John.

···

On Jul 14, 2017, at 4:15 PM, Charles Srstka <cocoadev@charlessoft.com> wrote:

On Jul 14, 2017, at 2:35 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

On Jul 14, 2017, at 1:12 PM, Charles Srstka via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:
MOTIVATION:

Meet Bob. Bob is a developer with mostly C++ and Java experience, but who has been learning Swift. Bob needs to write an app to parse some proprietary binary data format that his company requires. Bob’s written this app, and it’s worked pretty well on Linux:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = handle.readData(ofLength: bufsize)
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Later, Bob needs to port this same app to macOS. All seems to work well, until Bob tries opening a large file of many gigabytes in size. Suddenly, the simple act of running the app causes Bob’s Mac to completely lock up, beachball, and finally pop up with the dreaded “This computer is out of system memory” message. If Bob’s particularly unlucky, things will locked up tight enough that he can’t even recover from there, and may have to hard-reboot the machine.

What happened?

Experienced Objective-C developers will spot the problem right away; the Foundation APIs that Bob used generated autoreleased objects, which would never be released until Bob’s loop finished. However, Bob’s never programmed in Objective-C, and to him, this behavior is completely undecipherable.

After a copious amount of time spent Googling for answers and asking for help on various mailing lists and message boards, Bob finally gets the recommendation from someone to try wrapping the file handle read in an autorelease pool. So he does:

import Foundation

do {
    let url = ...
    
    let handle = try FileHandle(forReadingFrom: url)
    let bufsize = 1024 * 1024 // read 1 MiB at a time
    
    while true {
        let data = autoreleasepool { handle.readData(ofLength: bufsize) }
        
        if data.isEmpty {
            break
        }
        
        data.withUnsafeBytes { (bytes: UnsafePointer<UInt8>) in
            // do something with bytes
        }
    }
} catch {
    print("Error occurred: \(error.localizedDescription)")
}

Unfortunately, Bob’s program still eats RAM like Homer Simpson in an all-you-can-eat buffet. Turns out the data.withUnsafeBytes call *also* causes the data to be autoreleased.

This seems like a bug that should be fixed. I don't know why the other one would cause an unreclaimable autorelease.

Sticking a break at the end of my original code snippet so the loop runs only once and then running it through Instruments, it seems the NSConcreteData instance gets autoreleased… 32,769 times. o_O

First autorelease occurs inside -[NSConcreteFileHandle readDataOfLength:], the next 32,768 occur in Data.Iterator.next(), which is called by specialized RangeReplaceableCollection.init<A>.

I can send you the trace off-list if you’d like.


(Charles Srstka) #5

Changing the loop to run five times, I end up with five NSConcreteDatas in Instruments. For the first one, I get the initial autorelease in -[NSConcreteFileHandle readDataOfLength], as well as the subsequent 32,768 autoreleases (it actually seems to be one autorelease per every 32 bytes in the data; if I adjust the buffer size the number of autoreleases changes accordingly).

For the other four… it looks exactly the same, actually. :-/

Instruments .trace file is available off-list by request.

Charles

···

On Jul 14, 2017, at 3:24 PM, John McCall <rjmccall@apple.com> wrote:

We should absolutely not need to do the later autoreleases. We have logic to autorelease objects when calling returns-inner-pointer objects on them, but we shouldn't need to do that in safe patterns like what Data does here by scoping the pointer to the closure. We probably just don't actually have a way to turn that logic off, i.e. an equivalent of objc_precise_lifetime in ObjC ARC.

I have no idea why the first autorelease wouldn't be reclaimed. There's a well-known issue with micro-reductions involving autoreleases on x86, where the first autorelease from the executable doesn't get reclaimed because the dynamic linker's lazy-binding stub interferes somehow. Can you verify that you still see that initial autorelease on subsequent Data creations?


(John McCall) #6

Would you mind just filing a bug with a reduced test case?

John.

···

On Jul 14, 2017, at 4:31 PM, Charles Srstka <cocoadev@charlessoft.com> wrote:

On Jul 14, 2017, at 3:24 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

We should absolutely not need to do the later autoreleases. We have logic to autorelease objects when calling returns-inner-pointer objects on them, but we shouldn't need to do that in safe patterns like what Data does here by scoping the pointer to the closure. We probably just don't actually have a way to turn that logic off, i.e. an equivalent of objc_precise_lifetime in ObjC ARC.

I have no idea why the first autorelease wouldn't be reclaimed. There's a well-known issue with micro-reductions involving autoreleases on x86, where the first autorelease from the executable doesn't get reclaimed because the dynamic linker's lazy-binding stub interferes somehow. Can you verify that you still see that initial autorelease on subsequent Data creations?

Changing the loop to run five times, I end up with five NSConcreteDatas in Instruments. For the first one, I get the initial autorelease in -[NSConcreteFileHandle readDataOfLength], as well as the subsequent 32,768 autoreleases (it actually seems to be one autorelease per every 32 bytes in the data; if I adjust the buffer size the number of autoreleases changes accordingly).

For the other four… it looks exactly the same, actually. :-/

Instruments .trace file is available off-list by request.


(Philippe Hausler) #7

Would it perhaps be reasonable for the Data backing to manually managed the storage via Unmanaged? Since we know the lifespan of the container and are the sole controllers in most places.

···

On Jul 14, 2017, at 1:40 PM, John McCall via swift-evolution <swift-evolution@swift.org> wrote:

On Jul 14, 2017, at 4:31 PM, Charles Srstka <cocoadev@charlessoft.com <mailto:cocoadev@charlessoft.com>> wrote:

On Jul 14, 2017, at 3:24 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

We should absolutely not need to do the later autoreleases. We have logic to autorelease objects when calling returns-inner-pointer objects on them, but we shouldn't need to do that in safe patterns like what Data does here by scoping the pointer to the closure. We probably just don't actually have a way to turn that logic off, i.e. an equivalent of objc_precise_lifetime in ObjC ARC.

I have no idea why the first autorelease wouldn't be reclaimed. There's a well-known issue with micro-reductions involving autoreleases on x86, where the first autorelease from the executable doesn't get reclaimed because the dynamic linker's lazy-binding stub interferes somehow. Can you verify that you still see that initial autorelease on subsequent Data creations?

Changing the loop to run five times, I end up with five NSConcreteDatas in Instruments. For the first one, I get the initial autorelease in -[NSConcreteFileHandle readDataOfLength], as well as the subsequent 32,768 autoreleases (it actually seems to be one autorelease per every 32 bytes in the data; if I adjust the buffer size the number of autoreleases changes accordingly).

For the other four… it looks exactly the same, actually. :-/

Instruments .trace file is available off-list by request.

Would you mind just filing a bug with a reduced test case?

John.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution


(Charles Srstka) #8

Radar or bugs.swift.org <http://bugs.swift.org/>?

Charles

···

On Jul 14, 2017, at 3:40 PM, John McCall <rjmccall@apple.com> wrote:

Would you mind just filing a bug with a reduced test case?

John.


(John McCall) #9

Either would be fine, assuming your test case doesn't need to be confidential.

John.

···

On Jul 14, 2017, at 4:43 PM, Charles Srstka <cocoadev@charlessoft.com> wrote:

On Jul 14, 2017, at 3:40 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

Would you mind just filing a bug with a reduced test case?

John.

Radar or bugs.swift.org <http://bugs.swift.org/>?


(Charles Srstka) #10

Done! Three test cases, all with silly names. Enjoy!

https://bugs.swift.org/browse/SR-5463

Charles

···

On Jul 14, 2017, at 3:56 PM, John McCall <rjmccall@apple.com> wrote:

On Jul 14, 2017, at 4:43 PM, Charles Srstka <cocoadev@charlessoft.com <mailto:cocoadev@charlessoft.com>> wrote:

On Jul 14, 2017, at 3:40 PM, John McCall <rjmccall@apple.com <mailto:rjmccall@apple.com>> wrote:

Would you mind just filing a bug with a reduced test case?

John.

Radar or bugs.swift.org <http://bugs.swift.org/>?

Either would be fine, assuming your test case doesn't need to be confidential.

John.