Memory initialization and deinitialization best practices

I understand initialization is important for reference types, and that’s why core team says to always perform it, even though it turns into a no-op for value types, but if you have a case where only some of the elements in a buffer are initialized, then it does not make sense to deinitialize them all

let storage:UnsafeMutablePointer<T> = .allocate(capacity: 256)

for i in 0 ..< min(Int.random(), 256)
{
    (storage + i).initialize(to: getValue(i))
}

...

storage.deinitialize(count: 256)
storage.deallocate(capacity: 256)

clearly this code will fail if T is a reference type. Since I’m using UInt8 as my T in my code, i would just take out all the initialize and deinitialize calls and just write directly to uninitialized memory, but people said to include the initialization calls in there for good form. But “good form” in this case is actually incorrect. So is it better to just remove the initialization calls?

(for maximum safety I wish Swift had a way to constrain a type to a pure-value type)

+1. Serves for avoiding mistakes (which one often does), for documenting the intention of the design, and for giving the compiler more information about the program (which is always good).

1 Like

I think the assumption is that, somewhere, the elements that are actually initialised is tracked: that is, that your code looks more like:

let storage:UnsafeMutablePointer<T> = .allocate(capacity: 256)

let count = min(Int.random(), 256)
for i in 0 ..< count
{
    (storage + i).initialize(to: getValue(i))
}

...

storage.deinitialize(count: count)
storage.deallocate(capacity: 256)

Could you say more about what you're doing, where you don't have something like count available?

it’s for a 2-level huffman tree/table representation. the count is not needed, except for the memory initialization

I've also felt this need.

That might also be useful when writing generic routines for serialization/deserialization (to files, byte buffers, etc).

I have the feeling that a protocol to which all “pure value types” implicitly conform has been suggested/discussed before. Does somebody have a link?

Perhaps if somebody could provide a precise definition of what “pure-value” means we could write a proposal that would be acceptable. I spent quite a bit of time trying to do this. Unfortunately it turns out to be quite difficult and I was unsuccessful.

This thread, continued here, contains a lot of discussion about that. It's the longest discussion about value types/semantics that I can recall from the mailing list days.

Looking into it deeper, I believe that it is fine to do what you wish for UInt8, since the current docs for deallocate() say:

This pointer must be a pointer to the start of a previously allocated memory block. The memory must not be initialized or Pointee must be a trivial type.

and those for pointee:

When pointee is used as the left side of an assignment, the instance must be initialized or this pointer’s Pointee type must be a trivial type.

Do not assign an instance of a nontrivial type through pointee to uninitialized memory. Instead, use an initializing method, such as initialize(to:count:).

Of course, it's fair that the way to constrain something to be trivial is not currently available for arbitrary generics, so one can't write statically correct generics in this manner (one could do a "closed protocol" hack ala Emulating closed protocols in Swift · GitHub, or a half static/half dynamic thing with an (open) protocol accompanied by preconditions that a given type parameter is one of the supported types, meaning users get the help of static checking but the library still defends against "malicious" conformances).


In any case, for your specific use-case: even just reading an uninitialised byte can lead to unsafe behaviour in seemingly safe code (you may be aware of this, but it was definitely surprising to me when I first encountered it), e.g.

let uninitialised = UnsafeMutablePointer<UInt8>.allocate(capacity: 1)
if uninitialised.pointee == 0 { print("zero") }
if uninitialised.pointee != 0 { print("nonzero") }

might print nothing at all. This means that your tree/table needs to be safe by construction (I guess the first level is constructed to direct values to arrays of exactly the right size in the second level?), or else someone may be able to trigger undefined behaviour with an appropriate input, since it sounds like there's no way at all to compute the size of the arrays to precondition that the accesses are valid.

yes the uninitialized bytes are used as padding so the relevant entries can be indexed in constant time. they are never read if the traversal code is working correctly

A pure value and a trivial value aren't the same thing. Huon's reference to trivial values is correct. IIUC, pure value's have to do with substitutability--a value can always be replaced by a copy of that value. It's implementation may still use a CoW storage reference, so it's not trivial. We don't normally think about the semantics of trivial values in the language, but they do have important properties that affect Unsafe APIs.

It's useful to constrain types to be trivial for performance. The compiler already has the ability to support the "_Trivial" constraint. See Generics.rst: Specialization. This use case is an interesting argument for exposing that as an official part of the type system. I wouldn't have thought that was very high priority, but it sounds like a few power users are bumping into this.

5 Likes

I understand trivial types as occupying contiguous memory areas. i.e

struct Struct {}
class Class {}

// Trivial type
struct Foo {
    let foo: Struct = Struct()
}

// Value type
struct Foo {
   let foo: Class = Class()
}
1 Like

i thought pure value type meant a type that does not wrap any reference types

Ditto.
Hmm. I wonder how common are trivial types v/s non-trivial value types. We could probably do a survey over popular projects with some automatized tool (can libSyntax make sense of “value type” and “reference type”?). My guess is that in Swift, trivial types must be uncommonly popular, followed by non-trivial value types and then by reference types.

Reading the aforementioned thread, I can see where the confusion came from. There's no accepted definition of a "pure" value though, only a pure function. I can think of three properties that are interesting:

(1) Trivial values don't contain any references. They can be bitwise copied and don't need to be deinitialized.

(2) Values that don't reference shared mutable state (PureValue??). It may contain a reference, but we have a guarantee that writes to the referenced memory aren't visible via another reference. This is really a statement about mutability that could allow the compiler to infer function purity.

(3) The substitutable property of copies. I think this is a tautology from the definition of a value. The CoW storage is part of the value, any other references to shared objects are not.

This is an extremely narrow definition to the degree that I don't think it is very useful. For example, many (most?) of the value-semantic types vended by the standard library and Foundation use object references in their implementation.

1 Like

Do you have a specific design for this in mind?

I don't have a specific proposal. The idea is simply that if I have some generic code:

protocol P : PureValue {
  associatedtype R
  func method() -> R
}

func foo<T : P>(t: T) -> P.R {
  return t.method()
}

The compiler can infer that foo is a pure function without fully specializing it.

You could probably accomplish the same by qualifying all of the protocol's methods as "pure" instead.

1 Like

This sounds to me a lot like some parts of what Ownership Semantics allows you to declare. Does this have some overlap with that, perhaps?

Introducing MoveOnly types would be an alternative to CoW as another way to ensure that a reference is unique. So that would be another way that a type could contain a reference to a storage object that is semantically part of the value.

1 Like