RFC: Data-race safety chapter in the TSPL Language Reference

Hello!

I'd like to add a comprehensive chapter on data-race safety to the TSPL Language Reference. The Language Reference is meant to describes every aspect of the Swift language in complete detail, but it makes no attempt to be an instructional text. This makes it the perfect place for an exhaustive list of data-race safety rules. There is currently no single place where all data-race safety rules are documented together, and it's extremely difficult to piece together the model based on Swift Evolution proposals because later proposals change the rules from earlier proposals. This comprehensive documentation can also be used as a reference when writing introductory content for the Language Guide on recent SE proposals.

I have a very, very rough and incomplete draft of the chapter in a swift-book PR here: Add a chapter to the Language Reference for data-race safety. by hborla · Pull Request #372 · swiftlang/swift-book · GitHub. I would love some early feedback on any of the following topics:

  1. The writing style and depth level. This is not an introduction to concurrency, but is it understandable to someone who is experienced with Swift's concurrency model?
  2. The balance of examples to text. I personally think there are not enough examples right now, but the Language Reference tends to not include many examples in its current content (leaving that primarily to the Language Guide).
  3. Missing content that I did not mention in the PR description.
  4. Any other feedback you might have!

Feel free to leave feedback in this discussion thread, or if you have more specific/editorial feedback, in a code review on my PR!

-Holly

20 Likes
identifying and diagnosing risk of concurrent access to shared state

At the risk of performing the grammatical equivalent to bike shedding, should that be "diagnosing risks"?

(I'm very interested in this chapter and will read it over as someone who half understood this stuff from picking through evolution documents.)

Thanks for taking on this effort!

I haven’t read through it yet, but I want to mention something I’ve experienced previously with other write-ups about concurrency.

And that is, often-times there will be lots of examples showing what not to do: “Don’t do ABC, it causes a data race”, “Don’t do XYZ, it involves overlapping access to memory”, etc.

That’s all fine as far as it goes, but speaking for myself, when I read documents like that I find myself thinking, “Okay, now I know lots of things not to do, but I still don’t know the right thing to do.”

So (and maybe you’ve already done this, like I said I haven’t read through it yet) I think it would be most beneficial to include several examples of how to properly use Swift currency features to achieve common goals, especially when it may not be obvious how to maintain data-race safety while doing so.

In other words, take some common scenarios where people tend to violate data-race safety, and rather than just saying “This is against the rules, don’t do it”, instead demonstrate what a proper idiomatic solution looks like to achieve the desired outcome while following the rules.

5 Likes

I completely agree that this is an issue with a lot of concurrency write ups and that can make it difficult to understand how to properly use concurrency features. That said, the reference that I wrote definitely is "just" a comprehensive list of the data-race safety rules. This is what I wrote in my PR description about the intended purpose of this reference

This is meant to be an exhaustive list of semantic rules that define away low-level data races in Swift code. This is not meant to be an introduction to concurrency -- that's the job of the Concurrency chapter of the Language Guide -- and it's not meant to contain all semantic rules about the concurrency model (such as when async functions are guaranteed to not suspend dynamically, etc). This reference will subsume the data-race safety reference in the Swift migration guide, and is largely inspired by the content there, originally written by @mattmassicotte (thank you!)

I think this documentation is appropriate for the TSPL Language Reference based on the statement in the style guide on the purpose of the reference:

Language Reference, commonly referred to as “the reference”, describes every aspect of the Swift language in complete detail, but it makes no attempt to be an instructional text. Its material is ordered according to the shape of the formal grammar, and it hand-waves over examples and applications. Several places explicitly link back to the guide for examples. It doesn't need to be as approachable for beginners, because the guide handles that, but it does need to be accurate and unambiguous, shining its flashlight into infrequently explored areas of the language. To accomplish that, it sometimes must sacrifice approachability or user-friendliness. That's ok — many readers won't even need the reference, but if the reference is unclear, the readers who need an answer have nowhere else to go.

I think there are 2 places that should be more instructional resources on how to best make use of concurrency features and how to resolve data races:

  1. The Concurrency chapter in the Language Guide
  2. The documentation for understanding and resolving data-race safety errors (at Documentation)

I think there is still opportunity to incorporate some of that into the Language Reference. At the very least, every data-race safety error shown in the reference could link out to the documentation about resolving the specific error, which will contain various strategies for changing your code to eliminate the data race.

3 Likes

I have two questions/comments on Isolation Domains section (a small section). They're not about actual rules, but rather accuracy of statements related to the term "isolation domain".

Q1) Does task provide isolation domain?

Tasks and actors provide isolation domains

I didn't read about that "Task provides isolation domain" before. In my understanding task is associated with an isolation domain at a specific time, but it doesn't provide isolation domain. It's actor (more accurately, executors, including global executor) that does it. So I searched in SE proposals. The only one that implies task has isolation domain is SE-0414:

[{(a), Task1}]: A single region that is part of Task1's isolation domain.

So, does it refers to the isolation provided by global executor?

Q2) Is Non-isolated an isolation domain

Every declaration in Swift code has a specific isolation domain. There are three kinds of isolation domains:

  1. Non-isolated

In my understanding, while Non-isolated describes a value's isolated status, it isn't an isolation domain itself.

Two more comments.

  1. "Isolation Regions" section has the following description about how regions are merged by function calls:

A function implementation can create references and access paths between its argument and result values. By default, a function call causes all non-Sendable arguments and result values to merge into one region. If the function is isolated to an actor, the values are merged into the actor's region. If the function is non-isolated, the values are merged into a larger region that is disconnected from any actor.

The description focues on the factor of function's isolation and skips the factor of parameters and return value's isolation. Could that mislead user? For example, if a non-isolated synchronous function takes actor's property as its parameter, the merged region is an actor isolated region, instead of a disconnected region. The original proposal defines the rules based on parameters and return value's isolation only, which I think is a better approach because it covers the difference between an actor-isolated function and non-isolated function (the implicit self parameter in the actor's isolation).

  1. "Non-isolated" section has the following rules:

It is safe to mark a variable as non-isolated in the following cases:

The variable is a let-constant with a type that conforms to Sendable.
The variable is a property of a non-Sendable type.
The variable is a property of a struct and the type of the property conforms to Sendable.

The result of the code below doesn't completely match item 2. Is it a bug in implementation or does the text need to be modified to be complete?

@MainActor
class NonSendable {
    var value: Int = 0
    nonisolated var id: Int = 0 // This doesn't compile. It has to be "let" variable.
}