Circular Buffer

I don't have much experience with circular buffers, but from what I've gathered from reading this is that it is a double ended queue that dequeues elements if it is at capacity. Is that correct?

When implementing an AudioUnit plug-in, at least in the past, your code would be called with a source pointer, a destination pointer, and a number of samples to process, so I'm not sure producer/consumer sequencing was an issue you had to deal with, because you were explicitly 'mapping' from a source block to a destination block, and the circular buffer you used to do the actual work would be isolated from both.

I've written a number of AudioUnit plugins that use a circular buffer internally. The 'write' position always advances one sample at a time, but depending on the effect, the position of the 'read' head can move by any number of steps, or even just be indexed directly.

Sure, close pipeline could easily be a much better approach. Nonetheless, my other point still stand:

Maybe this should be separated into two separate types. CircularArray which automatically grows and CircularBuffer which overwrites.

3 Likes

Hi @kitaisreal,

Thanks for looking into this!

A circular buffer is definitely something that the standard library needs, as it's a good foundation for many other data structures. It should be the first of several data structures that belong in the standard library, since as you mention it would be useful in the implementation of the standard library itself.

That said, I think we need a slightly different approach to the development of data structures to the one we've traditionally had with new algorithms when proposing additions to the standard library. Specifically, they need a more detailed design phase before moving to pitch, probably as a stand-alone package. The prototype needs to be coupled with a design document outlining exactly how the implementation will be achieved, and the design trade-offs made. Discussion of the design should probably happen elsewhere to the evolution pitches forum topic (possibly the Standard Library developer topic, or maybe on GitHub).

A big reason for this is the way data structures stabilize their ABI, and how this is very different to how algorithms do so. Since the standard library is ABI-stable, any additions will need care taken in stabilizing their ABI.

Right now, unless you want to keep your data structure as resilient, then implementing an ABI-stable data structure is a bit of a dark art. You need to spend a lot of time thinking about which parts of your data structure are going to be frozen and which aren't, and which methods should be inlinable and touch those frozen internals. This means spending time trading off future enhancements for performance today. It also involves trade-offs around of the user's binary ā€“ sometimes if you inline too much it has really bad consequences for code size.

Very little of this is currently documented because the techniques needed are still emerging ā€“ the engineers who work on the standard library have only been doing it in an ABI-stable context for the past year. Hopefully the design of new data structures will be an opportunity for those engineers to share what they've learned with the rest of the community.

A package by contrast has none of these problems (at least until we start distributing binary packages). They are never ABI stable and get shipped with the user's binary (just like the standard library used to). This gives you a lot more freedom to iterate, even after you release a 1.0 you can stand behind people using. By contrast, you may only get one shot to ever ship a version in the standard library, because performance for a type like this is critical so enough of it gets marked fragile (@frozen, @inlinable etc) in the first version that you can never change it later.

For these reasons, I'd recommend iterating on some possible implementations before moving to the pitch and PR against the standard library phase. Ideally this would start from a battle-tested base like the one found in NIO. It should also demonstrate performance where possible versus equivalent types in the standard library. For example, appending to a circular buffer could be benchmarked versus appending to Swift's array type.

Hope this helps and I or some of the other engineers who've worked on this would happy to go into more detail about any of this if needed.

Cheers,
Ben

18 Likes

Has the core team given consideration to making a ā€œstaging groundā€ portion of the standard library that would not be ABI stable, and thus would be included within apps? (It would also, emphatically, still be part of the standard library.)

If we had such a component, then data structures like this could have their API designed, pitched, implemented, and accepted, without needing to lock down the ABI immediately. They could be incorporated into the staging ground portion of the standard library, whereupon the ABI experts could go to work improving and fine-tuning the details.

Developers could use these data structures just fine since they would be part of the standard library, and the staging ground would become part of their apps just like the standard library used to.

Then, when the ABI experts are satisfied that they have made all the right decisions and finalized any necessary tradeoffs for a data structure, it could be moved into the main, ABI-stable, portion of the standard library. This move may or may not require a separate proposal.

ā€¢ ā€¢ ā€¢

I know this idea has been discussed on the forums in the past (I suggested prior to ABI stability that Mirror could be given this treatment, since Chris Lattner and others had expressed that it was already known we would eventually want to overhaul it), but I donā€™t know if the core team has weighed in on it.

10 Likes

Smalltalk, Objective-C, and Swift have core concrete collection types Array, Dictionary, Set, and the former 2 also have a Bag typeā€” all a single word. Each language also has concrete variations with prefixes such as Weakā€¦, Mutableā€¦, Contiguousā€¦, which aren't unwieldily because the type name is then just ā€œAdjectiveNounā€.

If a circular buffer concrete type is added to Swift's stdlib, I suggest it be named with a single nounā€” such as Ring.

Further, I would avoid the word Buffer since every Swift stdlib type that currently has the word Buffer in it deals with raw memory.

16 Likes

Is there some kind of documentation of future directions for the standard library? It seems odd to me that we would be discussing adding new containers one at a time. I would expect to see a higher level discussion of the kinds of containers that are desirable to add and that discussion would result in a prioritized list of containers. After that it would make sense to entertain discussions of the individual implementations.

There are dozens of swift containers available on GitHub etc. There are at least a dozen Circular Buffer implementations out there. There is the swift algorithm club that has a large library of containers and other algorithms.

Because things seem to be moving so slowly with adding new containers to the Swift standard library I had assumed that either new containers weren't desired (we just need the containers from Foundation) or something is wrong with the process.

Anyway, what's going on?

2 Likes

Packages are the preferred solution to this problem.

2 Likes

Just looked through this thread and the current code.

Some of the early posts had the posters seem confused on the semantics of the type, as well as some naming. I think part of the problem is that the author is also confused on some of the semantics, which makes the result a muddled mess.

As I understand it, the capacity parameter for a RangeReplaceableCollection type is for internal use only, it has no observational effect. If the type actually uses the capacity, it has to ignore it and grow said capacity whenever clients request more elements than the starting capacity has. You have two concepts of capacity, the one already mentioned, and a public capacity. Your public capacity limits the number of elements an instance can have. The problem is that you're trying to use the internal-use capacity for double-duty as the public capacity, which makes everything confusing.

Suggestions:

  1. Rename the public capacity as a "maxCount." This value is kept independent of the existing RRC capacity. If the RRC capacity is set lower than maxCount, that request is ignored and maxCount is used instead. If maxCount is set to a higher value, increase the RRC capacity to match as necessary.

  2. The default initializer delegates to the initializer that takes a maxCount, but set to Int.max.

  3. Add prepend methods, both single element and sequence-sourced.

  4. For now assume that the new prepend methods and all the existing non-capacity RRC methods channel into replaceSubrange(_: with:). Your implementations don't actually have to, we'll work with an "as-if" rule to make everything easier.

  5. Make the type its own SubSequence.

  6. These would be the new members:

     public init(maxCount: Int)
     public var isFull: Bool { get }
     public var maxCount: Int { get }
     public mutating func resize(newMaxCount: Int)
     public mutating func subsumeAsNewFirst(_ newElement: Element)
     public mutating func subsumeEachIntoFirst<S: Sequence>(_ newElements: S) where S.Element == Element
     public mutating func subsumeAsNewLast(_ newElement: Element)
     public mutating func subsumeEachIntoLast<S: Sequence>(_ newElements: S) where S.Element == Element
    

    (There are no "pop" methods because of #5.) Note I used "first" and "last" to maintain the existing Collection terminology.

  7. You have to document which elements are obliterated if resize is called with a count smaller than the current count. You have to document which order the "subsumeEach..." methods call the new elements as-if into the corresponding single-element method, which implies which of the new elements are kept if the new sequence is longer than maxCount and determines if they'll be read backwards or forwards relative to their original order. But I feel like the sequence-based variants should be dropped, because of the potentially confusing semantic.

  8. When replaceSubrange(_: with:) ends up making a net gain of elements, the maxCount increases by the difference! There is no change to maxCount upon decreases or length-neutral events. The overrunning behavior only occurs when calling the "subsume..." methods.

  9. Implement removeAll(keepingCapacity:) to make sure a reset always preserves the pre-call maxCount.

Separating the two kinds of capacity should make the semantics clearer.

2 Likes

Big thanks for review. I definitely agree that new data structures should require careful design for ABI stability.

I plan to add benchmarks to popular implementations (see below) of this kind of data structure for common cases like Queue implementation. Also to test them against existing stdlib data structures where possible. As I understand benchmarks can be implemented on top of https://github.com/apple/swift/tree/master/benchmark.

The most common concern of RingBuffer is should it provide ability to rewrite old elements if its capacity is full. If this feature can be eliminated implementation and design can become much easier. I am not sure should such questions be discussed in standard library pitches or it is just implementation detail.

As existing solid and tested implementations I find these:

  1. https://github.com/apple/swift-nio/blob/master/Sources/NIO/CircularBuffer.swift
  2. GitHub - attaswift/Deque: A double-ended queue type in pure Swift

Has there been any movement on this since March? I searched the forums, and can't find any other references to this.

1 Like