Circular Buffer

kitaisreal · March 14, 2020, 10:52am

Circular Buffer

Proposal: SE-CircularBuffer
Authors: Maksim Kita
Review Manager: TBD
Status: Awaiting implementation
Implementation: apple/swift#30242, apple/swift-evolution-staging#1
Bugs: SR-8190, SR-6868, SR-11956

Introduction

Introduce the CircularBuffer collection type conforming to the RandomAccessCollection,
RangeReplaceableCollection and MutableCollection protocols. With random element
access and support for constant back and front element insertion and deletion.

Motivation

Swift currently does not have collection with both element random access
and constant O(1) elements back and front insertion and deletion. A good
usage examples for such collection are queue, deque, fixed length buffers.
Such abstractions cannot effectively be build on Array because of O(n) first element deletion.

Proposed solution

This proposal adds a CircularBuffer type to standard library.

Detailed design

CircularBuffer is generic collection with random element access and
constant O(1) elements back and front insertion and deletion.

public struct CircularBuffer<Element> : RandomAccessCollection, RangeReplaceableCollection {
    public init(capacity: Int)
    /// A boolean value indicating that CircularBuffer is full.
    public var isFull: Bool { get }
    
    /// A total number of elements that CircularBuffer can contain.
    public var capacity: Int { get }
    /// Resizes CircularBuffer capacity.
    public mutating func resize(newCapacity: Int)
    /// Pushes element to the back
    public mutating func pushBack(_ newElement: Element)
    /// Pushes elements to the back
    public mutating func pushBack<S>(contentsOf newElements: S) where Element == S.Element, S : Sequence
    /// Pushes element to the front
    public mutating func pushFront(_ newElement: Element)
    /// Pushed elements to the front
    public mutating func pushFront<S>(contentsOf newElements: S) where Element == S.Element, S : Sequence
    /// Removes and returns element from the back
    public mutating func popBack() -> Element
    /// Removes and returns element from the front
    public mutating func popFront() -> Element
}

CircularBuffer conforms to RandomAccessCollection, RangeReplaceableCollection, CustomStringConvertible, CustomDebugStringConvertible,
ExpressibleByArrayLiteral and also to Equatable and Hashable when its Element type conforms to it.

Capacity

Capacity is essential part of RingBuffer, not just for holding elements but also for
implementing behaviour of rewriting old data when buffer is full. These methods:

public mutating func pushBack(_ newElement: Element)
public mutating func pushBack<S>(contentsOf newElements: S) where Element == S.Element, S : Sequence
public mutating func pushFront(_ newElement: Element)
public mutating func pushFront<S>(contentsOf newElements: S) where Element == S.Element, S : Sequence

does not increase capacity automatically. When CircularBuffer is full, new data will be written
to the beginning of the buffer so old data will be overwritten.

var circularBuffer = CircularBuffer<Int>(capacity: 2)
circularBuffer.pushBack(1)
circularBuffer.pushBack(2)
circularBuffer.pushBack(3)
print(circularBuffer)
// Prints "[2, 3]"

Circular Buffer capacity can be increased with manual call
to resize(newCapacity:).

Also CircularBuffer support RangeReplaceableCollection so client can
increase capacity using methods that increase element count of
generic RangeReplaceableCollection collection append, append(contentsOf:),
replaceSubrange(subrange:, with:). CircularBuffer will grow exponentially
using grow factor 1.5.

Example of usage

var circularBuffer = CircularBuffer<Int>(capacity: 2)
circularBuffer.pushBack(1)
circularBuffer.pushFront(2)
print(circularBuffer)
// Prints "[2, 1]"
// Now buffer isFull so next writes will overwrite data at beggining
circularBuffer.pushFront(4)
print(circularBuffer)
// Prints "[4, 2]"
circularBuffer.pushBack(3)
print(circularBuffer)
// Prints "[2, 3]"
print(circularBuffer.popFront())
// Prints "2"
print(circularBuffer)
// Prints "[3]"
circularBuffer.popBack()
print(circularBuffer)
// Prints "[]"

Source compatibility

N/A

Effect on ABI stability

This proposal only makes additive changes to the existing ABI.

Effect on API resilience

All the proposed additions are versioned.

Alternatives considered

Naming

There can be a lot of possible names for this kind of structures like circular buffer,
circular queue, cyclic buffer or ring buffer. The name CircularBuffer is one that is used
in most of the sources so it was picked.

Lantua · March 14, 2020, 12:55pm

It’d be better to use append for pushBack and add prepend for pushFront, and not to override the existing data.

CircularBuffer is used only for fixed capacity anyway, adding elements beyond the capacity should be deemed as logic error.

cukr · March 14, 2020, 2:29pm

I am not against this pitch/proposal, I have just a few questions

Is it by design or a bug that reserveCapacity works differently from an Array?

var x: Array = [1, 2, 3]
var y: CircularBuffer = [1, 2, 3]
x.reserveCapacity(10)
y.reserveCapacity(10)
print(x.capacity) // 10
print(y.capacity) // 13

I don't like the fact that method changing the capacity is called resize because it implies that you change the size of CircularBuffer's size, but in reality it changes the size of CircularBuffer's buffer (another unfortunate name)

Could you share your thoughts about removeFirst and removeLast being duplicated with popBack and popBack? Was it just to preserve symmetry of names with pushFront pushBack? If so, why is there no front and back properties?

kitaisreal · March 14, 2020, 3:14pm

Hi, big thanks for review.

reserveCapacity is bug, I mistakenly implemented it like reserve additional capacity. Already updated CircularBuffer by kitaisreal · Pull Request #1 · apple/swift-evolution-staging · GitHub.
Yes, I also had some concerns with this method name, I added newCapacity argument name to make it less confusing for client resize(newCapacity: Int). So it will be used like:

var x: CircularBuffer = [1, 2, 3]
x.resize(newCapacity: 5)

There is also option to add it into stored property capacity setter, but It seems like this action doing a lot of work, and client should be aware of it. There also could be an option to name this method setCapacity() but it is not swift style.
3. Related to CircularBufferBuffer name its private and it looks like default naming convention for COW like collections in swift, they all having underlying buffer with (CollectionName)Buffer name.
4. Yes. We definitely need pushFront or prepend like @Lantua suggested because RangeReplaceableCollection does not provide method for insert to beginning. Then I added pushBack to be consistent with it and also to have a method of inserting to the end of the buffer without growing its capacity. I does not wanted to duplicate all RangeReplaceableCollection properties and methods, popBack and popFront methods were added for consistency with pushBack and pushFront methods.

After reading @Lantua comment it makes sense for CircularBuffer not to rewrite its old data, I could not find example where this feature is actually necessary. In such case we can have only prepend as additional method.

xwu · March 14, 2020, 5:05pm

Overall, I think there's merit to considering whether additional fundamental collection types are appropriate to add to the standard library, and certainly a circular buffer is one such candidate.

This is one of those types where there are several viable implementations already out there. As we get into design details, nothing can beat the lived experience of using these implementations. My opinion would be that it would be probably wise to iterate on any implementation over time, and that the standard library should ultimately adopt the product of such iteration rather than reinventing it anew. Of course, there is now the preview library, but some level of maturity even before we get to that stage can only be a bonus.

jims · March 14, 2020, 5:06pm

First, although I mention a bunch of things to think about, I like this and want one!

The mutating and accessing functions' names seem mostly redundant and confusing. They are clear in a vacuum, but since this conforms to RandomAccessCollection it already has…

func popFirst() -> Self.Element?
func removeFirst() -> Self.Element
func removeFirst(Int)
func popLast() -> Self.Element?
func removeLast() -> Self.Element
func removeLast(Int)

These are available when the collection is a SubSequence which raises the question… "Why isn't this a SubSequence?". It appears from your example that ordering is guaranteed. I might want to pass this to something which takes a Sequence.

For resizing, it appears some ways of adding elements will trigger a resize and some fail. That seems odd. Maybe double to push API and a push without a throw resizes and one with throws an error?

Notice the variants of popFirst and removeFirst? You get to choose if empty should return a nil or, well the documents don't really say what removeFirst does if the collection is empty, there is a "must not call" phrase. I assume it panics and gets you a 1 star review.

Whatever happens for naming, please think ahead to a future ConcurrencySafeCircularBuffer where you might want to block if the buffer is full when you push or empty when you pop. Maybe that means that CircularBuffer is a protocol which can be adopted by ConcurrencySafeCircularBuffer. When thinking ahead to concurrency, no common use should require more than one API call. Any time code does a "check if would work" followed by a "do it" that requires the code to also add explicit locking around the calls which turns the code into a mess. In that vein, the push functions which take sequences are great, because although they could just be an extension to a bunch of single element pushes, putting them in the API lets a concurrent implementation correctly get them all adjacent.

The optional results on pop handle the empty buffer well, for the "push on full" case I'd like a throwing version. That's easy to add with an extension, but would be better in the protocol because that future concurrent safe version I imagined would not be safe with an extension that does a "check for space then push" type of logic, it would be a race.

kitaisreal · March 14, 2020, 5:39pm

Hi, thanks for review.
Related to names we only need to provide method for inserting element to the beginning because RangeReplaceableCollection does not provide such method, so pushFront was introduced. Related to other pushBack war introduced for cases when we need to override old data when buffer is full. @Lantua already mentioned that this case is almost never exists, and it will be better to just introduce prepend for adding elements to beginning. In such case we would make interface much easier without confusing client with method that can increase capacity and other that will overwrite old data. I also waiting for more feedback related to this point, because maybe there are cases where such behaviour can be usefull.

Maybe I didn't get that question but related to conform RandomAccessCollection, this methods:

func popFirst() -> Self.Element?
func removeFirst() -> Self.Element
func removeFirst(Int)
func popLast() -> Self.Element?
func removeLast() -> Self.Element
func removeLast(Int)

Have default implementations only when Collection.Subsequence = Self but default implementation is not performance optimal when we are implementing for example Array or CircularBuffer
swift/Collection.swift at main · apple/swift · GitHub. For this methods implementations are provided by CircularBuffer to be most efficient.

Related to popFirst and removeFirst methods. Documentation for standard library removeFirst says that The collection must not be empty. it does not specify what will be if you execute it with empty collection, there are additional method to get first elements safe popFirst.

I am not sure that I correctly understand points related to ConcurrentCircularBuffer version. As I understand we can make ConcurrentSafeCircularBuffer on top of this structure using some synchronization primitives Locks, ConditionalVariables, etc. without making spefic changes CircularBuffer api.

johannesweiss · March 15, 2020, 11:42am

Just to mention that SwiftNIO contains a CircularBuffer data structure too (API docs here).

SwiftNIO's has the following properties:

self-resizing (ie. never overrides existing elements)
opaque indices
self-slicing (ie. a slice of CircularBuffer<Element> is again a CircularBuffer<Element>)
implements Collection, MutableCollection, RandomAccessCollection, RangeReplaceableCollection
needless to say but a value type with copy-on-write

Lantua · March 15, 2020, 1:32pm

Also about the index.

Unless you always start with 0 (except for subsequence), and always increment by 1, you shouldn’t use Int. People easily assume that Int index has that behavior, despite it never being documented elsewhere.

Still, given how easy you can shift all indices by removing first elements, maybe avoid Int still.

Joseph_Heck · March 15, 2020, 2:50pm

I recently noticed/found it - and have been using it locally. It had a few surprising bits to it - extracting it (to use directly in my own project) from NIO required dealing with it's internal usage of Int24. I'm assuming that's part of it's optimizations specifically for NIO, but I've really enjoyed using it. (Admittedly, I'm using only a tiny, tiny portion of it's capabilities).

It has absolutely lovely tests as well, although I had to comment a number of them out (I pulled them into my project as well) since they rely on NIO specific helpers that I hadn't tracked down and pulled into my project as well.

johannesweiss · March 15, 2020, 3:24pm

Ha, that makes sense. You can actually entirely remove everthing that deals with the _UInt24. So you can remove the _backingCheck variable and delete all lines that touch it.

Let me quickly explain to you why it's there: Our (opaque) index type uses 40 bits in total meaning that on 64 bit platforms, we've got 24 spare bits at the moment. Given that one of the most frequent misuses of Swift collections I see is index re-use after adding/removing elements, I thought maybe we can use those 24 bits for something useful. Given that NIO already contained an (internal) _UInt24 type, I just added one of those to the (opaque) index and where we store the capacity of the buffer at the time when that index was valid if that fits in 24 bits. If there are more elements than a 24 bit integer can hold, we just set it to _UInt24.max which is a special value meaning "no checks". Then in debug mode, whenever an index is used, we assert that the capacity stored in the index is still the capacity of the buffer (or set to _UInt24.max which means no checking) . That means we can catch some indexing violations in debug builds.

anandabits · March 15, 2020, 3:30pm

This is great!

Joseph_Heck · March 15, 2020, 4:58pm

That's super cool. I did exactly that when I pulled it into my project.

I've got the say, I'm very tempted to go hunt down your test helpers for validating async/concurrent usage though - they looked interesting (at least from how you use them in the test cases)!

Pampel · March 16, 2020, 11:05am

Sorry if I'm misunderstanding, but there are plenty of cases where you want a CircularBuffer to overwrite it's data - lots of audio DSP algorithms for example make use of circular buffers - for example, a tape delay simulation with feedback, or lots of 'glitching' effects - and need to be able to overwrite old values and write around 'past' the read location.

There's also obviously plenty of cases where this is an error, perhaps we need to handle both cases?

Lantua · March 16, 2020, 12:20pm

I’m most certain that those cases should still invalidate the old data first (which is a matter of changing a few variables). And if you need fine control over where in the buffer to start the data, standard library solution might not be what you want to use.

drewster99 · March 16, 2020, 1:51pm

I don’t understand why you would want a ring buffer without the ability to overwrite the existing data. I’m sure there’s a use case for not overwriting, but every circular buffer I’ve ever implemented has always needed the ability to overwrite the existing data — audio processing is a real obvious case here.

For this to be useful, overwriting would definitely need to be a feature, and it would also need to be highly performant in that use case.

wiresoft · March 16, 2020, 1:55pm

I’m also curious what is the use-case for an automatically growing ring buffer. Isn’t that just an array? Every application where I’ve needed a ring buffer was specifically to store a fixed count of items and be highly performant in the case where old data is overwritten with new data.

johannesweiss · March 16, 2020, 1:58pm

It has O(1) removeFirst, that's really it .

Lantua · March 16, 2020, 2:43pm

Well yes, one of the main point of circular buffer is to overwrite old data after discarding them.

The audio processing is a really good example here. You overwrite the old data after ensuring that they are no longer used by the consumer, either via a proper synchronisation between producer and consumer, or just the sheer buffer size. One common bug is also to assume such synchronisation when there is no guarantee in place. You may expect the call pattern PCPCPC because you set the callback frequency to be the same on both. But if you sometimes get PPCCPCPC, you can potentially lost data. The effect on data lost may vary, from slight audio glitch up to consistent artefact or even poorly encoded audio file.

Still, even if you guarantee that data is overwritten properly, whenever that happens, you still need to advance the startIndex, and the endIndex. I don't see the reason you want to do this

buffer.appendAndOverride(...)

rather than this

buffer.removeFirst(...)
buffer.append(...)

especially when the latter already has a clear semantic within the language. Potentially there could be some overhead for bound checking, but I don't think it'd pose too much of a problem, even assuming compiler can't optimise across both calls.

cukr · March 16, 2020, 3:21pm

My intuition says that CircularBuffer overwrites elements when there is no more space, and Deque grows. Which one do we want to add to the stdlib? Maybe both?