[Discussion] NIORedis: NIO-based Redis Driver

Alright, I've seen the reasoning for changing from [UInt8] for storage to ByteBuffer. I've written issue #31 to track this

2 Likes

As chatted about in the Vapor Discord, I'd like to add some feedback for RedisPipeline. So if I understand correctly, the main task of RedisPipeline is batching up the commands so you can get back an array of responses. So whilst the implementation of this feature might make use of pipelining, that's not what the user is after when using the API. A user really wants to send a bunch of commands and get the responses back as an array. Therefore, if we leave everything else as is, I think we should rename this RedisCommandBatcher or something.

RedisPipeline is also confusing in other ways: First, the type doesn't create a pipeline but may use pipelin ing. The more important thing is that a user might assume that the library doesn't make use of pipelining if the user doesn't use the RedisPipeline type but that's not the case, if the user enqueues two commands with a single flush, it will still be pipelined, the only difference is that their results don't come back as an array.

Quick question regarding this: I really like the above way of having a declarative way for a value. Is there are reason we shouldn't also have a declarative way for the command themselves? Something like

public enum RESPCommand {
    case get(RESPValueConvertible)
    case increment(RESPValueConvertible)
    case set(RESPValueConvertible, RESPValueConvertible)
    // ...
}

That way, batching of requests can become more natural and we don't need

public func enqueue<T>(operation: (RedisClient) -> EventLoopFuture<T>) -> RedisPipeline

where we have this weird operation: (RedisClient) -> EventLoopFuture<T> which I commented about before. Correct me if I'm wrong but I believe the T is just there because we need support all of the singular operations like $0.get(...) which already return an EventLoopFuture but of different types... If we created an enum RESPCommand we could have

connection.sendBatch([.get("foo"), .set("bar", 1), .increment("buz")])

and sendBatch would just take [RESPCommand].

1 Like

I've thought about this too - but held off until now because I didn't see a huge use case for it.

This discussion has shown that use case. I'll write an issue for this and update this comment with a link to it for tracking.

EDIT: NIORedis#33

1 Like

That is IMO not the idea of the Redis pipeline. The Redis pipeline kinda directly maps to Channel.write vs Channel.flush (i.e. it is indeed the same like HTTP/1.1 pipelining).

I could imagine something like this:

connection.withAutoflushDisabled {
  // regular client calls
} // explicitly flushes at the end

This way NIO can decide if it wants to flush earlier (send buffer full), or we do it explicit once we wrote the requests we want can bundle together in a TCP packet.

Combining the results (whether pipelined calls or not) can already be done by using a NIO promise (cascade or something? don't remember), right? That is completely unrelated to pipelining.

In the user facing API you don't want:

connection.do(.get("key")) // what about the result?

But just:

connection.get("key") // => promise or callback

The enum dupes quite a bit. Though the same is true for NIO HTTP :->

A Redis command call is just a regular RESP array where the first element is the command key.
For queuing commands I have this RedisCommand object which is a little like your enum. I guess one might potentially save an (array) alloc because your enum might directly encode the parameters for short commands. So maybe it is indeed a good idea despite all the extra code!

For queuing I have this thing: RedisCommandCall which bundles the command with the associated promise. /shrug

BTW: If the enum is done, this:

public enum RESPCommand {
    case get(RESPValueConvertible)
}

should probably be

public enum RESPCommand {
    case get(RESPStringConvertible) // ???
}

because the various commands do expect certain types, not just arbitrary RESP representable values.

P.S.: ... just ideas as usual.

No, pipelining doesn't map to write/flush. Even if you do (for brevity written as a synchronous program)

write(request1); flush(); write(request2); flush(); write(request3); flush()

from the perspective of Redis, you still pipelined the requests. The exact TCP packets may or may not (depending on the TCP_NODELAY setting) differ.

If you truly don't want pipelining the program would look like

write(request1); flush(); waitForResponse(); write(request2); flush(); waitForResponse(); write(request3); flush();

Which means in an asynchronous world, you would need to implement some extra stuff to hold onto the N+1st command until we have the Nth command's response.

So whether we use pipelining is not determined by when we flush but rather if we wait for a response before sending the next request.

Said that,

write(request1); write(request2); write(request3); flush()

is clearly more efficient than

write(request1); flush(); write(request2); flush(); write(request3); flush()

But that's a separate issue. The only thing I was saying is that the main purpose of RedisPipeline is not whether to pipeline or not but rather whether to batch. And if we have a batching mode we can also make sure we don't flush more often than necessary.

Why not both APIs? I would absolutely recommend users to use the latter one (connection.get("key")).

Indeed, in NIO's case I think that's fine because NIO is a low-level tool built for people who write networking libraries which then offer the syntactical sugar on top. For the Redis library I think we should offer the nicest possible API, ie. enums internally but not in the simple user-facing APIs. If however the enums solve a particular issue that users might face too, we might just expose them too in more advanced APIs. So maybe

// nice convenience API
connection.get("key") -> EventLoopFuture<RespValue>
// more complicated API, used to implement the simple API (maybe `internal` only?)
connection.singular(.get("key")) ->  EventLoopFuture<RespValue>
// the underlying API that implements all others, may be public or not
connection.batched([.get("key"), .get("other-key"), .set("foo", "bar")])

For the Redis batching API case however, the enum could work. Just because the user might want to batch multiple thing that don't have the same return type and with the enum we can return them in one array without losing any type-safety.

You are right. Using RESPValue as a key for GET is wrong because Redis won't accept all RESP values as keys. We should solve this with an explicit RESPKey type, maybe something like below.

The enum RESPCommand itself, I would keep as closely aligned to the wire-protocol as possible, as type-safe as possible. Therefore, below I suggest a RESPKey which can represent all RESP keys. It could be backed by either a String or a ByteBuffer, whatever's more correct in the protocol. For convenience (users want String literals we should implement the various ExpressibleBy* protocols). So RESPKey would feel pretty much like a String but we could do some extra validation (ie. no \r\ns in there)

public struct RESPKey: ExpressibleByStringLiteral, CustomStringConvertible, ExpressibleByStringInterpolation {
    // similar to https://github.com/apple/swift-log/blob/master/Sources/Logging/Logging.swift#L433-L448
    [...]
}

public enum RESPCommand {
    case get(RESPKey)
    case set(RESPKey, RESPValue)
}

// because some users might just want to query with a String
extension RESPCommand {
    public static func get(_ string: String) -> RESPCommand {
        // warning: needlessly inefficient implementation because RESPKey will probably offer
        // a constructor directly from String. Just to show-case 
        return .get("\(string)" as RESPKey)
    }
}

let connection = ...

// convenience API
let result1: EventLoopFuture<RESPValue> = connection.get("foo")

 // directly construct a RESPKey using ExpressibleByStringLiteral/ExpressibleByStringInterpolation
let result2: EventLoopFuture<RESPValue> = connection.singular(.get("foo"))

// via "public static func get(_ string: String) -> RESPCommand"
let explicitlyStringKey: String = "foo"
let result3: EventLoopFuture<RESPValue> = connection.singular(.get(explicitlyStringKey)) 

Nitpicking. This is interdependent. You have to flush, otherwise you would wait forever for a result. Sticking to your example:

write(request1); waitForResponse()

Won't do anything. The Redis document about pipelining above explicitly outlines that they do it "for the flush" :toilet: :smile: The whole point is bundling many small requests into one big packet, specifically not to accomplish interleaving because the backend has to do work (like in an HTTP pipeline). The backing in Redis is essentially instant (in memory hash table).

P.S.: That doesn't have much to do with the discussion, but Redis operations are atomic. By receiving many requests in a single batch, a lock only needs to be acquired once (in the case of Redi/S, Redis itself is single threaded).

I'm not convinced about the need for this one, it should not have different API because the semantics of the the calls are not different (they do not form a transaction of any sort!).
The sole thing required is a control point for "auto flush". As mentioned, I'd rather imagine this:

// singular:
client.get("a") -> promise // autoflush
// plural:
client.withPipelining {
  client.get("a") -> promise
  client.get("b") -> promise
  // combine the promises if you want using `cascadeAll`
} // flushed at this point

Same API. Kinda similar to your autoread flag (or whatever it is called) in the channel. Again, the sole difference is when flush is called.

BTW: Connection handling also ties into this - if this is supposed to be an API used by an end user. Any call might be queued and resent. (I think I already do this in my client object, you can essentially queue requests independent of the connection state)

RESPKeys have no such constraints. You can use an image as your key if you feel like it. They are "Data" objects ;-)

Don't be confused by the .simpleString/.bulkString protocol frames, they map to exactly the same data type and can be used interchangably.

EDIT: Sorry everybody, whilst I totally stand by the points made in this message I don't think it adds much to the API discussion at all, so you might just skip reading it :slight_smile:. The core message here is: First, we should design the best API, then we should look how to implement this in the most efficient way. I believe that NIO is flexible enough that any reasonable API can be implemented in the most flexible way, of course using things like Redis pipelining without surfacing the word/concept of 'pipelining' to the user.


Is it? If you don't set TCP_NODELAY, write; flush; write; flush if followed very quickly after each other will still end up in one packet (if it fits). If you don't disable Nagle's Algorithm by setting TCP_NODELAY the kernel will try to fit as much as possible into one packet. With NIO, you can disable TCP_NODELAY and more-or-less take control of the packets that get sent out. There's never a guarantee, TCP is a stream and not a message/datagram based protocol.

What you're missing is that even if you send it out in 3 packets very quickly followed by each other, Redis might still see this in one read. Redis (or any other TCP software) cannot reliably distinguish the number of write/send/... calls or the number of flush calls made with NIO, TCP is not framed. Making one write doesn't mean mean it comes out in one packet, making two writes doesn't mean it comes out in two packets (may be 1, may be 2, may be more). And on the receiving end, receiving something in one packet doesn't necessarily mean it comes out in one read and receiving something as two packets doesn't necessarily mean it doesn't come out in just one read.

Doesn't my example have a flush in between the write and the waitForResponse? I apologise if I missed it, yes, in NIO you need to flush for any data to be sent.

I understand what Redis is. And I understand why it's beneficial to make use of pipelining because it can reduce the amount of work Redis needs to do. But even write; flush; write; flush; write; flush in very quick succession is very likely to make use of Redis' pipelining support. Without TCP_NODELAY (which is the default) it's pretty much guaranteed, with TCP_NODELAY I'd still think it's likely. And again, I'm not arguing to not do the more efficient write; write; write; flush, I'm just saying this is not about pipelining being used or not. As I understand the main point of @Mordil's 'pipeline' API is to batch the results up in an array. But @Mordil will be able to tell us more.

Regarding pipelining more generally, check this image, pipelining means sending a subsequent request before receiving the previous response as outlined before.

I'm 100% happy if we don't need it.

Again, I don't think withPipelining is a great name because it implies that not using withPipelining does never pipeline (which isn't true) but otherwise, sure we can have such an API. But I still believe that the main point of the original RedisPipeline API is the returned array and not some potential efficiency gains. Sure, you can manually create that array (eg. using whenAllComplete) but it's less convenient for users that just getting the array in the first place. @Mordil will be able to tell us if it's about the returned array or some potential efficiency gains.

If it was just about when to flush you could also just add an optional autoFlush: Bool = true parameter to each command and write:

client.get("foo1", autoFlush: false) -> future
client.get("foo2", autoFlush: false) -> future
client.get("foo3", autoFlush: false) -> future
client.get("foo4") -> future

but I really don't think it's about the flushes :slight_smile:.

Good point, connection pooling is interesting for all sorts of client APIs: Redis, MySQL, Postgres, HTTP, ...

Awesome, then RESPKey should just be backed by a ByteBuffer but for convenience still be ExpressibleByString*.

The problems RedisPipeline (the type) was intended to solve was two fold:

  1. Sending a series of commands to Redis
  2. Reducing the overhead of an intended batch of commands as much as possible
    a. This solution ideally would be easily extensible to later have a public facing API for Redis' "transactions" by inserting the necessary commands before and after the user's.

I am leaning more towards the following:

class RedisConnection {
    func send(command: String, with: [RESPValueConvertible]) -> EventLoopFuture<RESPValue>
    func send(batch: (RedisClient) -> Void) -> EventLoopFuture<[RESPValue]>
}

// `RedisPipeline` would be deleted

This would remove the need for 3 types (RedisPipeline and the proposed RESPKey and RESPCommand), solve some of the concerns raised about the enqueuing syntax, and still supports the 2 1/2 points I listed above.

All that, while still not exposing much of the NIO implementation details such as "auto flush", ByteBuffer to execute commands, or other conversions.

fwiw, one thing we did when implementing redis clients in the past (not in swift) was to create a basic api that is "command" based, then generate command specific wrappers (eg get, set, inc, etc) since they can be generated using metadata. we arrived at this design because there are 300+ commands

1 Like

Okay, this sounds great. I would even go further and say: Let's design the best API first, only introducing the concepts a user knows they want to use. I believe that NIO is flexible enough so that we can turn something that the user knows into the most efficient way of sending the bytes (for example leveraging pipelining, without the user needing to specify this).

These "transactions" and batching are great examples of something that should be part of the API: The user knows if they want transactions (if they want 'isolation') and the a user knows when they want to batch requests (if they'd like to execute a few things and get the results back in one array).

Cool! I do however have a question how do use the batched API. If I understand correctly, the idea is that I do

let connection = ...
// singular API
let getResult = connection.send(command: "GET", ["foo"])
// batched API
connection.send { client in
    client.send(command: "GET", ["bar"]) /* what happens with the result ? */
    client.send(command: "SET", ["foo", "bar"])
    client.send(command: "GET", ["foo"])
}

Whilst that looks great API-wise I don't quite understand how the API works. In your first post, you show that the type of RedisClient.send is:

public protocol RedisClient {
    func send(command: String, with arguments: [RESPValueConvertible]) -> EventLoopFuture<RESPValue>
}

ie. each send operation returns an EventLoopFuture<RESPValue> which makes a lot of sense. What I don't get is how that is fitting within your batched API. How are you collecting all the results of the individual send calls that you make in the closure? They all return a future of their values and you need to 'collect' them. Did you maybe mean to write

class RedisConnection {
    /* [...] */
    func send(batch: (RedisClient) -> [EventLoopFuture<RESPValue>]) -> EventLoopFuture<[RESPValue]>
    //                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //                                make the user return the RESPValues
}

? If yes, then do you want this API to be used this way?

let connection = ...
// batched API
let arrayOfResults: EventLoopFuture<[RESPValue]> = connection.send { client -> [EventLoopFuture<RESPValue>] in
    return [client.send(command: "GET", ["bar"]),
            client.send(command: "SET", ["foo", "bar"]),
            client.send(command: "GET", ["foo"])]
}

Please note, I'm not suggesting to change anything, I'm just trying to understand how this was intended.

And quick secondary question: What's the difference between RedisClient and RedisConnection, are they the same?

That sounds like a very reasonable plan to design to me given that it's 300+ commands :flushed:

Shoot, you caught a great mistake I made.

Below is an example that I'd be OK with:

class RedisConnection {
    func sendBatch(_ commands: (RedisClient) -> [EventLoopFuture<RESPValueConvertible>] -> EventLoopFuture<[RESPValueConvertible]>
    //                          ^^^^^^^^^^^
    //                       It could be argued that this should be `RedisConnection`,
    //                       since `RedisClient` has a weaker use case now,
    //                       and this isn't a protocol requirement
}

let connection = ...
// batched API
let results = connection.sendBatch { client in
    let first = client.sadd([1, 2, 3, 4, 5], to: "first")
    let second = client.sadd([3, 4, 5, 6], to: "second")
    return [first, second, client.sdiffstore(as: "third", sources: ["first", "second"])]
}
...
let first = results[0] as? Int
let second = results[1] as? Int
let third = results[2] as? Int

This will technically make it possible to chain futures to do their own transformations on data to return anything that is RESPValueConvertible, actually giving more freedom to the user without hidden implementation details, but it also makes it easier to work with the results.

RedisClient is a base protocol for sending commands through Redis, that should be used for generics and other function signatures as a form of type erasure in case you might be passed some other delegate object or implementation.

RedisConnection is just an "out of the box" implementation of RedisClient with other capabilities.

Originally, it was to allow RedisPipeline to be implemented as it has been, but now that it is slated for removal, there's less of a use case for the RedisClient defined this low level. If I was to split the current module into two, as NozeIO does, these types would be in the very base package:

  • RESPEncoder
  • RESPDecoder
  • RedisCommandHandler
    • RedisCommandContext
  • RESPValue
  • RESPValueConvertible
  • RedisError
  • RedisClient

then the "out of the box" implementation module would contain:

  • RedisConnection

Hi All, it's been a few days and I thought I should give a status update:

  • Migrating RESPValue .bulkString and .simpleString to use ByteBuffers has been completed: #34
    • Also with this, I took the time to re-evaluate implementations of the RESPEncoder and RESPDecoder
  • I've started work on re-implementing the feature of sending batch commands
  • These next two weeks are going to be busy for me personally, as for one week I'll be on vacation.

Barring further feedback, which is 100% welcomed, my goal is to have NIORedis ready for formal review by May - with the SSWG voting on it May 16th.

3 Likes

Post-vacation update:

I've re-evaluated the implementation of RedisPipeline for the purpose of controlling how frequently flushes were done in addition to batching up commands to receive all in a single EventLoopFuture.

This is the direction I have decided to head towards (PR #36):

  1. In my opinion, the feature to batch up commands into a single EventLoopFuture is a great one - but not a necessary one for this low level of a driver.
    A higher level library such as RedisKit could attempt to implement this, or it can be re-implemented at a later date if deemed necessary to be at this level.
    • To this effect, RedisPipeline has been deleted
  2. A new sendCommandsImmediately property has been added to RedisConnection to control the “autoflush” nature of send(command:with:)
    • When it is set to false, write will be the method called, with the timing of flushes handled by the host machine / NIO.
    • When set to true, writeAndFlush will be called instead, and setting this will trigger a channel.flush().

As always, I'm open to discussing to this decision.

Implement feedback is welcome through GitHub.

1 Like

To address this more wider ranging namespacing problem (that we might or might not try to solve now) I created a separate thread.

The only relevant bit to this proposal is that the NIORedis module name is already in use by a package that has been tagged since at least a year ago. Given that Swift does not support two modules with the same name in a given binary, I strongly believe that we have to re-name this module to something else. Sure, module clashes will happen anyway but the SSWG must do everything it can to prevent module clashes from happening wherever possible. Btw, I just pitched a tightening of the uniqueness rules for SSWG packages.

Do I have a good name-proposal? I know that NIORedis and RedisClient are already in use, so maybe RedisClientNIO or AsyncRedisClient?

Btw, this Github advanced search to work quite well to search for modules:

https://github.com/search?l=&q=NAME_OF_THE_MODULE+filename%3APackage.swift+language%3ASwift&type=Code

Which will search only the 'code' of Package.swift files for a mention of NAME_OF_THE_MODULE. For the repo name nio-redis seems to still be available, so we could leave it like that or make it nio-redis-client?

feedback thread: [Feedback] RedisNIO: A NIO-Based Redis Driver