[Discussion] NIORedis: NIO-based Redis Driver

Helge_Hess1 · April 4, 2019, 12:10pm

I'm not convinced about the need for this one, it should not have different API because the semantics of the the calls are not different (they do not form a transaction of any sort!).
The sole thing required is a control point for "auto flush". As mentioned, I'd rather imagine this:

// singular:
client.get("a") -> promise // autoflush
// plural:
client.withPipelining {
  client.get("a") -> promise
  client.get("b") -> promise
  // combine the promises if you want using `cascadeAll`
} // flushed at this point

Same API. Kinda similar to your autoread flag (or whatever it is called) in the channel. Again, the sole difference is when flush is called.

BTW: Connection handling also ties into this - if this is supposed to be an API used by an end user. Any call might be queued and resent. (I think I already do this in my client object, you can essentially queue requests independent of the connection state)

Helge_Hess1 · April 4, 2019, 12:15pm

RESPKeys have no such constraints. You can use an image as your key if you feel like it. They are "Data" objects ;-)

Don't be confused by the .simpleString/.bulkString protocol frames, they map to exactly the same data type and can be used interchangably.

johannesweiss · April 4, 2019, 1:36pm

EDIT: Sorry everybody, whilst I totally stand by the points made in this message I don't think it adds much to the API discussion at all, so you might just skip reading it . The core message here is: First, we should design the best API, then we should look how to implement this in the most efficient way. I believe that NIO is flexible enough that any reasonable API can be implemented in the most flexible way, of course using things like Redis pipelining without surfacing the word/concept of 'pipelining' to the user.

Is it? If you don't set TCP_NODELAY, write; flush; write; flush if followed very quickly after each other will still end up in one packet (if it fits). If you don't disable Nagle's Algorithm by setting TCP_NODELAY the kernel will try to fit as much as possible into one packet. With NIO, you can disable TCP_NODELAY and more-or-less take control of the packets that get sent out. There's never a guarantee, TCP is a stream and not a message/datagram based protocol.

What you're missing is that even if you send it out in 3 packets very quickly followed by each other, Redis might still see this in one read. Redis (or any other TCP software) cannot reliably distinguish the number of write/send/... calls or the number of flush calls made with NIO, TCP is not framed. Making one write doesn't mean mean it comes out in one packet, making two writes doesn't mean it comes out in two packets (may be 1, may be 2, may be more). And on the receiving end, receiving something in one packet doesn't necessarily mean it comes out in one read and receiving something as two packets doesn't necessarily mean it doesn't come out in just one read.

Doesn't my example have a flush in between the write and the waitForResponse? I apologise if I missed it, yes, in NIO you need to flush for any data to be sent.

I understand what Redis is. And I understand why it's beneficial to make use of pipelining because it can reduce the amount of work Redis needs to do. But even write; flush; write; flush; write; flush in very quick succession is very likely to make use of Redis' pipelining support. Without TCP_NODELAY (which is the default) it's pretty much guaranteed, with TCP_NODELAY I'd still think it's likely. And again, I'm not arguing to not do the more efficient write; write; write; flush, I'm just saying this is not about pipelining being used or not. As I understand the main point of @Mordil's 'pipeline' API is to batch the results up in an array. But @Mordil will be able to tell us more.

Regarding pipelining more generally, check this image, pipelining means sending a subsequent request before receiving the previous response as outlined before.

Helge_Hess1:

connection.batched([.get("key"), .get("other-key"), .set("foo", "bar")])
I'm not convinced about the need for this one, it should not have different API because the semantics of the the calls are not different (they do not form a transaction of any sort!).

I'm 100% happy if we don't need it.

Helge_Hess1:

// singular:
client.get("a") -> promise // autoflush
// plural:
client.withPipelining {
  client.get("a") -> promise
  client.get("b") -> promise
  // combine the promises if you want using `cascadeAll`
} // flushed at this point

Same API.

Again, I don't think withPipelining is a great name because it implies that not using withPipelining does never pipeline (which isn't true) but otherwise, sure we can have such an API. But I still believe that the main point of the original RedisPipeline API is the returned array and not some potential efficiency gains. Sure, you can manually create that array (eg. using whenAllComplete) but it's less convenient for users that just getting the array in the first place. @Mordil will be able to tell us if it's about the returned array or some potential efficiency gains.

If it was just about when to flush you could also just add an optional autoFlush: Bool = true parameter to each command and write:

client.get("foo1", autoFlush: false) -> future
client.get("foo2", autoFlush: false) -> future
client.get("foo3", autoFlush: false) -> future
client.get("foo4") -> future

but I really don't think it's about the flushes .

Good point, connection pooling is interesting for all sorts of client APIs: Redis, MySQL, Postgres, HTTP, ...

Awesome, then RESPKey should just be backed by a ByteBuffer but for convenience still be ExpressibleByString*.

Mordil · April 4, 2019, 4:36pm

The problems RedisPipeline (the type) was intended to solve was two fold:

Sending a series of commands to Redis
Reducing the overhead of an intended batch of commands as much as possible
a. This solution ideally would be easily extensible to later have a public facing API for Redis' "transactions" by inserting the necessary commands before and after the user's.

I am leaning more towards the following:

class RedisConnection {
    func send(command: String, with: [RESPValueConvertible]) -> EventLoopFuture<RESPValue>
    func send(batch: (RedisClient) -> Void) -> EventLoopFuture<[RESPValue]>
}

// `RedisPipeline` would be deleted

This would remove the need for 3 types (RedisPipeline and the proposed RESPKey and RESPCommand), solve some of the concerns raised about the enqueuing syntax, and still supports the 2 1/2 points I listed above.

All that, while still not exposing much of the NIO implementation details such as "auto flush", ByteBuffer to execute commands, or other conversions.

tomerd · April 4, 2019, 6:24pm

fwiw, one thing we did when implementing redis clients in the past (not in swift) was to create a basic api that is "command" based, then generate command specific wrappers (eg get, set, inc, etc) since they can be generated using metadata. we arrived at this design because there are 300+ commands

johannesweiss · April 5, 2019, 9:07am

Okay, this sounds great. I would even go further and say: Let's design the best API first, only introducing the concepts a user knows they want to use. I believe that NIO is flexible enough so that we can turn something that the user knows into the most efficient way of sending the bytes (for example leveraging pipelining, without the user needing to specify this).

These "transactions" and batching are great examples of something that should be part of the API: The user knows if they want transactions (if they want 'isolation') and the a user knows when they want to batch requests (if they'd like to execute a few things and get the results back in one array).

Mordil:

I am leaning more towards the following:
class RedisConnection {
    func send(command: String, with: [RESPValueConvertible]) -> EventLoopFuture<RESPValue>
    func send(batch: (RedisClient) -> Void) -> EventLoopFuture<[RESPValue]>
}

// `RedisPipeline` would be deleted
This would remove the need for 3 types ( RedisPipeline and the proposed RESPKey and RESPCommand ), solve some of the concerns raised about the enqueuing syntax, and still supports the 2 1/2 points I listed above.

Cool! I do however have a question how do use the batched API. If I understand correctly, the idea is that I do

let connection = ...
// singular API
let getResult = connection.send(command: "GET", ["foo"])
// batched API
connection.send { client in
    client.send(command: "GET", ["bar"]) /* what happens with the result ? */
    client.send(command: "SET", ["foo", "bar"])
    client.send(command: "GET", ["foo"])
}

Whilst that looks great API-wise I don't quite understand how the API works. In your first post, you show that the type of RedisClient.send is:

public protocol RedisClient {
    func send(command: String, with arguments: [RESPValueConvertible]) -> EventLoopFuture<RESPValue>
}

ie. each send operation returns an EventLoopFuture<RESPValue> which makes a lot of sense. What I don't get is how that is fitting within your batched API. How are you collecting all the results of the individual send calls that you make in the closure? They all return a future of their values and you need to 'collect' them. Did you maybe mean to write

class RedisConnection {
    /* [...] */
    func send(batch: (RedisClient) -> [EventLoopFuture<RESPValue>]) -> EventLoopFuture<[RESPValue]>
    //                                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    //                                make the user return the RESPValues
}

? If yes, then do you want this API to be used this way?

let connection = ...
// batched API
let arrayOfResults: EventLoopFuture<[RESPValue]> = connection.send { client -> [EventLoopFuture<RESPValue>] in
    return [client.send(command: "GET", ["bar"]),
            client.send(command: "SET", ["foo", "bar"]),
            client.send(command: "GET", ["foo"])]
}

Please note, I'm not suggesting to change anything, I'm just trying to understand how this was intended.

And quick secondary question: What's the difference between RedisClient and RedisConnection, are they the same?

That sounds like a very reasonable plan to design to me given that it's 300+ commands

Mordil · April 5, 2019, 4:54pm

Shoot, you caught a great mistake I made.

Below is an example that I'd be OK with:

class RedisConnection {
    func sendBatch(_ commands: (RedisClient) -> [EventLoopFuture<RESPValueConvertible>] -> EventLoopFuture<[RESPValueConvertible]>
    //                          ^^^^^^^^^^^
    //                       It could be argued that this should be `RedisConnection`,
    //                       since `RedisClient` has a weaker use case now,
    //                       and this isn't a protocol requirement
}

let connection = ...
// batched API
let results = connection.sendBatch { client in
    let first = client.sadd([1, 2, 3, 4, 5], to: "first")
    let second = client.sadd([3, 4, 5, 6], to: "second")
    return [first, second, client.sdiffstore(as: "third", sources: ["first", "second"])]
}
...
let first = results[0] as? Int
let second = results[1] as? Int
let third = results[2] as? Int

This will technically make it possible to chain futures to do their own transformations on data to return anything that is RESPValueConvertible, actually giving more freedom to the user without hidden implementation details, but it also makes it easier to work with the results.

RedisClient is a base protocol for sending commands through Redis, that should be used for generics and other function signatures as a form of type erasure in case you might be passed some other delegate object or implementation.

RedisConnection is just an "out of the box" implementation of RedisClient with other capabilities.

Originally, it was to allow RedisPipeline to be implemented as it has been, but now that it is slated for removal, there's less of a use case for the RedisClient defined this low level. If I was to split the current module into two, as NozeIO does, these types would be in the very base package:

RESPEncoder
RESPDecoder
RedisCommandHandler
- RedisCommandContext
RESPValue
RESPValueConvertible
RedisError
RedisClient

then the "out of the box" implementation module would contain:

RedisConnection

Mordil · April 12, 2019, 1:06am

Hi All, it's been a few days and I thought I should give a status update:

Migrating RESPValue .bulkString and .simpleString to use ByteBuffers has been completed: #34
- Also with this, I took the time to re-evaluate implementations of the RESPEncoder and RESPDecoder
I've started work on re-implementing the feature of sending batch commands
These next two weeks are going to be busy for me personally, as for one week I'll be on vacation.

Barring further feedback, which is 100% welcomed, my goal is to have NIORedis ready for formal review by May - with the SSWG voting on it May 16th.

Mordil · April 29, 2019, 10:42pm

Post-vacation update:

I've re-evaluated the implementation of RedisPipeline for the purpose of controlling how frequently flushes were done in addition to batching up commands to receive all in a single EventLoopFuture.

This is the direction I have decided to head towards (PR #36):

In my opinion, the feature to batch up commands into a single EventLoopFuture is a great one - but not a necessary one for this low level of a driver.
A higher level library such as RedisKit could attempt to implement this, or it can be re-implemented at a later date if deemed necessary to be at this level.
- To this effect, RedisPipeline has been deleted
A new sendCommandsImmediately property has been added to RedisConnection to control the “autoflush” nature of send(command:with:)
- When it is set to false, write will be the method called, with the timing of flushes handled by the host machine / NIO.
- When set to true, writeAndFlush will be called instead, and setting this will trigger a channel.flush().

As always, I'm open to discussing to this decision.

Implement feedback is welcome through GitHub.

johannesweiss · May 28, 2019, 11:01am

To address this more wider ranging namespacing problem (that we might or might not try to solve now) I created a separate thread.

The only relevant bit to this proposal is that the NIORedis module name is already in use by a package that has been tagged since at least a year ago. Given that Swift does not support two modules with the same name in a given binary, I strongly believe that we have to re-name this module to something else. Sure, module clashes will happen anyway but the SSWG must do everything it can to prevent module clashes from happening wherever possible. Btw, I just pitched a tightening of the uniqueness rules for SSWG packages.

Do I have a good name-proposal? I know that NIORedis and RedisClient are already in use, so maybe RedisClientNIO or AsyncRedisClient?

Btw, this Github advanced search to work quite well to search for modules:

https://github.com/search?l=&q=NAME_OF_THE_MODULE+filename%3APackage.swift+language%3ASwift&type=Code

Which will search only the 'code' of Package.swift files for a mention of NAME_OF_THE_MODULE. For the repo name nio-redis seems to still be available, so we could leave it like that or make it nio-redis-client?

tomerd · June 24, 2019, 4:45pm

feedback thread: [Feedback] RedisNIO: A NIO-Based Redis Driver