Using CTE's (WITH clause)

remcopoelstra · August 12, 2020, 2:25pm

Hi,

I'm trying to fill a structure of which one of the properties consists of data from multiple rows.
I've been able to manually build a SQL query that uses CTE's to group the required rows.

The problem now is, how do I translate that to GRDB? I can't find any reference to CTE's or the WITH clause in the documentation. I'm fine with directly using the SQL I've written if the query interface is not useable here, but I've no idea where to start.
My query is currently:

WITH 
    subPolygons AS (SELECT elementId, subRegion, subPolygon, json_group_array(json_array(x1,x2,x3,y1,y2,y3)) as subPolyData FROM polygonEdge GROUP by elementId, subRegion, subPolygon ORDER BY 'order'),
    polygons AS (SELECT elementId,subRegion, json_group_array(json(subPolyData)) as polyData FROM subPolygons GROUP BY elementId, subRegion)
SELECT elementId, json_group_array(json(polyData)) as regionData FROM polygons ;

The last SELECT should probably be converted to a CTE as well, to be usable in another query, but this allowed me to test it stand alone.

This should fill a structure like:

    struct ElementWithApertureAndRegionData: Decodable, FetchableRecord {
        let element: Element
        let apertureData: Data?
        let regionData: Data?
    }

The current query I use to fill the first two properties is:

        let apertureAlias=TableAlias()
        let request = Element
            .annotated(with: apertureAlias[Column("apertureData")])
            .joining(optional: Element.aperture.aliased(apertureAlias))
            .filter(keys: insertedElementIDs)
            .order(Element.Columns.polarityIndex.desc)
        let rows = try ElementWithApertureAndRegionData.fetchAll(db, request)

I think I need another TableAlias() as well as an .annotated(), but how do I fit in the CTE?

Thanks in advance.

Kind regards,

Remco Poelstra

gwendal.roue · August 12, 2020, 5:08pm

Hi! I'm having a few days off, so I let you or other readers elaborate around the wide topic of CTE (WITH clauses), and which kind of feature with a general purpose could be added to the query builder :-)

remcopoelstra · August 13, 2020, 6:27am

Hi,

I think you want to be able to do something like:

Row.with(sql: "SELECT ....").select().filter().fetchAll()
Row.with(Author.select().filter()).select().filter().fetchOne()

or maybe also:

let withAlias=TableAlias()
let request=Element.annotated(with: withAlias[Column("columnName")])
.joining(with: Element.select().filter().aliased(withAlias))
.filter().order()

This should also be possible (or something like this):

Row.with().with().with().select().fetch()

where the second with() fetches from the first with(), which is what I do in my case.
I don't have enough understanding of Swift's type system to formalise this. But it might give a starting point.

Remco

gwendal.roue · August 17, 2020, 6:31pm

I see what you mean.

Now I have a problem to solve, which is the ability for users to define their own composable requests api.

Let's put CTE on the side for now, and define what "composable requests api" means. An example will help:

// All lead players from French teams of 1st category ordered by name:
Player.all()
    .filter(role: .lead)
    .filter(teamCountry: "FR")
    .filter(teamCategory: .first)
    .orderedByName()

All those methods are application-defined apis:

App code

extension Player {
    static let team = belongsTo(Team.self)
}

extension DerivableRequest where RowDecoder == Player {
    func filter(role: Player.Role) -> Self {
        filter(Player.Columns.role == role)
    }
    
    func filter(teamCountry: String) -> Self {
        joining(required: Player.team.filter(country: teamCountry))
    }
    
    func filter(teamCategory: Team.Category) -> Self {
        joining(required: Player.team.filter(category: teamCategory))
    }
    
    func orderedByName() -> Self {
        order(Player.Columns.name.collating(.localizedCaseInsensitiveCompare))
    }
}

What makes those apis freely composable is the ability to "merge" common elements. See for example how the team table is mentionned twice in the Swift request, but only once in SQL:

// SELECT player.* FROM player
// JOIN team
//   ON team.id = player.teamId
//   AND team.coutry = 'FR'
//   AND team.category = 'first'
Player.all()
    .filter(teamCountry: "FR")
    .filter(teamCategory: .first)

This is no magic of course. All associations have a "key" (basically, a Swift string). Those keys serve two purposes:

Association keys allow the decoding of complex record types:

extension Player {
    // Key is "team" (the default)
    static let team = belongsTo(Team.self)
}

struct PlayerInfo: Decodable, FetchableRecord {
    var player: Player
    
    // The "team" property name matches the association key
    var team: Team 
}

// Yeah, it works
let infos /*: [PlayerInfo] */ = try Player
    .including(required: Player.team)
    .asRequest(of: PlayerInfo.self)
    .fetchAll(db)

Association keys allow request composition.

They make it possible to recognize that the team table should be joined only once at the SQL level:

// SELECT player.* FROM player
// JOIN team
//   ON team.id = player.teamId
//   AND team.coutry = 'FR'
//   AND team.category = 'first'
Player
    // Joins association with key "team"
    .joining(required: Player.team.filter(country: "FR"))
    // Joins association with key "team" again
    .joining(required: Player.team.filter(category: .first))

Note how "French teams" and "Teams of 1st category" are not equal relations. We really need something else in order to recognize a common identity. This identity is the association key.

Summary:

GRDB requests are composable.
What makes requests involving associations composable are their association keys.

Back to CTE. Should support for them eventually come to the query builder, I want requests to remain composable.

This means that it must be possible to write a Swift request that mentions a CTE several times, just like we did with associations above, and yet generate a single WITH clause.

Let's look at your sample code:

try Row
    .with(/* some CTE */)
    .filter(...)
    .select(...)
    .fetchOne(db)

It must be possible to use the same CTE several times:

try Row
    .with(/* some CTE */)
    .filter(...)
    .with(/* the same CTE */)
    .filter(...)
    .fetchOne(db)

The generated SQL would have a single WITH clause. The two filter expressions would be joined with a simple AND.

So, how do we recognize that two CTEs are the same?

Problem is that a CTE like Author.select(...).filter(...) has no identity. It has nothing like an association key. And GRDB requests do not conform to Equatable.

Requests are not Equatable

The reason is that requests are generally not defined with values, but with "value promises", which are functions that accept a database connection. Swift functions can not be compared.

For example, Player.orderByPrimaryKey() is a request that can't generate SQL until a database connection is available, the schema can be queried, and the primary key is known:

try dbQueue.read { db in
    // SELECT * FROM player ORDER BY id
    try Player.orderByPrimaryKey().fetchAll(db)
    // SELECT * FROM country ORDER BY code
    try Country.orderByPrimaryKey().fetchAll(db)
}

I don't think requests will become Equatable. But maybe we can define CTEs in a way similar to associations, as static properties of the record types that need them, in a way that makes them identifiable.

That's my suggested exploration track.

gwendal.roue · August 18, 2020, 11:57am

I'm no CTE expert myself. I see them as:

temporary views
counters
helpers for tree walking
other use cases. You seem to know the topic better than I do.

Among other use cases, I myself have played with CTEs a little bit when I last tried to let GRDB users define associations like "an author has one latest book", in order to build requests like "give me all authors with their latest book". I gave up due to technical difficulties, but I hope I'll be able to ship them once. Those are more frequently requested than support for CTEs, after all

Views and counters are not really tied to any particular table: my suggestion of having CTEs defined as static properties of some record types does not hold any water.

But I maintain the need for composability, hence some notion of CTE identity.

You may feel that there is a tension between your feature request and my answers. Maybe you just want to build the requests you need, wish you could just throw some snippets in, and have the query builder generate the SQL. A query builder is just a concatenation machine of SQL chunks, after all. Well, this has never been quite the case of GRDB requests. On the contrary, they know a great deal of the structure of the request, and this allows a lot of cool features that a blind concatenation of meaningless strings could just not achieve. This makes the implementation slightly more difficult. But in the end you have this API where you just throw stuff in, and the expected SQL comes out. Even in non trivial cases - that's the point.

gwendal.roue · August 18, 2020, 12:16pm

I also know that my goal of universal composability is... bounded to fail, due to SQL itself. For a critique of SQL on this topic, see for example We Can Do Better Than SQL | EdgeDB Blog

But this is one of the most interesting aspects of the GRDB project. Providing a statically sound request derivation system, backed by the database schema itself, and with great support for raw SQL, is a very satisfying experiment, once inspired by the ActiveRecord Ruby library :-)

remcopoelstra · August 18, 2020, 5:53pm

I'm not a CTE expert either :) I was just looking for an alternative to the ROLLUP feature of MySQL and realised that the rollup level was just three. Few enough to write out in separate queries by hand, for which CTE's provides nice views. So for me, CTE's are also just temporary views...

Can't CTE's not be made identifiable if there is a requirement to use them in conjunction with a TableAlias()? Maybe something along the lines of

let cte = TableAlias().with(Author.select().filter())

gwendal.roue · August 18, 2020, 6:13pm

Yes, a TableAlias is a reference type, and can totally be used as the identity for the CTE. And since the alias is mandatory anyway, in order to use the CTE in the main request, it sounds like a very natural solution :-) I can look at it some day. I'm back from vacations, and still not quite at 100% of my capabilities, so don't hold your breath. But it may reveal a quick win!

remcopoelstra · August 18, 2020, 6:24pm

I can hold my breath quite long, so I'll give it a try!