Implementation of a DSL with inter-relationships

RapidZinc · December 30, 2023, 8:16pm

I'm making a directory app, where the data will only be updated occasionally, but the dataset is complex and full of inter-relationships.

To seed the data, I implemented a domain specific language, but I want to expand upon it. I want to both:

Easily collect model instances (in an array, for example).
Specify relationships between model instances.

After the initial creation of these model instances, they might be persisted using SwiftData, or similar. However, for now I'm interested in the initial data entry.

Initial implementation

Let's say I have builder structs Teacher and School, and several results builders, that I use like this:

@ModelsBuilder
func seedData() -> [ModelBuilder] {

    Teacher("John", "Smith")
        .contacts {
            Address("1 The Road", "London")
            Tel("020 1234 5678")
            Email("johnsmith@school.com")
        }
        .payCategory(.newlyQualified)

    Teacher("Shirley Anne", "Waters")
        .id("shirleywaters")
        .payCategory(.headteacher)

    School("Trinity")
        .staff {
            Role(department: "administration", id: "shirleywaters") <--- String IDs used here!
            Role(department: "science", id: "johnsmith") <--- String IDs used here!
        }
}

This is a contrived example, but analogous to my current implementation. The idea is that the data entry is readable but succinct.

Processing the builders proceeds as follows:

The builders are collected in an array by the @ModelsBuilder results builder.
The array of builders is iterated to create the model instances.
Each model instance either has a string ID specified ("shirleywaters" in the example above), or infers a string ID ("johnsmith" above).
A dictionary is used to map string IDs to model instances.
The array of builders is iterated again to resolve string ID references.
For each string ID reference, the dictionary is used to find the target instance and the relationship is created.

This is all particularly error prone, because of the size and complexity of the data.

Option 1

I'm thinking of leveraging Swift's type system to refer to model instances at compile-time, to avoid runtime errors with missing or incorrect string IDs. Maybe to allow a syntax something like this:

    School("Trinity")
        .staff {
            Role(department: DeptAdministration.self, id: ShirleyWaters.self)
            Role(department: DeptScience.self, id: JohnSmith.self)
        }

I wondered about adopting the model used by SwiftUI for my builders and using macros to avoid boilerplate. Maybe something like this (if it's even possible):

    @Teacher("John", "Smith")
        .contacts {
            Address("1 The Road", "London")
            Tel("020 1234 5678")
            Email("johnsmith@school.com")
        }
        .payCategory(.newlyQualified)

Expanding to:

struct JohnSmith : Teacher {

    init() {
        super.init("John", "Smith")
    }

    var model: some ModelBuilder {
        self
            .contacts {
                Address("1 The Road", "London")
                Tel("020 1234 5678")
                Email("johnsmith@school.com")
            }
            .payCategory(.newlyQualified)
    }
}

So every builder instance would become a separate type, allowing compile-time checking of relationships. Circular references (and there are many in the real dataset) would be handled, too.

Problems:

I'd need a new way to collect the builders, as these could not be defined in a results builder. I definitely wouldn't want to curate a list of builder types manually, because that would create a new source of errors (models missing from the list could be referenced at compile-time, but wouldn't actually be created at runtime).
I don't know if Swift has a practical limit for the number of types defined... this could potentially run to approximately 20,000 types. That might brick the compiler.
Maybe it harms readability to use Swift macros to create underlying code so different from the code as written.

Questions:

Is there a way to use reflection to iterate over all implementations of a protocol (of Teacher say)? That would avoid the need for a results builder to collect model instances.
Alternatively, could the macro create some peer definition or statement that would append that type to a global list, or allow reflection or similar?

Option 2

I wondered about the possibility of a "ModelID" macro, just for those model instances that will be referenced by other instances. Maybe to allow something like this:

@ModelsBuilder
func seedData() -> [Model] {

    @ModelID
    Teacher("John", "Smith")
        .contacts {
            Address("1 The Road", "London")
            Tel("020 1234 5678")
            Email("johnsmith@school.com")
        }
        .payCategory(.newlyQualified)

    @ModelID("ShirleyWaters")
    Teacher("Shirley", "Waters")
        .payCategory(.headteacher)

    ...
}

Maybe the macro would create a helper type that could be used in references, something like this:

struct JohnSmith : ModelReference {
    let id = "johnsmith"
}

Problems:

I don't think it's possible for a macro here to create a type that would be visible globally.

Other options

Maybe I'm going about this in the wrong way? I could persist the data in JSON, I suppose, and write a parser, but that seems verbose and error-prone. I like the idea of leveraging the compiler to maximise compile-time data input. I'm the one creating the dataset, and it's already very labour intensive. Are there other ways of seeding complex data?

This question is too long and probably too vague, for which I apologise. I'm not sure my contrived example does justice to the problem.

I wondered if the named references in the RegexBuilder DSL provides an analogous solution, but I can't work out how it helps me.

Any thoughts or suggestions very gratefully received. Thank you.

tera · December 31, 2023, 3:39am

Who will perform the "initial data entry", yourself? Other people? Will they have to use Xcode?

I'd just go with JSON to be honest.

rayx · December 31, 2023, 3:26pm

IIUIC you are asking how to use macro inside a result builder to help set up reference in a neat way. I don't have practical experience with result builder, but I think you are in wrong direction. First, attached macro should be used with declaration, which means you can't use it in result builder. Second and more importantly, setting up reference usually requires having access to all available data, which conflict with the way how result buider works. For example, what would you do if result builder processes a School value which references a Teacher's id but that teacher hasn't been processed yet?

So, why not just use result builder to save input values first and set up reference after that? You can apply macro to structs (or classes) to help generate CRUD methods (e.g. delete cascade, etc.). BTW, if you use structs, I don't think you need to do much about "setting up reference"; if you use classes, you'll need to set up pointers to construct an object graph, which is tedious and macro can help to generate boilerplate code.

RapidZinc · January 2, 2024, 10:58am

Thank you so much for the reply. The data entry will mostly be by me, at least at first. I already have builder structs and result builders. They are great for succinct, readable data and for code completion in Xcode. Although it was a lot of work to get to that stage, it has probably saved time (and errors) overall. The key question, I suppose, was whether there is a clever way to allow cross-referencing, again leveraging code completion and compile-time errors. It looks like that isn't easy to accomplish. Thanks again.

RapidZinc · January 2, 2024, 11:07am

Thanks @rayx, I think you're right that macros don't really help me to achieve what I hoped.

I used the term "reference" without enough care... you are right that actual run-time relationships would need to be established after the creation of the individual model instances. In the initial data entry, I was really looking for an alternative to using string IDs for cross-referencing, because of the number of errors I introduce that didn't become apparent until run-time. I had hoped to use Xcode code completion and compile-time type checking to avoid some errors in the string IDs, hence the thought about dummy objects. Whatever the solution, these might well be resolved back to string IDs ultimately, and then used to create the actual references. It was just to avoid errors in data entry. As you say, I think I'm looking in the wrong direction.

Thanks again for looking at the question and for your helpful reply.

rayx · January 2, 2024, 11:54am

I see what you meant now. You're looking for a way to guarantee data consistenntcy in term of reference. And you want the check is performed at compile time rather than runtime. I don't think there is such a way. You have to implement it yourself, which means its performed at runtime. Have you considered using SQL database? If you are looking for a foreign key like feature, I'd suggest you to take a look at typed ID. See a tutorial here. Note it can only help to catch type errors. If your mistake teacher A's id for teacher B in the data, the only way to catch it is perhaps testing.

ptomaselli · January 4, 2024, 7:13pm

I didn’t read the original question super closely, but I have run into this general problem before.

I agree with others in this thread that you should start with a simpler approach. One pattern that exists is to basically think of what you are doing as being done in “two passes” — one step which is a relatively straightforward result builder where a non-cyclic data structure is built. And then another step where the cycles or references in that structure are resolved (and where perhaps that original data structure is converted into a final representation).

If you squint this is kind of what similar APIs such as SwiftUI and SwiftData are already doing — the result builder builds some convenient initial representation of the data, but in order to use the thing, it needs to be turned into a final representation (which can even use classes, thus avoiding the need for you to re-invent some concept of references for structs).

The step that converts from initial to final can be throwing, return an optional, etc. and this becomes your “validation” step where you make sure all your references resolve, etc.

li3zhen1 · January 4, 2024, 9:52pm

I think it's possible to check ID with macro. You could use a macro that only do syntactic diagnosis and does not expand to anything. The outer attached macro @ SomeMacroThatPerfomsIDChecks can visit all the syntax node inside the result builder. For the convenience, you can create an inner macro, say, #model and expand it to and raise diagnostics for incorrect ID after visiting all.

But I think it is generally not a good idea to use macro for checking ID. Introducing macro packages will significantly slow down your build speed (especially for release build).

@SomeMacroThatPerfomsIDChecks
struct SomeModel {

    @ModelBuilder
    var body: [Model] {

        #model("John") {
             ...
        }

        #model("ShirleyWaters")
        ...
    }
}

tera · January 4, 2024, 10:39pm

Maybe this?

enum Department { case administration, science }
protocol Person {}
struct Teacher: Person {
    let name: String
    let surname: String
}
struct Role {
    let department: Department
    let person: Person
}
struct School {
    let name: String
    let staff: [Role]
}

let shirleyWaters = Teacher(name: "Shirley Anne", surname: "Waters")
let johnSmith = Teacher(name: "John", surname: "Smith")

let trinitySchool = School(name: "Trinity", staff: [
    Role(department: .administration, person: shirleyWaters),
    Role(department: .science, person: johnSmith)
])

That's if you need to use the teacher "handle" elsewhere, otherwise just have it defined in-place:

let trinitySchool = School(name: "Trinity", staff: [
    Role(
        department: .administration,
        person: Teacher(name: "Shirley Anne", surname: "Waters")
    ),
    ...
])

Or the same but using a class hierarchy (Person being a base class, Teacher a subclass).

rayx · January 5, 2024, 10:12am

I like @tera's approach. I use the same approach in my app too. The idea is to use proper data structures to avoid explicit reference. However, it doesn't work in all sceanrios. For example, how would you represent a transfer bwteeen two bank accounts using this approach? I don't think it's possible.

tera · January 5, 2024, 12:23pm

Then I'd use either id's:

struct AccountNumber {
    let value: String
    init(_ value: String) { self.value = value }
}
struct Account {
    var name: String
    let no: AccountNumber
}
struct Transfer {
    let from: AccountNumber
    let to: AccountNumber
}
var currentAccount = Account(name: "Current", no: .init("1234"))
var savingsAccount = Account(name: "Savings", no: .init("5678"))
let transfer = Transfer(from: currentAccount.no, to: savingsAccount.no)

or references:

struct AccountNumber {
    let value: String
    init(_ value: String) { self.value = value }
}
class Account {
    var name: String
    let no: AccountNumber
    init(name: String, no: AccountNumber) { self.name = name; self.no = no }
}
class Transfer {
    let from: Account
    let to: Account
    init(from: Account, to: Account) { self.from = from; self.to = to }
}
let currentAccount = Account(name: "Current", no: .init("1234"))
let savingsAccount = Account(name: "Savings", no: .init("5678"))
let transfer = Transfer(from: currentAccount, to: savingsAccount)