How to parse dates formatted like "2020-09-16 23:44:47 +0200" for use in RegexBuilders

I, tried to parse a date in the following format and failed miserably:

2020-09-16 23:44:47 +0200

I wanted to parse a file containing dates in this format using the new RegexBuilders.
I was not able to generate or parse this format using the swift formatters introduced last year. I really tried for two hours to find matching formatting options.

I had to fall back on DateFormatter to make it work. Is there a better way?

let RFC3339DateFormatter:DateFormatter = {
        let formatter = DateFormatter()
        formatter.locale = Locale(identifier: "en_US_POSIX")
        formatter.dateFormat = "yyyy-MM-dd HH:mm:ss ZZZZZ"

        return formatter
    }()

let regexp = Regex {
    ...
    Capture{
        ZeroOrMore(CharacterClass(.anyOf("+-: "),.digit))
        } transform: { RFC3339DateFormatter.date(from: String($0)) } 
    ...
}

The exact parser for these dates looks like this:

import RegexBuilder

let datePattern = Regex {
    Anchor.wordBoundary
    Repeat(.digit, count:4)
    "-"
    Repeat(.digit, count:2)
    "-"
    Repeat(.digit, count:2)
    " "
    Repeat(.digit, count:2)
    ":"
    Repeat(.digit, count:2)
    ":"
    Repeat(.digit, count:2)
    " "
    CharacterClass.anyOf("+-")
    Repeat(.digit, count:4)
    Anchor.wordBoundary
}
.asciiOnlyDigits()

But depending on your input file, it could be simplified.

Hi Pavel,

thank you for your code example.

Unfortunately my question was not very precise: I was looking for a solution in how to parse this kind of date format into a Date struct for using the corresponding swift formatter in a RegexBuilder, like shown in the WWDC Video "Swift Regex: Beyond the basics"

Oddly enough print(Date().now) generates exactly this format in the output, but I could not find corresponding options on Date.formatted(...) to generate or parse this kind of output.

The easiest would be

Capture {
    Date.ISO8601FormatStyle(dateTimeSeparator: .space)
}
2 Likes

Thank you again. Did you try it? Unfortunately this ignores the +2000 part and I was not able to finde the correct options to include it.

It works for me, yes, on your examples.

1 Like

Hi Pavel,

a BIG thank you for your patience!

You are right

Capture {
    Date.ISO8601FormatStyle(dateTimeSeparator: .space)
}

really works!

I let me be discouraged by the fact, that

print(Date().formatted(.iso8601.dateTimeSeparator(.space)))

does not print
2020-09-16 23:44:47 +0200
or even
2020-09-16 21:44:47 +0000
but
2020-09-16 21:44:47Z
instead.

So I never really tried to use it in a parser after trying for hours to first get the correct output.

Edit:

21:44:47Z and 21:44:47 +0000 are equivalent in ISO 8601, causing my confusion.