How to load components of `IPv4Address`?

taylorswift · October 8, 2023, 10:51pm

SocketAddress.IPv4Address is ill-suited as a currency type:

it is reference-counted and heap-allocated.
it includes extra metadata about the connection, which means it is not a pure value type and cannot implement Equatable, etc.
it cannot be serialized or deserialized.

therefore, it is motivating to copy the contents of SocketAddress.IPv4Address to a value type like:

extension IP
{
    @frozen public
    struct V4:Equatable, Hashable, Sendable
    {
        public
        var a:UInt8
        public
        var b:UInt8
        public
        var c:UInt8
        public
        var d:UInt8

        @inlinable public
        init(_ a:UInt8, _ b:UInt8, _ c:UInt8, _ d:UInt8)
        {
            self.a = a
            self.b = b
            self.c = c
            self.d = d
        }
    }
}

but converting a SocketAddress to this type requires a bit of reverse-engineering to deduce the layout of SocketAddress:

        switch ip
        {
        case .v4(let ip)?:
            let bytes:UInt32 = .init(bigEndian: ip.address.sin_addr.s_addr)
            let value:IP.V4 = .init(
                .init((bytes >> 24) & 0xFF),
                .init((bytes >> 16) & 0xFF),
                .init((bytes >>  8) & 0xFF),
                .init( bytes        & 0xFF))

is there a better way to load the components of IPv4Address? what about IPv6Address?

wadetregaskis · October 8, 2023, 11:04pm

Is there a particular reason you want to store the address across four fields, rather than just as a single UInt32?

I find that a lot of C APIs expect the address in either "raw" form (UInt32, sometimes host-endian and sometimes network-endian) or sockaddr / in_addr form. So while there's no 'perfect' representation, native-endian UInt32 is a good middle ground.

taylorswift · October 8, 2023, 11:08pm

no good reason, other than that it was convenient while i was reverse-engineering the layout of in_addr for the purposes of implementing CustomStringConvertible.description for logging purposes.

for serialization/deserialization, a UInt32-backed representation would probably be better. but it would still be necessary to know the byte-order of the UInt32, to support lookups and logging.

lukasa · October 9, 2023, 6:35am

The canonical representation is UInt32 in network byte order (i.e. big-endian), but as you say, it doesn't really matter so long as you are consistent about it.

As for the flaws with SocketAddress, yep, all of these and more. An important note is that SocketAddress.IPv4Address, despite its name, doesn't even represent an IPv4 address: instead, it represents an IPv4 address and a port.

We have a now defunct PR open that aimed to add IP addresses as a proper type in NIO. We'd still love to find a way to get that over the line to resolve some of this mess, but for now the implementation may be useful for you.

tera · October 9, 2023, 10:13am

Can't you use in_addr / in6_addr directly? Example:

extension in_addr {
    init(_ string: String) {
        var address = in_addr()
        inet_pton(AF_INET, string, &address)
        self = address
    }
    var string: String {
        var address = self
        var str = [CChar](repeating: 0, count: Int(INET_ADDRSTRLEN))
        inet_ntop(AF_INET, &address, &str, socklen_t(INET_ADDRSTRLEN))
        return String(cString: str)
    }
}

extension in6_addr {
    init(_ string: String) {
        var address = in6_addr()
        inet_pton(AF_INET6, string, &address)
        self = address
    }
    var string: String {
        var address = self
        var string = [CChar](repeating: 0, count: Int(INET6_ADDRSTRLEN))
        inet_ntop(AF_INET6, &address, &string, socklen_t(INET6_ADDRSTRLEN))
        return String(cString: string)
    }
}

Usage example:

print(in_addr("128.0.0.1").string) // 128.0.0.1
print(in_addr("1.2.3.4").string) // 1.2.3.4
print(in6_addr("::1").string) // ::1
print(in6_addr("2001:0000:1111:0000:0000:9999:8888:2222").string) // 2001:0:1111::9999:8888:2222

outputs:

128.0.0.1
1.2.3.4
::1
2001:0:1111::9999:8888:2222

Karl · October 9, 2023, 10:29am

WebURL has excellent IPv4Address and IPv6Address types, FWIW.

No heap allocations, well-defined system-independent parsing (both inet_aton and inet_pton styles) both from (sub)strings and generic collections of bytes, RFC-compliant serialisation, full documentation, and it's written in pure Swift with no platform dependencies (fully portable).

The documentation explains how to convert to C's in_addr/in6_addr, or to a NIO SocketAddress. Similar techniques can be used to convert in the other direction.

lukasa · October 9, 2023, 10:53am

+1 to the above, those are also great choices for types.

taylorswift · October 9, 2023, 9:29pm

that looks like exactly what i need! i also commend the swift-url package for keeping the main WebURL Foundation-free, in my view, there are not enough libraries out there that pay sufficient attention to server-side requirements.

i have opened an issue on GitHub to discuss the size of the WebURL module itself, it would be really great if the IP address-related types lived in their own module.

tera · October 10, 2023, 10:11am

Continuing the idea of using C's in_addr / in6_addr as currency types, here's possible implementation for required Equatable / Hashable / Codable conformances:

extension in_addr: Hashable, StringCodable {
    static public func == (lhs: Self, rhs: Self) -> Bool {
        lhs.s_addr == rhs.s_addr
    }
    public func hash(into hasher: inout Hasher) {
        hasher.combine(s_addr)
    }
}

extension in6_addr: Hashable, StringCodable {
    static public func == (lhs: Self, rhs: Self) -> Bool {
        lhs.value128 == rhs.value128
    }
    public func hash(into hasher: inout Hasher) {
        hasher.combine(value128)
    }
}

utilising these helpers:

extension in6_addr {
    var components: (UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16, UInt16) {
        __u6_addr.__u6_addr16
    }
    var components32: (UInt32, UInt32, UInt32, UInt32) {
        __u6_addr.__u6_addr32
    }
    var components64: (UInt64, UInt64) {
        ((UInt64(components32.0) << 32) + UInt64(components32.1), (UInt64(components32.2) << 32) + UInt64(components32.3))
    }
    var value128: UInt128 {
        UInt128(first: components64.0, second: components64.1)
    }
}

and:

struct UInt128: Hashable {
    var first: UInt64
    var second: UInt64
}

protocol StringCodable: Codable {
    init(_ string: String)
    var string: String { get }
}

extension StringCodable {
    public func encode(to encoder: Encoder) throws {
        var container = encoder.singleValueContainer()
        try container.encode(string)
    }
    public init(from decoder: Decoder) throws {
        let container = try decoder.singleValueContainer()
        let string = try container.decode(String.self)
        self.init(string)
    }
}

As for the Codable this implementation chooses to use the ip address string representation:
example JSON:

    { "ipAddress" : "127.0.0.1" }

taylorswift · October 11, 2023, 9:42pm

i would center the design around the serialization requirement, rather than just the Equatable/Hashable aspects.

encoding IPs as strings is not appealing to me, because i want to log and persist a lot of IP addresses for a long period of time, as is necessary to secure and defend a service that receives a lot of traffic from a lot of different IP addresses.

tera · October 11, 2023, 9:48pm

What would be your preferred serialisation for IPv4 and IPv6 addresses?

taylorswift · October 11, 2023, 9:50pm

for IPv4, an Int32, which has a 4-byte native BSON representation.

for IPv6, a fixed-size 16-byte array, which has a 21-byte BSON representation.

both of these representations are much smaller than a BSON string.

tera · October 11, 2023, 11:38pm

I see. Then unless you are in full control of the whole app and all its dependencies it would be unwise to make the "currency type" things like SocketAddress.IPv4Address / WebURL.IPv4Address / in_addr conforming to your version of Codable † representation as that might be at odds with the wishes of other components of the app ("do not conform types you don't own to protocols you don't own"). I'd use a custom made wrapper type, e.g.:

struct MyIP4Address: Hashable, Codable {
    var wrapped: in_addr // or whatever you end up using
    public func encode(to encoder: Encoder) throws {
        // custom implementation
    }
    public init(from decoder: Decoder) throws {
        // custom implementation
    }
}

† Strictly speaking that would be a concern with Hashable / Equatable as well!

Karl · October 12, 2023, 2:10am

The implementation is more-or-less contained to one file. I offered to contribute the implementation to swift-system a while ago, but I think we agreed that things such as IP addresses should go in some kind of lower-level, platform-independent library.

That said, even if there were a common data type defined somewhere, WebURL would still implement its own parsing and serialisation routines because these are defined by the URL standard.

--

As for splitting things in to separate modules:

I split IDNA in to its own package product, because it's huge and implements a self-contained standard. It's fully-documented, and you can use it via the "_WebURLIDNA" product, although I'm not promising a stable API for it at the moment (hence the leading underscore).

When it comes to IP addresses, percent-encoding, and other "utilities" like that, the problem is that I can see that quickly expanding to almost all of the library.

For instance, the host parser - a user gives you some string (e.g. on the command line or via a configuration file), and you want to interpret it the same way the URL parser does, to find out whether it's a domain/IPv4/IPv6, and if it's a domain, to perform IDNA compatibility processing in order to normalise it. You seem to know that you want IP addresses, but I can imagine a lot of other applications wanting a general host parser.

It's a non-trivial operation, so it's good for a library to handle that for you -- for instance, the hostname "0x𝟕f.1" contains U+1D7D5 MATHEMATICAL BOLD DIGIT SEVEN. The way the URL standard's host parser algorithm works, the string goes through IDNA before it is parsed as an IP address, so this gets mapped to "0x7f.1" (with a normal "7"), which then gets successfully parsed as 127.0.0.1. Check it in the reference parser.

In fact, it gets even worse - you can percent-encode that mathematical 7, and it still parses to 127.0.0.1 (Reference parser). So it goes through this enormous journey where the host string "0x%f0%9d%9f%95f.1" needs to be percent-decoded, and then we figure out that the result contains Unicode text, so we send it through IDNA, then we figure out that the result is some ancient form of IPv4 address and parse that. That's why I say it's really not trivial.

That's all lots of fun, but the part where it gets serious is that, if a browser (or any other conforming URL implementation) is going to see the string 0x𝟕f.1 (with the mathematical 7), or the percent-encoded version, and think "oh, that's 127.0.0.1", then that's a really valuable thing for applications to know. There is a whole class of vulnerabilities known as Server-Side Request Forgery which involves tricking servers to make requests to internal services.

But all of this processing? It's not part of the IP address parser; you need the full URL host parser for that. That's why I think it would quickly grow to encompass the host parser as well. Which is like basically half the library.

By the way, WebURL already does expose the host parser as API, so you can parse hosts from anywhere exactly like the URL parser would:

WebURL.Host("0x%f0%9d%9f%95f.1", scheme: "http")
// .ipv4Address(127.0.0.1)

It's part of the WebURL module, and I don't see a problem with that.

So at the moment I'm not inclined to split things like IP addresses and percent-encoding out in to separate modules. I try to expose as many things as I can as API, with generics, so you can use the parts that you need without costly and annoying data type conversions. In this particular case, I've tried to make the file more-or-less self-contained, but I'm not inclined to break the library apart for a fully à la carte experience.

taylorswift · October 12, 2023, 3:03am

right, the question i would ask is: does the parsing logic have to go with the type definition?

a favorite pattern of mine is to ship a module that contains the type and perhaps its formatter:

@frozen public
struct SchlondPoofa:Equatable, Hashable, Sendable
{
    ...
}
extension SchlondPoofa:CustomStringConvertible
{
    ...
}

and then have a second module that contains the parser:

import Mechanics

extension SchlondPoofa:LosslessStringConvertible
{
    ...
}

could a similar concept be applied to WebURL?

Karl · October 12, 2023, 3:20am

It could, but I'm not sure I see the value in it, to be honest