How to parse XML in Swift 3 - Complex XML Construction

Hi everyone. I have to parse xml file in my application. I get xml data from web service with token. I can not parse true xml tag's data because xml file is a complex construction and a few tags repeat a few times. For example xml file has 17 "cbc:ID" tag. I need first cbc:ID tag which into "invoice" tag. This an example for my xml data. I use that function;

func getXMLDataFromServer(authorization:String){
let url = NSURL(string: self.urlString)
let request = NSMutableURLRequest(url: url! as URL)
request.httpMethod = "GET"
request.addValue(authorization, forHTTPHeaderField: "Authorization")
let task = URLSession.shared.dataTask(with: request as URLRequest) { data,response,error in

        if error != nil {
            //print("\(error)")
        }

        else {

            let htmlContent = NSString(data: data!, encoding: String.Encoding.utf8.rawValue)
            //print("\(htmlContent)")

        }
        if let receivedData = data {
            if let aString = String(data: receivedData, encoding: String.Encoding.utf8) {
                if let percentage = aString.addingPercentEncoding(withAllowedCharacters:CharacterSet(charactersIn:"ıİöÖçÇüÜğĞşŞ").inverted) {
                    if let aData = percentage.data(using: String.Encoding.utf8) {
                        let parser = XMLParser(data: aData)
                        parser.delegate = self
                        parser.parse()
                    }
                }
            }
        }
    }
    task.resume()
}

I have tried that tutorial for parse my xml data. However I can not parse true tag in my xml data.

I need some tag's datas there are "cbc:ID" tag's data in "invoice" tag, "cbc:Name" tag's data in "cac:AccountingCustomerParty" tag, again "cbc:ID" tag but it is into "cac:AccountingCustomerParty" and "cbc:PayableAmount" tag's data in "cac:LegalMonetaryTotal"

Someone have an idea how to parse true tag's data for my application?

There are two general flavours of XML APIs:

  • Streaming APIs, which parse the XML from start to end and give you a callback for the start and end of each construct it contains.

  • Document APIs, which parse an entire document into a tree and then let you query and manipulate that tree.

Document APIs usually come with a query mechanism (XPath) which lets you run queries against the document. For example, you could run a query for all cbc:id elements within a cac:invoiceline element.

For historical reasons Swift’s Foundation framework only contains a stream API (Foundation.XMLParser). So you have one big choice to deal with initially:

  • Do you want to stick with the built-in streaming API?

  • Do you want to take a dependency on a third-party document API?


If you decide to stick with the streaming API then the basic strategy for extracting data is for to you maintain state about the XML as you get callbacks and use that to guide your data extraction. For example, let’s say you want to extract all the text from the cbc:id elements within a cac:invoiceline element. You could do this as follows:

  1. When you get the ‘did start’ callback for cbc:id, set a flag that causes you to start accumulating data.

  2. When you get the ‘did end’ callback for cbc:id, stop accumulating data and save the final string.

  3. When you get the ‘did end’ callback for cac:invoiceline, save that string to your model.

This is a very simple example and things can get more complex as you try to deal with extracting more information from the XML. Depending on how complex your requirements are, you might want to build a general structure for this or looking at using a document API.

When folks ask me questions about this I usually point them to the SeismicXML sample code. While it’s in Objective-C, the core technique it illustrates is applicable to all languages.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

2 Likes

Here is a Swift example of using XMLParser to parse a GPX file. GPX is an XML standard for representing location data over time.

1 Like

How come Swift doesn't have "Codable" support for XML like it does for JSON? Is XML too complex? Or is it just no one has gotten around to it...?

2 Likes

To sort of answer my own question, I suppose the main reason is that Foundation in Objective-C doesn't support it. Though that doesn't answer the underlying question: why not?

It supports plist’s, which are pretty much XML.

There's also Cocoa's (NS)XMLDocument, which should be fine for a document-based approach to querying the XML?

XMLDocument lacks many basic functionality (see current status). For example, I need to get child element with certain tag name (even without namespace support).

In my understanding, XMLParser from the Swift Foundation library (al least in the form of the SAX-like i.e. streaming parser, see Apple Developer Documentation) is actually unusable as a general XML tool as it cannot process internal entity references properly (except the XML-predefined ones < etc.). This is a severe limitation that exists (already in form of the Objective-C-API) for a very long time, and nobody at Apple seems to be interested in fixing it.

As an alternative, you could use libxml2, which comes preinstalled on macOS and iOS, but you need to use a wrapper to this C library, see Wrapping libxml2 for Swift – The Red Queen Coder and GitHub - SonoPlot/Swift-libxml: This is a repository that contains Swift wrapper classes for libxml2.

The libxml2 has limitations regarding the support of W3C Schema. So if W3C Schema has to be supported, Xerces-C++ should be used, Xerces is the "gold standard" for XML (the Java version of Xerces is part of the standard Java libraries). As the C++ interop seems to be coming (see https://mobile.twitter.com/jeremyphoward/status/1154974115893149696), using Xerces-C++ from Swift without an intermediate C wrapper seems to be possible soon.

Both libxml2 and Xerces have a document interface and a SAX-like (i.e. streaming) interface. For Swift, I would prefer using the SAX-like interface of one of those libraries and building the document structure in pure Swift (XPath from these libraries would then not work, but the Swift language should give something better than XPath at least for internal code).

For full support of current important XML standards (besides maybe XPath, see above), the best choice would be a wrapper to Xerces (al least to the SAX interface). (libxml2 would be a better choice if you want to validate documents using Relax NG, but even though Relax NG is really great, W3C Schema is much more popular.)

Conclusion: Working with XML in Swift is in a sad state.

Update: Something changed here: The described problem is gone, this XMLParser seems to be quite good now (even reporting comments, and CDATA section as CDATA sections). I think I will use this parser in a project and use C++ based pre-processes for validation of XML documents via Xerces-C++ and/or libxml2.

Oh oh I got an error with the XMLParser from Foundation, see there (see that other topic about a possible solution so I do not have to update this ticket here = too much noise).