With the following code and XML document I get the error Failure to process entity ent1:
ERROR: Error Domain=NSXMLParserErrorDomain Code=104 "(null)" UserInfo={NSXMLParserErrorColumn=13, NSXMLParserErrorLineNumber=5, NSXMLParserErrorMessage=Failure to process entity ent1
This seems to be a serious bug in the library.
Note: When I set xmlParser.shouldResolveExternalEntities = false, I still get the same error (I have to set it to true to be informed about the entity declaration). The value for xmlParser.externalEntityResolvingPolicy also does not seem to matter here. I also tried to use xmlParser.allowedExternalEntityURLs = [ URL(string: "myentity1.ent")! ]. The file myentity1.ent exists.
The file:
<!DOCTYPE test
[
<!ENTITY ent1 PUBLIC "MYPUBLICID" "myentity1.ent">
]>
<test>&ent1;</test>
The code:
func parse(
fileURLWithPath: String
) {
let url = URL(fileURLWithPath: fileURLWithPath)
if let xmlParser = XMLParser(contentsOf: url) {
xmlParser.shouldResolveExternalEntities = true // has to be set to true to get the declaration
xmlParser.externalEntityResolvingPolicy = XMLParser.ExternalEntityResolvingPolicy.never
let myParserDelegate = MyParserDelegate() // just prints
xmlParser.delegate = myParserDelegate
xmlParser.parse()
}
}
Full output:
did start document
external entity declaration: name ent1, public id "MYPUBLICID", system ID "myentity1.ent"
start of element: name "test", namespace URI "", qualified name ""
ERROR: Error Domain=NSXMLParserErrorDomain Code=104 "(null)" UserInfo={NSXMLParserErrorColumn=13, NSXMLParserErrorLineNumber=5, NSXMLParserErrorMessage=Failure to process entity ent1
}
external entity: name ent1, system ID ""
Error 104 is XML_ERR_ENTITY_PROCESSING, which is a catch all for “something went wrong loading an external entity”, so that doesn’t help much.
I’m curious as to your expectations here. You wrote:
The file myentity1.ent exists.
Exists where? Relative to the current working directory? Or are you expecting it to resolve it relative to the base URL you passed in? AFAICT it actually works relative to the current directory. However, it’s hard to tell without a complete example to test with.
Regardless, if I were in your shoes I’d take complete control over this by implementing the parser(_:resolveExternalEntityName:systemID:) delegate callback.
Oh, and what platform are you testing this on? I assumed you’re working on the Mac but, thinking about it, that’s not based on any evidence.
I created a Github repository with my sample and code: SwiftXMLParserExamples. See the README there for a more detailed description.
macOS 11.3 on Apple Silicon with Xcode 12.5. xcrun swift -version returns Swift version 5.4. Please note that I would like to use the XMLParser on other platforms, too (Windows and Red Hat).
Unfortunately, this does not solve the problem, see the mentioned repository. Maybe it would be good for my purpose to deactivate validation? How should I do that? I had some difficulties to get a complete overview about the parser behavior in the documentation. Is this parser based on libxml2?
I don't know anything about the external entity issue, but you could try the XMLDocument class — if you don't need to support the iOS, tvOS, or watchOS platforms.
Most of the APIs are also available in swift-corelibs-foundation (Windows and Linux), except for the XQuery and XSLT methods; however I've only tested on macOS.
Thanks for the suggestion @benrimmington, but I do not want to have a tree structure, and I guess this is using the same XML parser in the background anyway.
I now get the document parsed by first only reading the entity declarations and then making some according textual replacements in the document before again parsing it, and during that second parsing restoring the old entity names when "resolving" the according entities, see the repository SwiftXMLCorrectingParserExamples. It is some kind of a "dirty" way, so I still hope there is a better way or there will an according change in the library, but until then, I can at least work with it.
...Please also note, as noted on SwiftXMLParserExamples, that undeclared entities will not be reported if we do not use parser(_:resolveExternalEntityName:systemID:), but without any error message. It should never happen that content gets lost without an error message. There is also the problem of external parsed entities being resolved twice. I created the bug reports SR-14581, SR-14582, SR-14583, and SR-14584.