Using large language models (LLMs) for Swift programming

wigging · July 21, 2025, 6:20pm

I have been playing with OpenAI's ChatGPT and Anthropic's Claude to get suggestions on how to accomplish various programming tasks in Swift. I haven't used them for anything complex, but so far both models have provided similar results. They also do a good job with providing code examples for Apple specific frameworks that are lacking in documentation. So I was wondering, what LLMs have you all used for Swift programming and which one(s) do you get the best results with?

hassila · July 21, 2025, 8:26pm

We’ve used both ChatGPT and Claude and in general get better results with Claude.

Wouldn’t trust it for anything blindly, but can be a good sounding board, provide a POC and perform refactoring. Also useful for exploring different approaches.

That being said, needs careful review and it is just one more nice tool.

Also unexpectedly useful as a debugger actually - many examples of dumping in SwiftUI code and explaining a problem we see and ask it for possible root cause analysis and it actually points out the issue.

That being said, need careful iterative prompting for best results and definitely don’t trust it - I usually would use a preamble to tell it to use swift 6, avoid using dispatch, use Observation, etc, etc, as much of the llms are done on old apis etc.

Just 0:02c

theundergroundsorcerer · July 21, 2025, 9:33pm

When I've used Claude last time (3.7 version bundled with Copilot subscription), it literally started inventing new syntax. I've had similar results with Gemini and Grok. So far, from what I've seen they seem to struggle with recent features introduced in the language, including owernship (proper usage of consume, consuming, borrowing, ~Copyable, ~Escapable etc, with explanations often being just plain wrong).

To me - it feels like StackOverflowing + Search + Autocomplete on steroids, which is very powerful, but has its limitations.

wigging · July 22, 2025, 2:26am

Did you use the paid or free version of Claude? Regardless of which tier you used, I agree that the generated code should be reviewed. But even when the code is wrong I still find it useful for getting new ideas and I can always iterate with the prompt to improve results.

wigging · July 22, 2025, 2:31am

I haven't tried using it with new Swift features but I'm not surprised that it would struggle with recent features. There are less resources available to train the model on new Swift code so I would expect more errors in that regard. But it seems to do fine with established Swift concepts.

t089 · July 22, 2025, 4:40am

Using the combination Visual Studio Code and Claude Code is very powerful. Once you teach Claude how to compile and run tests, it can go in a very productive „code, build, fix, test“ loop and solve even more challenging problems. You can even teach it to check sources of packages to understand their syntax. Of course it will still make mistakes, and yes concurrency is a challenge as well as advanced newer features of the language. But if you take over the difficult parts, it will happily fill in the boring stuff.

nikolai.ruhe · July 22, 2025, 4:42am

I found this article by Indragie Karunaratne contained some interesting information.

hassila · July 22, 2025, 5:00am

Use the paid ”max” version, otherwise it just runs out quickly.

I’d probably try attaching the evolution proposals as attachments if using eg. Ownership annotation - to get decent results one need to be very careful with the prompting.

Agree with the other poster about using the tools to do ”boring” bulk work.

maartene · July 22, 2025, 6:23am

We have our internal GPT instance, but its based on O3mini. I use it mostly as a way of 'navigating' Apple's documentation. Typically its able to provide a reasonable explanation, an example and most importantly roads into further exploration.

I also have GitHub CoPilot running when using VS Code. Though I notice that my accept rate of suggestions is pretty low. It's like its not very well aligned with my workflow.

And there's off course the built in model in Xcode. Not sure if that one counts though.

let_zeppelin · July 22, 2025, 11:16am

After being somewhat skeptical about AI in general (I don't want to get 'political' here, but some cultural aspects and concerns around it kept me away), I decided to approach the topic with an open mind. Most of my experience is based on the Sonnet models 3.7 and 4, driven by claude-cli, cline, or Xcode 26 (via LiteLLM proxy). I spent some time with context engineering to optimize my results and also hooked up some MCP servers for tighter feedback loops and additional context.

My short summary is that LLMs are very useful for short bursts of work that follow an easy-to-predict pattern. The quality of this short burst of work depends heavily on the quality of the planning that you perform together with the agent and the context that you're able to provide. The time to set everything up properly should not be underestimated.

It's better at this point not to expect miracles. All of my sessions followed a similar pattern. First, I got some impressive results because I was often presented with something that works (for example, a simple game), but on closer inspection, the code was not that great, and optimizing it together with the agent started feeling a bit like gambling, and the initial time savings started to melt away.

I'm aware that this is a fast-moving field. The progress might be exponential, linear, or follow the law of diminishing returns—it's hard to say. So for now, I keep coding like before, but things like cleaning up my code and documentation, filling the body of a function that does something well-known, or creating custom conformances to protocols like Codable are things I will not write by hand anymore, and I will also explore more meaningful contributions. However, creating whole projects from scratch or letting the agent perform meaningful changes on large codebases is something that I have not had much luck with. Your mileage may vary.

But there is an additional angle here, and this is what could be done to make the lives of developers who code together with an AI agent easier. Should there maybe be a way to convert a DocC archive to Markdown, or should the Swift Package Manager have a built-in MCP server? I feel like there are many aspects like this that could be explored.

wigging · July 22, 2025, 4:53pm

What do you mean by "navigating Apple's documentation"? This sounds like you ask the model a question and give it some links to the Apple docs as reference. Or is there some other way you utilize the docs?

maartene · July 22, 2025, 5:45pm

Hi @wigging,

I should have been more clear: I use an LLM when I need to accomplish a certain goal. I.e. add a bloom effect to a RealityKit scene. This is actually non-trivial to find in the RealityKit documentation, but when you use an LLM, you get a piece of code that helps find the relevant APIs. And from there its easier for me to dive deeper.

hryde · July 28, 2025, 3:32pm

I've been using ChatGPT to help proofread a book I'm working on: "Introduction to Programming Using Swift." GPT code output has been about evenly divided into four categories:

Absolutely amazing code, doing things I never would have thought of
Largely matching the same code I would have written myself.
Producing slightly inferior code to what I would have written.
Going off into left field and producing unusable junk (hallucinations).

Without question, the biggest issue I have found with using GPT is that it has been trained on a wide variety of examples on the net, some using older Swift syntax that doesn't compile on modern versions. In particular, I'm using Swift 6.2 (beta) for the code in my book and GPT-generated code often fails to compile (it really gets confused when using Date & Time stuff, which has conflicts with RegEx builder). Even when it compiles, the code sometimes doesn't do quite what you asked for (though going through several iterations specifying tighter and tighter requirements helps). I recently read an article where coders were claiming that AI sped up their coding process, but in reality they were seeing 30% less productivity using it. Based on my experiences with Swift and ChatGPT, I can believe this.

I have also found that ChatGPT does a good job on very simple requests (like "give me an example of using String.allSatisfy()") but does a mediocre job dealing with more complex programming tasks ("write a simple database application that maintains names and addresses and allows set theoretic operations on the records" [granted, with a lot more requirements than just this, attached]).

I've also found ChatGPT nearly worthless for reviewing existing code.

I'm currently planning on self-publishing "An Introduction to Programming Using Swift", much like I did my original 16-bit "Art of Assembly Language Programming." I have been using ChatGPT as an "editor" to proofread the English for me (hiring an editor would be cost-prohibitive for such a release). I've found that asking ChatGPT to "rewrite this section for grammar and clarity" works well when asked to process one section at a time in a short chapter. It does not do well when asked to proof-read an entire chapter (typically 40-100 pages):

It will take a 50-page chapter and turn it into a 5-page outline, with the obvious loss of information.
It will often ignore the existing material in a given section and write its own version of the subject, often missing the main point of the section.

Even working on one section at a time, ChatGPT has problems:

It ignores diagrams, removing them from the section and not even doing a good job of trying to describe what the diagram was showing.
For longer chapters, ChatGPT does a great job on earlier sections, but somewhere around 40-50 pages into the chapter it stops reading the actual text and switches into "I will write this section on my own" (ignoring the existing written material).

I am really surprised about the issue with diagrams. Because ChatGPT does a great job of creating Alt-Text from diagrams for my other books (No Starch Press). [Alt-Text is a special text description of diagrams providing in electronic editions of a book for sight-impared readers who are using text-to-speech synthesis to "read" the book].

In any case, just my two cents.
Randy Hyde

wigging · July 29, 2025, 4:54pm

Did you try using other models like Claude or Gemini?

ariccio · August 3, 2025, 6:15pm

This. I find the careful engineering of a copilot-instructions.md file makes all the difference in the world. It's far from perfect though. Once I've gotten this loop working, I've begun running into annoying other issues. For example, copilot for xcode seems to use far more tokens when building than copilot in vscode. I'm constantly exceeding token limits in xcode copilot.

This of course is one of the reasons I built a ton of custom automation and orchestration scripting ahead of time - I knew there's a chance I might need to swap it in at some point.

There are still several major pain points, which is actually why I came to this thread in the first place.... Lemme get back on topic and leave the post I came here to leave

ariccio · August 3, 2025, 6:36pm

What I came here to ask about:

How much of the functionality of the swift/tools folder is exposed through sourcekit and sourcekit MCPs?

Among the pain points I do see repeatedly is that these LLMs don't necessarily seem to have enough insight into the ASTs (if at all) to really reason about complex refactoring.

Some of them, like swift-grep (swift-ast-script.cpp) would be helpful for models that are struggling to understand visibility and modularity without consuming entire files. Others, like swift-refactor might be way more likely to get refactoring tasks right the first time and every time, than hoping an LLM can do it via text replacement. (These are more common than I expected)

This dovetails nicely with the strategy that I'm employing more and more - I'm getting better results overall with a pattern of having LLMs write proper scripting or build tooling to execute on a task and then using that, than asking it in instructions to carry out a given task itself. It's much more predictable, obviously, and also inherently less generalizable, but allows much better results for the LLM. Finally, the bit that satisfies my inner hibernating c++ programmer, said scripts or tooling execute a ton faster than a general purpose billion parameter large language model.

I have theories here as to why. Namely that a properly behaving tool that does work in machine code gets that work done without consuming tokens, and thus without contributing to dilution or the context window and the model's attention.

Heh, as a much smaller scale example from the other day, once I created a little document in the repo context that listed the available commands in the system environment, Claude in copilot started using jq to extract JSON, instead of trying to wrangle values out of files by selecting the line or writing and executing inline python.

As a side note: one of the things that's been surprisingly helpful has been including the right incantations in my instructions file to get agentic models to leave notes for themselves in the repo. I'm building up a nice knowledgebase of copilot_notes that it does occasionally reference and do smart things with. It's a bit like a low-octane memory simulation. I have yet to figure out if it's more effective than the 497 solutions that exist nowadays that advertise to be "memory MCPs" or anything like them. But it's easier than training an embedding or a proper agent ecosystem.

ariccio · August 3, 2025, 6:40pm

I am entertained that I have independently discovered how useful proper priming is.

If you want to take this to the next level and give your models real superpowers, the secret sauce for me has been including a short section on the overall business goals of the project at the beginning, and dumping the entire (short) design/business planning document at the end. This has it sometimes like 4 steps ahead of me.

taylorswift · August 6, 2025, 8:12pm

i find that Google Gemini outperforms Claude on Swift tasks by a large margin. this surprises me, as Google is not known for being an organization that uses a lot of Swift.

it still struggles with being up to date with the “latest” Swift though. i cannot really use Gemini to easily create anything that uses SwiftSyntax, because it tends to generate code that worked with SwiftSyntax 509, and it just doesn’t have much experience with 600, 601, etc.

Gemini seems to do alright with ownership. Gemini just does not understand parameter packs. though i can hardly blame it, i imagine the set of humans who understand parameter packs is also quite small.

i tried giving Gemini 2.5 Pro the following challenge:

Please use Swift parameter packs to create a function that takes a String key, N dictionaries, of type [String: T1], [String: T2], [String: T3], etc., and N closures (T1) -> Int, (T2) -> Int, (T3) -> Int, etc., and looks up the string key in each dictionary from first to last, returning the closure result for the first dictionary value that matches.

it gave me this, which does not compile.

func findFirstValue<each T>(
    forKey key: String,
    in dictionaries: repeat [String: each T],
    using closures: repeat (each T) -> Int
) -> Int? {
    // We create an array of closures, where each closure encapsulates the logic
    // for one dictionary-closure pair. This is a powerful pattern for working with
    // parameter packs. The `repeat` keyword expands the packs in lockstep.
    let lookups: [() -> Int?] = [
        repeat {
            // For each pair in the packs, we create and immediately return a new closure.
            {
                // `each dictionaries` refers to the current dictionary in the expansion.
                if let value = (each dictionaries)[key] {
                    // If the key is found, `each closures` refers to the corresponding
                    // transformation closure, which we call with the found value.
                    return (each closures)(value)
                }
                // If the key is not found in the current dictionary, this lookup returns nil.
                return nil
            }
        }
    ]

    // Now, we can simply iterate through our homogenized array of lookup closures.
    for lookup in lookups {
        // We execute each lookup. If it returns a non-nil value, we've found our match.
        if let result = lookup() {
            // Return the result from the first successful lookup and stop processing.
            return result
        }
    }

    // If the loop completes, it means the key was not present in any of the
    // dictionaries. We return nil to indicate that no value was found.
    return nil
}

solution

func findFirstValue<each T>(
    forKey key: String,
    in dictionary: repeat [String: each T],
    using closure: repeat (each T) -> Int
) -> Int? {
    for case let i? in repeat (each dictionary)[key].map(each closure) {
        return i
    }
    return nil
}

theundergroundsorcerer · August 17, 2025, 8:43am

We are not supposed to compare humans to computers. We sort of use LLMs and similiar tools, to understand features “most of humans” don’t know.