Implementing a Jupyter kernel for Swift using LLDB; questions about declaration behavior and code completion

marcrasi · July 16, 2018, 10:22pm

I've started experimenting with a Jupyter kernel for Swift that works by calling LLDB's APIs to evaluate code. (If you're not familiar with Jupyter, it's like a REPL but with big boxes that make it easy to enter large multiline snippets of code to execute all at once. It also has a few other features like graphics output that make it really popular in machine learning and data science communities.) You can see it here: GitHub - google/swift-jupyter

I have one general question: Does calling LLDB's APIs (specifically, SBTarget.EvaluateExpression) to evaluate code sound like a sensible approach for implementing a Jupyter kernel?

And I have a few specific questions that have arisen from my experimentation.

duplicate declarations

When you declare something twice in the Swift REPL, the second declaration replaces the first one:

  1> let x = 1
x: Int = 1
  2> let x = 2
x: Int = 2
  3> x
$R0: Int = 2

However, if you use your declaration in the same line where you re-declare it, the REPL gives you an error:

  1> let x = 1; print(x)
1
x: Int = 1
  2> let x = 2; print(x)
error: repl.swift:2:18: error: ambiguous use of 'x'
let x = 2; print(x)
                 ^

repl.swift:2:5: note: found this candidate
let x = 2; print(x)
    ^

Is this something that can and should be fixed? I think that line 2 should succeed and print 2.

People using Jupyter will run into this a lot, because it's very common to write "cells" (single blocks of code that get executed all at once) that declare and use variables.

I'm happy to work on a fix if this is indeed something that can and should be fixed.

code completion

The Swift REPL has some nice code completion behavior that is not exposed through the LLDB API. I would love to be able to use this in the Jupyter kernel. Would it be reasonable for me to expose the code completion through the LLDB API somewhere?

I nearly have basic code completion working using the SourceKit API, but I think using the Swift REPL's implementation would be better in the long term because:

The Swift REPL provides completions in more contexts. For example, if you type half a function name, the Swift REPL will complete it but SourceKit won't. (I found a previous discussion of this here: Completion limitations · Issue #113 · jpsim/SourceKitten · GitHub).
I need to pass the code-completer a "history" of all the code that has already been executed, so that it knows how to complete things that the user has declared in previously-executed code. The Swift REPL completion seems to be happy to accept duplicate declarations in the history. SourceKit errors out when there is a duplicate declaration in the history.
Overall, SourceKit seems intended for operating on files rather than interactive REPL sessions.

Thanks!!

Jim_Ingham · July 18, 2018, 5:34pm

Just a few quick responses...

Duplicate definitions. That sounds like a bug for sure. It's been a while since I've looked at the REPL code in detail, but in sketch, we treat each REPL compile as an independent unit that we then import into the environment to compile the current statement. Then that import has some custom lookup rules that resolve conflicts in favor of the most recent definition. That mechanism doesn't intervene in new definitions which is why we aren't resolving the conflict at a use site. So we'll need to find a hook during type checking to resolve duplicate definitions in favor of the just happening one. Should be doable, though I haven't looked at that code for a while so I can't say how hard it would be.
Code completion. The lldb REPL's code completion uses swift::REPLCompletions. That seems to use the same completion as SourceKit under the covers, so I'm not sure why it is behaving differently for you. If you want to look at the code, the lldb side is in:

source/Plugins/ExpressionParser/Swift/SwiftREPL.cpp.

LLDB's use of it requires you have access to the chain of definitions that the REPL has accumulated. It would certainly be possible to add some API to "complete a string in the context of the current REPL". Would that suit your purposes? I'm not sure how to make it a more general facility than that.

marcrasi · July 20, 2018, 11:58pm

Thanks for the responses!

Makes sense! Should I file a bug on JIRA? I'll take a look eventually, but if anyone else has time to try to fix it, that would be great too :)

Interesting, I'll trace through things and see if I can find a place where there is a difference.

I thought a bit more about this. I notice that LLDB has a general "REPL" class that has a subclasses for a few different languages. They implement stuff like initializing the debugger to the right state, and code completion. LLDB does not expose "REPL" through the Python API. Would it make sense to expose that?

If that's exposed, then the juptyer kernel could reuse all the REPL's debugger-initialization code (right now my kernel duplicates it -- it has to instantiate a debugger, point it at the repl_swift stub executable, set up a breakpoint, etc).

Jim_Ingham · July 21, 2018, 12:09am

I have no objections to exposing the REPL through the SB API's. You can get a little of its functionality by running expressions with SetREPLEnabled(true) in the SBExpressionOptions. But that doesn't encompass the startup code, and as you say doesn't give you a way to test an expression for completions.

If you decide to work on this, note that the REPL is a bit of a funny beast. It exists in generic code which we draw from the llvm.org lldb repository. However that repository doesn't have Swift support so the code always seems a little unmotivated there... Regardless, changes to generic code should go there first, then be merged into the swift-lldb code where it actually stands a chance of doing something.

That will remain the state until somebody decides to write a C++ REPL, which would also be pretty cool, but not something I'm likely to take on soon.