Fuzzing/stress-testing tool | GSOC2018

nanosem · February 20, 2018, 6:58pm

Hi, I'm Swift programmer from Ukraine, and I want to make an stress-testing tool, but I don't know how to..
Can I find out more about this?

Thanks.

tkremenek · March 6, 2018, 11:55pm

cc @akyrtzi

akyrtzi · March 7, 2018, 12:27am

cc @Nathan_Hawes

Nathan_Hawes · March 7, 2018, 2:40am

Sorry for the late reply @nanosem, and thanks for getting in touch!

A good starting point would be getting familiar with libSyntax if you're not already. The README.md file there gives a good overview of the library itself, and at the very end includes instructions on making use of it from Swift in an Xcode project. I'd recommend playing around with the example shown there to do something more interesting, and also try building up your own syntax pieces from scratch via the SyntaxFactory APIs. Both are likely to be helpful for the fuzzing aspect of the tool, and should give you an idea of what's currently possible with the library.

If you have any questions or need help with any issues you run into just let us know!

ortem · March 11, 2018, 9:26pm

Hi everybody!

My name is Artem Mukhin, I'm a third-year student at St. Petersburg State University (department of software engineering), Russia.

I'm interested in modern programming languages and tools and I would like to participate in the development of the fuzzer.

Last year I developed Objective-C to Swift converter for AppCode during JetBrains internship, so now I have certain experience in working with syntax trees and basic knowledge of Swift.
Now I am learning libSyntax and already tried building code via SyntaxFactory.

I also got some questions:

Is this project related to this?
Which fuzzers do developers of Swift language use at the present time?
Is there any starter task for this project? What should I do now to become more familiar with the problem?

cc @Nathan_Hawes

Nathan_Hawes · March 12, 2018, 11:51pm

Great to hear from you Artem!

Is this project related to this?

Not directly. That one is a general mechanism for fuzzing the APIs of a program written in Swift, while this is about creating and using a tool written in Swift to fuzz and stress test the Swift compiler (and SourceKit) itself.

While you could use libFuzzer to fuzz the Swift compiler and SourceKit, its mutations aren't syntax aware, so the resulting input often doesn't progress past lexing/parsing when compiled due to syntactic errors. This means the type checker and lower levels of the compiler wouldn't be exercised very often. The aim of this project is to use Swift's libSyntax, which lets us easily perform structured mutations on an existing Swift file, to generate inputs that are well-formed enough to progress beyond lexing/parsing in most cases and find issues in the later phases of compilation.

Kostya Serebryany gave a talk at last year's LLVM dev meeting about overcoming this issue when fuzzing the Clang compiler.

Which fuzzers do developers of Swift language use at the present time?

I'm not sure, sorry! Some contributors do seem to be using them, but I'm not sure what specific tools they're using.

Is there any starter task for this project? What should I do now to become more familiar with the problem?

Are you familiar with C++ at all? If so, libSyntax isn't actually complete yet, so your help there would be appreciated! See here for how to implement missing syntax nodes, and here for contributing to Swift in general.

In terms of familiarization, the mutated inputs will need to be fed to the Swift compiler and SourceKit, so see if you can do so. For the compiler, this will likely just be launching an external process from Swift and detecting whether it crashes. For SourceKit you can try our work-in-progress wrapper around the SourceKit service, SwiftLang. Instructions are the same as for libSyntax here except you need to add $(TOOLCHAIN_DIR)/usr/lib to your framework and runpath search paths and import SwiftLang at the top of your file. Here's some example usage to make a code completion request get started:

let connection = SourceKitdService()

let request = SourceKitdRequest(uid: SourceKitdUID.request_CodeComplete)
request.addParameter(.key_SourceText, value: """
  struct Foo {
    let x = 12
  }
  Foo().
  """)
request.addParameter(.key_Name, value: "something_unique")
request.addParameter(.key_Offset, value: 34)
let compilerArgs = request.addArrayParameter(.key_CompilerArgs)
for arg in ["<input>"] { compilerArgs.add(arg) }
let response = connection.sendSyn(request: request)

print(response.description)

You can also use .key_SourceFile and supply a path to read the input from disk. SourceKit's API isn't particularly well documented as far as I'm aware, so the best reference is probably the implementation and tests, or just ask here

ortem · March 13, 2018, 10:12pm

Thank you for your quick reply!

I got the point. libFuzzer can only generate random data to test programs written in Swift (or in other languages), but it can’t generate random programs (or mutate existing programs) for finding bugs in the compiler.
OK, then, creating fuzzer for Swift is indeed quite a topical task :)
I am familiar with C++, so I will gladly try to contribute to Swift. As I understood, at this time support of Prefix(Postfix)OperatorDecl has not been added (and nobody seems to work on this task, as it appears on bug tracker), so I’ve created an issue and will try to solve this task in the forked repository.

Also I have a question:
Here the following is said about kind: This must have a corresponding Node with that kind. But do I understand correctly that for kind=‘<word>Token’ (for example, StructToken) there are no corresponding Node, but only exists corresponding elements of SYNTAX_TOKENS (and definitions in TokenKinds.def)? If so, I think this is somewhat misleading.

Xi_Ge · March 13, 2018, 10:39pm

Hey @ortem,
Thank you for filing this task! and we're looking forward to seeing the Prefix(Postfix)OperatorDecl implemented in the parser.

To give you an example of how to add such node, please see this commit for adding import decl to the supported syntax nodes.

Yeah, you're right. I think the documentation is not sufficiently up-to-date. Right now, we've converged all TokenSyntax to one node with subkind of token kind. Could you try to correct the documentation?

ortem · March 14, 2018, 8:35pm

Hi Xi Ge!

I explored the structure of the project today and did the following:

Add Nodes for Prefix(Postfix)OperatorDecl to the DeclNodes.py
Add test to the round_trip_parse_gen.py
Add DECL_KEYWORD(prefix), DECL_KEYWORD(postfix) to the TokenKinds.def
Add Keyword('Prefix', 'prefix'), Keyword('Postfix', 'postfix') to the Token.py
Try to correct the documentation with "This must have a corresponding Node with that kind (or corresponding Token in both include/swift/Syntax/TokenKinds.def and SYNTAX_TOKENS)"

But I haven't run tests (and haven't updated round_trip_parse_gen.swift.withkinds) yet, because the project is building a very long time on my machine, so I will do it tommorow.

Could you please check my current progress?

Xi_Ge · March 14, 2018, 9:04pm

@ortem that's great progress! Could you open a pull request to the Swift repo so that we can review your change?

ortem · March 14, 2018, 9:59pm

Yes, sure. I have already opened a pull request with documentation fixes, and will open a pull request with Prefix(Postfix)OperatorDecl when I run the tests.

ortem · March 15, 2018, 8:50pm

@Xi_Ge I realized I was wrong: prefix and postfix are declaration modifiers, not declaration keywords, and this mistake restricts, for example, the ability to create a function named prefix or postfix.

But when I explored keywords and modifiers in Node definitions, I found out that ClassDecl Node doesn't contain optional final modifier (in contrast to the specification of the Class Declaration in the Swift Language Reference).

Is it a mistake? And is it correct to define the final modifier in the ClassDecl Node as Child('FinalModifier', kind='Token', text_choices=['final'], is_optional=True) (as well as defining prefix modifier as Child('PrefixModifier', kind='Token', text_choices=['prefix'])?

Xi_Ge · March 16, 2018, 11:13pm

Yeah @ortem, you're right. You can find the decl modifier declarations for these words in here. I think our current token structure is mostly complete, so we may not need to add new token kinds any more.

Good catch! @ortem. I think we've not implemented modifier list for class decl. Could you try to fix this issue? The start point is to replace AccessLevelModifer with ModifierList. This allows us to enclose all kinds of modifiers like private, internal or final.
Thank you again for the documentation fix! Could you open another PR to fix this modifier issue?

ortem · March 19, 2018, 8:14pm

@Xi_Ge It is my pleasure to contribute to Swift . I opened new PR.

But it's not clear to me that I should use ModifierList in this case, because then it will be possible to declare class like unowned override class A {...} (as unowned and override are valid elements of ModifierList). Am I right?

harlanhaskins · March 19, 2018, 8:54pm

(Hi Artem!)
libSyntax makes an effort to represent invalid or in-progress syntactic constructs, because it’s designed with editing in mind. Since final public class A {} is valid as well as public final class A {}, we need to use ModifierList which can represent them in any order.

ortem · March 19, 2018, 9:03pm

Hi Harlan!
I got the point, thank you! So should I define prefix modifier (in PrefixOperatorDecl) as Child('PrefixModifier', kind='DeclModifier'), because prefix token contains in DeclModifier's children?

harlanhaskins · March 19, 2018, 9:11pm

I think you can represent it with kind=‘PrefixToken’, because the grammar doesn’t allow any other kind of DeclModifier in that position.

ortem · March 19, 2018, 9:37pm

Do I understand correctly that using PrefixToken means there is Token('prefix', ...) definition in Tokens.py? Because other <Something>Token from DeclNodes.py have related Token in Tokens.py. But I can't define Token('prefix', ...), because the word prefix isn't actually a keyword of Swift (e.g. it can be used as function name).

harlanhaskins · March 19, 2018, 9:43pm

Oh! Yes, you’re right. You can use Token, then, with the understanding that the only thing that’s valid is prefix. DeclModifier contains that extra Detail entry to accommodate private(set) which we don’t necessarily need.

ortem · March 19, 2018, 9:49pm

I got the point! I think I should define new Node like
Node( 'PrefixModifier', kind='Syntax', children=[Child('Name', kind='Token', text_choices=['prefix'])] )
and then use it as a kind of a Child inside PrefixOperatorDecl.