Surveying how Swift evolves

armcknight · May 18, 2018, 11:06pm

OK, I think I'll start a new pitch page for this (cool @davedelong? ), as well as one for String.trim() (cc @Erica_Sadun) just to turn down the chatter in this thread.

armcknight · May 18, 2018, 11:12pm

Erica started one for strings and looking over my data, I don't see many people actually defining a function or otherwise to grab the current instant in time, so I'm not going to start that topic myself.

davedelong · May 19, 2018, 12:06am

Cool

kiel · May 20, 2018, 2:55am

Will there be a pitch thread to follow this up in?

pvieito · May 20, 2018, 3:24pm

For now I have opened a radar about Date.now (rdar://40400849 if anyone wants to help ).

The other option would be to add it to Swift Foundation and the SDK overlays through Swift Evolution, I suppose.

Jon_Shier · May 20, 2018, 5:03pm

Is it just because Date() doesn’t match other libraries? Just seems unnecessary.

Alexandre_Lopoukhine · May 20, 2018, 5:24pm

I assume that the main advantage is doing things like someDate.timeIntervalSince(.now) as opposed to someDate.timeIntervalSince(Date()). Obviously not this exact function, but other functions that take dates become more readable.

pvieito · May 20, 2018, 8:35pm

It is mainly in the name of clarity and discoverability which are one of the main points of the Swift API Guidelines. Date.now is clear that represents the current Date while Date() seems to initialize an empty or default Date instance.

That default date could be the ReferenceDate or 1970 instead of Now, for example.

Also, there are already Date.distantPast and Date.distantFuture properties so adding Date.now seem inline with the API.

davedelong · May 21, 2018, 2:04am

Let's take the discussion of Date to this thread: Date.now() and other calendar thoughts

armcknight · July 24, 2018, 3:56pm

Hey, sorry to bump an old thread... expanding the search space, coupled with travel, delayed my update much later than promised

I expanded the search to just over 9K repos, and it took a couple days of processing time to clone and analyze them all. On top of that I had to implement some MapReduce-like scripts because there was too much data for jq to handle at once!

Ok, here's the top 10 extended non-Cocoa APIs (the number is the count of extensions on that API across all repos):

2597 String
720 Int
647 Array
643 Date
347 Double
325 Sequence
322 Dictionary where Key : Hashable
310 Request
227 Data
219 Collection

Some things moved around in the list, and Request bumped URL.

I automated the data munging I was doing to come up with all the lists sprinkled throughout this thread. I'll just skip straight to the latest work on function families, as @Erica_Sadun had asked for previously. I tokenized function names by '_', camelcase and digit boundaries (so 'foo_int64arrayPlease' => ['foo', 'int', '64', 'array', 'please']), and indexed all extension functions by those lists of keywords within an API. Currently, this only uses function names, not parameter labels that appear at the call site.

Top 5 function family keywords in String (number is amount of function names containing the keyword across all String extensions in all repos):

487 string
156 substring
138 date
128 index
110 replace

trim, the first function family keyword for String last time around, has been bumped to 16th place. Although, I'm not totally satisfied with the new #1 keyword being string, because it is not at all a cohesive group of functions. Here are the top 5 signatures:

18 stringByAppendingPathComponent(path: String) -> String
12 decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec
10 stringByAppendingPathExtension(ext: String) -> String?
7 withCString(_ body: (UnsafePointer) throws -> Result) rethrows -> Result
6 subString(startIndex: Int, length: Int) -> String

subString, matched by string, didn't match to the #2 keyword substring, but does match the #12 keyword 'sub'. This is a tricky thing to discern currently in my analysis–I can either throw some real matches away, or potentially bring in lots of false positives. Clustering on word roots is probably needed (i.e. how are 'substring', 'string' and 'sub' related?). This feels like it's getting into NLP territory, which is not my forte, un-forte-unately. Happy to hear suggestions on a package I could use for this.

Here are the top 5 families for the next few top extended API:

Int: random, times, string, format, overflow, clamp, time, up, formatted, gcd
Array: index, remove, first, object, each, find, map, array, last, json
Date: date, string, time, day, month, days, week, adding, year, jjs (I have no idea what jjs is, currently)
Sequence: filter, map, first, group, find, each, contains, reduce, sorted, index
Dictionary where Key : Hashable: map, key, value, string, json, merge, filter, values, dictionary, keys

I am also toying with filtering out common 2/3 letter words, prepositions etc from the keywords, and perhaps redundant type information like string for String should be treated specially, like finding function families within that keyspace.

I'd also like to normalize by repository count, because that 2nd string signature:

decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec

only appears in a single repository all 12 times.

Happy to hear other questions to be answered by this analysis or other considerations! I put all the aggregation results here if you'd like to have a look: Dropbox - File Deleted - Simplify your life and the the latest code is on github.

taylorswift · July 24, 2018, 6:54pm

i feel like random will subside over time since it’s in the standard library now

armcknight · July 24, 2018, 6:56pm

Totally, if/when I ever complete what I want the analysis to be, the next step will be tracking changes to the results over time

It could be helpful to point out things like that to folks who still roll their own, when they no longer have to.

nick.keets · July 25, 2018, 6:55am

Do you really need something so general? What if you had a few manual rules for the top-20 things?

armcknight · July 25, 2018, 4:19pm

It's a good point, and I had originally tried to keep a list of keywords. I worried that I wouldn't be able to predict a good set though, and the goal became to let the code tell how Swift might be extended.

Are there other angles I'm not seeing, in terms of more mechanical ways to find patterns that convey the intent of the extensions? Keep in mind, too, this is only analysis of extension functions; analyzing protocols might provide some good clues or show different ways to pull apart the semantics in aggregate. Class, struct and enum declarations probably must be handled completely differently, and I haven't gotten around to brainstorming that, but it could also shine new light.

Here's the state of my keyword brainstorm before I decided to can the approach:

image operations

colors

core graphics

frame logic

autolayout

core data

dictionaries

arrays

dates

gestures

sets

files

serialization

plist

json

xml

strings

hashing

notifications

kvo

user defaults

webkit and webviews

maps

bundles

networking

uikit

device stuff

alerts

collection views

table views

buttons

modals

You can see I didn't even bother actually guessing keywords, I just realized it was too big a task when I doubted I'd even be able to list all the parts of the ecosystem But, I'm also happy to collaborate on this list and post the results back here, if anyone cares to comment (I can also switch it to edit): Colloquial Swift - apple ecosystem keyspace brainstorm - Google Docs

ethanjdiamond · July 26, 2018, 3:45pm

This is a great way of getting real data. Great job Andrew.

Aside from extensions, another good thing to follow is what people are generating with either Sourcery or gyb. Sourcery's default templates called out the need for auto-equatable, auto-hashable, auto-codable, and autogenned enum cases a while before they were discussed here, and I certainly know that an easier way of mutating nested structs (Lenses) and auto-generating mocks would be a big lift to my projects. You could search for files ending in .stencil is majority Swift projects.

armcknight · July 27, 2018, 11:39pm

Fantastic idea, I had not considered this. Thank you, that's exactly the kind of thing I am looking for It's in the README so I don't forget! Are there any linters for stencil definitions, to pare down the variability of the template files? I couldn't find anything on a quick search.

dennisvennink · July 28, 2018, 2:21am

Awesome work!

Here's another idea. Your analysis primarily focuses on nominal types that can be extended. It would be interesting to see an analysis done on global functions and operators. I'm suspecting we'll see a lot of operations related to non-nominal types like functions, e.g., curry, uncurry, apply (partial application) and compose (function composition), and tuples, e.g., zipLongest, zip(Longest)With and product.

(This will likely tie in with the results of template generated code as these operations can be specified for multiple arities.)

armcknight · July 31, 2018, 9:42pm

Thanks for the ideas! I currently extract global function declarations, but as you've pointed out I haven't done aggregation on them yet. Stay tuned!

ole · August 1, 2018, 3:59pm

I did a manual search for "jjs" on GitHub and all matches on the first few pages were where somebody used "jjs" as their custom prefix for all kinds of extensions, e.g. func jjs_today or func jjs_dateBySubtractingMonths. Could that be it?

taylorswift · August 1, 2018, 7:32pm

i think that person just likes their initials