Surveying how Swift evolves

OK, I think I'll start a new pitch page for this (cool @davedelong? ), as well as one for String.trim() (cc @Erica_Sadun) just to turn down the chatter in this thread.

Erica started one for strings and looking over my data, I don't see many people actually defining a function or otherwise to grab the current instant in time, so I'm not going to start that topic myself.

Cool

Will there be a pitch thread to follow this up in?

For now I have opened a radar about Date.now (rdar://40400849 if anyone wants to help :wink:).

The other option would be to add it to Swift Foundation and the SDK overlays through Swift Evolution, I suppose.

1 Like

Is it just because Date() doesn’t match other libraries? Just seems unnecessary.

1 Like

I assume that the main advantage is doing things like someDate.timeIntervalSince(.now) as opposed to someDate.timeIntervalSince(Date()). Obviously not this exact function, but other functions that take dates become more readable.

3 Likes

It is mainly in the name of clarity and discoverability which are one of the main points of the Swift API Guidelines. Date.now is clear that represents the current Date while Date() seems to initialize an empty or default Date instance.

That default date could be the ReferenceDate or 1970 instead of Now, for example.

Also, there are already Date.distantPast and Date.distantFuture properties so adding Date.now seem inline with the API.

4 Likes

Let's take the discussion of Date to this thread: Date.now() and other calendar thoughts

2 Likes

Hey, sorry to bump an old thread... expanding the search space, coupled with travel, delayed my update much later than promised :sweat_smile:

I expanded the search to just over 9K repos, and it took a couple days of processing time to clone and analyze them all. On top of that I had to implement some MapReduce-like scripts because there was too much data for jq to handle at once!

Ok, here's the top 10 extended non-Cocoa APIs (the number is the count of extensions on that API across all repos):

2597 String
720 Int
647 Array
643 Date
347 Double
325 Sequence
322 Dictionary where Key : Hashable
310 Request
227 Data
219 Collection

Some things moved around in the list, and Request bumped URL.

I automated the data munging I was doing to come up with all the lists sprinkled throughout this thread. I'll just skip straight to the latest work on function families, as @Erica_Sadun had asked for previously. I tokenized function names by '_', camelcase and digit boundaries (so 'foo_int64arrayPlease' => ['foo', 'int', '64', 'array', 'please']), and indexed all extension functions by those lists of keywords within an API. Currently, this only uses function names, not parameter labels that appear at the call site.

Top 5 function family keywords in String (number is amount of function names containing the keyword across all String extensions in all repos):

487 string
156 substring
138 date
128 index
110 replace

trim, the first function family keyword for String last time around, has been bumped to 16th place. Although, I'm not totally satisfied with the new #1 keyword being string, because it is not at all a cohesive group of functions. Here are the top 5 signatures:

18 stringByAppendingPathComponent(path: String) -> String
12 decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec
10 stringByAppendingPathExtension(ext: String) -> String?
7 withCString(_ body: (UnsafePointer) throws -> Result) rethrows -> Result
6 subString(startIndex: Int, length: Int) -> String

subString, matched by string, didn't match to the #2 keyword substring, but does match the #12 keyword 'sub'. This is a tricky thing to discern currently in my analysis–I can either throw some real matches away, or potentially bring in lots of false positives. Clustering on word roots is probably needed (i.e. how are 'substring', 'string' and 'sub' related?). This feels like it's getting into NLP territory, which is not my forte, un-forte-unately. Happy to hear suggestions on a package I could use for this.

Here are the top 5 families for the next few top extended API:

Int: random, times, string, format, overflow, clamp, time, up, formatted, gcd
Array: index, remove, first, object, each, find, map, array, last, json
Date: date, string, time, day, month, days, week, adding, year, jjs (I have no idea what jjs is, currently)
Sequence: filter, map, first, group, find, each, contains, reduce, sorted, index
Dictionary where Key : Hashable: map, key, value, string, json, merge, filter, values, dictionary, keys

I am also toying with filtering out common 2/3 letter words, prepositions etc from the keywords, and perhaps redundant type information like string for String should be treated specially, like finding function families within that keyspace.

I'd also like to normalize by repository count, because that 2nd string signature:

decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec

only appears in a single repository all 12 times.

Happy to hear other questions to be answered by this analysis or other considerations! I put all the aggregation results here if you'd like to have a look: Dropbox - File Deleted - Simplify your life and the the latest code is on github.

6 Likes

i feel like random will subside over time since it’s in the standard library now

Totally, if/when I ever complete what I want the analysis to be, the next step will be tracking changes to the results over time :chart_with_upwards_trend:

It could be helpful to point out things like that to folks who still roll their own, when they no longer have to.

1 Like

Do you really need something so general? What if you had a few manual rules for the top-20 things?

It's a good point, and I had originally tried to keep a list of keywords. I worried that I wouldn't be able to predict a good set though, and the goal became to let the code tell how Swift might be extended.

Are there other angles I'm not seeing, in terms of more mechanical ways to find patterns that convey the intent of the extensions? Keep in mind, too, this is only analysis of extension functions; analyzing protocols might provide some good clues or show different ways to pull apart the semantics in aggregate. Class, struct and enum declarations probably must be handled completely differently, and I haven't gotten around to brainstorming that, but it could also shine new light.

Here's the state of my keyword brainstorm before I decided to can the approach:

  • image operations
  • colors
  • core graphics
  • frame logic
  • autolayout
  • core data
  • dictionaries
  • arrays
  • dates
  • gestures
  • sets
  • files
  • serialization
    • plist
    • json
    • xml
  • strings
  • hashing
  • notifications
  • kvo
  • user defaults
  • webkit and webviews
  • maps
  • bundles
  • networking
  • uikit
    • device stuff
    • alerts
    • collection views
    • table views
    • buttons
    • modals

You can see I didn't even bother actually guessing keywords, I just realized it was too big a task when I doubted I'd even be able to list all the parts of the ecosystem :sweat_smile: But, I'm also happy to collaborate on this list and post the results back here, if anyone cares to comment (I can also switch it to edit): Colloquial Swift - apple ecosystem keyspace brainstorm - Google Docs

This is a great way of getting real data. Great job Andrew.

Aside from extensions, another good thing to follow is what people are generating with either Sourcery or gyb. Sourcery's default templates called out the need for auto-equatable, auto-hashable, auto-codable, and autogenned enum cases a while before they were discussed here, and I certainly know that an easier way of mutating nested structs (Lenses) and auto-generating mocks would be a big lift to my projects. You could search for files ending in .stencil is majority Swift projects.

1 Like

Fantastic idea, I had not considered this. Thank you, that's exactly the kind of thing I am looking for :slight_smile: It's in the README so I don't forget! Are there any linters for stencil definitions, to pare down the variability of the template files? I couldn't find anything on a quick search.

1 Like

Awesome work!

Here's another idea. Your analysis primarily focuses on nominal types that can be extended. It would be interesting to see an analysis done on global functions and operators. I'm suspecting we'll see a lot of operations related to non-nominal types like functions, e.g., curry, uncurry, apply (partial application) and compose (function composition), and tuples, e.g., zipLongest, zip(Longest)With and product.

(This will likely tie in with the results of template generated code as these operations can be specified for multiple arities.)

1 Like

Thanks for the ideas! I currently extract global function declarations, but as you've pointed out I haven't done aggregation on them yet. Stay tuned!

I did a manual search for "jjs" on GitHub and all matches on the first few pages were where somebody used "jjs" as their custom prefix for all kinds of extensions, e.g. func jjs_today or func jjs_dateBySubtractingMonths. Could that be it?

i think that person just likes their initials