OK, I think I'll start a new pitch page for this (cool @davedelong? ), as well as one for String.trim() (cc @Erica_Sadun) just to turn down the chatter in this thread.
Erica started one for strings and looking over my data, I don't see many people actually defining a function or otherwise to grab the current instant in time, so I'm not going to start that topic myself.
Cool
Will there be a pitch thread to follow this up in?
For now I have opened a radar about Date.now
(rdar://40400849
if anyone wants to help ).
The other option would be to add it to Swift Foundation and the SDK overlays through Swift Evolution, I suppose.
Is it just because Date()
doesnât match other libraries? Just seems unnecessary.
I assume that the main advantage is doing things like someDate.timeIntervalSince(.now)
as opposed to someDate.timeIntervalSince(Date())
. Obviously not this exact function, but other functions that take dates become more readable.
It is mainly in the name of clarity and discoverability which are one of the main points of the Swift API Guidelines. Date.now
is clear that represents the current Date
while Date()
seems to initialize an empty or default Date
instance.
That default date could be the ReferenceDate
or 1970
instead of Now
, for example.
Also, there are already Date.distantPast
and Date.distantFuture
properties so adding Date.now
seem inline with the API.
Let's take the discussion of Date
to this thread: Date.now() and other calendar thoughts
Hey, sorry to bump an old thread... expanding the search space, coupled with travel, delayed my update much later than promised
I expanded the search to just over 9K repos, and it took a couple days of processing time to clone and analyze them all. On top of that I had to implement some MapReduce-like scripts because there was too much data for jq
to handle at once!
Ok, here's the top 10 extended non-Cocoa APIs (the number is the count of extensions on that API across all repos):
2597 String
720 Int
647 Array
643 Date
347 Double
325 Sequence
322 Dictionary where Key : Hashable
310 Request
227 Data
219 Collection
Some things moved around in the list, and Request
bumped URL
.
I automated the data munging I was doing to come up with all the lists sprinkled throughout this thread. I'll just skip straight to the latest work on function families, as @Erica_Sadun had asked for previously. I tokenized function names by '_', camelcase and digit boundaries (so 'foo_int64arrayPlease' => ['foo', 'int', '64', 'array', 'please']), and indexed all extension functions by those lists of keywords within an API. Currently, this only uses function names, not parameter labels that appear at the call site.
Top 5 function family keywords in String (number is amount of function names containing the keyword across all String extensions in all repos):
487 string
156 substring
138 date
128 index
110 replace
trim
, the first function family keyword for String
last time around, has been bumped to 16th place. Although, I'm not totally satisfied with the new #1 keyword being string
, because it is not at all a cohesive group of functions. Here are the top 5 signatures:
18 stringByAppendingPathComponent(path: String) -> String
12 decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec
10 stringByAppendingPathExtension(ext: String) -> String?
7 withCString(_ body: (UnsafePointer) throws -> Result) rethrows -> Result
6 subString(startIndex: Int, length: Int) -> String
subString
, matched by string
, didn't match to the #2 keyword substring
, but does match the #12 keyword 'sub'. This is a tricky thing to discern currently in my analysisâI can either throw some real matches away, or potentially bring in lots of false positives. Clustering on word roots is probably needed (i.e. how are 'substring', 'string' and 'sub' related?). This feels like it's getting into NLP territory, which is not my forte, un-forte-unately. Happy to hear suggestions on a package I could use for this.
Here are the top 5 families for the next few top extended API:
Int: random, times, string, format, overflow, clamp, time, up, formatted, gcd
Array: index, remove, first, object, each, find, map, array, last, json
Date: date, string, time, day, month, days, week, adding, year, jjs (I have no idea what jjs
is, currently)
Sequence: filter, map, first, group, find, each, contains, reduce, sorted, index
Dictionary where Key : Hashable: map, key, value, string, json, merge, filter, values, dictionary, keys
I am also toying with filtering out common 2/3 letter words, prepositions etc from the keywords, and perhaps redundant type information like string
for String
should be treated specially, like finding function families within that keyspace.
I'd also like to normalize by repository count, because that 2nd string
signature:
decodeCString(_ cString: UnsafePointer<Encoding.CodeUnit>?, as encoding: Encoding.Type, repairingInvalidCodeUnits isRepairing: Bool) -> (result: String, repairsMade: Bool)? where Encoding : UnicodeCodec
only appears in a single repository all 12 times.
Happy to hear other questions to be answered by this analysis or other considerations! I put all the aggregation results here if you'd like to have a look: Dropbox - File Deleted - Simplify your life and the the latest code is on github.
i feel like random
will subside over time since itâs in the standard library now
Totally, if/when I ever complete what I want the analysis to be, the next step will be tracking changes to the results over time
It could be helpful to point out things like that to folks who still roll their own, when they no longer have to.
Do you really need something so general? What if you had a few manual rules for the top-20 things?
It's a good point, and I had originally tried to keep a list of keywords. I worried that I wouldn't be able to predict a good set though, and the goal became to let the code tell how Swift might be extended.
Are there other angles I'm not seeing, in terms of more mechanical ways to find patterns that convey the intent of the extensions? Keep in mind, too, this is only analysis of extension functions; analyzing protocols might provide some good clues or show different ways to pull apart the semantics in aggregate. Class, struct and enum declarations probably must be handled completely differently, and I haven't gotten around to brainstorming that, but it could also shine new light.
Here's the state of my keyword brainstorm before I decided to can the approach:
- image operations
- colors
- core graphics
- frame logic
- autolayout
- core data
- dictionaries
- arrays
- dates
- gestures
- sets
- files
- serialization
- plist
- json
- xml
- strings
- hashing
- notifications
- kvo
- user defaults
- webkit and webviews
- maps
- bundles
- networking
- uikit
- device stuff
- alerts
- collection views
- table views
- buttons
- modals
You can see I didn't even bother actually guessing keywords, I just realized it was too big a task when I doubted I'd even be able to list all the parts of the ecosystem But, I'm also happy to collaborate on this list and post the results back here, if anyone cares to comment (I can also switch it to edit): Colloquial Swift - apple ecosystem keyspace brainstorm - Google Docs
This is a great way of getting real data. Great job Andrew.
Aside from extensions, another good thing to follow is what people are generating with either Sourcery or gyb. Sourcery's default templates called out the need for auto-equatable, auto-hashable, auto-codable, and autogenned enum cases a while before they were discussed here, and I certainly know that an easier way of mutating nested structs (Lenses) and auto-generating mocks would be a big lift to my projects. You could search for files ending in .stencil is majority Swift projects.
Fantastic idea, I had not considered this. Thank you, that's exactly the kind of thing I am looking for It's in the README so I don't forget! Are there any linters for stencil definitions, to pare down the variability of the template files? I couldn't find anything on a quick search.
Awesome work!
Here's another idea. Your analysis primarily focuses on nominal types that can be extended. It would be interesting to see an analysis done on global functions and operators. I'm suspecting we'll see a lot of operations related to non-nominal types like functions, e.g., curry
, uncurry
, apply
(partial application) and compose
(function composition), and tuples, e.g., zipLongest
, zip
(Longest
)With
and product
.
(This will likely tie in with the results of template generated code as these operations can be specified for multiple arities.)
Thanks for the ideas! I currently extract global function declarations, but as you've pointed out I haven't done aggregation on them yet. Stay tuned!
I did a manual search for "jjs" on GitHub and all matches on the first few pages were where somebody used "jjs" as their custom prefix for all kinds of extensions, e.g. func jjs_today
or func jjs_dateBySubtractingMonths
. Could that be it?