Surveying how Swift evolves

Hello!

I am developing a survey of Swift utility libraries, with the goal of observing how the language is evolving in the wild. It was rightly pointed out that this could be a fantastic tool to generate new Evolution Proposals. (Writeup, scripts, etc on GitHub at GitHub - armcknight/colloquial-swift: Writeups and scripts for my survey of Swift utility libraries. README is a work in progress!)

Details on the survey:

Things I have automated so far:

  • search github for Swift repositories matching certain criteria (e.g., name contains "util" or "extension"): 5,810 unique repositories currently returned
  • focus on repos with podspecs (maybe carthage and spm one day): 1,357 repositories
  • remove example/test/dependency directories: removed about half of over 500K files
  • focus on declarations using swiftc -print-ast (avoids searching through comments/implementation)
  • classify by type of declaration: func, enum, etc
  • especially focus on extension declarations and group all declarations they contain

From there I was able to do some quick command-line processing to aggregate and look for patterns. That's the next step that needs to be automated, as well as deciding which questions are actually important (and which ones aren't being asked yet). Here are some early things I found, which I presented recently at iOSDevCampCO.

Example findings:

Note: for all the lists, the first column of numbers are the frequency with which something was encountered.

  • almost 13,000 extension declarations
  • top 10 Swift stdlib APIs extended, with the number of extensions declared on each:

608 String
206 Array
204 Date
160 Int
119 Dictionary where Key : Hashable
90 Double
86 Data
68 Sequence
64 URL
62 Array where Element : Equatable

I drilled down into String to see how people are extending it. Here are the top 10 function declarations from extensions on String, in their canonical form from swiftc:

24 "trim() -> String"
13 "substring(from: Int) -> String"
12 "substring(to: Int) -> String"
11 "isValidEmail() -> Bool"
10 "trimmed() -> String"
10 "toBool() -> Bool?"
10 "height(withConstrainedWidth width: CGFloat, font: UIFont) -> CGFloat"
9 "trim()"
9 "toDouble() -> Double?"
9 "isNumber() -> Bool"

Trimming is not only the most popular function declaration, but appears many times in varied forms (84 total functions dealing with trimming):

24 "trim() -> String"
10 "trimmed() -> String"
9 "trim()"
3 "trimPhoneNumberString() -> String"
3 "trimNewLine() -> String"
3 "trimForNewLineCharacterSet() -> String"
2 "trimmedRight(characterSet set: NSCharacterSet = default) -> String"
2 "trimmedLeft(characterSet set: NSCharacterSet = default) -> String"
1 "trimmingWhitespacesAndNewlines() -> String"
1 "trimmedStart(characterSet set: CharacterSet = default) -> String"
1 "trimmedRight() -> String"
1 "trimmedLeft() -> String"
1 "trimmedEnd(characterSet set: CharacterSet = default) -> String"
1 "trimWhitespace() -> String"
1 "trimPrefix(prefix: String)"
1 "trimInside() -> String"
1 "trimDuplicates() -> String"
1 "trim(trim: String) -> String"
1 "trim(_ characters: String) -> String"
1 "trim(_ characterSet: CharacterSet) -> <>"
1 "stringByTrimmingTailCharactersInSet(_ set: CharacterSet) -> String"
1 "sk4TrimSpaceNL() -> String"
1 "sk4TrimSpace() -> String"
1 "sk4Trim(str: String) -> String"
1 "sk4Trim(charSet: NSCharacterSet) -> String"
1 "prefixTrimmed(prefix: String) -> String"
1 "omTrim()"
1 "m_trimmed() -> String"
1 "m_trim()"
1 "jjs_trimWhitespaceAndNewline() -> String"
1 "jjs_trimWhitespace() -> String"
1 "jjs_trimNewline() -> String"
1 "jjs_emptyOrStringAndTrim(str: String?) -> String"
1 "hyb_trimRight(trimNewline: Bool = default) -> String"
1 "hyb_trimLeft(trimNewline: Bool = default) -> String"
1 "hyb_trim(trimNewline: Bool = default) -> String"

I then looked at implementations just for the #1 form, trim() -> String:

8 return self.trimmingCharacters(in: NSCharacterSet.whitespaces)
6 return self.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceAndNewlineCharacterSet())
5 return self.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
5 return self.stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
3 return trimmingCharacters(in: CharacterSet.whitespacesAndNewlines)
3 return stringByTrimmingCharactersInSet(NSCharacterSet.whitespaceCharacterSet())
3 return self.stringByTrimmingCharactersInSet(.whitespaceCharacterSet())
3 return self.stringByTrimmingCharactersInSet(.whitespaceAndNewlineCharacterSet())
2 return trimmingCharacters(in: CharacterSet.whitespaces)
2 return self.trimmingCharacters(in: CharacterSet.whitespaces)
2 return self.trimmingCharacters(in: .whitespacesAndNewlines)
2 return self.trimmingCharacters(in: .whitespaces)
1 return trimmingCharacters(in: .whitespacesAndNewlines)
1 return trimmed
1 return stringByReplacingOccurrencesOfString(" ", withString: "")
1 return strTrimmed
1 return self.trimmingCharacters(in: characterSet)
1 return self.trimmingCharacters(in: NSMutableCharacterSet.whitespaceAndNewline() as CharacterSet)
1 return self.trimmingCharacters(in: NSCharacterSet.whitespacesAndNewlines)
1 return self.trimmingCharacters(in: NSCharacterSet.whitespacesAndNewlines())
1 return self.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines).trimmingCharacters(in: CharacterSet.whitespaces).trimmingCharacters(in: CharacterSet.whitespaces)
1 return (self as String).trimmingCharacters(in: CharacterSet.whitespaces)
1 return (self as NSString).trimmingCharacters(in: NSCharacterSet.whitespacesAndNewlines) as String

This tells me that lots of people are solving the same problem the same way. But not all–there are enough differences here to warrant concern that behaviors could diverge now, or later if things like NSCharacterSet.whitespaceCharacterSet() and (NSCharacterSet.whitespaceAndNewlineCharacterSet() one day diverge for whatever reason (maybe not those specifically, but this is the general idea of seemingly-same-yet-different functions; and the more complicated the problem, the higher the impact of this divergence).

Conclusion

Lots of extensions out there solve the same problems, albeit with different implementations. We could think of those implementations on spectra of stability, usability, and others. It might then stand to reason that those most common tasks with the best implementations should be incorporated into the Swift stdlib (maybe with even better implementations).

42 Likes

This is awesome stuff. Do you have the top 3-10 method-families for any of these extensions?

Wow, this is a really interesting way to do this analysis. Thank you for doing this! We clearly need a pitch for a trimming API :-)

9 Likes

This is fun!

Here are the top function families (ps thanks for that bit of vocab!) for the top 4 extended APIs:

  • top 4 families for String: trim, substring, email, height (for layout)
  • String conversions–after whittling away those top 4 families, I noticed lots of "toBool" etc functions
  • top 4 Array: shuffle, random, pop/push, shift
  • top 3 Date: today, format, offset (various calculations for elapsed time in/between different units)
  • top 4 Int: random, digits, times/upto/downto (looping), digits, bytes

The complete lists are currently here so you can see the breakdown of declarations: colloquial-swift/function_families at master · armcknight/colloquial-swift · GitHub

2 Likes

Awesome useful work!

2 Likes
  • Trim seems like a win.
  • Random (Int and Array): SE-0202
  • Array has popLast, but queue functions often come up in discussions and wish lists.
  • Date really needs @DaveDeLong's input.
  • Times, up to and down to, are somewhat covered by cycles, batches, and concatenation
  • Digits might have some opportunity (along with radix) although there's already some good stuff in stdlib.
  • Not quite sure what the "bytes" covers? You mean like 4 sequential bytes (0x00, 0xFF, 0x00, 0x23) from an Int regardless of endianness?

Also, did you see much about left and right padding? I remember @beccadax pitching some formatting stuff a while back.

I think this is fantastic. I really appreciate the data-oriented approach!

One thought about this:

I think it would also be interesting to relax the criteria, and look at GitHub projects containing apps, etc. The difference with those projects is that these will be codebases that implement some larger feature or purpose, and the extensions you will find are the things that were needed to get core work done. There may be some selection bias here with looking at the repositories that match the restricted criteria because the whole purpose of those repositories might be to vend utility APIs. That doesn't mean the signal here is wrong, but there may be some over-counting and thus over-weighing of the relative importance of some of these.

14 Likes

+1

1 Like

With respect to Date, today is a bit of a misnomer since Date is a moment in time and not a range. That said, we do have Date.now. It also has the usual mathematical operations for calculations about absolute time (+, -, etc).

Oh, I guess I misremembered about Date.now (which we could probably add), but it is the same as Date().

2 Likes

Thanks! I agree, there's no reason to exclude extension/other declarations from apps and other codebases. I think I went too far down the "utility lib" route, before settling on extension declarations, and didn't look back to see what I'd missed :)

I'll tweak a few filters and rerun the analysis to see how it changes the numbers, I'll check back maybe Monday or Tuesday with some results. My guess is that like you said, this is going to extend the long tails of the distributions by adding more uniquely specialized functions, even if it also notches up a few more String.trim()s :wink:

I haven't figured out good questions to ask about those long tails yet, or good strategies for automated analysis. I was thinking of tokenizing the identifiers and doing some clustering analysis on the words this extracts.

1 Like

This may not be the right thread for it, but IMO we shouldn't encourage Date usage, since it is fundamentally the wrong way to express calendrical ideas (calendar values are ranges, not instants).

If we do want some sort of "now()" concept, we should consider adding a Clock API. I've got that built in to Chronology and it is awesome.

I disagree that Date is fundamentally broken, but we can start a new thread about it if you like.

2 Likes

Dave didn't say it was broken. I read it as Date is misused. I would love a new thread for this topic. Date and time handling in Foundation (essentially inherited by Swift) is powerful, but it has some blind spots with regards to ergonomics.

1 Like

Ask and receive. (cc @armcknight )

6 Likes

I guess public func trimmed() -> String should not be mutating?

2 Likes

I think this should wait for (and use) the Character Properties pitch, and their definition for whitespace characters (of which they propose 2 options for debate):

4 Likes

I think that long-term, we will definitely want some kind of whitespace control / formatting / document-building solution.

In the near(er) term, we should deliver trim/pad/center/etc., which should compose well with a String.lines property (which we should also do soon). I was planning on tackling these after wrapping up more ABI-critical work, but if anyone else is interested in exploring this, let's spin off a thread and collaborate!

3 Likes

Yes, Date.now is really needed for clarity. I remember trying Date(timeIntervalSinceNow: 0) before realizing there had to be a simpler way.

Could this API be added through Swift Evolution?

1 Like