Swift performance while manipulating String

You’ve gotten some good performance tips in this thread. One more:

components(separatedBy:) is part of Foundation, so this will require a bunch of bridging. Try using split(separator:) instead—it operates directly on native Swift types.

7 Likes

Swift respects Unicode canonical equivalence, whereas the languages you are comparing against do not. That means Swift considers "a" + "\u{300}" ( a + ◌̀ ) and "\u{E0}" ( ) to be equal.

I made the test on Kotlin according to your little example.

val composed = '\u00E0'.toString()

val decomposed = "a" + '\u0300'.toString()

val strings = listOf<String>(composed, decomposed, composed, decomposed, "a", "b", "c")
for (e in strings) {
    println(e)
}
println(strings)

val sorted = strings.sorted()

println(sorted)

val descriptions = sorted.map { string ->
    val scalars = string
        .map({it.toInt()})
        .joinToString(separator = ", ")
        "${string}: ${scalars}"
}

for (entry in descriptions) {
    println(entry)
}

Results
The sorted array = [a, à, à, b, c, à, à]
Array Value output =
a: 97
à: 97, 768
à: 97, 768
b: 98
c: 99
à: 224
à: 224

As you said, the sorting is not correct.

1 Like

It kinda correctly sort them as an array of utf8 code points, it's just not a unicode-correct way to treat strings. Many languages' stdlibs still treat strings like that, so it's probably also useful to know, esp. if you need to interop between languages.

You're right. It was a simple perf test but it elevates me to better knowledge on String matter as I have to interop with projects on different platforms in few months.

  1. tried using MutableCollection's swapAt

I did it to see: it lets gain 3 secs. 35s > 32s

split(seperator:) return a SubString, so you must map it to convert each member to String.
No?

SubString is also Comparable, so you can pretty much do the same thing, and defer the conversion (to String) to be as late as possible.

I did the test.

The result is disastrous: I stopped the execution after 5 min and my fan making lot of noise.

Even conforming to Comparable, it's look like not very efficient.

2 Likes

I try your suggestion, but just want to compare performance between components(separatedBy: ) vs split(separator:). The result is components(separatedBy: ) is faster than split(separator:). Not as you said.
I test by create new iOS project and write the test function into XCTestCase. Here are the result:

func splitString(string: String) {
    let list = string.split(separator: ",")
    for obj in list {
        let prefix = "Pre-"
        let newString = prefix + obj
        print(newString)
    }
}

func foundationSplit(string: String){
    let list = string.components(separatedBy: ",")
    for obj in list {
        let prefix = "Pre-"
        let newString = prefix + obj
        print(newString)
    }
}

let string = """
Việt Nam cần xem xét năng lực xét nghiệm. Để kiểm soát được nguy cơ ca nhiễm nCoV tăng khi mở đường bay quốc tế, Việt Nam nên thực hiện xét nghiệm trên diện rộng để có con số chính xác ở cấp độ địa phương và trung ương. Khi đó, cả chính phủ và người dân đều biết nơi nào có ca nhiễm tăng hay có cụm nhiễm. Tại các sân bay, nhà chức trách cần có hệ thống xét nghiệm nhanh đáng tin cậy, để các hành khách biết tình trạng sức khoẻ của mình trước khi di chuyển. Hawkins lưu ý quy định hành khách có giấy chứng nhận âm tính với nCoV trước khi đến sân bay một vài ngày không giúp bảo đảm an toàn, vì họ có thể có kết quả dương tính ngay sau đó. Hôm 9/8, Phó thủ tướng Vũ Đức Đam khẳng định năng lực xét nghiệm Covid-19 của Việt Nam đã tốt hơn nhiều so với trước cả về sản xuất kit thử và máy móc. Tuy nhiên, giải pháp chống dịch hiệu quả vẫn là phát hiện sớm, truy vết nhanh, xét nghiệm theo nhóm.
"""

func testPerformanceSplit() throws {
    // This is an example of a performance test case.
    self.measure {
        // Put the code you want to measure the time of here.
        for _ in 0...100 {
            splitString(string: string)
        }
    }
}

func testPerformanceFoundation() throws {
    // This is an example of a performance test case.
    self.measure {
        // Put the code you want to measure the time of here.
        for _ in 0...100 {
            foundationSplit(string: string)
        }
    }
}

I run each test 6 times, with:
Xcode 12.0 beta 5 (12A8189h)
on
MacOS: 10.15.5 - MacBook Pro (13-inch, 2017) - 2.3 GHz Dual-Core Intel Core i5 - 16 GB 2133 MHz LPDDR3
The result of the testPerformanceSplit in the seconds:

  • 0.049
  • 0.054
  • 0.047
  • 0.045
  • 0.052
  • 0.049

And here is the result of testPerformanceFoundation :

  • 0.044
  • 0.040
  • 0.039
  • 0.047
  • 0.040
  • 0.045

The result of test show that components(separatedBy:) is faster than split(separator:).
Please correct me if I did somethings wrong :))

I was not talking about performance of split(...) or components(..)

but the comparison of String or Substring

1 Like

Perhaps this is because Foundation doesn’t take into account combining characters. Try splitting on “é” with two kinds of “é” in the string (precomposed and combining) to see if you get different results. You could also insert a combining character after any separator in the long string to get a different result.

This is very interesting, and it would be great if you could file a report at bugs.swift.org with a self-contained reproducer, as it looks very much like we have some missing optimizations here.

5 Likes

the subdomain website (bugs) required an accompte that seams different from forums. Is this right?

Yes, it's a separate login.

Could you also share the link to the bug report once you reported it?

Yes. I gonna do that.

I filed a report: [SR-13544] Comparing String or SubString performance · Issue #55981 · apple/swift · GitHub

//————
I did complementary test of sorting based on randomized string.

2 kinds of string, generated from 2 kinds of char collections
1: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"
2: "aáàâbcdeéèfghiíìjklmnñoóòôpqrstuúùûvwxyzAÁÀÂBCDEÉÈÊFGHIíÌÎJKLMNOÓÒÔPQRSTUÚÙÛVWXYZ"

Simple string w\ components(separatedBy => avg. 4.73

Simple string w\ split(separator => avg. 41.8

Complexe string w\ components(separatedBy => avg. 3.05

Complexe string w\ split(separator => avg. 4.06

I reported as well in my bug report

2 Likes