Fast String Comparison From String Arrays

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.

This sounds like an interesting idea. The best way to promote this would be to make a Swift package first, so that Swift users can use it right away and we can see how it works in practice. If it has sufficient merit and general applicability, then it makes sense to propose including it in the standard library. There are tradeoffs to increasing the API surface area of the standard library, so having experience with a SPM package first can help drive discussion.

···

On Jul 28, 2017, at 5:54 AM, Omar Charif via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

If you’re doing only direct containment, the builtin Set will give O(1) lookups.

If you’re looking for, say, finding all values with a given prefix then a trie might be appropriate, and if you’re trying to do more interesting things (e.g. fuzzy search) there’s techniques like “finite state transducers” http://blog.burntsushi.net/transducers/ . I don’t believe either of these have anything built-in, and I suspect they (especially FSTs) are too specialized, or have too many possible variations, to be worth including directly in the current standard library, and a SwiftPM package would work almost as well as others have suggested.

Huon

···

On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

If you're doing something like that, you probably want to end up with some kind of bloom filter.

Alex

···

On 28 Jul 2017, at 13:54, Omar Charif via swift-evolution <swift-evolution@swift.org> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

This is a bit off topic, but does anybody know the data structure that supports Xcode’s fabulous case-insensitive-in-order-yet-disjoint-substring search?

-Kenny

···

On Jul 28, 2017, at 1:57 PM, Huon Wilson via swift-evolution <swift-evolution@swift.org> wrote:

On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

If you’re doing only direct containment, the builtin Set will give O(1) lookups.

If you’re looking for, say, finding all values with a given prefix then a trie might be appropriate, and if you’re trying to do more interesting things (e.g. fuzzy search) there’s techniques like “finite state transducers” http://blog.burntsushi.net/transducers/ . I don’t believe either of these have anything built-in, and I suspect they (especially FSTs) are too specialized, or have too many possible variations, to be worth including directly in the current standard library, and a SwiftPM package would work almost as well as others have suggested.

Huon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

The main advantage I introduce is actually no just a substring search, my main goal is to achieve high performance repeated search on the same given array of strings. I have tested my algorithm against something like spotlight for example searching for files etc … Actually my code is a bit faster on execution in a 1 to 1 single search. But if you go further and try to search for 2, 3, or more substrings in the same array my implementation will be even faster. My implementation is also simple and easy to use, you can simply see how easy and intuitive it is once you try it. I think this is basic String functionality that should already come with Swift Foundation library and it is additive with zero conflicts with Swift Foundation library. One last thing I have to say is that I have been testing this for 5 months so it is not new and I am not experimenting here :slight_smile:

···

On Jul 30, 2017, at 7:11 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org> wrote:

This is a bit off topic, but does anybody know the data structure that supports Xcode’s fabulous case-insensitive-in-order-yet-disjoint-substring search?

-Kenny

On Jul 28, 2017, at 1:57 PM, Huon Wilson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

If you’re doing only direct containment, the builtin Set will give O(1) lookups.

If you’re looking for, say, finding all values with a given prefix then a trie might be appropriate, and if you’re trying to do more interesting things (e.g. fuzzy search) there’s techniques like “finite state transducers” http://blog.burntsushi.net/transducers/ . I don’t believe either of these have anything built-in, and I suspect they (especially FSTs) are too specialized, or have too many possible variations, to be worth including directly in the current standard library, and a SwiftPM package would work almost as well as others have suggested.

Huon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Hi Omar,

Could you talk in a little more detail about your algorithm, including space/time complexity? It sounds interesting, but I am having trouble finding the relevant portions in the source code...

Thanks,
Jon

···

On Jul 31, 2017, at 1:26 AM, Omar Charif via swift-evolution <swift-evolution@swift.org> wrote:

The main advantage I introduce is actually no just a substring search, my main goal is to achieve high performance repeated search on the same given array of strings. I have tested my algorithm against something like spotlight for example searching for files etc … Actually my code is a bit faster on execution in a 1 to 1 single search. But if you go further and try to search for 2, 3, or more substrings in the same array my implementation will be even faster. My implementation is also simple and easy to use, you can simply see how easy and intuitive it is once you try it. I think this is basic String functionality that should already come with Swift Foundation library and it is additive with zero conflicts with Swift Foundation library. One last thing I have to say is that I have been testing this for 5 months so it is not new and I am not experimenting here :slight_smile:

On Jul 30, 2017, at 7:11 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

This is a bit off topic, but does anybody know the data structure that supports Xcode’s fabulous case-insensitive-in-order-yet-disjoint-substring search?

-Kenny

On Jul 28, 2017, at 1:57 PM, Huon Wilson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

If you’re doing only direct containment, the builtin Set will give O(1) lookups.

If you’re looking for, say, finding all values with a given prefix then a trie might be appropriate, and if you’re trying to do more interesting things (e.g. fuzzy search) there’s techniques like “finite state transducers” http://blog.burntsushi.net/transducers/ . I don’t believe either of these have anything built-in, and I suspect they (especially FSTs) are too specialized, or have too many possible variations, to be worth including directly in the current standard library, and a SwiftPM package would work almost as well as others have suggested.

Huon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org
https://lists.swift.org/mailman/listinfo/swift-evolution

Hello Jon,

My algorithm prioritizes time complexity and the cost is the space complexity.

Time Complexity

···

————————————
Time Complexity O(C), where C = number of matching String pyramids
We usually get to compare only the very similar string pyramids together, and the comparison is usually float to float comparison which is way quicker than character in a string to character in a string comparison, let me give an example:
[“make”, “lake”, “fake”, “take”, “cook”, “book”, “took”]

if we were to ask for end matching for “ke” we would get to compare only “ke” with [“make, “lake”, “fake”, “take”] simply because each word of these is having same blur value, or in other words they have same string pyramid coarse value, almost the same

After checking we get to return the result which is [“make, “lake”, “fake”, “take”]

So we almost get O(1) since we don’t waste any time checking for the different items and we directly jump into the similar ones.

Space complexity
————————————
Space complexity here is O(C * 2), where C = number of characters of a given string
We build a pyramid for each given string with number of levels = C / 2

We can cut the Space complexity down to O(C) if we remove the base level of the Pyramid, let me give you an example

if we were to encode the word “California” we get a pyramid like this

“California” <———— coarse
“Cali forn ia__"
“Ca li fo rn ia"
“C a l i f o r n i a” <—— base

if we get to remove the base level we remove half of the space we need, but once we do that we will not be able to match any string that has characters less than 2 characters minimum (just to reach the pyramid level after the base that we will remove)

That’s why the more we feed items in an array the more we benefit from the fact that they are already going to be grouped together and we will save time checking on wrong items.
Also this is better than a Set because a set is yes O(1) but can’t help providing any information more than straight forward equality.
My algorithm encodes the string with its features and characters chaining informations which helps when doing completion or substring matching.

I will leave it here for now, my message is getting larger :smiley:

regards,
Omar

On Aug 1, 2017, at 2:40 AM, Jonathan Hull <jhull@gbis.com> wrote:

Hi Omar,

Could you talk in a little more detail about your algorithm, including space/time complexity? It sounds interesting, but I am having trouble finding the relevant portions in the source code...

Thanks,
Jon

On Jul 31, 2017, at 1:26 AM, Omar Charif via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

The main advantage I introduce is actually no just a substring search, my main goal is to achieve high performance repeated search on the same given array of strings. I have tested my algorithm against something like spotlight for example searching for files etc … Actually my code is a bit faster on execution in a 1 to 1 single search. But if you go further and try to search for 2, 3, or more substrings in the same array my implementation will be even faster. My implementation is also simple and easy to use, you can simply see how easy and intuitive it is once you try it. I think this is basic String functionality that should already come with Swift Foundation library and it is additive with zero conflicts with Swift Foundation library. One last thing I have to say is that I have been testing this for 5 months so it is not new and I am not experimenting here :slight_smile:

On Jul 30, 2017, at 7:11 PM, Kenny Leung via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

This is a bit off topic, but does anybody know the data structure that supports Xcode’s fabulous case-insensitive-in-order-yet-disjoint-substring search?

-Kenny

On Jul 28, 2017, at 1:57 PM, Huon Wilson via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

On Jul 28, 2017, at 05:54, Omar Charif via swift-evolution <swift-evolution@swift.org <mailto:swift-evolution@swift.org>> wrote:

Hi,

I wonder whether there is already a way in Swift to compare a string against a large string array quickly without using the traditional ways of comparison.

Say we have ["a", "b", "c", "d"] and we would like to find whether this array contains "a", then we decide to check if we have "b" in that same array. Don't you think there is a way to represent the array in a different way and make this comparison a lot quicker ?

I know there are recurrent neural networks etc ... I am talking here about solution without learning anything, just representing the array differently so we can minimize that O(N).

I have developed an algorithm and it is doing pretty well so far and I wonder whether it would be accepted so I came to propose and see if this is interesting from your perspective.

I developed a Javascript version here https://omarshariffathi.github.io/quickhint/

If you think this is welcome in Swift Foundation I am ready for a pull request.
Thanks for reading.
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

If you’re doing only direct containment, the builtin Set will give O(1) lookups.

If you’re looking for, say, finding all values with a given prefix then a trie might be appropriate, and if you’re trying to do more interesting things (e.g. fuzzy search) there’s techniques like “finite state transducers” http://blog.burntsushi.net/transducers/ . I don’t believe either of these have anything built-in, and I suspect they (especially FSTs) are too specialized, or have too many possible variations, to be worth including directly in the current standard library, and a SwiftPM package would work almost as well as others have suggested.

Huon
_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

_______________________________________________
swift-evolution mailing list
swift-evolution@swift.org <mailto:swift-evolution@swift.org>
https://lists.swift.org/mailman/listinfo/swift-evolution

Terms of Service

Privacy Policy

Cookie Policy