LZW without pre-populated string table / alphabet

jjrscott · September 11, 2022, 12:20pm

This isn't really a project for other's to use but I wanted to share because of what Swift allowed, nay assisted me, to do.

Turns out I've made hobby of understanding and simplifying algorithms for quite a while now and wanted to try my hand at compression. So I took Rosetta Code's LZW compression implementation and set out to understand it. First I used Swift generics to abstract from String to [Character], then to [Hashable] arrays, and finally to [Equatable] arrays.

All normal Swift stuff these days. However, while going through this process I realised that the normal LZW algorithm can be made even simpler through the use of enums to render the pre-populated string table / alphabet redundant.

I'm sure this implicit coding step (that isn't part of the compression algorithm) has been noticed sometime between 1978 and now, but I found Swift extremely helpful in teasing LZW apart; I'm doubtful that other earlier generation languages would have helped me make so much progress.

Here's the normal LZW using Swift generics and enums:

import LZW

func compress(_ original: String) -> [UInt] {
    LZW.compress(original.utf8.map({$0})).map { unit in
        switch unit {
        case .element(let value):
            return UInt(value)
        case .index(let index):
            return UInt(UInt8.max) + UInt(index)
        }
    }
}

let compressed = compress("TOBEORNOTTOBEORTOBEORNOT")
print(compressed)

Along with the project should you wish to break it: GitHub - jjrscott/LZW: LZW without pre-populated string table / alphabet.

PS I was actually trying to construct an integer only implementation of LZW al la Index based binary search but enums just shouted at me .