Itai, when you say "memory-mapped from disk" data, what exactly do you mean?
e.g. Data(bytesNoCopy:) has it's own set of issues related to "value type-ness".
otherwise i can see your point. i'd say that:
- it's obvious and understandable that even if Set/Dictionary itself guarantees O(1) behaviour we can easily fool it supplying our own implementation of hashValue / EQ function that will have, say, O(n^2) behaviour, thus ruining the Set/Dictionary guarantees.
- in practice developers don't use multi megabyte (or gigabyte!) Data as Set elements or dictionary keys. (in those extremely rare cases they do - if hasValue was "true" - they'd quickly figure out the slowdown and do it somehow differently).
- the old PICT file format
had 512 header of arbitrary data (commonly all zeroes) seriously it's quite easy to encounter data formats (even audio visual cases like "PCM data in silent conditions" or "yuv data in a dark room", but there are zillion other non-AV cases of course) that has some common /slash/ constant data at the beginning.
- probably it makes sense to have a tree based implementation of dictionary / set in addition to the current hash based.
- it might well be the case that even shorter, under megabyte Strings could benefit speed wise from a tree based set / dictionary implementation.
- if nothing else, this magic 80 constant shall be in the documentation.
- there might be current or future hash collision attacks targeting this 80 byte hashValue limit! this is quite scary actually...