How to get all the emoji scalars? Why there is no `Character.isEmoji`?

There is Unicode.Scalar.Properties.isEmoji, but emoji character can be composed out of multiple scalars, seems there should be Character.isEmoji? According to the comment on Unicode.Scalar.Properties.isEmoji, determining whether a Character is emoji is not simple:

testing isEmoji alone on a single scalar is insufficient to determine if a unit of text is rendered as an emoji; a correct test requires inspecting multiple scalars in a Character . In addition to checking whether the base scalar has isEmoji == true , you must also check its default presentation (see isEmojiPresentation ) and determine whether it is followed by a variation selector that would modify the presentation.

The logic discuss above should be encapsulated in the missing Character.isEmoji? This would save a lot of having to know about "emoji Unicode".

On to how to get all the emoji scalars? From NSHispter article on CharacterSet, he is doing it this way:

import Foundation

var emoji = CharacterSet()

for codePoint in 0x0000...0x1F0000 {
    guard let scalarValue = Unicode.Scalar(codePoint) else {
        continue
    }

    // Implemented in Swift 5 (SE-0221)
    // https://github.com/apple/swift-evolution/blob/master/proposals/0221-character-properties.md
    if scalarValue.properties.isEmoji {
        emoji.insert(scalarValue)
    }
}

So it's brute force testing overly large code point, not efficient?

Found this from Stackoverflow:

extension Character {
    /// A simple emoji is one scalar and presented to the user as an Emoji
    var isSimpleEmoji: Bool {
        guard let firstScalar = unicodeScalars.first else { return false }
        return firstScalar.properties.isEmoji && firstScalar.value > 0x238C
    }

    /// Checks if the scalars will be merged into an emoji
    var isCombinedIntoEmoji: Bool { unicodeScalars.count > 1 && unicodeScalars.first?.properties.isEmoji ?? false }

    var isEmoji: Bool { isSimpleEmoji || isCombinedIntoEmoji }
}

it not doing all the logic mentioned in Unicode.Scalar.Properties.isEmoji. So it maybe not completely correct. And I don't know why:

&& firstScalar.value > 0x238C

So how get a list of all Emoji scalars?

In this SO post:

// NOTE: These ranges are still just a subset of all the emoji characters;
//       they seem to be all over the place...
let emojiRanges = [
    0x1F601...0x1F64F,
    0x2702...0x27B0,
    0x1F680...0x1F6C0,
    0x1F170...0x1F251
]

for range in emojiRanges {
    for i in range {
        guard let scalar = UnicodeScalar(i) else { continue }
        let c = String(scalar)
        print(c)
    }
}

In this SO post, the emoji scalars range is:

unicode-range: U+0080-02AF, U+0300-03FF, U+0600-06FF, U+0C00-0C7F, U+1DC0-1DFF, U+1E00-1EFF, U+2000-209F, U+20D0-214F, U+2190-23FF, U+2460-25FF, U+2600-27EF, U+2900-29FF, U+2B00-2BFF, U+2C60-2C7F, U+2E00-2E7F, U+3000-303F, U+A490-A4CF, U+E000-F8FF, U+FE00-FE0F, U+FE30-FE4F, U+1F000-1F02F, U+1F0A0-1F0FF, U+1F100-1F64F, U+1F680-1F6FF, U+1F910-1F96B, U+1F980-1F9E0;

if this is correct, then this is the complete list?

Where is the meat of Unicode.Scalar.Properties.isEmoji? I can’t find it in the GitHub source.

1 Like

stdlib/public/core/UnicodeScalarProperties.swift, lines 86 and 698.

I saw that. But that doesn't show the actual OptionSet bits for Unicode.Scalar's "properties".

Line 86 is the OptionSet bit mask for isEmoji
Line 698 return the membership test result.

There must be a table of all Unicode.Scalar's and its properties OptionSet bits.

where is _swift_stdlib_getBinaryProperties used in line 125?

As explain in line 49:

'_swift_stdlib_getBinaryProperties' where each bit indicates a unique Unicode defined binary property of a scalar

Where is this just for curiosity.

The raw stuff is under utils/gen-unicode-data, as far as I can tell.
Also: stdlib/public/stubs/Unicode/UnicodeScalarProps.cpp

What are you trying to do? It's not that it's "not simple" to determine if a character is presented as emoji, but rather it is impossible in the general case because there's no one answer.

Although many characters have either only a non-emoji presentation or only an emoji presentation, quite a few can be presented either as emoji or not emoji (even the digit "1"). isEmojiPresentation will tell you if Unicode recommends that the base scalar be presented as emoji by default. However, different applications are allowed by the standard—indeed, encouraged—to choose whether they want to present such characters as emoji, and they do in fact make different choices (on Mac, you can compare the presentation of such characters in BBEdit and iMessage, for example; some of Apple's first-party apps even make different choices for particular characters on iOS and macOS, despite being the "same" app).

The best analogy is that some (but certainly not all) CJK characters have Simplified Chinese-specific variants, Traditional Chinese-specific variants, Japanese-specific variants, and/or Korean-specific variants—these can also be specified using variant specifiers. Although these variant specifiers can be used, and although some CJK characters are only used in (e.g.) Simplified Chinese or in Korean text (and thus have no variants, requiring no variant specifiers to disambiguate), whether an arbitrary character "is Simplified Chinese" or "is Korean" is unknowable in the general case, because it depends on the language as deduced from the surrounding text, any metadata associated with the text, the user's language/locale settings, what fonts are available, and how any specific app chooses to present the text given all of the above factors. The same goes for "is Emoji."

8 Likes

I want to make an emoji picker in SwiftUI like the Emoji keyboard.

I know a lot of the emoji symbols are actually composed from multiple scalars. For those, I think I'll have to manually enter them by hand editing. But I am hoping for simple single code point emoji characters, I can programmatically generate those.

I don't understand why "0" (zero) by itself isEmoji as shown in the "Discussion" section of Unicode.Scalar.Properties.isEmoji. I only see it as an emoji when combined with VARIATION SELECTOR (U+FE0F) + something like COMBINING ENCLOSING KEYCAP 20E3:

"0\u{fe0f}\u{20e3}"      // 0️⃣

Anyway, what's the best way to make an emoji picker? Maybe I just scrapepage or this table.

Look like '_swift_stdlib_getBinaryProperties(scalar)' is a C function that does binary search on a code-gen’ed table.

Since the font renderer is able decide a character is emoji or not so clearly it's not "impossible", so Character.isEmoji can use the same logic?

My explanation above details how, for some characters, it is just as impossible to know if it's rendered as emoji as it is to know if it's rendered in red or bold. Character is not a font rendering engine.

To clarify Xiaodi's point. The emoji-ness of certain characters is like the column width of those characters. The only way to know it is to ask the rendering engine you're using.

1 Like

There are four flags for emoji:

isEmoji <= this means this scalar is or can be emoji
isEmojiPresentation <= true then this scalar is an emoji, false then need to combine with something else to be emoji
isEmojiModifier what's this?
isEmojiModifierBase what's this?

No, this is true when the scalar is recommended to have a default emoji presentation. However, the scalar can be followed by a text presentation selector, making it not an emoji for sure, or in the absence of any variant selector an app can still choose to render that character with text presentation. See the table in section 4 of UTS#51.

You may also notice in the nearby text of UTS#51 that Unicode characters fall into "text-only," "text-default," and "emoji-default" categories. However, notice that there is pointedly no "emoji-only" category.

See section 1.4.4 of UTS#51.

2 Likes

You can read the circa-2018 nitty-gritty here: SE-0221 – Character Properties - #30 by Michael_Ilseman

6 Likes

The text presentation selector ( U+FE0E ) dictates that it must not be rendered as emoji

It's not consistent:

In Xcode playground, it does as you said only when print to console area but not on the right side output area:

let renderAsText = "😀\u{FE0E}"     // "😀︎"  <== this is shown as emoji on the right side output area
print(renderAsText)     // print out text form of 😀︎ in console output, but as "😀︎\n" on right side output area

But in iOS SwiftUI view, it output the emoji in both the SwiftUI.Text render and print out in console:

let _ = print("\u{1F600}\u{FE0E}")  // this print out 😀 in console, not text!
Text("\u{1F600}\u{FE0E}")   // this render on screen 😀, not text!

Is SwiftUI text rendering broken? It's like this in both iOS and macOS.

Edit: I tried in Terminal.app with zsh and a Swift console program, both show emoji:

print '\U1f600\UFE0E'
😀︎
cat emoji.swift
print("\u{1F600}\u{FE0E}")
swift emoji.swift
😀︎

So only in the Xcode Playground console area, :grinning:︎ follow by text presentation selector ( U+FE0E ) is shown as text. Since it works in this one case, I think the macOS font rendering is capable of respecting the text presentation selector ( U+FE0E ). But why doesn't it work in those other cases?

Edit: iPad Playground also not work, only show as regular emoji :grinning:

1 Like

:man_shrugging:

1 Like