How to check whether a character is printable (from Cpp perspective)?

Till now I am using this ObjC++ function to check whether a character is printable:

- (BOOL)isPrintable:(UniChar)charCode
{
    return isprint(charCode);
}

Is there already a Swift-native alternative?

That sort of information is available, but it is more detailed because Character’s answer to that is not as clear‐cut. What you will need is something like the following, but it will require some tuning to determine exactly what you want it to do in your particular use case.

extension Unicode.Scalar {
  var isPrintable: Bool {
    switch properties.generalCategory {
    case .uppercaseLetter, .lowercaseLetter, .titlecaseLetter,
      /* ... */
      .mathSymbol, .currencySymbol, .modifierSymbol, .otherSymbol:
        return true
    case .control, .format:
      return false
    }
  }
}

extension Character {
  var isPrintable: Bool {
    return unicodeScalars.contains(where: { $0.isPrintable })
  }
}

The list of general categories is here, and some additional properties are listed here.

3 Likes

Thanks for the demonstration.
I just noticed that this requires converting Swift UniChar to Character which needs additional unwrapping process.
Is there a way to directly handle the Swift type "UniChar" in this case?

this is Unicode.GeneralCategory, it's an enum, not Bool

1 Like

Also completely unreachable. It was a holdover text fragment that somehow survived a refactor; it was originally the start of a property chain. That is what I get for neglecting to copy it back into a playground afterward to check for typos. Sorry for any confusion.

That does not look like a Swift type. Was it imported from C somewhere, or is it maybe the ancient unichar typealias in Foundation?

Assuming that like unichar it is really just a UInt16 representing a UTF‐16 code unit, then the closest you could get would be the following:

extension {
  func isPrintable() throws {
    guard let scalar = Unicode.Scalar(value: UInt32(self)) else {
      struct NotAWholeScalar: Error {}
      throw NotAWholeScalar()
    }
    return scalar.isPrintable
  }
}

Note that you cannot see through surrogates without their context, hence it throws for anything outside the Basic Multilingual Plane.

It would probably be better to load the UTF‐16 into a String and work from there instead.

1 Like

Was it imported from C somewhere

UniChar originated on traditional Mac OS and is found on modern systems courtesy of Core Foundation (which ran on traditional Mac OS). Consider, for example, the CFStringCreateWithCharacters(_:_:_:) function.

Interestingly, there’s no doc page for it. I suspect that’s because it comes from <MacTypes.h>, and stuff in /usr/include isn’t generally imported into the doc system )-:

Assuming that like unichar it is really just a UInt16 representing a
UTF‐16 code unit

Yep.

Share and Enjoy

Quinn “The Eskimo!” @ DTS @ Apple

2 Likes

Thank you so much. I modified it a little:

// Ref: https://forums.swift.org/t/57085/5
extension UniChar {
  public func isPrintable() -> Bool {
    guard Unicode.Scalar(UInt32(self)) != nil else {
      struct NotAWholeScalar: Error {}
      return false
    }
    return true
  }
}

If you're not going to throw an error, you can simplify this down to simply Unicode.Scalar(UInt32(self)) != nil.

Are you sure about the return true at the end instead of the return scalar.isPrintable I recommended? As written, your variant asserts that all of ASCII is printable. The name isUnicodeScalar() would better describe what your variant actually does.

I am actually using the ObjC code block above to handle keyinput signals of the Input Method project I am working on. Unfortunately, the Swift version I posted above (the "modified a little" block) hinders my Input Method from correctly handle function keys like PgUp/PgDn. I eventually reenabled the ObjC codes above.

The project repository is here for your reference: GitHub - ShikiSuen/vChewing-macOS: 威注音输入法的 macOS 版,以小麦注音输入法的引擎为主、搭配特制的简体中文专用辞库。是纯粹的简体中文注音输入法(也有原生的繁体中文输入模式,并非简繁转换)。目前研发管理工作均在 Gitee 进行。

P.S.: I am afraid that this issue is totally helpless at all since I can't find a way to make a minimal working sample for this issue.

Did you read my last comment? These seem to be the control characters macOS expects to come from various function keys. The last Swift iteration you posted would determine Page Up’s \u{B} to be “printable”. My last suggestion would fix that (assuming your scalar method reports false for the .control general category).

But may I ask what your definition of “printable” really is? To me that phrase means roughly, “it occupies space in the text visually and not just in memory”. C++’s isprint answers that question only for ASCII. I at first thought you wanted that definition applied to the whole of Unicode supported by String, and hence I was showing you how to inspect the Unicode properties. However, your latest comment suggests your real intent might just be filtering out keyboard control codes that are not really text input, so as not to interfere with the operating system while processing text coming from a keyboard. If so, what you are really asking is not “Is this character visible?”, but “Is this input code even text?” It makes a big difference because, for example, a zero‐width joiner is invisible, but still most definitely text. In the event that all you want to do is leave function input codes alone, then what you want is probably just this:

extension UniChar {
  var isTextKey: Bool {
    switch self {
    case 0x00..<0x20,
      0x7F:
      return false
    default:
      return true
    }
  }
}

And no, there is no native function for this particular purpose in any computing language, because it is an implementation detail of the operating system. To me it seems a bug that such function keys even rely on passing codes through the text processing system in the first place.

1 Like

I did read but didn't understand. Now I am reading your current reply and feel that I am really bad at describing product needs. I apologize for the confusions happened above.

The KeyHandler module in my IME (vChewing) was Swift-rewritten by myself as a derivative of the ObjC version of the same module used in McBopomofo IME. Here's how the "isprint" gets utilized in McBopomofo:

By the way, I tried your latest UniChar extension as the follows, but it still fails the PgUp/PgDn keys:

      // If ASCII but not printable, don't use insertText:replacementRange:
      // Certain apps don't handle non-ASCII char insertions.
      if charCode < 0x80, !charCode.isTextKey {
        return false
      }

Your code has nothing wrong. Now I do suspect that there is a possibility that the "isprint" used in the upstream ObjC code always return a true. McBopomofo's ObjC codebase is extremely old: It was initially developed in 2011 with lots of workarounds for compatibility purposes with macOS 10.05 Leopard. Some of those workarounds are still kept in their current codebase.

Maybe I should just remove this compatibility setting to see which application client bugs with this in the historical context of today.

I think this is the end of the question.

I just tried using this array to detect whether it contains the charCode:

    let blockedRange: [UniChar] = [0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F, 0x20, 0x7F]

And this failed my Input Method from using PgUp / PgDn, too.

Both my function and Jeremy's function work well in Swift but hindered the IME from correctly handling PgUp / PgDn. My conclusion is that I should remove this detection from my IME.

If the isprint(x) you are using is the one from the chart here, then it and my x.isTextKey are exactly identical. That would mean the real problem is elsewhere. On the other hand, if...

...it isn’t the standard isprint and has been overridden to behave differently in some way like you suspect, then it may not be obvious what it is really doing. The way to reverse engineer it would be to loop it through all the possibilities (0x0000..<0x10000), printing each pair of input character code and resulting Boolean.

If this works, go with it. The Objective C++ code you posted appears to be withholding control codes from getting any further into the text system. So it seems plausible that macOS could have changed the order it processes things. What the operating system used to handle and filter out before ever passing anything to your input method could now be deferred until afterward, but be getting caught in your net first and never reaching the operating system’s handling later on.

Thanks for the explanation.

Removing these codes won't solve the issue.
My only visible choice is to leave these codes in ObjCpp.

I think so, too.