skip to Main Content

In Dart/Flutter, given a Mandarin word "中", how to get the Unicode hex (4E2D) or decimal (20013) value?

I get the value from this website.

I can only convert the word into hex by doing HEX.encode(utf8.encode("中"));, which is not the Unicode value I need. I am very confused, need some explanation here.

2

Answers


  1. utf8.encode will give you the utf-8 byte values, but you are actually wanting the utf-16 code points.

    This example shows the values for the utf8 bytes and utf16 code points.

    import 'dart:convert';
    
    void main() {
      
      const input = '中';
      print('Input: $input');
      
      print('-----------------');
      print('Check UTF8 Values');
      final bytes = utf8.encode(input);
      print('Bytes: $bytes');
      for (final byte in bytes) {
        print('Byte Value [$byte] Converted to hex 0x${byte.toRadixString(16)}');
      }
      
      print('---------------------');
      print('Check CodeUnit Values');
      final codeUnits = input.codeUnits;
      print('Code Units: $codeUnits');
      for (final codeUnit in codeUnits) {
        print('CodeUnit Value [$codeUnit] Converted to hex 0x${codeUnit.toRadixString(16)}');
      }
      
    }
    

    And the output:

    Input: 中
    -----------------
    Check UTF8 Values
    Bytes: [228, 184, 173]
    Byte Value [228] Converted to hex 0xe4
    Byte Value [184] Converted to hex 0xb8
    Byte Value [173] Converted to hex 0xad
    ---------------------
    Check CodeUnit Values
    Code Units: [20013]
    CodeUnit Value [20013] Converted to hex 0x4e2d
    

    While this works for your specific character, if you are working with input that contains surrogate pairs you may want to look at string.runes instead of string.codeUnits

    Login or Signup to reply.
  2. To get the decimal or unicode value from a Chinese word, try any of the following.

    String mandarinWord = '中';
    int codePoint = mandarinWord.runes.first;
    String unicodeHex = codePoint.toRadixString(16).toUpperCase(); // returns the unicode hex "4E2D"
    String unicodeDecimal = codePoint.toString(); // this returns "20013"
    

    Alternatively, you could also do:

    String mandarinWord = '中';
    String unicodeHex = mandarinWord.codeUnitAt(0).toRadixString(16).toUpperCase(); // returns the unicode hex "4E2D"
    int unicodeDecimal = mandarinWord.codeUnitAt(0); // this returns 20013
    
    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search