skip to Main Content

I’m still playing around with C to understand how it works.

I’m having trouble printing characters from the extended ASCII table (128-255). If I do printf("Â") (for example) it prints  (everything works fine). However if I assign a variable for instance a = 194 and then print the variable printf("%c",a) it prints � instead of Â.

By the way it works fine with the 32-127 characters (for example 35 prints #)

-> How could I print one of the 128-255 character from an integer (decimal or binary)? Any help will be appreciated.

I am using gcc11.3 on Ubuntu 20.04.1 LTS

2

Answers


  1. It is likely both your compiler and the terminal use UTF-8 to encode non-ASCII characters. You can verify this with this code:

    #include <stdio.h>
    #include <string.h>
    
    int main(void) {
        const char *s = "Â";
        int len = strlen(s);
        printf("%s: len=%d, bytes=", s, len);
        for (int i = 0; i < len; i++) {
            printf("%02hhX%c", s[i], " n"[i == len - 1]);
        }
        return 0;
    }
    

    The output should be Â: len=2, bytes=C3 82.

    To convert non-ASCII characters to UTF-8 sequences on output streams, you can use the locale functions from <locale.h> and wide character output:

        setlocale(LC_ALL, "en_US.UTF-8");
        printf("%lcn", 194);
    

    Output:

    Â
    

    If the locale is correctly configured in the terminal, you can select the default locale with setlocale(LC_ALL, "");

    Login or Signup to reply.
  2. As pointed out by @interjay, as well as written on Wikipedia:

    … There is no formal definition of "extended ASCII", and even the use of the term is sometimes criticized because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its ANSI X3.4-1986 standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case. …

    🔗 Wikipedia: Extended ASCII


    Also, you are able to print  while using printf("Â"); because you are using it as a string.  is interpreted as a Unicode character (by both your compiler and your terminal). You can check it via compiling the following:

    #include <stdio.h>
    
    int main() {
        char c = 'Â'; // set single character variable to be Â
    
        printf("%c", c); // print the variable
    
        return 0;
    }
    

    On my system, my compiler gives me this warning:

    extended_ascii.c: In function ‘main’:
    extended_ascii.c:4:18: warning: multi-character character constant [-Wmultichar]
        4 |         char c = 'Â'; // set single character variable to be Â
          |                  ^~~
    extended_ascii.c:4:18: warning: overflow in conversion from ‘int’ to ‘char’ changes value from ‘50050’ to ‘-126’ [-Woverflow]
    

    which suggests that  is indeed a Unicode multi-byte character.


    You can also try running this code to check what  expands to:

    #include <stdio.h>
    
    int main() {
        char c[] = "Â"; // set a string variable to be Â
    
        for(int i = 0; c[i] != ''; i++) { // loop through each character
            printf("%d ", c[i]); // print integer value of each character
        }
    
        return 0;
    }
    

    and, its output:

    -61 -126
    

    So, your  is a multi-byte character expanding to those values. If you try printing each of these characters separately, you will again see ��. Thus, when printed together, the terminal would interpret it as a Unicode character and print the desired result.


    BTW, I also found this on the internet:

    In the Windows-1252 character set, ASCII code 194 is represented by the character  …

    Notice the use of the words Windows-1252 character set. Although I have no idea about that character set, the most probable reason for 194 not printing as  might be due to your terminal not supporting that character set.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search