c++ function outputs differently when run from a arm device vs a x64 machine - Ubuntu

doctor
March 2, 2023
267 views
0 votes
2 Answers

I am trying to a c++ python so that converts a image into CbCrA, but the same function outputs two different outputs when run from a raspberry pi and a ubuntu laptop. I know that there are variables that have a different size on arm but im not able to pin point what messing it up though.

static PyObject *
method_rgb_to_atem(PyObject *self, PyObject *args)
{
    Py_buffer input_buffer;
    Py_ssize_t data_length;
    unsigned int width, height, premultiply;
    PyObject *res;

    /* Parse arguments */
    if (!PyArg_ParseTuple(args, "y*IIp", &input_buffer, &width, &height, &premultiply)) {
        return NULL;
    }

    data_length = input_buffer.len;
    unsigned char *buffer;
    buffer = input_buffer.buf;

    char *outbuffer = (char *) malloc(data_length);
    if (outbuffer == NULL) {
        return PyErr_NoMemory();
    }

    char *writepointer = outbuffer;

    int pixel_size = 8;
    for (int i = 0; i < data_length; i += pixel_size) {
        // Convert RGBA 8888 to 10-bit BT.709 Y'CbCrA
        float r1 = (float)buffer[0] / 255;
        float g1 = (float)buffer[1] / 255;
        float b1 = (float)buffer[2] / 255;
        float r2 = (float)buffer[4] / 255;
        float g2 = (float)buffer[5] / 255;
        float b2 = (float)buffer[6] / 255;

        if (premultiply) {
            // PNG files have straight alpha, for BMD switchers premultipled alpha is easier
            float a1 = (float)buffer[3] / 255;
            float a2 = (float)buffer[7] / 255;
            r1 = r1 * a1;
            g1 = g1 * a1;
            b1 = b1 * a1;
            r2 = r2 * a2;
            g2 = g2 * a2;
            b2 = b2 * a2;
        }

        float y1 = (0.2126 * r1) + (0.7152 * g1) + (0.0722 * b1);
        float y2 = (0.2126 * r2) + (0.7152 * g2) + (0.0722 * b2);
        float cb = (b2 - y2) / 1.8556;
        float cr = (r2 - y2) /  1.5748;

        unsigned short a10a = ((buffer[3] << 2) * 219 / 255) + (15 << 2) + 1;
        unsigned short a10b = ((buffer[7] << 2) * 219 / 255) + (15 << 2) + 1;
        unsigned short y10a = clamp((unsigned short)(y1 * 876) + 64, 64, 940);
        unsigned short y10b = clamp((unsigned short)(y2 * 876) + 64, 64, 940);
        unsigned short cb10 = clamp((unsigned short)(cb * 896) + 512, 44, 960);
        unsigned short cr10 = clamp((unsigned short)(cr * 896) + 512, 44, 960);

        writepointer[0] = (unsigned char) (a10a >> 4);
        writepointer[1] = (unsigned char) (((a10a & 0x0f) << 4) | (cb10 >> 6));
        writepointer[2] = (unsigned char) (((cb10 & 0x3f) << 2) | (y10a >> 8));
        writepointer[3] = (unsigned char) (y10a & 0xff);
        writepointer[4] = (unsigned char) (a10b >> 4);
        writepointer[5] = (unsigned char) (((a10b & 0x0f) << 4) | (cr10 >> 6));
        writepointer[6] = (unsigned char) (((cr10 & 0x3f) << 2) | (y10b >> 8));
        writepointer[7] = (unsigned char) (y10b & 0xff);
        writepointer += pixel_size;
        buffer += pixel_size;
    }

    res = Py_BuildValue("y#", outbuffer, data_length);
    free(outbuffer);
    return res;
}

On Ubuntu (22.04)
for this input b'xffx00x00xffxffx00x00xff'
the Output is

b':x96hxfa:x9fx00xfa'

or (hexdump)

00000000: 3A 96 68 FA 3A 9F 00 FA                           :.h.:...

On a Raspberry pi arm32bit and 64bi (raspberry pi os)
for the same input
I get the output

b':x98x00xfa:x9fx00xfa'

or (hexdump)

00000000: 3A 98 00 FA 3A 9F 00 FA                           :...:...

Tags: arm byte c#raspberry-pi sizeof

Answers

Chosen as BEST ANSWER
- doctor
- March 10, 2023 at 11:10 am
- 0 votes
0
@AlanBirtles, Thanks for pointing it out i was able to write a custom clamp function that used float rather than short and that made the conversion happen correctly.
```
unsigned short
clamp(float v, unsigned short min, unsigned short max)
{
    const short t = v < min ? min : v;
    return t > max ? max : t;
}
```

(Edit)

- AlanBirtles
- March 2, 2023 at 3:11 pm
- 0 votes
0
In
```
float cb = -0.114572;
unsigned short cb10 = std::clamp((unsigned short)(cb * 896) + 512, 44, 960);
```
The arguments to std::clamp are deduced as int. cb * 896 cast to unsigned short evaluates to 65434 (rather than the correct value of -102), adding 512 gives 65946 which then gets clamped to 960 rather than the expected value of 410. I’m guessing you meant to do this instead:
```
unsigned short cb10 = std::clamp((unsigned short)(cb * 896 + 512), 44, 960);
```
You’ll then get an error that the argument types don’t match, you’ll either need to cast all of your arguments to unsigned short or just specify the template type explicitly:
```
unsigned short cb10 = std::clamp<unsigned short>(cb * 896 + 512, 44, 960);
```
I’m not sure how your code ever worked (and I can’t reproduce it working), std::clamp<unsigned short>((unsigned short)(cb * 896 + 512), 44, 960) does work because casting 65946 to unsigned short does get you back to 410 but std::clamp should deduce to int in your code.
Login or Signup to reply.

Please signup or login to give your own answer.

Click here to cancel reply.

c++ function outputs differently when run from a arm device vs a x64 machine – Ubuntu

Answers