I am trying to a c++ python so that converts a image into CbCrA, but the same function outputs two different outputs when run from a raspberry pi and a ubuntu laptop. I know that there are variables that have a different size on arm but im not able to pin point what messing it up though.
static PyObject *
method_rgb_to_atem(PyObject *self, PyObject *args)
{
Py_buffer input_buffer;
Py_ssize_t data_length;
unsigned int width, height, premultiply;
PyObject *res;
/* Parse arguments */
if (!PyArg_ParseTuple(args, "y*IIp", &input_buffer, &width, &height, &premultiply)) {
return NULL;
}
data_length = input_buffer.len;
unsigned char *buffer;
buffer = input_buffer.buf;
char *outbuffer = (char *) malloc(data_length);
if (outbuffer == NULL) {
return PyErr_NoMemory();
}
char *writepointer = outbuffer;
int pixel_size = 8;
for (int i = 0; i < data_length; i += pixel_size) {
// Convert RGBA 8888 to 10-bit BT.709 Y'CbCrA
float r1 = (float)buffer[0] / 255;
float g1 = (float)buffer[1] / 255;
float b1 = (float)buffer[2] / 255;
float r2 = (float)buffer[4] / 255;
float g2 = (float)buffer[5] / 255;
float b2 = (float)buffer[6] / 255;
if (premultiply) {
// PNG files have straight alpha, for BMD switchers premultipled alpha is easier
float a1 = (float)buffer[3] / 255;
float a2 = (float)buffer[7] / 255;
r1 = r1 * a1;
g1 = g1 * a1;
b1 = b1 * a1;
r2 = r2 * a2;
g2 = g2 * a2;
b2 = b2 * a2;
}
float y1 = (0.2126 * r1) + (0.7152 * g1) + (0.0722 * b1);
float y2 = (0.2126 * r2) + (0.7152 * g2) + (0.0722 * b2);
float cb = (b2 - y2) / 1.8556;
float cr = (r2 - y2) / 1.5748;
unsigned short a10a = ((buffer[3] << 2) * 219 / 255) + (15 << 2) + 1;
unsigned short a10b = ((buffer[7] << 2) * 219 / 255) + (15 << 2) + 1;
unsigned short y10a = clamp((unsigned short)(y1 * 876) + 64, 64, 940);
unsigned short y10b = clamp((unsigned short)(y2 * 876) + 64, 64, 940);
unsigned short cb10 = clamp((unsigned short)(cb * 896) + 512, 44, 960);
unsigned short cr10 = clamp((unsigned short)(cr * 896) + 512, 44, 960);
writepointer[0] = (unsigned char) (a10a >> 4);
writepointer[1] = (unsigned char) (((a10a & 0x0f) << 4) | (cb10 >> 6));
writepointer[2] = (unsigned char) (((cb10 & 0x3f) << 2) | (y10a >> 8));
writepointer[3] = (unsigned char) (y10a & 0xff);
writepointer[4] = (unsigned char) (a10b >> 4);
writepointer[5] = (unsigned char) (((a10b & 0x0f) << 4) | (cr10 >> 6));
writepointer[6] = (unsigned char) (((cr10 & 0x3f) << 2) | (y10b >> 8));
writepointer[7] = (unsigned char) (y10b & 0xff);
writepointer += pixel_size;
buffer += pixel_size;
}
res = Py_BuildValue("y#", outbuffer, data_length);
free(outbuffer);
return res;
}
On Ubuntu (22.04)
for this input b'xffx00x00xffxffx00x00xff'
the Output is
b':x96hxfa:x9fx00xfa'
or (hexdump)
00000000: 3A 96 68 FA 3A 9F 00 FA :.h.:...
On a Raspberry pi arm32bit and 64bi (raspberry pi os)
for the same input
I get the output
b':x98x00xfa:x9fx00xfa'
or (hexdump)
00000000: 3A 98 00 FA 3A 9F 00 FA :...:...
2
Answers
@AlanBirtles, Thanks for pointing it out i was able to write a custom clamp function that used float rather than short and that made the conversion happen correctly.
In
The arguments to
std::clamp
are deduced asint
.cb * 896
cast tounsigned short
evaluates to65434
(rather than the correct value of-102
), adding512
gives65946
which then gets clamped to960
rather than the expected value of 410. I’m guessing you meant to do this instead:You’ll then get an error that the argument types don’t match, you’ll either need to cast all of your arguments to
unsigned short
or just specify the template type explicitly:I’m not sure how your code ever worked (and I can’t reproduce it working),
std::clamp<unsigned short>((unsigned short)(cb * 896 + 512), 44, 960)
does work because casting65946
tounsigned short
does get you back to410
butstd::clamp
should deduce toint
in your code.