Image for post
Image for post
Code that relies on wrapping

As Apple M1 launches, there comes a tendency to port codes originally run on x86 machines to ARM. For C code, porting can be easy if the code only relies on well-defined behaviors — you just need to recompile it. But things can get complicated if your code relies on undefined behaviors — they might differ on these two architectures.

Days ago I was porting otfcc, an optimized OpenType builder and inspector, to ARM platform. I was expecting all I needed to do was a simple recompilation and everything would run out-of-the-box. But I got sucked into segmentation error and malformed output. After about a day’s debugging, I found that these errors were caused by a small difference in wrapping behavior between x86 and ARM! On x86, when casting a negative floating point number into unsigned integer, it will be wrapped. …


TL;DR: Modern compilers feature very strong capability of auto-vectorization. So just write loops and let the compiler optimize them! (with an appropriate optimization level)

Why We Need to Vectorize Our Code

In many cases, we need to perform the same operations on a huge amount of data. For example, if I need to calculate a single value of the probability density function of the Cauchy distribution, I will write:

float dcauchy_single(float x, float location, float scale) {
const float fct = M_1_PI / scale;
const float frc = (x - location) / scale;
const float y = fct / (1.0f + frc * frc);
return y;
}
Image for post
Image for post
The PDF of the Cauchy distribution

That works fine for a single…

About

Misaki Kasumi

A programming enthusiast

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store