This is a code that outputs in a .ppm file the mandelbrot fractal. How can I optimize this?
#include<bits/stdc++.h>
using namespace std;
int findMandelbrot(double cr, double ci, int max_iterations)
{
int i = 0;
double zr = 0.0, zi = 0.0;
while (i < max_iterations && zr * zr + zi * zi < 4.0)
{
double temp = zr * zr - zi * zi + cr;
zi = 2.0 * zr * zi + ci;
zr = temp;
++i;
}
return i;
}
double mapToReal(int x, int imageWidth, double minR, double maxR)
{
double range = maxR - minR;
return x * (range / imageWidth) + minR;
}
double mapToImaginary(int y, int imageHeight, double minI, double maxI)
{
double range = maxI - minI;
return y * (range / imageHeight) + minI;
}
int main()
{
ifstream f("input.txt");
int imageWidth, imageHeight, maxN;
double minR, maxR, minI, maxI;
if (!f)
{
cout << "Could not open file!" << endl;
return 1;
}
f >> imageWidth >> imageHeight >> maxN;
f >> minR >> maxR >> minI >> maxI;
ofstream g("output_image.ppm");
g << "P3" << endl;
g << imageWidth << " " << imageHeight << endl;
g << "255" << endl;
double start = clock();
for (int i = 0; i < imageHeight; i++)
{
for (int j = 0; j < imageWidth; j++)
{
double cr = mapToReal(j, imageWidth, minR, maxR);
double ci = mapToImaginary(i, imageHeight, minI, maxI);
int n = findMandelbrot(cr, ci, maxN);
int r = ((int)sqrt(n) % 256);
int gr = (2*n % 256);
int b = (n % 256);
g << r << " " << gr << " " << b << " ";
}
g << endl;
if(i == imageHeight / 2) break;
}
cout << "Finished!" << endl;
double stop = clock();
cout << (stop-start)/CLOCKS_PER_SEC;
return 0;
}
I go until imageHeight / 2 because in a photoshop I can just copy the other half.
I was thinking about loghartimic power but tried something and only works with integers…
3
Answers
So this is the hot loop:
I know how to implement non-integer power in fast CPU instructions but it won’t get you out of a bind as it doesn’t work for complex numbers at all. Nor will using std::complex help at all. You won’t gain anything to pay for the non-inlining and certainly can’t apply optimizations as you find them. So the best I can do is this:
Yes I know taking one integer test out of the loop didn’t buy much over all that. I only found one other optimizer and you’ll have to check if its really faster:
That’s all there is folks.
In
findMandelbrot
you use the expressionszr * zr
andzi * zi
in the loop test but then recalculate the same two expressions if the test succeeds. So one obvious improvement might be to cache those with something like…There are many ways to optimize the mandelbrot fractal.
One way is to optimize the code for your CPU or even your GPU. An impressive speedup is shown in: Mandelbrot with SSE, AVX and OpenCL. That optimizes the inner loop by nearly a factor of 1000. That’s 3 orders of magnitude faster.
But there are other ways to optimize. The first you have already mentioned: The mandelbrot set is mirrored at y=0. So you only need half of it. There are some more trivial ways to avoid running the inner loop. If you scroll way down on Wikipedia on Mandelbrot to Optimizations you see “Cardioid / bulb checking”. That is a simple check for points in the apple shaped main part or the circle directly left of it. For a poster that can cover a lot of points.
Another speedup I’ve seen uses the distance estimation (also on Wikipedia) method when generating previews or just the outline of the mandelbrot set in black&white. Random points in the image are computed and if outside the mandelbrot set a circle is drawn with radius given by the distance estimation method. Anything in that circle is not part of the mandelbrot set. This can quickly cover lots of points outside the mandelbrot set.
Then there are methods of approximating results that might not result in a perfect image but usually are good enough. For example:
Calculate points following the boundary where pixels take N and N+1 iterations to escape. Same for N+1 and N+2. The everything between those two boundaries takes N iterations.
Calculate the border of a rectangle (start with the whole image). If the border all takes N iterations then the inside of the rectangle takes N iterations and is filled in. Otherwise split the rectangle in 4 and repeat for each. The result might be improved by calculating a 2 or 3 pixel wide border but then less is saved.
Calculate the image at a low resolution and then double the resolution keeping the calculated points. Go over the image and if 5×5 of the original points have the same N then fill a rectangle around the inner 3×3 points (works with 3×3, 7×7, 9×9, … points too). Points not filled in you calculate. Then repeat the whole thing till you git the final resolution.
This is the hardest to get right and I’ve only ever seen one implementation of it. The idea is that points close together behave the same in the iteration. If you iterate for a grid of 3×3 points (covering the whole image to start with) you can interpolate the values for points in between using newton interpolation. That works quite well — until it doesn’t.
So besides the 3×3 points grid you also iterate 4 test points, the middle of each of the grid cells. Do 8 iteration for those 13 points and then interpolate the 4 test points from the grid points. If the calculates and interpolated results drift apart too much (this is the hard part) you throw away the last 8 iterations and subdivide the grid in 4 half sized grids. The missing points you interpolate. Repeat till you reach the final resolution.
The potential benefits are extreme even if it only works for a few iterations. Say you want your 40000 x 40000 pixel image. If SOI works for 10 loops (80 iterations) before the first subdivision then you saves 80 * 40000 * 40000 = 128_000_000_000 iteration by calculating 1040 and some interpolations and checks. Or a speedup of 123_076_923, 8 orders of magnitudes. But only for the first 80 iterations. As the grid is divided more and more the speedup becomes less. But every bit saved adds up.
The great thing about this method is that is works for smooth/continious coloring or mapping to heights. The other methods only work with mapping iterations to color bands.