skip to Main Content

Beginner programmer here.

Recently I noticed that my program was using a truly excessive amount of memory, as reported by Visual Studio. Pic for reference: Way too much memory

I determined that the problem was a loop where a new instance of a massive bool array was being created every cycle. I changed the code from this (simplified):

for (int i = 0; i < TotalLoops; i++)
{
    bool[,] BigBoolArray = new bool[x,y];
    // Do stuff
}

To this:

bool[,] BigBoolArray = new bool[x,y];
for (int i = 0; i < TotalLoops; i++)
{
    Array.Clear(BigBoolArray, 0, BigBoolArray.Length);
    // Do stuff
}

This reduced memory usage all the way down to this: Much Better

Problem solved, right? Sort of…

Having made this discovery, my OCD kicked in, and I went though and removed every instance of the "new" keyword found inside a loop. Two things happened as a result of this:
1. My code became a lot less readable. Worse, I must’ve screwed something up in the process of making all of these changes, because it’s spitting out some very strange results.
2. I declared a few variables outside of a method (i.e. outside the block of code that runs when I click a button on the form), and that seems to make the program slower. (Perhaps they are being stored on the heap instead of the stack?)

So what’s the right approach here? Where should I declare and initialise variables, with both performance and readability in mind? Should I just accept the increased memory usage in some cases, and only initialise outside the loop in extreme cases like my massive bool array? Does it have to do with the type of variable? (Besides the bool array, I’m also using a lot of hashsets, in case that makes a difference.)

2

Answers


  1. Short answer, there is not one right approach… it all depends.

    But before I get deeper into it, I would love to see some of your code after you changed it, just to get an understanding of what you mean by "spitting out some very strange results".

    Either way, be careful about focusing on optimizing things early. Optimizations should (roughly speaking) be made only if the requirements calls for it; you say, for example, that 400+ MB of memory usage is way too much… but the obvious question is – to whom? I have Visual Studio open right now, and that – according to Task Manager – is eating up more than 1GB of memory on my system. Is that unacceptable? I don’t think so, and I don’t think it’s a problem either. If your requirements say that your application can only use 16MBs of RAM, or it must be able to "process 1K requests per second", then you will probably have to make some changes and rethink your approach.

    Consider the usage case of your application. If you’re creating an application for fun, then the requirements may be (and probably are) totally different from ones for an application that is going into a production environment at a business, for example. I don’t think there’s anything wrong with trying to optimize your code, especially if you’re still learning or want to try things out, but be careful about making it your primary concern when other parts (such as readability) might be equally or even more important.

    Generally speaking, I would always aim for readability first. Not only because it will help you (and others working on the code) in the future, but also because readability and optimizations are not necessarily opposites – one does not have to rule out the other.

    Login or Signup to reply.
  2. While certain people are going to vehemently disagree with this, I don’t think that it’s ever acceptable to intentionally make a program behave poorly just for the sake of making it "easier to read".

    However, that doesn’t mean that you should over-adjust either.


    In general, allocating and releasing memory is an expensive operation – and this applies especially to garbage-collected languages such as .NET as a GC pass is significantly more overhead than a traditional malloc() / free().

    Also note that in your first example, that memory is not actually "lost" – what’s happening is simply that your program might run multiple iterations of that loop between GC cycles – and under the hood, the garbage collector does not actually return the free’ed memory back to the operating system, but rather add it to it’s internal free list, so it can be reused by further allocations.

    Therefore, allocating large data structures in a loop should be avoided whenever possible.


    However, there are exceptions to this rule, because a fresh allocation ensures that you’re actually getting a "fresh" object.

    For instance in plain C, you have to be very careful with "use after free" issues. On some Operating Systems – such as for instance OpenBSD – malloc() actually gives you a new, randomly allocated chunk of memory – even when doing repeated malloc() / free() calls.

    This doesn’t really need to concern you too much in a managed language such as C# / .NET – just keep it in mind, that there are legitimate use cases for doing allocations in a loop.


    One thing you do need to be aware of it C# is lexical scoping.

    Declaring a variable inside a block (such as a loop) ensures that it cannot possibly be used outside of that block.

    This is important because if that variable holds any security-critical content, you should always limit it’s scope as much as possible.


    So basically, by moving all new calls out, you massively over-adjusted and extended the scope and therefore the lifecycle of these objects beyond what’s actually needed.

    This could even lead to performance degradations, because from the GC’s point of view, all of these objects are still reachable.


    Furthermore, class fields are intended for data that needs to be shared between multiple methods / multiple invocations of a method.

    A case can be made to use a field to store some really large object that’s being used by repeat invocations of some "high profile" method – but to avoid any scoping issues (other methods accessing that field), it’s generally best to put it into a nested class.


    Since you mentioned hash sets, it really depends on what you’re storing in them. The values that are stored in these need to be garbage-collected regardless – reusing it via HashSet<T>.Clear() only retains the internal storage for the keys.

    So if you have a hash set with thousands of keys, but only small values, and use that in a loop with hundreds of iterations, then the hash set should probably be allocated outside.

    But other than that, I’d really allocate them inside the loop.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search