Stack vs. Heap
To understand what a garbage collector does, you first need to understand the difference between memory stored on the stack, and memory stored on the heap. Both are specific memory locations in the memory assigned to your program, from your computer’s available RAM.
The Stack is fast, and is used for value types that have a fixed byte size. It’s the same physical memory as the heap of course, it’s just fast because it’s a very orderly First-In, Last-Out data structure. Whenever you make a local variable, it stores it on the stack, and whenever your function exits and the variable goes out of scope, it is automatically cleaned off the stack.
This is a very quick process and makes stack allocations basically free. Though there is a performance penalty, it’s as cheap as it’s gonna get.
The Heap, on the other hand, is used for large objects like lists and strings that are too big to be stored on the stack, or need to be stored long after functions go out of scope, which stack allocations can’t do by design. Whenever you do new object, you’re making a heap allocation. You’re also probably making a stack allocation, since if you’re storing a reference into a local variable, that local variable must be created to point to the actual memory of the new object.
The heap is quite a bit slower, both to allocate memory onto, and remove from. This applies to all languages using this model, with or without a garbage collector.
Cleaning Up Your Garbage
Of course, it’s not as simple as just “allocate once, and forget about it.” If we never removed memory, we’d have a memory leak, which is very bad and will quickly eat up your machine’s RAM. Allocations like local lists will go out of scope immediately, but without cleanup, would clog up the heap forever. So, programs must have a way to clean up memory that isn’t needed anymore.
In manual memory management languages like C++, memory is handled, well, manually. You must manually free memory and delete objects that you are no longer using, using a reference, or pointer, to that object’s memory location. While that can be extremely fast, it isn’t fun to code, and can lead to memory bugs and exploits. This is one of the major reasons C++ is seen as a “hard” programming language to learn and code in.
The alternative to manual management is having the machine do it for you automatically. This is what’s know as garbage collection.
A garbage collector runs on a background thread and periodically scans your application’s heap and stack, and looks for objects that no longer have any references. This means the object is worthless, and can be safely removed without affecting the program.
For example, take the following pseudocode, which creates and “deletes” an object
Since refToObject no longer references the new object() that was created, the garbage collector will see the the new object is dangling without a reference to it from anywhere, and remove it whenever it collects garbage next time.
The garbage collector is also pretty smart, and is able to resolve circular dependencies. For example, if you have two objects that reference each other, but nothing else knows about them, it’s garbage. In most cases, if an object doesn’t have a reference chain starting from the root of the program and leading to the object, it’s garbage.
Garbage collection can be triggered at any time, usually:
When the system is low on memory. The percentage of memory on the heap surpasses a certain threshold. This threshold is adjusted automatically, and is basically whenever the GC sees your program needs cleaning. When it’s triggered manually, like with GC. Collect().
Impacts On Performance
Of course, garbage collection isn’t free, at all. If it was, every language would use it. GC is slow, mostly because it needs to pause program execution to collect garbage.
Think of it like this — your CPU can only work on one thing at a time. With C++, it’s always working on your code, including the bits that delete memory. With a GC, your program doesn’t delete memory, and runs up until it makes some garbage. Then, it’s paused, and the CPU swaps to working on garbage collection. If it’s doing this often, it can slow down application performance.
Usually, it’s fairly quick though, usually less than a couple of milliseconds at the most. For .NET, this depends on what kind of memory is being cleaned up though, as it keeps track of memory in different “generations”:
Generation 0, the youngest generation which contains short-lived objects like temporary variables. Generation 1, which acts as a buffer between short and long term objects. If an object survives a garbage collection attempt, it will be “promoted” to a higher generation. Generation 2, the last one, which tracks long term objects.
The GC will check objects in Gen0, then Gen1, then Gen2. Since they only contain temporary or newly created objects, cleaning up Gen0 and Gen1 is usually pretty fast, but Gen2 contains a lot of memory. Doing a “full garbage collection” can be a lot slower than ephemeral garbage collections.
How To Speed Up Performance?
So what can you do to prevent this? Well, at the end of the day, your garbage must be picked up. The only real thing you can do is reduce the amount of garbage your program is throwing around.
One of the major ways to reduce garbage is to utilize Object Pooling. The core principle behind this is that it’s often faster to reset objects to default than it is to make a new one and throw the old one away.
For example, the following code iterates 10k times, and makes a new list every time to do something with it. However, this is horrible on the GC, so a better method is to make one big list, and clear it after you’re done with it and want a new one.
In practice, this is usually done with a generic “Object Pool,” which manages a list of objects that it can “rent out” to your program. When your code is done, it frees the object back to the pool, and resets it, ready to be used again when requested.