Managed object lifetime - How a Garbage Collector Works

This is part 1 (if you will) of a couple of posts. I am surprised how very little most people know how the GC actually works. If you are reading this and are going to be interviewed by me, this is one of my favourite questions. *grin* This is based on an interesting post I found long ago I think on gotdotnet (sorry I forgot who the author was, I’ll try and google it and see if I can find the link to the original post).

It is not possible to state exactly when a managed object will be collected. The garbage collector schedules itself based on various heuristics. Even if a garbage collection occurs, it may only collect the younger generations of the heap. And the JIT has some freedom to lengthen or shorten the lifetime of instances, based on how it generates code and reports liveness.

class C {
    IntPtr _handle;
    Static void OperateOnHandle(IntPtr h) { ... }
    void m() { 
    OperateOnHandle(_handle); 
    ... 
    } 
    ... 
   } 
  
class Other {
     void work() {
     if (something) { 
     C aC = new C();
     aC.m();
     ... // most guess here 
     } else {
     ... 
     } 
     } 
    }

So we cannot say how long “aC” might live in the above code (line 15). The JIT might report the reference until Other.work() completes. It might inline Other.work() into some other method, and report aC even longer. Even if you add “aC = null;” after your usage of it, the JIT is free to consider this assignment to be dead code and eliminate it. Regardless of when the JIT stops reporting the reference, the GC might not get around to collecting it for some time.

You should worry more about the earliest point that aC could be collected. If you are like most people, you’ll guess that the soonest aC becomes eligible for collection is at the closing brace of Other.work()’s “if” clause (line 16). The fact though is, that braces don’t exist in the IL; they are a syntactic contract between you and your language compiler. Other.work() is free to stop reporting aC as soon as it has initiated the call to aC.m().

Another common guess is that the soonest aC could be collected is when C.m() stops executing. Or perhaps after the call to C.OperateOnHandle(). Actually, aC could become eligible for collection before C.m() even calls C.OperateOnHandle(). Once we’ve extracted _handle from “this”, there are no further uses of this object. In other words, “this” can be collected even while you are executing an instance method on that object.

Why should you care? For the example above, you don’t; the GC’s reachability will ensure that objects won’t be collected until we are finished with them. But what if class C has a Finalize() method which closes _handle? When we call C.OperateOnHandle(), we now have a race between the application and the GC / Finalizer - eventually we’re going to lose that race.

One way to fix this race is to add a call to GC.KeepAlive(this) right after the call to OperateOnHandle(). This indicates that we need the JIT to keep reporting “this” to the GC until we get to that point in the execution. KeepAlive is just a light-weight method call that is opaque to the JIT. So the JIT cannot inline the call and recognize that the call has no real side effects and hence could be eliminated. The reason you need to add this call is that you have really broken the encapsulation of the _handle resource. The lifetime of the enclosing object and the required lifetime of the _handle are separated when you extract the value from the object’s field.

It’s bad enough that you must use GC.KeepAlive() to tie those two lifetimes back together in your encapsulation. It would be disastrous if you required the clients of your class to be responsible for calling KeepAlive. Public fields on classes are a bad idea for many reasons. As we’ve seen, when they expose a resource that is subject to finalization, they are an exceptionally bad idea.

You may wonder why we don’t just extend all lifetimes to the end of methods. This has a terrible impact on code quality, particularly on x86 where we are cursed with limited registers. And a change like that doesn’t really fix the problem. It is still possible for you to return the _handle, place it in a static field, or otherwise cause its lifetime to escape the lifetime of the enclosing object.

There’s another wrinkle to this issue. So far we’ve seen how the Finalizer thread and the application can race when the resource can be separated from its enclosing object. The same sort of thing can happen when you expose IDisposable on your class. Now a multi-threaded application can simultaneously use the resource on one thread and imperatively call Dispose on another thread. GC.KeepAlive isn’t going to solve this problem, since you’ve provided a public API to disassociate the lifetime of the resource from the lifetime of the enclosing object.

This is more than application issue. It can also be used to mount security attacks. If malicious code can open a file to an uninteresting part of the filesystem, it could simultaneously Read and Dispose that file object on two different threads. In a server environment, it’s possible that some other component is opening a file to a sensitive part of the filesystem. Eventually, the malicious code could exploit the race condition to read the other component’s file. This is a handle-recycling attack.

When the .NET framework uses a resource via a PInvoke to the OS (like reading from a file handle), it places a reference count on the resource. If malicious (or poorly-timed) code calls Dispose, this simply removes the reference count that was created when the resource was acquired. The result is that all current uses of the resource will be drained, the resource will then be safely disposed, and subsequent attempts to use the resource will be failed gracefully.

For now, you should consider similar approaches if you are encapsulating sensitive resources like this, which are subject to recycling. Also, this probably should be the second post, I’ll probably change it the title (or repost) to avoid confusion.