Wednesday, 4 July 2012

Garbage Collection in Java


Definition

In our daily life we come across lot of things we don't need forever (called as garbage) and throw them into trash can. The garbage collector collects those garbage periodically and dispose/recycle the garbage in many different ways.

Similarly in Java, when we are done using an object and are no more useful, we dereference them. All these java garbage have to be collected and claim the memory occupied them so that the memory space can be re-used in future. This is done by Garbage Collector and the technique is called Garbage Collection or Memory Recycling.

Precisely, garbage collection is a technique of reclaiming the memory from the objects that do not have any references.

Why Garbage Collection?

Note: I will be writing GC (short name for Garbage Collector) now onwards.

If you do not free memory when an object is no longer needed, at time you may end up taking all spaces and your program may crash suffering out of memory. We must clean up those objects and reclaim memory periodically to make the program healthy. In addition to this GC also does heap fragmentation. Heap fragmentation occurs during normal program execution. New objects are allocated, and unreferenced objects are freed such that free portions of heap memory are left in between portions occupied by active objects. Requests to allocate new objects may have to be filled by extending the size of the heap even though there is enough total unused space in the existing heap. This will happen if there is not enough contiguous free heap space available where the new object will fit. In the process of freeing unreferenced objects, GC must run finalizers of objects (if any) being freed.

GC relieves the burden of freeing allocated memory. Explicit memory clearance is always risky and tricky. Assigning this task to JVM has many advantages. It can make the programmer more productive and he/she can concentrate on the actual logic rather thinking of memory clean up.

The GC Engineering

There is only one GC runs for a JVM for the whole JVM lifetime and gets started during JVM startup. GC runs as a daemon thread (background thread) and pops up periodically for freeing memory. We can request JVM to call GC by calling System.gc() or gc() method of Runtime class (System.gc() just delegates the call to Runtime). But there is no guarantee that GC will act instantly.

GC Algorithm

Any GC algorithm must do some basic things like detecting unreferenced objects, reclaim the heap space used by them and defragment the free space to make sure we have enough contiguous free space.

Garbage detection is accomplished by defining a set of roots and determining reachability from the roots. An object is reachable if and only if there is at least a reference to the object exists by which the program can access the object. The roots are always accessible by the program. Objects those are reachable from the roots are considered LIVE and those are not reachable are considered GARBAGE. The best example of root set is a stack.

Figure: Reachable and unreachable objects in heap and their references from the Root set of references.

GC algorithm depends on JVM implementation. The JVM can be implemented such that the GC knows the difference between a genuine object reference and a primitive type that appears to be a valid object reference. (One example is an int that, if it were interpreted as a native pointer, would point to an object on the heap.) However, some GCs may choose not to distinguish between genuine object references and look-alikes. Such GCs are called conservative because they may not always free every unreferenced object. Conservative GC trades off an increase in garbage collection speed for occasionally not freeing some actual garbage.

Two basic approaches of distinguishing live objects from garbage are reference counting and tracing.
Reference counting GCs distinguish live objects from garbage objects by keeping a count for each object in the heap. The count keeps track of the number of references to that object.
Tracing GCs actually trace out the graph of references starting with the root nodes. Objects that are encountered during the trace are marked in some way. After the trace is complete, unmarked objects are known to be unreachable and can be garbage collected.

Finalization

In Java, an object may have a finalizer. finalize is a method that the garbage collector must run on the object prior to freeing the object. However, the existence of finalizers complicates the job of GC. finalize method needs to be overridden from Object class. GC gives you the final chance to clean up any objects/resources you may have before it gets garbage collected.

Because of finalizers, the GC must perform some extra steps each time it garbage collects. First, the GC must in some way detect unreferenced objects (call this Pass I). If it has enough time, it may at this point in the garbage collection process finalize all unreferenced objects that declare finalizers. After executing all finalizers, the GC must once again detect unreferenced objects starting with the root nodes (call this Pass II). This step is needed because finalizers can "resurrect" unreferenced objects and make them referenced again. Finally, the GC can free all objects that were found to be unreferenced in both Passes I and II.

To reduce the time it takes to free up some memory, a GC can optionally insert a step between the detection of unreferenced objects having finalizers and the running of those finalizers. Once the GC has performed Pass I and found the unreferenced objects that need to be finalized, it can run a miniature trace starting not with the root nodes but with the objects waiting to be finalized. Any objects that are (1) not reachable from the root nodes (those detected during Pass I) and (2) not reachable from the objects waiting to be finalized cannot be resurrected by any finalizer. These objects can be freed immediately.

Finalizer for an object run at most once in the object lifetime. If an object with a finalizer becomes unreferenced, and its finalizer is run, if that object is resurrected by its own finalizer or some other object's finalizer and later becomes unreferenced again, the GC treat it as an object that has no finalizer.

References



Hope you enjoyed this post on garbage collection. Feel free to comment on this. Thank you.


2 comments:

  1. Finalization part is interesting.

    ReplyDelete
  2. Also have a look at the below reference:

    http://www.oracle.com/webfolder/technetwork/tutorials/obe/java/gc01/index.html

    ReplyDelete