Measuring Java Object Sizes

Probably pure coincidence, but I stumbled upon this Estimating Java Object Sizes with Instrumentation blog. Given we just had released Ehcache 2.5 with Automatic Resource Control, aka ARC, I had to read through this. We do, amongst other, use an instance of java.lang.instrument.Instrumentation to measure object sizes ourselves. Yet, we found some shortcomings to that approach:

Getting a reference to a Instrumentation instance !

As the blog mentions, you need an agent to get to that instance. Yet it felt like imposing every Ehcache users wanting to use ARC to add a -javaagent: just wasn’t a great idea. Trying to work around this, it turns out Java 6 introduced the Attach API. Now we can try to attach to the VM while it’s running and load the agent.
And when I say try, I do mean try! As this could fail… For all kind of reasons: we’re running JDK5, so no Attach API for us to use; or the attaching to the VM itself fails, this could be for different reasons again. One particularly weird one being due to a bug on OS X, when the java.io.tmpdir system property is set! And even if we get to attach and load the agent, we still need the Ehcache code to get a reference to that Instrumentation instance. The agent classes are being loaded by the system class loader, but the Ehcache classes aren’t necessarily and we might not get access to the system class loader directly. We don’t necessarily need to, but we try to avoid accessing another agent class instance, loaded by some other class loader. This would be generally not possible, as we hide the java agent jar within the Ehcache-core jar, so the classes it contains can’t be present multiple times…

What if we can’t access an Instrumentation instance ?

Ehcache’s sizeOf engine, as we named it, falls back to other mechanisms to size POJO. We’ve added two other methods, to which we fallback shouldn’t we be able to access the Instrumentation instance: The Unsafe and, finally, the Reflection based one.
The UnsafeSizeOf will try to get a reference to the sun.misc.Unsafe#theUnsafe. Using that reference, we can now query for an Object’s last non-static field offset in memory using Unsafe.objectFieldOffset and do some math to calculate the object’s size in memory. I’ll come back later to the some math part…
And finally, shouldn’t we be able to gain access to theUnsafe, we use reflection based sizing. This will measure all primitives and references within an object and sum the size these use in memory. Dr. Heinz Kabutz published more details on that approach in his Java Specialists Newsletter #78: MemoryCounter for Java 1.4 back in 2003.

Now that’s all very simple, isn’t it ?

Well… Sadly it isn’t. But luckily, we’ve mostly sorted it all out for you! We just were done with the agent based implementation (which didn’t auto attach yet), and started the testing. Obviously, since this calls into the VM’s internal, it would all magically figure it out and all. Well, no. CMS wasn’t properly accounted for. CMS needs a certain minimal amount of memory to store information when an object is garbage collected and it’s memory allocation is “freed”. That affects the minimal size an object will use on heap. And that was Hotspot only… We then moved on to test on JRockit that required some finer adjustments, but I won’t start with these here now.
CMS, Compressed OOPS, minimum object size were just some of the things that we needed to account for in the some math to in the other implementations: pointer sizes (32 vs. 64 bit VMs), object alignment, field offset adjustment (on JRockit) and “object header” size. All these required us to gather all that information about the VM the sizing was happening in order to properly measure object sizes, even using the Instrumentation instance to measure.

Know what to measure !

As you could read in Heinz’s newsletter there is some objects you probably don’t want to account for. Especially while measuring the size a cached entries are using on heap. There are all the obvious static, classes and other “Flyweight type objects”. These can all automatically be discarded by the sizing engine. But some other times, you also don’t want every cached entry to account for a particular part of an object graph. Simply because every, for instance, every cached entry will reference that particular bit. Hibernate’s 2nd level cache is good example of that. For that particular example, we’ve added a “resource” file that describes fields and types to be discarded when measuring a cache entry’s size on heap. For application types though (ones not going into the cache through Hibernate, but applications using the Ehcache API directly), we’ve added the @IgnoreSizeOf annotation. Annotating a Field, a Type or even an entire package with it, will result in the sizing engine skipping that part of the graph (those types or the types in those packages respectively) while doing the sizing.

Try it now !

Ehcache 2.5 is out now and available for direct download or through maven central. It enables you to size your caches simply using values in bytes using Ehcache ARC, you can read more about cache sizing on the ehcache.org website.

Measuring Java Object Sizes

6 thoughts on “Measuring Java Object Sizes

  1. Very interesting article. Just wandering how EhCache calculates entry size. My conclucion is that to avoid performance decrease one should set limit on number of entries rather than amount of memory taken by entries. This way that complicated code to calculate entry size will not be used. Do you agree?

    1. Depends on what you want me to agree on. Yes, if you use count based caches, the entries won’t be sized and you will save CPU cycles on put. Now whether size or count-based is the most desirable, well that depends on your use case mainly, including the complexity of the object graph you cache. Class instances with 2 Strings and a couple of primitives will have very neglectable overhead.

  2. Did I understand you correctly? When using ARC with EhCache as a Hibernate 2nd level cache, I should exclude all my cacheable domain model classes from sizing to avoid oversizing, because they will be put into a separate cache region by Hibernate?

    1. Not quite sure what it is I say that makes you think that. But the answer is no. Hibernate doesn’t actually put instances of your domain model classes in its second-level cache. Or are you saying you are putting Hibernate entities in Ehcache yourself? If that’s the case, there is probably more you may want to consider than sizing anyways. But even then, your classes shouldn’t be ignored from being sized.

      1. birnbuazn says:

        Sorry, don’t get you there. I’m not putting Hibernate entities manually into Ehcache, but rely on Hibernate’s 2nd level caching (=Ehcache in my instance). What do you mean exactly with “Hibernate doesn’t actually put intances of [my] domain model classes in its second-level cache”? I thought, this is what Hibernate’s 2nd level cache is all about?

        I was original refering to you mentioning “Simply because every, for instance, every cached entry will reference that particular bit. Hibernate’s 2nd level cache is good example of that. For that particular example, we’ve added a “resource” file that describes fields and types to be discarded when measuring a cache entry’s size on heap.”

      2. Well, yes and no. Hibernate stores “some dehydrated representation of your domain model instances”. So that it then knows how to reconstruct a given instance (say User#123). The reason being that every thread (and transaction), needs its own copy of that User to operate on. And as such, the actual type of the instance being put in the cache, references “shared” across all entries (like the metadata information about say the User entity). Since all those cached dehydrated “User” instance reference these common bit, we don’t want to size them over and over again.

        Now the good news is, you have nothing to do. We deal with these internally. Hope this clarifies this post a bit. I should have had been more precise on how the Hibernate 2nd level cache actually works when using it as an example, rather than expecting people to know about these internals.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s