Java, GlassFish v3, High CPU and Memory Usage, Locked Threads, Death

We’ve been successfully running a core Java system at work to handle student assignment submissions for approximately 2 years. It has been running on an old version of GlassFish (and by old, I mean old. The Oracle consultants didn’t recognise it as being an official version). It was recommended we moved versions at the earliest possible opportunity.

And so we did. We tested fairly thoroughly on GlassFish v3.1.1 and everything went through fine.

We deployed to the live architecture, with some minor architecture changes to the code base to keep GlassFish v3 happy – the old version of GlassFish seemed to be a bit more forgiving of the way we packaged our EJBs. We also made some changes to break out separate functionality to avoid downtime for key users during upgrades and planned maintenance.

And everything went fine. Although over the next 2 days the CPU usage on the Solaris boxes kept rising. The stats revealed by the command ‘prstat -a’ according to our Sys Admin were particularly unhealthy, and indicated imminent meltdown. And sure enough not long later the boxes fell over and died. Upon restart we saw this behaviour repeat – the CPU usage slowly rising, until eventually the box would give up the ghost and keel over.

We went back to basics and eliminated the load balancer, we reduced the functionality being provided, but alas zero difference.

After taking countless stack traces (note this important command to get a dump of the active threads from GlassFish):

./asadmin generate-jvm-report --type=thread --user admin --passwordfile passfile

And looking through the thread list, nothing was terribly worrying. There were no deadlocks, and no locked threads. There was a valid line that was being repeated, but seemed to be the inner realms of Java that we were calling. We would normally expect to see a maximum of 5-6 of these track staces at any one time, but on one occasion when the box keeled over, there were 47 threads in this state. The line of importance was:

Thread "http-thread-pool-8080(76)" thread-id: 27,866 thread-state: RUNNABLE
   at: java.io.FileInputStream.readBytes(Native Method)
   at: java.io.FileInputStream.read(FileInputStream.java:177)

This led us to this post on Stack Overflow where similar symptoms were being experienced:

http://stackoverflow.com/questions/10178162/infinite-100-cpu-usage-at-java-io-fileinputstream-readbytesnative-method

Having put in place the temporary fix described in the SO post:

if (in.available() == 0) { 
   Thread.sleep(1000); 
   if in.available() == 0) return;
}

Latest Status

With this code fix in place we appear to have been relatively stable for the past 4 days. Its still early days, but this certainly appears at the moment to be the solution.

We have noted that we’re running Java 1.6.0_21 on the box, which is quite dated, and indeed the support matrix for GlassFish 3 suggests a minimum of 1.6_026. We’ve been testing our application with the latest version of Java 6 and all appears good. We’ll look to move to that version, and see about removing the temporary fix to see if that provides a more permanent solution.

Its worth noting that we have experienced this on both GlassFish 3.1.1 and GlassFish 3.1.2 (we upgraded one of our servers during a stage of panic in the hope GF would have resolved the issue).

17 thoughts on “Java, GlassFish v3, High CPU and Memory Usage, Locked Threads, Death

  1. Hi,

    we’re currently having the same problem. I saw you seemed to have it solved, but I dont understand your fix. Where did you put this in? Do one have to recompile its JDK to use it? – Any help would be highly appreciated 🙂

    Best,

    K.B.

    1. Hi KB,

      This fix comes in your own code – not in the JDK, JRE or JVM.

      After you’ve opened your input stream, call the in.available() method. We’re testing next week if a later version of the JRE makes any difference – which version are you currently at?

      Cheers,
      Greg

      1. Hi Gregor,

        so, this basically means that every time a fileinputstream is used we need to call it that way? – Is nearly impossible for us, as we don’t consume the files directly but use ModeShape as JCR repo and this handles the abstraction to disk as well as JasperReports etc…. changing all those libs – no clue how to handle/ do that;

        I upgraded today to JDK 1.6.0_32, hope it behaves better now. Currently its well but last time it took about 24 hours till it appeared…. However, I made a mistake by using a JRE upfront – and the glassfish-doc’s point that a JRE is not enough to run glassfish successfully. I’ll keep you informed if you like.

        During my search through the web I noticed that most of the posted problems with high CPU/ stuck threads etc. were writing they use Ubuntu 10.04 LTS – do you think it might be possible its related to the OS?

      2. Ah – in which case best of luck! We do the streaming directly ourselves so at least have a little control!

        Don’t think its OS related – we experienced it on Solaris and the original post on StackOverflow hinted at Windows and Linux (can’t remember if it was Ubuntu).

        We did at first consider it being Solaris based, but suspect not – especially if you’re getting similar symptoms.

        Fingers crossed 6.0_32 solves it for you – will be really interesting to hear one way or the other.

      3. Thanks,

        if 1.6.0_32 wont solve it, I either need to move to another appserver or maybe try JDK7u4 – in either way its more a mere “hope”.

        Did you try different appservers/ versions that aren’t affected?

      4. No we stuck completely with GlassFish. We’ve been successful with it for a number of years so didn’t want to lose that history. It was a point of discussion though.

        We didn’t experience this on earlier versions of GlassFIsh. If you can replicate it outside of a production environment it might be worth trying GlassFish v2. I came across a number of discussion posts during research of people ‘degrading’ to solve the issue.

      5. For the last 13:18 hours system runs well, CPU usage is low and no errors in log; Keep you informed further;

  2. my apps require JEE6 as they use EJB 3.1 and CDI a lot – so GF v2 is not an option;

    Do you use any special -XX config options?

    I have
    -XX:+UseConcMarkSweepGC
    -XX:+CMSPermGenSweepingEnabled
    -XX:+CMSClassUnloadingEnabled
    -server
    -XX:MaxPermSize=384m
    -XX:LargePageSizeInBytes=256m

    for last 4 hours server runs well, but last time it took 23 hours till cpu started growing, so still hoping its ok now….

  3. Just a little update to the original post – we’ve now been running with the temporary fix, Java 6.0_21 and GF 3.1.2 for approximately 2 weeks and no issues.

    Next week we’re looking to upgrade Java to 6.0_32 to see what difference, if any, it makes. Although with the feedback from KB it looks unlikely anything will have changed.

    1. can you share your Http-Thread-Pool config settings as well as your JVM config the server is running in?

  4. @gregorbowie:

    I digged a bit deeper; Seamingly the number of thread-pools one allow are making some kind of differences – I reduced them for testint purp. to min5/ max5 and last 17 hours server runs without problems; Its quite interesting though;

    Beside that I saw that OpenJDK seems to differ in those IO operations field and maybe another trie – even if its not supported officially, Glassfish runs with OpenJDK 7; Alternative is trying to use jRockit….

Leave a reply to K.B. Cancel reply