Welcome!

Open Source Cloud Authors: Elizabeth White, Yeshim Deniz, Pat Romanski, Liz McMillan, Zakia Bouachraoui

Related Topics: Java IoT

Java IoT: Article

Extreme Performance Tuning

Extreme Performance Tuning

There are many articles about basic performance tuning a Java application. They all discuss simple techniques such as using a StringBuffer versus using a String, and the overhead of using the synchronized keyword.

This article doesn't cover any of this. Instead we focus on tips that can help make your Web-based application faster and highly scalable. Some tips are detailed, others brief, but all should be useful. I end with some recommendations that you can present to your manager.

I was inspired to write this article when a co-worker and I were reminiscing about our dot-com days - how we designed for systems that could support thousands of users and had tight code, and how we hit aggressive deadlines. Sometimes there's a trade-off between designing for reuse and designing for performance. Based on my background, performance wins every time. Your business customers understand fast-performing systems even if they don't necessarily understand code reuse. Let's get started on our tips.

How to Use Exceptions
Exceptions degrade performance. A thrown exception first requires the creation of a new object. The constructor in the throwable interface calls a native method named fillInStackTrace(). This method is responsible for walking the stack frame to collect trace information. Then whenever an exception is thrown, it requires the VM to fix the call stack since a new object was created in the middle.

Exceptions should be used for error conditions only, not control flow. I had the opportunity to see code in a site that specializes in marketplaces for wireless content (name intentionally withheld) where the developer could have used a simple comparison to see if an object was null. Instead he or she skipped this check and actually threw Null- PointerException.

Don't Initialize Variables Twice
Java by default initializes variables to a known value upon calling the particular class's constructor. All objects are set to null, integers (byte, short, int, long) are set to 0, float and double are set to 0.0, and Booleans are set to false. This is especially important if the class has been extended from another class, as all chain constructors are automatically called when creating an object with the new keyword.

Use Alternatives to the New Keyword
As previously mentioned, by creating an instance of a class using the new keyword, all constructors in the chain are called. If you need to create a new instance of a class, you can use the clone() method of an object that implements the cloneable interface. The clone method doesn't invoke any class constructors.

If you've used design patterns as part of your architecture and use the factory pattern to create objects, the change will be simple. Listed below is the typical implementation of the factory pattern.

public static Account getNewAccount() {
return new Account();
}
The refactored code using the clone method may look something like this:
private static Account BaseAccount = new Account();
public static Account getNewAccount() {
return (Account) BaseAccount.clone();
}
The above thought process is also useful for the implementation of arrays. If you're not using design patterns within your application, I recommend that you stop reading this article and run (don't walk) to the bookstore and pick up a copy of Design Patterns by the Gang of Four.

Make Classes Final Whenever Possible
Classes that are tagged as final can't be extended. There are many examples of this technique in the core Java APIs, such as java.lang.String. Tagging the String class as final prevents developers from creating their own implementation of the length method.

Furthermore, if a class is final, all the methods of the class are also final. The Java compiler may take the opportunity to inline all final methods (this depends upon the compilers implementation). In my testing I've seen performance increase by an average of 50%.

Use Local Variables Whenever Possible
Arguments that are part of the method call and temporary variables that are declared a part of this call are stored on the stack, which is fast. Variables such as static, instance, and new objects are created on the heap, which is slower. Local variables are further optimized depending upon which compiler/VM you're using.

Use Nonblocking I/O
Current versions of the JDK don't provide nonblocking I/O APIs. Many applications attempt to avoid blocking by creating a large number of threads (hopefully used in a pool). As mentioned previously, there's significant overhead in the creation of threads within Java. Typically you may see the thread implementation in applications that need to support concurrent I/O streams such as Web servers, and quote and auction components.

JDK 1.4 introduces a nonblocking I/O library (java.nio). If you must remain on an earlier version of the JDK, there are third-party packages that have added support for nonblocking I/O: www.cs.berkeley.edu/~mdw/proj/java-nbio/download.html.

Stop Being Clever
Many developers code with reuse and flexibility in mind and sometimes introduce additional overhead into their programs. At one time or another they've written code similar to:

public void doSomething(File file) {
FileInputStream fileIn = new FileInputStream(file);
// do something
It's good to be flexible, but in this scenario they've created more overhead. The idea behind doSomething is to manipulate an InputStream, not a file, so it should be refactored as follows:
public void doSomething(InputStream inputStream){
// do something
Multiplication and Division
Too many of my peers count on Moore's Law, which states that CPU power will double every year. The "McGovern Law" states that the amount of bad code being written by developers triples every year, ruling out any benefit to Moore's Law. Consider the following code:
for (val = 0; val < 100000; val +=5) { shiftX = val * 8; myRaise = val * 2; }
If we were to utilize bit shifting, performance would increase up to six times. Here's the refactored code:
for (val = 0; val < 100000; val += 5) { shiftX = val << 3; myRaise = val << 1; }
Instead of multiplying by 8, we used the equivalent to shift to the left (<<) by 3. Each shift causes a multiplication by factors of 2. The variable myRaise demonstrates this capability. Shifting bits to the right (>>) is the same as dividing by factors of 2. Of course this makes execution speed faster, but may make it difficult for your peers to understand at a later date; therefore it should be commented.

Choosing a VM Based on Its Garbage Collection Implementation
Many people would be surprised that the Java specification doesn't require the implementation of a garbage collector. Imagine the days when we all have infinite memory computers. Anyway, the garbage collector routines are responsible for finding and throwing away (hence garbage) objects that are no longer needed. The garbage collector must determine what objects are no longer referenced by the program and make the heap memory that's consumed by the object free. It's also responsible for running any finalizers on objects being freed.

While garbage collection helps ensure program integrity by intentionally not allowing you to free memory you didn't allocate, this process also incurs overhead as the JVM determines the scheduling of CPU time and when the garbage collector runs. Garbage collectors have two different approaches to performing their job.

Garbage collectors that implement reference counting keep a count for each object on the heap. When an object is created and a reference to it is assigned to a variable, the count is incremented. When the object goes out of scope the reference count is set to zero and the object can be garbage collected. This approach allows for the reference counter to run in small time increments that are relative to the execution of the program. Reference counting doesn't work well in applications in which the parent and child hold references to each other. There's also the overhead of incrementing and decrementing the reference count every time an object gets referenced.

Garbage collectors that implement tracing trace out a list of references starting with the root nodes. Objects found while tracing are marked. After this process is complete, any unmarked objects known to be unreachable can be garbage collected. This may be implemented as a bitmap or by setting flags in the object. This technique is referred to as "Mark and Sweep."

Recommendations for Your Manager
Other approaches can be used to make your Web-based application faster and more scalable. The easiest technology to implement is usually a strategy that supports clustering. With a cluster, a group of servers can work together to transparently provide services. Most application servers allow you to gain clustering support without having to change your application - a big win. Of course you may need to consider additional licensing charges from your application server vendor before taking this approach.

When looking at clustering strategies there will be many additional things to consider. One flaw that's frequently made in architecture is having stateful sessions. If a server/process in the cluster crashes, the cluster will usually fail over the application. For this functionality to happen, the cluster has to constantly replicate the state of the session bean to all members in the cluster. Make sure you also limit the size and amount of objects that are stored in the session, as these will need to be replicated.

Clusters also allow you to scale portions of your Web site in increments. If you need to scale static portions, you can add Web servers. If you need to scale dynamically generated parts, you can add application servers.

After you've put your system in a cluster, the next recommended approach to making your application run faster is choosing a better VM. Look at the Hotspot VM or other VMs that perform optimization on the fly. Along with the VM, it's a good idea to look at a better compiler.

If you've employed several industry techniques plus the ones mentioned here and still can't gain the scalability and high availability you seek, then I recommend a solid tuning strategy. The first step in this strategy is to examine the overall architecture for potential bottlenecks. Usually this is easily recognized in your UML diagrams as single-threaded components or components with many connecting lines attached.

The final step is to conduct a detailed performance assessment of all code. Make sure your management has set aside at least 20% of the total project time for this undertaking; otherwise insufficient time may not only compromise your overall success, but cause you to introduce new defects into the system.

Many organizations are also guilty of not having the proper test beds in place due to cost considerations. Make sure your QA environment mirrors your production environment, and your QA tests take into account testing the application at different loads, including a low load and a fully scaled load based on maximum anticipated concurrent users. Performing tests, sometimes to gauge stability of a system, may require running different scenarios over the course of days, even weeks.

Under no circumstances should you undertake tuning an application without a profiler. We use Optimize it, but Sitraka's JProbe and Numega's profiler are also good. These tools will show you bottlenecks in your code, such as threads that are blocked by other threads, unused objects that survive garbage collection, and excessive object creation. Once you've captured the output of these tools, make simple changes and limit the scope of those changes to things that will make your code faster. Don't worry about reuse, style issues, or anything other than performance. Usually the easily identifiable bottlenecks will be contained within loops and algorithms.

More Stories By James McGovern

James McGovern is an industry thought leader and the author of the bestselling book: A Practical Guide to Enterprise Architecture (Prentice Hall). He is working on two upcoming books entitled: Agile Enterprise Architecture and Enterprise SOA. He is employed as an Enterprise Architect for The Hartford Financial Services Group, Inc. He holds industry certifications from Microsoft, Cisco and Sun. He is member of the Java Community Process and of the Worldwide Institute of Software Architects.

Comments (0)

Share your thoughts on this story.

Add your comment
You must be signed in to add a comment. Sign-in | Register

In accordance with our Comment Policy, we encourage comments that are on topic, relevant and to-the-point. We will remove comments that include profanity, personal attacks, racial slurs, threats of violence, or other inappropriate material that violates our Terms and Conditions, and will block users who make repeated violations. We ask all readers to expect diversity of opinion and to treat one another with dignity and respect.


IoT & Smart Cities Stories
Digital Transformation and Disruption, Amazon Style - What You Can Learn. Chris Kocher is a co-founder of Grey Heron, a management and strategic marketing consulting firm. He has 25+ years in both strategic and hands-on operating experience helping executives and investors build revenues and shareholder value. He has consulted with over 130 companies on innovating with new business models, product strategies and monetization. Chris has held management positions at HP and Symantec in addition to ...
Dynatrace is an application performance management software company with products for the information technology departments and digital business owners of medium and large businesses. Building the Future of Monitoring with Artificial Intelligence. Today we can collect lots and lots of performance data. We build beautiful dashboards and even have fancy query languages to access and transform the data. Still performance data is a secret language only a couple of people understand. The more busine...
The challenges of aggregating data from consumer-oriented devices, such as wearable technologies and smart thermostats, are fairly well-understood. However, there are a new set of challenges for IoT devices that generate megabytes or gigabytes of data per second. Certainly, the infrastructure will have to change, as those volumes of data will likely overwhelm the available bandwidth for aggregating the data into a central repository. Ochandarena discusses a whole new way to think about your next...
CloudEXPO | DevOpsSUMMIT | DXWorldEXPO are the world's most influential, independent events where Cloud Computing was coined and where technology buyers and vendors meet to experience and discuss the big picture of Digital Transformation and all of the strategies, tactics, and tools they need to realize their goals. Sponsors of DXWorldEXPO | CloudEXPO benefit from unmatched branding, profile building and lead generation opportunities.
All in Mobile is a place where we continually maximize their impact by fostering understanding, empathy, insights, creativity and joy. They believe that a truly useful and desirable mobile app doesn't need the brightest idea or the most advanced technology. A great product begins with understanding people. It's easy to think that customers will love your app, but can you justify it? They make sure your final app is something that users truly want and need. The only way to do this is by ...
DXWorldEXPO LLC announced today that Big Data Federation to Exhibit at the 22nd International CloudEXPO, colocated with DevOpsSUMMIT and DXWorldEXPO, November 12-13, 2018 in New York City. Big Data Federation, Inc. develops and applies artificial intelligence to predict financial and economic events that matter. The company uncovers patterns and precise drivers of performance and outcomes with the aid of machine-learning algorithms, big data, and fundamental analysis. Their products are deployed...
Cell networks have the advantage of long-range communications, reaching an estimated 90% of the world. But cell networks such as 2G, 3G and LTE consume lots of power and were designed for connecting people. They are not optimized for low- or battery-powered devices or for IoT applications with infrequently transmitted data. Cell IoT modules that support narrow-band IoT and 4G cell networks will enable cell connectivity, device management, and app enablement for low-power wide-area network IoT. B...
The hierarchical architecture that distributes "compute" within the network specially at the edge can enable new services by harnessing emerging technologies. But Edge-Compute comes at increased cost that needs to be managed and potentially augmented by creative architecture solutions as there will always a catching-up with the capacity demands. Processing power in smartphones has enhanced YoY and there is increasingly spare compute capacity that can be potentially pooled. Uber has successfully ...
SYS-CON Events announced today that CrowdReviews.com has been named “Media Sponsor” of SYS-CON's 22nd International Cloud Expo, which will take place on June 5–7, 2018, at the Javits Center in New York City, NY. CrowdReviews.com is a transparent online platform for determining which products and services are the best based on the opinion of the crowd. The crowd consists of Internet users that have experienced products and services first-hand and have an interest in letting other potential buye...
When talking IoT we often focus on the devices, the sensors, the hardware itself. The new smart appliances, the new smart or self-driving cars (which are amalgamations of many ‘things'). When we are looking at the world of IoT, we should take a step back, look at the big picture. What value are these devices providing. IoT is not about the devices, its about the data consumed and generated. The devices are tools, mechanisms, conduits. This paper discusses the considerations when dealing with the...