Welcome!

Open Source Cloud Authors: Zakia Bouachraoui, Liz McMillan, Elizabeth White, Pat Romanski, Yeshim Deniz

Related Topics: Microservices Expo, Java IoT, Industrial IoT, Open Source Cloud, Machine Learning , Python

Microservices Expo: Article

Performing Under Pressure | Part 1

Load-Testing with Multi-Mechanize

Many types of performance problems can result from the load created by concurrent users of web applications, and all too often these scalability bottlenecks go undetected until the application has been deployed in production.  Load-testing, the generation of simulated user requests, is a great way to catch these types of issues before they get out of hand.  Last month I presented about load testing with Canonical's Corey Goldberg at the Boston Python Meetup last week and thought the topic deserved blog discussion as well.

load-testing

In this two-part series, I'll walk through generating load using the Python multi-mechanize load-testing framework, then collect and analyze data about app performance using Tracelytics.

Also, a request: there's mechanize documentation available, but I unfortunately haven't found any full documentation of the python mechanize API online-post a comment if you know where to find it!

Meet the app: Reddit
The web app that I'll be using for all the examples in this post is an open source reddit running on a single node in EC2. You don't need to understand how it works in order to enjoy this post, but if you do want to play along, there is a super-easy install script that sets up a whole stack.

Generating the data
Performance testing can start off simple: hit pages in your app, and monitor how long they take to load. You can automate this using something like the mechanize library in Python, or even something more low level like httplib/urllib2.  This is a good start, but today we're looking for concurrency as well.

Enter multi-mechanize. Multi-mechanize takes simple request or transaction simulation scripts you've written and fires them repeatedly from many threads simultaneously in configurable patterns. It's as easy as writing a few scripts that simulate users doing different actions on your website (login, browse, submit comment, etc.) and then writing a short config that tells multi-mechanize how to play them back.

Since our site is reddit, I'm going to write a few scripts that read discussion threads, one that posts comments, and one that votes on comments. I'll walk through writing to of the scripts: a simple read-only request, and more complicated one that logs in and submits a comment. The rest of them are available in the full source of the load tests on github.

A simple mechanize script

First, the test is wrapped in a Transaction class-this is how multi-mechanize will run each of your scripts. The __init__() is called once per worker thread, then run() will be invoked repeatedly to generate the requests of your transaction. For development and debugging, it's easier to just run the scripts individually, so the __main__ block at the end provides that functionality.

All of the work here is happening in the run() method. A mechanize browser is instantiated, and our simple request for the front page of Reddit is performed. Finally, we make sure that a valid page was returned.

There's one more thing: custom timers.  Multi-mechanize can collect timing information about the requests it performs. If you store that information in a correctly-named dict, it will be able to generate charts of the data later.

A more complicated script
Now, let's take a look at a slightly more complicated script. This one posts a comment on a particular story, so it'll have to take the following actions:

  • Log in as a user
  • Open thread page on Reddit
  • Post comment

It's a bit longer, so I've broken it up according to the bullets above with accompanying explanation.  The full version of the example can be found here:  https://gist.github.com/1529242

The first thing that's happening here is familiar: pulling up a page in our mechanize browser. In order to login, though, we need to start interacting with forms on the page, and this means tweaking some of the default mechanize browser settings.  We need to set three attributes on our browser: follow 30x redirects (lets the login redirect back), specify the Referrer page header (validation for comment post), and ignore robots.txt rules (Reddit doesn't like robots playing human).

After that, it's on to forms.  The mechanize browser interface with forms is pretty simple: you can list all the forms on the page with browser.forms, select a form to interact with using select_form, and then manipulate the fields of the form using the browser.form object.

select_form  can take a variety of selection predicates, most of which revolve around using attributes such as the form's CSS ID. Our example, Reddit, doesn't have much identifying information associated with forms, so I've used numeric selection to grab them. The login form happens to be form 1.

Pretty straightforward: head to the thread page now that we're logged in.  Now we want to actually submit the comment.  Here's the heavy lifting:

Comment submission is a little bit different because it works via AJAX.  The mechanized browser doesn't process JavaScript, meaning that we'll have to take things into our own hands here.  So, we inspect two forms to grab the state information that JavaScript on the page would use to submit the form, and we construct our own request manually.  (Form 0 provides the ‘uh' value in a hidden field; form 12 is the top-level comment submit form.)

In this simple example, user credentials are provided in __init__. However, a more realistic example might involve many different users logging in. In the code on github, I've written auser pool implementation that takes care of this problem by instantiating a pool of logged in users for each script (then, each invocation of run can check out a different user).

(Debugging note for those playing along: if comments are not showing up in the thread but are showing up in the users profile, that means that some of the background jobs may not be running correctly. The site re-caches the comment tree asynchronously after posts.)

Running the full load test
After writing a few individual mechanize scripts, the final step is putting them all together with a multi-mechanize config. Multi-mechanize organizes load tests in terms of "projects" which are represented by subdirectories of a directory called projects. Each project contains a config file and a directory called test_scripts which contains your individual load tests scripts. It should look like this:

The config file specifies how long the load test should run for, whether it should ramp up the amount of pain or keep it constant,  a few output and statistics settings, and of course the number of threads and scripts you want to run. Here's an example config:

Runtime sets the duration of the test, in seconds. Ramp up, if nonzero, tolls multi-mechanize to linearly increase the number of threads up to the specified numbers during the wrapup.

And here's how to invoke them, finally:

Learning from our load tests
Multi-mechanize collects statistics about timing information that you provide in your tests (custom_timers) and dumps the output in a results subdirectory of your project. This can easily be plotted in your favorite graphics package.  Here's an example of the average times from a read load increasing over 30 minutes:

Ok, so it's getting slower, but why??  These timers treat the application like a black box-they'll show you that it can be slow, but you won't know why or what layers of the stack are slow. In the next article, we'll talk about how to gather actionable data from your load tests.

Related Articles

Performing under pressure, pt. 2: Collecting and visualizing load-test performance data

Python and Gevent

Tracing Python - An API

More Stories By Dan Kuebrich

Dan Kuebrich is a web performance geek, currently working on Application Performance Management at AppNeta. He was previously a founder of Tracelytics (acquired by AppNeta), and before that worked on AmieStreet/Songza.com.

IoT & Smart Cities Stories
@CloudEXPO and @ExpoDX, two of the most influential technology events in the world, have hosted hundreds of sponsors and exhibitors since our launch 10 years ago. @CloudEXPO and @ExpoDX New York and Silicon Valley provide a full year of face-to-face marketing opportunities for your company. Each sponsorship and exhibit package comes with pre and post-show marketing programs. By sponsoring and exhibiting in New York and Silicon Valley, you reach a full complement of decision makers and buyers in ...
The Internet of Things is clearly many things: data collection and analytics, wearables, Smart Grids and Smart Cities, the Industrial Internet, and more. Cool platforms like Arduino, Raspberry Pi, Intel's Galileo and Edison, and a diverse world of sensors are making the IoT a great toy box for developers in all these areas. In this Power Panel at @ThingsExpo, moderated by Conference Chair Roger Strukhoff, panelists discussed what things are the most important, which will have the most profound e...
Two weeks ago (November 3-5), I attended the Cloud Expo Silicon Valley as a speaker, where I presented on the security and privacy due diligence requirements for cloud solutions. Cloud security is a topical issue for every CIO, CISO, and technology buyer. Decision-makers are always looking for insights on how to mitigate the security risks of implementing and using cloud solutions. Based on the presentation topics covered at the conference, as well as the general discussions heard between sessio...
The Jevons Paradox suggests that when technological advances increase efficiency of a resource, it results in an overall increase in consumption. Writing on the increased use of coal as a result of technological improvements, 19th-century economist William Stanley Jevons found that these improvements led to the development of new ways to utilize coal. In his session at 19th Cloud Expo, Mark Thiele, Chief Strategy Officer for Apcera, compared the Jevons Paradox to modern-day enterprise IT, examin...
While the focus and objectives of IoT initiatives are many and diverse, they all share a few common attributes, and one of those is the network. Commonly, that network includes the Internet, over which there isn't any real control for performance and availability. Or is there? The current state of the art for Big Data analytics, as applied to network telemetry, offers new opportunities for improving and assuring operational integrity. In his session at @ThingsExpo, Jim Frey, Vice President of S...
Rodrigo Coutinho is part of OutSystems' founders' team and currently the Head of Product Design. He provides a cross-functional role where he supports Product Management in defining the positioning and direction of the Agile Platform, while at the same time promoting model-based development and new techniques to deliver applications in the cloud.
In his keynote at 18th Cloud Expo, Andrew Keys, Co-Founder of ConsenSys Enterprise, provided an overview of the evolution of the Internet and the Database and the future of their combination – the Blockchain. Andrew Keys is Co-Founder of ConsenSys Enterprise. He comes to ConsenSys Enterprise with capital markets, technology and entrepreneurial experience. Previously, he worked for UBS investment bank in equities analysis. Later, he was responsible for the creation and distribution of life settl...
There are many examples of disruption in consumer space – Uber disrupting the cab industry, Airbnb disrupting the hospitality industry and so on; but have you wondered who is disrupting support and operations? AISERA helps make businesses and customers successful by offering consumer-like user experience for support and operations. We have built the world’s first AI-driven IT / HR / Cloud / Customer Support and Operations solution.
LogRocket helps product teams develop better experiences for users by recording videos of user sessions with logs and network data. It identifies UX problems and reveals the root cause of every bug. LogRocket presents impactful errors on a website, and how to reproduce it. With LogRocket, users can replay problems.
Data Theorem is a leading provider of modern application security. Its core mission is to analyze and secure any modern application anytime, anywhere. The Data Theorem Analyzer Engine continuously scans APIs and mobile applications in search of security flaws and data privacy gaps. Data Theorem products help organizations build safer applications that maximize data security and brand protection. The company has detected more than 300 million application eavesdropping incidents and currently secu...