Testing large web applications can be really challenging. For this effort we focused on testing the site in 4 different fronts:
- Unit testing
- Functional testing
- UI testing
- Performance testing
Unit Testing
We invested quite a bit of time to cover our code with as many unit tests as possible. Of course the decision on how to develop unittests was pretty easy, given that we are working on django. We used django's testing framework and standard library unittest from django utils. Currently we have an overall coverage of ~85%, and here is an example of how it changes over time.
Functional Testing
Selenium is the standard way of performing functional testing. It simulates a user's browsing behavior on a site. This is what is used also for oDesk's functional testing. Since ver1.4, django supports natively selenium testing.
[...] LiveServerTestCase allows the use of automated test clients other than the Django dummy client such as, for example, the Selenium client, to execute a series of functional tests inside a browser and simulate a real user's actions.For every basic group of pages oDesk's visitor site supports, extensive functional tests are written to identify and report broken pieces of the site.
UI Testing
This is where things got really interesting. When we talk about UI testing we mainly mean testing the actual visual result that the user sees in their browser, and also make sure that the structure of the page is as expected. To give some examples of what our expectations were, we wanted to detect:
All of those looked pretty promising and are open source.
- broken images in the site
- broken layout on a page
- missing text
All of those looked pretty promising and are open source.
Quality Bots
This tool is really promising. It is developed by Google and its primary goal is to reduce the regression test suite and provide free web testing at scale with minimal human intervention. Usually UI testing that happens by different frameworks is via image comparison, but even if it sounds promising it is not an industry de facto quality assurance methodology. As described in Quality bots site:
[it] will crawl the website on a given platform and browser, while crawling it will record the HTML elements rendered at each pixel of the page. Later this data will be used to compare and calculate layout score.
The approach Quality Bots is following sounded really promising, but integration of such a tool in our infrastructure turned out to be more time consuming than we wanted, so we decided to defer this for a later time. However, I strongly recommend anyone working on testing to read though Quality Bots wiki/code, to understand how it works. Even if you don't end up using the tool, you can definitely get ideas out of Google's testing procedure.
Fighting layout bugs (FLB)
Fighting layout bugs is an automatic library for the detection of layout bugs. It currently supports detection for the following scenarios:
- invalid image URLs
- text near or overlapping horizontal edge
- text near or overlapping vertical edge
- text with too low contrast
- elements with invisible focus
validator.nu
As a sanity/lint check, we also validate the structure of our HTML. Invalid HTML can usually lead to ugly layout bugs. validator.nu is used by w3c for HTML5 validation. It validates HTML5, SVG, MathML, RDF, IRI and more. It also runs as standalone service. So for us it was a no-brainer to use it.
We integrated it by implementing a middleware. This middleware sends content to a local instance of validator.nu on process_response. An HtmlValidationError is thrown when the html is invalid. In this case, we add a list of html errors in the response and output this list of errors at the bottom of the page; here is an example of how it looks:
Performance Testing
We use various tools to test our site's performance. A well known tool we use is apache's ab tool. ab is a tool for benchmarking apache's HTTP server. It shows how many requests per second (RPS) an apache installation is capable of serving.
We also use apache's JMeter and bash scripts to produce heavy load on our servers to test their strength on different load types. With those tests:
- we check response codes for various groups of pages
- we measure the min, max, average response time for accessing these links
- we display the success rate for accessing all of the links
- we issue random requests to our servers with various concurrency levels
Last but not least, something that we are currently looking into is a log replay mechanism to measure our performance. In general with performance testing, we can test with various loads and for some specific URLs, though the traffic we produce is not realistic. With log replay functionality we have the ability to "replay" requests based on apache's access log. This way, we have the ability to measure our performance under traffic that is produced by real users.
I would strongly recommend, if you are interested in reading about performance testing, to go through this resource: http://www.igvita.com/2008/09/30/load-testing-with-log-replay/, its really useful.
Testing Results Presentation
All our tests are run with a single fabric command, to which we can pass arguments to disable specific stages if we want to. This command is invoked in every build we run via jenkins and if our tests fail, the build also fails. Code coverage, counts of failing tests, screenshots of broken layouts (found via UI testing) and soon performance results are all presented with graphs in jenkins. Here's a few example screenshots: