Wednesday, September 5, 2012

Test infrastructure of a webapp

Testing large web applications can be really challenging. For this effort we focused on testing the site in 4 different fronts:
  1. Unit testing
  2. Functional testing
  3. UI testing
  4. Performance testing

Unit Testing

We invested quite a bit of time to cover our code with as many unit tests as possible. Of course the decision on how to develop unittests was pretty easy, given that we are working on django. We used django's testing framework and standard library unittest from django utils. Currently we have an overall coverage of ~85%, and here is an example of how it changes over time.

Functional Testing

Selenium is the standard way of performing functional testing. It simulates a user's browsing behavior on a site. This is what is used also for oDesk's functional testing. Since ver1.4, django supports natively selenium testing.
[...] LiveServerTestCase allows the use of automated test clients other than the Django dummy client such as, for example, the Selenium client, to execute a series of functional tests inside a browser and simulate a real user's actions.
For every basic group of pages oDesk's visitor site supports, extensive functional tests are written to identify and report broken pieces of the site.

UI Testing

This is where things got really interesting. When we talk about UI testing we mainly mean testing the actual visual result that the user sees in their browser, and also make sure that the structure of the page is as expected. To give some examples of what our expectations were, we wanted to detect:
  • broken images in the site
  • broken layout on a page
  • missing text
There are various tools that can do this, but integration is not always as easy. The tools that we ended up researching were:
  1. Quality Bots
  2. Fighting layout bugs
  3. validator.nu
All of those looked pretty promising and are open source.

Quality Bots

This tool is really promising. It is developed by Google and its primary goal is to reduce the regression test suite and provide free web testing at scale with minimal human intervention. Usually UI testing that happens by different frameworks is via image comparison, but even if it sounds promising it is not an industry de facto quality assurance methodology. As described in Quality bots site:
[it] will crawl the website on a given platform and browser, while crawling it will record the HTML elements rendered at each pixel of the page. Later this data will be used to compare and calculate layout score.

The approach Quality Bots is following sounded really promising, but integration of such a tool in our infrastructure turned out to be more time consuming than we wanted, so we decided to defer this for a later time. However, I strongly recommend anyone working on testing to read though Quality Bots wiki/code, to understand how it works. Even if you don't end up using the tool, you can definitely get ideas out of Google's testing procedure.

Fighting layout bugs (FLB)

Fighting layout bugs is an automatic library for the detection of layout bugs. It currently supports detection for the following scenarios:
  • invalid image URLs
  • text near or overlapping horizontal edge
  • text near or overlapping vertical edge
  • text with too low contrast
  • elements with invisible focus
All these scenarios are commonly found in software and instead of catching them manually, we integrated FLB with our framework and detection happens automatically. FLB is written in java and we integrated it in django with py4j. Py4j gateway server is run automatically by a fabric script executing tests. FLB is used with Firefox via the WebDriver implementation provided by Selenium. FLB test cases are invoked each time selenium.get method is executed. Here is how this is implemented:

validator.nu

As a sanity/lint check, we also validate the structure of our HTML. Invalid HTML can usually lead to ugly layout bugs. validator.nu is used by w3c for HTML5 validation. It validates HTML5, SVG, MathML, RDF, IRI and more. It also runs as standalone service. So for us it was a no-brainer to use it. We integrated it by implementing a middleware. This middleware sends content to a local instance of validator.nu on process_response. An HtmlValidationError is thrown when the html is invalid. In this case, we add a list of html errors in the response and output this list of errors at the bottom of the page; here is an example of how it looks:

Performance Testing

We use various tools to test our site's performance. A well known tool we use is apache's ab tool. ab is a tool for benchmarking apache's HTTP server. It shows how many requests per second (RPS) an apache installation is capable of serving.

We also use apache's JMeter and bash scripts to produce heavy load on our servers to test their strength on different load types. With those tests:
  • we check response codes for various groups of pages
  • we measure the min, max, average response time for accessing these links
  • we display the success rate for accessing all of the links
  • we issue random requests to our servers with various concurrency levels

Last but not least, something that we are currently looking into is a log replay mechanism to measure our performance. In general with performance testing, we can test with various loads and for some specific URLs, though the traffic we produce is not realistic. With log replay functionality we have the ability to "replay" requests based on apache's access log. This way, we have the ability to measure our performance under traffic that is produced by real users.

I would strongly recommend, if you are interested in reading about performance testing, to go through this resource: http://www.igvita.com/2008/09/30/load-testing-with-log-replay/, its really useful.

Testing Results Presentation

All our tests are run with a single fabric command, to which we can pass arguments to disable specific stages if we want to. This command is invoked in every build we run via jenkins and if our tests fail, the build also fails. Code coverage, counts of failing tests, screenshots of broken layouts (found via UI testing) and soon performance results are all presented with graphs in jenkins. Here's a few example screenshots:




Friday, July 13, 2012

Building a structured and distributed CMS

As a first step of oDesk's rebranding project, we had to move our home-grown CMS into a more robust and full featured one. Having in mind that we'll eventually need to develop the rest of the visitor site in the same infrastructure and after benchmarking drupal vs django-cms (see here analysis) we picked django-CMS. So currently all our static pages are build via our django-CMS installation.

Building static pages with reusable SCSS components


One of our primary goals was to be able to use reusable CSS components in our static pages, so that we don't end up with high level of customized CSS in our static pages. We were using Compass to define CSS components but we wanted to expose them in our CMS also. So we had to create a custom plugin for django-CMS. The plugin includes 5 basic fields which you can see here:


And the way this works is the following:

class SassStylesheetPlugin(CMSPlugin):
    name = models.CharField(...)
    css_media = models.CharField(...)
    scss_body = models.TextField(...)
    compiled_css_body = models.TextField(...)
    upload = models.BooleanField(...)

    def _get_compiled_css(self):
        compiled_body = scss_compiler.compile_scss(self.scss_body)
        content = '<style type="text/css" media="%s">%s</style>' % (
            self.css_media, compiled_body)
        if not self.upload:
            return content
        return compressor.output()

    def save(self, commit=False):
        self.compiled_css_body = self._get_compiled_css()
        return super(SassStylesheetPlugin, self).save(commit=commit)

The plugin reads the input SCSS and compiles it to a CSS output. The compressor we use is django-compressor and the SCSS compiler we use is pyScss.

This way writing static pages in our CMS becomes easier and less messy. CMS administrators can @import reusable SCSS components that are globally defined in our stylesheets and use them in their pages. As a result we can have static pages that are using less CSS code and actually look and feel similar to the rest of the site.

Using S3 as a shared resource


One of the things we always have in mind while developing the visitor site is that we are operating in a distributed environment in the cloud—our services are replicated on multiple EC2 instances in multiple availability zones. As a result, a single server instance failure or even a whole availability zone failure should not impact our reliability.

When admins create static pages or upload content such as images, these files need to be stored in a central location so that they can be accessed by all servers. S3 provides an ideal solution for this, as it is a storage service that can also serve the content directly to the visitors.

Every time a page on the CMS is modified we automatically compile the SCSS to CSS, we create a file and upload it to S3 (just as we do with images and other files) and then just append in the HTML of the page a link to the stylesheet.

Monday, July 2, 2012

New oDesk brand launched!

After a long period of silence, I'm back! About a year ago I transitioned to oDesk, where I got the chance to learn and implement many new technologies.

The new oDesk visitors site (ie whatever the user sees before they signup) is built on a django stack, using apache and running on multiple EC2 instances, with an RDS backend. We also use continuous integration with jenkins and have an automated deployment process which allows us to easily push our code changes in multiple servers seamlessly. In our FE we are using HTML5, Compass and OOCSS to make our code cleaner and more reusable and django compressor to minify it.

The road was long, rough and exciting. I learned way too many things and still learning during this project...since its not over yet. :)

I'll try to share my experiences in the following months, but for now please browse through the odesk site as a visitor and enjoy the experience.

Stay tuned for more updates.

Monday, September 5, 2011

Performance Comparison between django-cms and drupal

Recently, I run a benchmark to compare the performance of a django-cms vs drupal installation. The results were pretty interesting. Let me start by pointing you first to this article I found online, which was the most comprehensive comparison between drupal and django I've seen overall.

I'll summarize here the points of that article, but I'd strongly recommend you to read it, to get the big picture. There are eight different sections in which the author of that article focuses and these are:

  • Templating system
  • OOP
  • MVC/MTV
  • ORM
  • Learning Curve
  • Community Resources
  • Security
  • Performance

Lets take a look at each one briefly.

Templating system

Django is using a clean inheritance-based templating system which makes it easy to nest templates within templates, use different templates for different parts of the site and to completely eliminate redundancy in template code. In contrast, with Drupal it can be difficult to customize templates.

OOP

Python, by its nature, is an object oriented language and thus is django. PHP in general is not object oriented.

MVC/MTV

Django is using the model-view-controller architecture (which is know as MTV within the django community). MVC is the most widely accepted architecture to build webapps. Drupal follows the PAC (Presentation-Abstraction-Control) model rather than MVC.

ORM

In Django, you have to define your data models that describe the site. All models have datatypes and those are interdependent. From defined data model, database tables are auto-generated, and the system becomes internally “aware” of data relationships. To query your db you use Object Relational Mapper (ORM). So, rather than writing SQL queries, for django one has to write “querysets” like: customPage = CustomPages.objects.filter(author='foo') This way we avoid long error prone SQL queries (both security and development effort advantages). Drupal does not have an ORM. In this case I'll have to also note that using ORM is not always ideal. Keep in mind that when we use ORM based query, we are not aware of the actual SQL queries that run behind the scenes. So in simple, flat queries ORM is fine, although moving towards more complicated queries, ORM may become a performance headache.

Learning Curve / Language

Drupal is often accused for having a steep learning curve. I would argue that learning django/python is way easier, but probably I am biased towards python, since I am familiar with this language.

Security

It’s true that Drupal has a history of security issues, whereas similar issues in django are extremely rare.

Performance

In this section the article was kind of falling short. So I tried to clear out which is the winner in this case and below are the results of what I found.

Performance Comparison

I run a benchmark of a django-cms vs drupal installation on my laptop. Benchmark used is FunkLoad. Results on the following table are after testing on my laptop, which has 8 cores (2GHz Intel i7), 8G memory. For these tests mod_wsgi is configured to spawn 20 processes. Be aware: by default mod_wsgi is using just one process. Both drupal and django pages tested are pretty simple html pages with minimal content (one 12K css file, no js). Have to note here that the django-cms page included one plugin, which was loading a news item.

Installation Concurrent Users RPS (Requests per second) Response Time
Django-cms 50 99 0.25
Drupal 50 111 0.23

Below you can find more detailed graphs.

Drupal

Concurrent Users

Requests per sec

Django-cms

Concurrent Users

Requests per sec

Results

As you can see from the results, the difference in the performance between those two installations is negligible. Django-cms loads the page with 0.02sec delay, although this is probably related to the fact that the django-cms page included one plugin and on the other hand drupal's page was pretty much empty ("plugin free"). Bottom line is that there is no major difference between the two frameworks performance-wise. So at the end of the day its up to the developer/decision-maker to pick the language of his/her preference and build a CMS with one of the two frameworks. Performance shouldn't be a concern.





References
  1. Django documentation
  2. Drupal documentation
  3. PAC
  4. MTV
  5. Drupal or Django?

Saturday, April 2, 2011

When does DOM access slow down rendering?

Page speed is becoming a pretty hot topic. There are quite a few tools out there helping developers to make their UIs more responsive. My personal favorite is Page Speed which was recently released for Chrome and as an online tool and existed for a while for Firefox. This kind of tools can give you precious information about your website's performance, but if your JS execution still takes a long time, then repaints and reflows could be main cause of your problem.

How does a browser render a page?

  1. Browser reads the HTML and constructs a DOM tree (tag=node, text=text node and root=html element)
  2. Browser parses css...keep in mind the order rules are being picked up is (order is provided from stronger to weaker rule):
    • IDs with !important declaration in a rule
    • Classes with !important declaration in a rule
    • Elements with !important declaration in a rule
    • Inline styles
    • IDs
    • Classes
    • Elements
  3. Browser constructs the render tree, which is the DOM tree but with all styling rules applied. Every node in the render tree is called a frame or a box (from the W3C box model).
  4. Last but not least...browser starts drawing (paint) the tree nodes on the screen.

Be aware! Repaints and reflows may be really expensive!

When we change data were used to construct the render tree then we force the browser to do one of the following:
  • parts or complete render tree is revalidated and nodes dimensions are recalculated...this is called reflow. There's always one reflow, for the initial construction of the render tree.
  • parts of the screen are updated (because of changes dimensions of a node or because of stylistic change). This is called repaint, or redraw.

As most of us may already know, a script's running time is usually spent on executing the JS byte code, although what we may not realize is that oftentimes a lot of this time is also spent in performing DOM operations triggered by the script. Reflow is one example of these. The larger and more complex the DOM, the more expensive this operation may be. Reflows and repaints may cause your UI to appear sluggish.

What calls cause repaints and reflows?

  • Add, remove, update DOM nodes
  • Hide a DOM node with display: none (reflow and repaint)
  • Hide a DOM node with visibility: hidden (repaint only)
  • Move, animate a DOM
  • Add a stylesheet
  • User action like resizing window, changing font size, or scrolling

Can we be more specific?

Sure we can...so here is a list of properties and methods that can cause reflow in webkit as they are given by Tony Gentilcore:

Elements
clientHeight, clientLeft, clientTop, clientWidth, focus(), getBoundingClientRect(), getClientRects(), innerText, offsetHeight, offsetLeft, offsetParent, offsetTop, offsetWidth, outerText, scrollByLines(), scrollByPages(), scrollHeight, scrollIntoView(), scrollIntoViewIfNeeded(), scrollLeft, scrollTop, scrollWidth

Frame, image
height, width

Range
getBoundingClientRect(), getClientRects()

SVGLocatable
computeCTM(), getBBox()

SVGTextContent
getCharNumAtPosition(), getComputedTextLength(), getEndPositionOfChar(), getExtentOfChar(), getNumberOfChars(), getRotationOfChar(), getStartPositionOfChar(), getSubStringLength(), selectSubString()

SVGUse
instanceRoot

Window
getComputedStyle(), scrollBy(), scrollTo(), scrollX, scrollY, webkitConvertPointFromNodeToPage(), webkitConvertPointFromPageToNode()

So how can we fix that?

Don't worry there is still hope! Browsers are becoming smarter and smarter and they are trying to save us some time. So in this case they are smart enough to realize that these operations cost a lot and help by setting up a queue of changes our scripts require and perform them in batches. The queue keeps growing for an amount of time and then at some point its flushed causing only one reflow. But we have to be careful...
  • offsetTop, offsetLeft, offsetWidth, offsetHeight, scrollTop/Left/Width/Height, clientTop/Left/Width/Height, getComputedStyle() are forcing the browser to reflow in order to return the correct values. So, be cautious and call these elements only when you really need them.
  • Batch methods that manipulate the DOM separately from those that query its state. If this is not possible perform your DOM changes "offline", i.e use a documentFragment to hold your changes temporarily.
  • Change class names and not styles, if this is not possible, again change cssText property and not style

In conclusion, we have to be aware of how browsers work and try to reduce the amount of work they have to perform, otherwise we end up with not responsive and slow UIs, that make users unhappy. When you build UIs, keep always in mind the render tree and how many changes browser will have to make to it once your JS is executed.






References

Thursday, March 31, 2011

Yahoo User Interface 2.X Cookbook by Matt Snider

Recently I read Yahoo User Interface 2.X Cookbook by Matt Snider.

Here is what I think about it:

This is a well written book. Beware... it doesn't teach JavaScript development. If you are not familiar with basic concepts of JavaScript and event-driven programming, this is not the right book for you. The book emphasizes on the development of UIs using YUI2 framework. It gets right to the point: It gives you a handful of examples of how manipulate the DOM and use many different components (like Menus, Elements, Buttons, Drag&Drop) of YUI2, not in an easy and dirty way -- it shows you the right way to do it.

In contrast with other books that have lengthy sections of text explaining all the details of some topic, followed by a huge blocks of code towards the end of the chapter, this book is not written that way. It is organized into sections covering many different areas of YUI2 framework. Within a section you'll find headings for the topics "Getting ready", "How to do it", "How it works" and "There's more", a series of instructions that spell out exactly what to do for a sample scenario.

"Getting ready" shows what you need to have already setup in your environment.
"How to do it" is a section showing coding steps you need to take to achieve your goal.
"How it works" is a section explaining step by step what the code given in previous section does.
"There's more" is a section giving additional notes and examples you may find useful in further development.

I believe that YUI2 Cookbook is good for an experienced web developer who is tied to YUI2. However, I can't wait to see the YUI3 Cookbook. For sure YUI2 is a pretty good library, but YUI3 has way more potential. I wouldn’t suggest to anyone to start any project using YUI2 at this point, given that YUI3 is out there in a pretty good shape and YUI4 is on its way.

Sunday, January 16, 2011

Upcoming CSS and JS Features that I'm looking forward to!

Recently I attended a talk from Tab Atkins about future CSS and JS features that are currently being developed by Chrome's team in Webkit, and will be proposed to the W3C as standards, once the team feels confident about them. He mentioned quite a few interesting things I'd like to see soon in both languages.

CSSOM (CSS object model -- http://dev.w3.org/csswg/cssom/)

From W3C's site:
CSSOM defines APIs (including generic parsing and serialization rules) for Media Queries, Selectors, and of course CSS itself.
In simple words CSSOM will be the way that CSS and JS will interact. With those API's the browser will be able to give us meaningful values for DOM elements in JS. For example so far when we need to get the css width of a DOM element what we usually do is:
document.getElementById("some_id").style.width
If we have defined, "width: 100px", for element (#some_id) what this method would return is: "100px". So with this unfortunate value we'll have to parse out the string "px", since we were lucky enough to have width defined in px. If we didn't have px, then we would have to go through the process to modify em's lets say to px and then manipulate the value. With CSSOM we would be able to do:
document.getElementById("some_id").style.values.width.px
which will return the width of the element in px (e.g 100). Cool, right?

CSS Variables

At last what we've all been waiting for! CSS Variables can define stylesheet-wide values identified by a token and can be used in all CSS declarations. A recurring value, like colors or background-color in a stylesheet, will be way easier to update across the stylesheet if a developer has to modify at just one single place or value instead of modifying all style rules applying the property:value pair. Here is the proposed syntax:
@var header-color color #006

.header {
   color: var(header-color);
}
We will even be able to get these variables values from JS, in the following way:
dom.stylesheets[0].vars['header-color']
Notice that the type of the variable will need to be declared (in this case "color"). This will allow the browser to give more meaningful data to any JS that accesses the variable. For example, for a color, you can get
.red
to get the red component of the variable, etc. We will also be able to define local variables, which will have visibility only within a rule (with the @local keyword).

CSS Mixins (http://oocss.org/spec/css-mixins.html)

From OOCSS:
Mixins allow document authors to define patterns of property value pairs, which can then be reused in other rulesets. 
In simple words mixins is a set of rules which we can define and include in a different rule. Here is an example:
@mixin clearfix {
  overflow: hidden;
  _overflow: visible;
  zoom: 1;
}

.mainContent {
  color: #222;
  @mixin clearfix;
}


CSS Nesting

Currently there is a usually lot of repetition in long stylesheets. You may have run into something similar to the following rules:
#content >  ul {
  list-style-type:none;
  margin:0 2em;
}
#content > ul li {
  list-style-type:none;
  padding:2px 5px;
}
With CSS nesting that could be translated to a more compact and less verbose form, like:
#content {
  @this > ul {
    llist-style-type:none;
    margin:0 2em;
    @this li {
       list-style-type:none;
       padding:2px 5px;
    }
  }
}
Wouldn't it be great to have all of the above features? According to Tab Atkins all of the above features are still being developed by Chrome's engineers and are going to be first implemented in WebKit (Chrome and Safari) and then be proposed to W3C. So, we should expect to see these things in their finalized form (and developed by other browsers) by the end of this year. For those who really want to work with those features, they can download Chrome's dev channel and start playing with them. They should be coming really soon now. Enjoy! :)





References
Tab Atkins' slides from this talk: http://www.xanthir.com/talks/2011-01-12/slides.html