Mod_Pagespeed Is Now Out Of Beta

Mod_Pagespeed Is Now Out Of Beta

Understanding Amazon EBS

Understanding Amazon EBS

Redis and Memcached benchmarking on AWS

Redis and Memcached benchmarking on AWS

Why you should buy solid-state drives #SSD


Elasticsearch Index Warmup API: it really whips real-time search's ass!

If you are into search and have not checked out elasticsearch yet, you are really doing yourself a diservice. Elasticsearch has a fantastic set of features, it’s very approachable, super hackable and unit-testable (in-memory nodes rock). It also comes out-of-the-box with the ability to scale to hundreds of nodes and provides automatic replication to boot. Combine that with simple administration and you have a real boon for tech startups that can’t afford dedicated sysadmins. All of the above contributed to us recently migrating our Solr setup to it. 

The Elasticsearch engine comes with real-time search capabilities out-of-the-box which have been documented before. Having said that, one still needs to tweak their settings (refresh times, merge sizes, GC new vs. tenured generation ratios, etc.) according to their system’s usage patterns to get optimal performance out of it.

Even then, if you frequently add large amounts of new documents to your index like we do at Traackr, you can’t really escape the inevitable merges that have to occur in the background to keep your Lucene segments to a manageable number.

It’s either that or watch your search-times come down to a crawl as your segments get out of control. What this means is that your caches will inevitably get cleared as old segments are merged into new ones. And if you happen to use a cache-heavy query like top_children, then some search requests get the short end of the stick as they sit around waiting for those caches to re-populate.

Enter the fantastic Index Warmup API, currently available in master. This feature allows you to define warm-up queries that will periodically get executed as merges occur and caches get invalidated. Those queries are executed on separate threads, performing all the heavy data loading in the background while users can still run regular queries without interruptions. As a result, you get an almost perfectly smooth request latency curve: 

If your requirements include near-real-time (NRT) search, you should absolutely give elasticsearch a spin. 

PS1: if you want to get great insights into how your Solr/Elasticsearch engine is behaving (pretty graphs above), try the performance monitor from Sematext.

PS2: for those of you too young to recognize the Nullsoft nod in the title, take a short stroll down memory lane. Since NRT search is a beast in and of itself, I figured it would be fitting.


Amazon now offers SSD-backed EC2 instance

Amazon just announced they are now offering SSD-backed instance. This pretty awesome. I have been hearing great feedback on performance from everybody using SSD for IO intensive application. Obviously. We can now finally try it on AWS. Here are the specs they offer for now:

  • 8 virtual cores, clocking in at a total of 35 ECU (EC2 Compute Units).
  • 60.5 GB of RAM.
  • 10 Gigabit Ethernet connectivity
  • 2 TB of local SSD-backed storage

The only downside? It ain’t cheap: $3.10/h


Rajiv's blog: Scaling lessons learned at Dropbox, part 1

Rajiv's blog: Scaling lessons learned at Dropbox, part 1

Maven - Separating Unit / Integration Tests

We use Maven as our internal build tool for our Java projects, and most of time it works great. But recently, when trying to separate the execution of our integration tests from our unit tests, I ran into an unexpected hurdle.

Our internal code is a multi-module project, where each module provides its set of unit tests and integration tests for our internal build to execute. If the tests are successful (among other things), then each module is deployed as a snapshot JAR to our internal Nexus repository. Deploying snapshots allows our developers to work on other areas of our project’s code-base without having to build all internal dependencies locally.

Now, we need our internal build to be a) fast and b) reliable as we commit changes frequently throughout the day, which means we want to be building & deploying code frequently throughout the day as well. As you would guess, this is exactly the opposite of what integration tests are: slow (non-millisecond execution times) and unreliable.

While Maven distinguishes between unit tests and integration tests, it has not excluded the latter during its default build cycle. So, when running “mvn clean deploy”, Maven will execute all the build phases underneath it (..compile->test->..->integration-test->..->install->deploy). For us, this means that every time we execute an internal build, each module’s integration tests will be run as a result (no thanks).

So, I wanted to know how I could run our modules’ integration tests separately from our regular build. There were quite a few questions on Stack Overflow around this, but none of the responses were 100% accurate. So, after a bunch of trial-and-error to get Maven to behave how I wanted, I figured I’d share our solution.

Note: Solution assumes that Integration tests conform to the naming convention of *IntegrationTest.java

1) Exclude integration tests from the regular build test phase. Include the following in the build section of your POM:

       <plugin>
         <artifactId>maven-surefire-plugin</artifactId>
         <groupId>org.apache.maven.plugins</groupId>
         <version>2.9</version>
         <configuration>
             <skip>false</skip>
             <useFile>false</useFile>
             <argLine>-Xms1024m -Xmx1024m -XX:MaxPermSize=512m</argLine>
             <excludes>
                 <exclude>**/*IntegrationTest.java</exclude>
             </excludes>
         </configuration>
       </plugin>

Now our regular build on our CI environment can run the standard “mvn clean deploy” fast and efficiently. The only tests that will be run will be unit tests.

2) Create a separate Maven profile that will only execute integration tests (and exclude all unit tests):

<profiles>
    <profile>
        <id>integration-test-builder</id>
        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <groupId>org.apache.maven.plugins</groupId>
                    <version>2.9</version>
                    <configuration>
                        <skip>false</skip>
                        <useFile>false</useFile>
                        <argLine>-Xms1024m -Xmx1024m -XX:MaxPermSize=512m</argLine>
                        <excludes>
                            <exclude>none</exclude>
                        </excludes>
                        <includes>
                            <include>**/*IntegrationTest.java</include>
                        </includes>
                    </configuration>
                    <executions>
                        <execution>
                             <id>integration-tests</id>
                              <phase>integration-test</phase>
                            <goals>
                                <goal>test</goal>
                            </goals>
                          </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </profile>   

This profile can be run as such: mvn clean integration-test -P integration-test-builder. We can have a separate build job on our CI environment whose only purpose is to run integration tests once per day using this profile.


Conversion Optimization

Conversion Optimization

Amazon's CloudFront adds support for dynamic content

Amazon's CloudFront adds support for dynamic content