Boston ElasticSearch Meetup, 3.27.13 - Language Search presentation

External link: http://www.slideshare.net/bwarner77/language-search


EasyMock Delegation: Testing Twitter... without Twitter

image

So you’ve decided the pain of maintaining your third-party unit tests have finally outweighed the pain of reworking them into something more predictable, and you’ve entered into the wild world of mock objects. There’s no way around it, really… it was bound to happen. Look, even if your app didn’t talk to a third-party service, you’d still want to prevent any possible variation in the environment in which your test is running, outside of the exact thing you’re testing, making all things absolutely predictable. This way if your tests fail, you’ll know exactly where to look. But why am I telling you this? You are all pros! There are many resources out there for the “hows” and “whys” of mock objects, but if you’re already familiar with the concepts and libraries, this post is for you.

This is a post about how to use EasyMock for mocking Twitter, and how you can use Delegation to allow you to accomplish certain things more directly in dealing with complex implementations you’d otherwise have to do yourself.

Disclaimer: We’re using EasyMock, But you might also wanna check out jMock and Mockito as other options.

The Use Case

So let’s say you’ve built an app that integrates with a thirdparty API such as Twitter, and so you’ll probably want to test your app against specific responses from Twitter to make sure the business logic is performing correctly. For example, let’s just say every time Justin Bieber gets retweeted, your app sets off a fireworks show. Now, your unit tests want to test this behavior and you are using EasyMock to mock whatever home grown REST client you’ve written, returning the mock JSON response you’ve put together. You set your Expectations and Answers, and voila! You’ve mocked Twitter!

But then you realize that popular thirdparty Java clients exist for a reason, and instead of going through the process maintaining every single endpoint that Twitter provides, you decide to switch to something like Twitter4J, which makes integrating with Twitter very easy (it really does). So you figure, “I’ll just Mock Twitter4J”, and it should be easy, right? Wrong. Here’s why:

Twitter4J’s User Timeline endpoint is defined as:

ResponseList<Status> Twitter.getUserTimeline()

And this is the code you expect to write:

ResponseList<Status> list = ... // Figure out how to construct a ResponseList object with the Statuses you want.
Twitter mockTwitter = createMock(Twitter.class);expect(mockTwitter.getUserTimeline(EasyMock.isA(String.class), EasyMock.isA(Paging.class))).andReturn(list);

… but you you find yourself in a rabbit hole. At this point you’ve just painfully realized that to create the ResponseList object with custom JSON is not exactly trivial and here’s why: ResponseList is an interface, and so is Status. Are there implementations? Well, if you follow the Twitter4J source far down enough, you’ll find a series of factories that translate the JSON responses from the HttpResponse objects into implementations of the return types we’re working with. It’s a lot of code to write! If you want to put all the control of the response in your hands, because essentially you’re going to have to recreate what’s being done under the hood for Twitter4J. So, I’ve put together some code so you can easily mock Twitter4J’s getUserTimeline(), in a way that puts control back in your hands that uses some of EasyMock’s tricks, without writing a lot of code. So without further ado:

* Drumroll *

Step 1: Define Your JSON Response

This is the hard part :) Creating your own custom tweets as JSON Strings is such a pain. So do yourself a favor and run a few calls against Twitter, copy/paste, escape all those quotes, and make any necessary changes via grep.. etc. So let’s say in your unit test, you’ll want the first Twitter UserTimeline call to returns 30 statuses from Oprah, the second call to return 20 statuses from Justin Bieber, and the last call returns nothing, your String array looks something like:

public static String OPRAH_RESPONSE  = "[ { ... // 30 statuses from Oprah
public static String JBIEBER_RESPONSE = "[ { ..// 20 statuses from Biebs
public static String EMPTY_RESPONSE = "[]"

String expectedJsonResponses = new String[]{ OPRAH_RESPONSE, JBIEBER_RESPONSE, EMPTY_RESPONSE };

And let’s turn the String array into JSON:

JSONArray array = new JSONArray(twitterRawJSONResponse);

Step 2: Create A Method Which Mocks ResponseList

If you’ve taken a look at the Twitter4J source, you’ll notice that ResultStatus is also an extension of java.util.List and there are a bunch of methods that are already defined by the List interface, and we’re going to use EasyMock to delegate those particular methods by ResponseList to an implementation of List that we can more easily create and work with… an ArrayList. Creating an ArrayList, is a lot less verbose than recreating Twitter4J’s ResponseList. Go ahead and check out the source, it’s rather complex.

This is what EasyMock refers to as Delegation. EasyMock describes delegation as:

“ <Delegation allows mock objects> to delegate the call to a concrete implementation of the mocked interface that will then provide the answer. The pros are that the arguments found in EasyMock.getCurrentArguments() for IAnswer are now passed to the method of the concrete implementation. This is refactoring safe. The cons are that you have to provide an implementation which is kind of doing a mock manually… Which is what you try to avoid by using EasyMock. It can also be painful if the interface has many methods.”

… however in this case, it works so beautifully!  Instead of redefining a behavior from scratch, you can just delegate to a similar object that already has the behavior defined, except your delegate is completely in your control.

Another scenario in which you would want to use Delegation could be around a database cursor, for example MongoDB’s DBCursor.   It implements the Iterator interface.  If you were mocking some database behavior and you had a pre-constructed list of objects you wanted work with, instead of trying to mock the behavior of DBCursor.next() and manually plot out what each subsequent next() would return (because remember, mock objects is a recording that needs to be replayed in a certain order), you can just delegate the behavior to a more easily constructed type that you can work with.   

Let’s see it in action and create an ArrayList object and populate it with some tweets!

final List<Status> list = new ArrayList<Status>();
if (array.length() > 0) {
  for (int x = 0; x < array.length(); x++) {
    JSONObject obj = (JSONObject) array.get(x);
    Status status = DataObjectFactory.createStatus(obj.toString());
    list.add(status);
  }
}

Now we’re going to create a mock of the ResponseList interface and delegate the methods being used by Twitter4J over to our List that we just created:

ResponseList<Status>  responseList = EasyMock.createMock(ResponseList.class);

Now you can use Delegation! Since you’re mocking the ResponseList and you’ll probably want to treat your ResponseList as a regular ArrayList to iterate through your results, all you have to do is simply delegate those method calls to your list object:

EasyMock.expect(responseList.toArray(EasyMock.isA(Status[].class))).andDelegateTo(list).anyTimes();

Your list contains the custom Status objects you’ve generated from your JSON, and they’ve been loaded into a list, so all you’d need to do is just delegate the work to ArrayList.  This way,  you don’t have to write custom code for every single Twitter scenario you want to test. Simply load the JSON into objects, parse them into Status objects, put them in a collection and delegate ResponseList’s behavior to that collection.

Let’s put all this code into a method called: buildTwitterServiceStatusResponseList(String jsonResponse)

Step 3. Mock Twitter4J

Remember this from above?

String expectedJsonResponses = new String[]{ OPRAH_RESPONSE, JBIEBER_RESPONSE, EMPTY_RESPONSE };

Let’s pass those into the above method and create a mock Twitter4J:

Twitter twitter = EasyMock.createMock(Twitter.class);
IExpectationSetters<ResponseList<Status>> expectation = EasyMock.expect(twitter.getUserTimeline(EasyMock.isA(String.class), EasyMock.isA(Paging.class)));
for (String resp : expectedJsonResponses) {
     expectation.andReturn(buildTwitterServiceStatusResponseList(resp)).times(1);
}
EasyMock.replay(twitter);

That’s all there really is to it. I’ve put a gist together where you can see the code in it’s entirety.

Remember: If your code uses other methods for ResponseList, you’ll have to create an expectation for those method calls that you can either Answer, Return, or DelegateTo…  or EasyMock will fail.

In summary, delegation in EasyMock is useful, if you care about the results that are coming back from your mocked service, and constructing a delegate is trivial compared to actually working with the objects themselves.

At TRAACKR our team spends a LOT of time thinking and implementing the ins and outs, as well as the dos and don'ts of interfacing with third party API’s and so we hope to provide engineers with some of the tools of the trade that will hopefully make your lives a little easier.


Simple math in Unix

I was trying to get some basic stats from our Apache access logs today. For each request we log how long it took to serve the request. The obvious choice would be to download the logs locally and try to load it in Excel to calculate, say, average request duration. But there is a simpler and faster way to do this directly on the server. ’awk’ to the rescue!. You can simply do:

grep “/path/request” access.log | awk ‘BEGIN{k=0;s=0}{k++; s=s+$14}END{print s/k}’

The ’BEGIN’ block will initialize variables. ’k’ for the number of rows found and ’s’ for the sum of requests duration.

The main block adds field ’$14’ to the sum (remember awk will split input lines on spaces by default -you can change that-. Choose the field you are interested in accordingly).

Finally the ’END’ block will print the average.

Easy a pie!


Chrome issues

Chrome issues

Why you should buy solid-state drives #SSD


#Java #Programming Tidbits

Observations from the world of Traackr programming

  1. Using Apache Commons CLI - Very helpful library for parsing command line arguments passed to your script. You are able to construct an org.apache.commons.cli.Options object and add an option for each acceptable argument to your script (the argument name, whether it accepts a value, a user-friendly description, etc.)
    options.addOption("h", true, "required: server host name:port");

    You can parse the arguments passed to your script into a org.apache.commons.cli.CommandLine object, and quickly validate it against your predefined options. The library can also nicely format your Options object into a user-friendly system message. #propz to @gstathis for finding this!

  2. Using org.slf4j.profiler.Profiler - Nifty little class for performing time profiling in your scripts, and being able to chain or nest different Stopwatches together. Again, #propz to @gstathis for this find.
  3. Java’s jvisualvm Plugins - Pretty cool that jvisualvm offers additional plug-ins to supplement its profiling capabilities. One in particular that came in really handy was the Threads Inspector plugin. Recently, one of our long-standing processes stopped reporting anything in our logs, and it was unclear whether it had failed because no exceptions were thrown. The Threads Inspector plugin allowed us to select one of our managed threads for this process and we could see that if was stuck waiting indefinitely on a CountDownLatch! A recent change had caused one of the child processes to die unexpectedly and the countdown latch was never properly updated (plus, we mistakenly never configured a time-out when waiting on this particular lock). #propz to the random Java developer who created this plugin, wherever you are…

Rajiv's blog: Scaling lessons learned at Dropbox, part 1

Rajiv's blog: Scaling lessons learned at Dropbox, part 1

Maven - Separating Unit / Integration Tests

We use Maven as our internal build tool for our Java projects, and most of time it works great. But recently, when trying to separate the execution of our integration tests from our unit tests, I ran into an unexpected hurdle.

Our internal code is a multi-module project, where each module provides its set of unit tests and integration tests for our internal build to execute. If the tests are successful (among other things), then each module is deployed as a snapshot JAR to our internal Nexus repository. Deploying snapshots allows our developers to work on other areas of our project’s code-base without having to build all internal dependencies locally.

Now, we need our internal build to be a) fast and b) reliable as we commit changes frequently throughout the day, which means we want to be building & deploying code frequently throughout the day as well. As you would guess, this is exactly the opposite of what integration tests are: slow (non-millisecond execution times) and unreliable.

While Maven distinguishes between unit tests and integration tests, it has not excluded the latter during its default build cycle. So, when running “mvn clean deploy”, Maven will execute all the build phases underneath it (..compile->test->..->integration-test->..->install->deploy). For us, this means that every time we execute an internal build, each module’s integration tests will be run as a result (no thanks).

So, I wanted to know how I could run our modules’ integration tests separately from our regular build. There were quite a few questions on Stack Overflow around this, but none of the responses were 100% accurate. So, after a bunch of trial-and-error to get Maven to behave how I wanted, I figured I’d share our solution.

Note: Solution assumes that Integration tests conform to the naming convention of *IntegrationTest.java

1) Exclude integration tests from the regular build test phase. Include the following in the build section of your POM:

       <plugin>
         <artifactId>maven-surefire-plugin</artifactId>
         <groupId>org.apache.maven.plugins</groupId>
         <version>2.9</version>
         <configuration>
             <skip>false</skip>
             <useFile>false</useFile>
             <argLine>-Xms1024m -Xmx1024m -XX:MaxPermSize=512m</argLine>
             <excludes>
                 <exclude>**/*IntegrationTest.java</exclude>
             </excludes>
         </configuration>
       </plugin>

Now our regular build on our CI environment can run the standard “mvn clean deploy” fast and efficiently. The only tests that will be run will be unit tests.

2) Create a separate Maven profile that will only execute integration tests (and exclude all unit tests):

<profiles>
    <profile>
        <id>integration-test-builder</id>
        <build>
            <plugins>
                <plugin>
                    <artifactId>maven-surefire-plugin</artifactId>
                    <groupId>org.apache.maven.plugins</groupId>
                    <version>2.9</version>
                    <configuration>
                        <skip>false</skip>
                        <useFile>false</useFile>
                        <argLine>-Xms1024m -Xmx1024m -XX:MaxPermSize=512m</argLine>
                        <excludes>
                            <exclude>none</exclude>
                        </excludes>
                        <includes>
                            <include>**/*IntegrationTest.java</include>
                        </includes>
                    </configuration>
                    <executions>
                        <execution>
                             <id>integration-tests</id>
                              <phase>integration-test</phase>
                            <goals>
                                <goal>test</goal>
                            </goals>
                          </execution>
                    </executions>
                </plugin>
            </plugins>
        </build>
    </profile>   

This profile can be run as such: mvn clean integration-test -P integration-test-builder. We can have a separate build job on our CI environment whose only purpose is to run integration tests once per day using this profile.


Geohashing Demystified by xkcd

Geohashing Demystified: xkcd


The Simpsons Love Traackr

The Simpsons are very impressed with Traackr (@ SF office)