Book Review: SQL Antipatterns

SQL is an interesting yet temperamental tool in the programmer's kit. You can use it to find amazing insights in your data, but you can watch a poorly formed query grind your application to a halt. SQL Anitpatterns wants to help you avoid the latter by identifying common pitfalls when designing and using databases.

The book identifies twenty-four antipatterns and breaks each down in the same formula -- an objective one is trying to achieve, the antipattern typically used as a solution, how to spot this antipattern, the legitimate uses of the antipattern, and alternative solutions. The author introduces each antipattern with a brief example scenario followed by a summary of the objective people are typically trying to achieve.

Next, examples of the antipattern are given with an explanation of why you would not want to do this. Even with examples, the author provides other ways to recognize the antipattern; most of these are questions you are likely to ask or hear colleagues ask. If you hear or ask "How do I do X", that would be a good opportunity to pull this book off the shelf, give it a quick skim, and possibly avoid a pitfall.

However, the author recognizes there are legitimate uses of each antipattern and presents them before the solutions to avoid it. This allows you to decide if this is a pitfall to avoid or a necessary evil in your application. Finally, the appropriate solutions are presented with attention given to how the solutions might differ between database vendors.

Easy to read and digest, this is the worthy reference book to add to your shelf. As you extend or update a database and related application code, you will find yourself reaching for it to reduce the likelihood of being paged at night by a misbehaving database.


Photo by Karolina Szczur on Unsplash

Fresh Baked Roles with Zend Permissions RBAC

As our Traackr application has grown, so has our need to restrict access to features or pieces of features. Recently, our search feature added new filters to use. However, some of these filters are only meant to be available to enterprise customers. This was not a new problem for us; our analytics feature added a new report type, one only enterprise customers should have.

Feature flags were our initial solution to both efforts. This worked well enough for our analytics use case. Our customer success team attaches an account to the feature flag, and the account can view the new report. Cracks started showing in this approach when applied to our search effort.

Let's say we added four new filters to search. That is four new feature flags to create, not a terrible burden from a coding standpoint. But our poor customer success team... They now have to go account by account and flip four new switches to make sure the account has access to the search filters granted to them by their subscription. Yuck...

Role-based Access Control to the Rescue!

During one of our hack weeks, I had tinkered with the Zend Permissions RBAC package without a solid use case in mind. It was just an interesting concept that crossed my path at the time. Now, it seemed a perfect use case for it was in front of me. Before we dive into details, we should first answer an important question.

What is role-based access control (RBAC)?

A standard access control list (ACL) focuses on objects and who can act on them. Your computer's file system is an example of an ACL. Using a Linux file system, each file and directory is an object. The ACL for a file contains three entities -- the owner of the object, the group associated with the object, and everyone else. A feature in our application would be an object, and the feature flag ended up being the ACL attached to it.

The statement ACL makes is "This object is associated with these people, and each person can access the object in specific ways". Phrased in the context of our application, "this feature can be accessed by the following accounts."

How does RBAC differ then? I'll let the package's documentation make the initial introduction:

RBAC differs from access control lists (ACL) by putting the emphasis on roles and their permissions rather than objects (resources).

RBAC makes the statement "These people are associated with these roles. These roles grant them these permissions." Again, in the context of our application, the statement becomes "This account is associated with this role. This role allows them access to these features." This concept has an immediate benefit for us. Accounts can be granted a specific role. As we add a new feature, we update the role to have new permissions for the feature, and every account is automatically able to use the feature. No longer do we need to edit accounts individually.

Baking up our roles

Let's start building this out! We first need to define our roles and the permissions associated with them. Here is a stripped down version of our class to manage that:

Our "Enterprise Subscription Package" starts with a parent role representing just the subscription for the account. It contains no feature specific permissions and would contain more general permissions instead -- number of users allowed in the account, number of concurrent logins, quotas, etc. A child "Search" role is added to the subscription role. Within the "Search" role we grant permission (search) to use the search feature and grant permission to use the topic (search.topics), brand (search.brands), and location (search.locations) filters. The dot notation is our own convention for role and permissions names. Names can be any string, but this gives us a visual hierarchy of the pieces of a feature.

May I?

When a customer logs into our site, we check the subscription attached to their account and attach the appropriate Role object to their session. Our search page must now check they can access the page, and if so, what filters they should see.

That smells good...

The sample checks above may not seem much different than what you would do with features flags. We are just replacing "are they on this feature flag" with "do they have this role and permission". The benefit here is in the ease of maintenance, especially for managing accounts and features. Using RBAC, a new feature is added to a role, that role attached to a subscription, and that role addition / subscription update is shipped with the feature. Every account tied to that subscription has immediate access to the feature. Our customer success team no longer has to take time to turn the feature on for an account.

RBAC has also made it easy to extend the system even further. As an example, we built an override and exclusion system on top of it. If a customer abuses the location search feature, we can shut it off just for them. Need an incentive for a customer to upgrade? We can override the permissions to allow them access to a feature for a short time.

We will be migrating our existing feature flagged based items to RBAC. Using it has also sparked ways to improve other areas of our code, particularly user based permissions. Our code and process has been much more manageable since we baked RBAC into it. But don't take my word; give the recipe a try for yourself.


Yoda conditionals... Change my mind I should?

I had another blog topic in mind but ran across the article "The Mistakes I Made As a Beginner Programmer". The entire article is worth reading, but this item leaped off the page at me:

... avoid Yoda conditions and assignments within conditionals.

I have encountered this statement in other articles and in the coding standards for various projects. Reduced readability is the criticism typically levied against the pattern, though they seldom elaborate on why readability suffers. I picked up the habit of using Yoda conditionals from a former mentor, but it has never been a practice I strongly advocated using. Perhaps, it is time to reexamine if there is any value in using it at all.

What exactly is a "Yoda conditional"? Rather than saying this

a Yoda conditional reverses the expression:

It is a coding practice where you always phrase a conditional statement with the "constant" portion, 66 in this example, first. The goal of the practice is to avoid errors where you unintentionally perform variable assignment inside the conditional rather than making an actual comparison.

The first conditional will always pass because variable assignment results in a truth-y value; your clone troopers will exterminate the Jedi on every order given. Having Yoda on the field though causes an immediate disturbance in the Force. Hidden this subtle bug is not.

There is a clear benefit here, but is it enough to trump the benefits of readable code? Is there truly a problem with it being less readable? On the surface, there appears to be no issue. You see the "if" and know some sort of comparison will be made. What does it matter if 66 is listed first?

Our brains do not read code the way a machine does. It makes no difference to the machine what the order is. However, say the conditional out loud. "If sixty-six equals the order variable." "If the order variable equals sixty-six." The latter falls into more familiar patterns of speech. Reading one slightly awkward statement may not be a big deal, but try reading more and more of them while debugging a large code base. It will tax your mental capacity.

My development environment also includes a bevy of code sniffers and linters. They constantly remind me to use the triple equals for equality checks along with warnings when assigning variables inside a conditional. I am less likely to fat finger away an equal sign and end up with an assignment rather than a conditional check.

So... Hrm... Yeah, I think I finally agree; the benefit of using Yoda conditionals is not sufficient enough compared against code readability. I would much rather have my code be easy for people to read and understand. No longer use this pattern will I. Wait... Apologies... I will no longer use this pattern. There, that is much better.


A pipeline for cleaning up technical debt?

Traackr's customer facing application is built on top of CakePHP and has accumulated its fair share of technical debt over time. One particular piece of debt weighed on my mind. Our application fetches a core set of data from an API and then augments it with additional data our customers have entered. A single method handles augmenting the initial data through a series of transformations. This beast of a method cannot be tested. There are far too many pieces going on to write maintainable tests. It was long overdue for a refactor, and there was a particular pattern and accompanying package I wanted to try out.

A Pipeline for Success?

I decided to use the Pipeline library from The League of Extraordinary Packages for a few reasons. First, the pattern lends itself to writing code that does one thing and one thing very well. Here is a sample pipe:

A pipe is invoked and passed a single parameter, a payload of data. You act upon that payload and then return it for the next pipe to use or to be used as the final output of the entire pipeline. Given you can chain any number of pipes together, your logic can be focused on performing a single task. This in turn leads to being able to write and maintain unit tests for a pipe.

Second, it introduces a level of reuse the beast method could not. Each pipe is a reusable element we can chain together in any fashion. If we only needed to augment the data with a small subset of our customer data, this could easily be done by putting together a different pipeline. For example, here is our entire augmentation pipeline:

Just need to augment the data with notes and projects? Simple:

Small, easy to read classes... Classes that are easy to test and maintain... Easy to reuse items in a flexible way... Yes, please!

How's It Flowin'?

Excitement must be tempered with reality though. Yes, the beast may be bloated, but it does run very fast. Would this pipeline run just as fast? The initial results were... not promising. The augmentation was not horribly slow, but it was noticeably slower. Timing logic revealed the beast ran under half a second on average, while the pipeline was running a full second on average.

Why is this approach slower? I cannot provide concrete answers, but I have a theory. Notice the pipes in our full pipeline take in a number of arguments like $this->InfData or $this->TagUtil. These are CakePHP component and model classes. My best guess is there is some overhead in having these items passed around into the pipe classes. The beast does not suffer from this overhead because the framework has already bootstrapped those classes and made it available.

So... Now what?

While the execution time of the pipeline makes using it impractical, it was worth the effort to go through the exercise. Yes, we may not be able to use these pipe classes, but they and their tests can easily be ported back into CakePHP component methods. We will still get the benefit of small, easy to test, and reusable units of code.

This library may not be used in conjunction with our CakePHP code base, but it is certainly a tool I will reach for outside of that context. More importantly, using it reinforces the mind set of writing small, testable code. And that, more than any library or tool, is the best pipeline towards reducing your technical debt.


Book Review: Agile Retrospectives

Agile Retrospectives: Making Good Teams Great*cough* *cough* Goodness... It's a bit dusty around here. Nearly a year of silence will do that. One of the goals our team had last year was to blog more. This effort failed. But, why did it fail? How do we determine what went wrong and what actions to take to ensure we do better going forward? There happens to be a handy tool in the agile / scrum toolkit known as the "retrospective". A mini retrospective of what happened with our blogging might look like this.

What did we do well? While we did not generate much content, the content we did generate was something the writer was passionate about. What could we have done better? We could have encouraged the team to write about anything. It does not need to be a long, involved post. Just write about something that interested you. How can we improve things going forward? Have someone leading by example and being a resource to cajole ask people to provide more content in addition to theirs.

This is a simple example from my own mind. Things become more complex when trying to ask these questions from an entire team. The most invaluable resource I have found for learning how to lead retrospectives is Agile Retrospectives: Making Good Teams Great. This is typically one of the books I regularly gift to co-workers.

The book is only 142 pages long but contains a wealth of information and tools to use. It is organized into three sections. First, three chapters are dedicated to explaining how to prepare to lead a retrospective, how to tailor it to your team, and the actual process of the retrospective meeting. The next five chapters outline various activities you could use for each step of the meeting. The final two chapters provide guidance on how to take all these ideas and apply them to lead retrospectives.

The best advice I have for consuming this book is to really absorb the first three chapters and final two chapters. Much of the advice there will help immensely in leading a retrospective and many other meetings. Skim the next five chapters; find a few activities your team might connect best with and start using those. Revisit these chapters every so often to find new activities to keep your retrospectives fresh.

Even if you are not in charge of leading retrospectives or other agile meetings, the advice carries over to leading or participating in other meetings. Also, the retrospective process is a useful tool outside of software development. You can have a retrospective about anything -- your recent home renovation project, your car buying experience, your vacation plans. There are few titles I gush over and push onto people. This is one of them, a title I firmly believe should be on every developer's shelf.


Adventures in Two Factor Authentication

Off we go!

Another hack week was upon us, and I had a definite idea in mind. I wanted to add two factor authentication to our web application. Having enabled it on web sites I use regularly, I was curious to see how difficult it was to implement and learn about the best practices associated with it. A bare bones implementation turned out to be remarkably trivial.

Our application runs on PHP, and it took minimal Google-fu to determine the RobThree/TwoFactorAuth library was the best option to try first -- well maintained, great documentation. One quick

composer require RobThree/TwoFactorAuth

to pull the library in, and it was right into setting it up. First hurdle reached! Our authentication code needed some tender loving refactoring to be able to support having two factor authentication enabled for your login or not. Ah... the satisfaction of being able to clean code up including squashing a bug that has long irritated you...

Once the authentication code was cleaned up, actually adding two factor authentication was trivial. It took more time to refactor our existing code than it did to use the library to generate the shared secret, display the QR code for scanning, and verify the entered token against the shared secret.

Which thirds to invite to the party?

By default, the PHP library reaches out to an external Google service to generate the QR code. However, you can use a local library to generate it if security of sending the shared secret over the wire is a concern. Hrm... Should I be concerned about that? Should I use an external third party service for generating QR codes, or should I compose in another third party library into our application?

Did a bit of research and found no compelling argument either way. Thinking it through, let's say I use an external third party and send the shared secret to them. What is the attack surface? They could sniff and record the secret, but they get no other details. How would they tie it back and use it with an actual username? They would need access to our database or server to make that association. If they have that, the game is already up, and they likely don't care or need the secret anyway.

What other drawbacks might there be then? There are two that spring to mind. First is the need to make an HTTP request to the service to get the QR code back. You are now dependent on the service being up and the network being fast and stable between you and the service. Second is the (likely) closed nature of the service; you have no way of vetting the code to make sure there are no shenanigans behind the curtain.

Given all of that, I stuck with the default Google service. Google knows just a little bit about keeping a service up and running, and their QR code service has likely been battle tested as much if not more than any other third party library I could use locally. However, the RobThree documentation provides an easy example of using a local library should the need or desire to switch arise.

Drifting away...

It was demo day; time to practice before presenting my work to the rest of the team! I'll just plug in my username and password... Type in the token from my device, and... Wait... Why am I not logged in? What did I break with my last commit?! All my work is for naught! I'm a frauuuud!

OK... OK... calm down... There must be a logical explanation. What would make a token work yesterday but not today? This is a time based protocol... My phone is setup to automatically keep my time correct. Let's check the time on my server... AHA! My local vagrant instance is not running any time synchronization and has decided to drift away by about an hour. One quick NTP query to fix that, and... We're back in business! Whew... Crisis averted; demo proceeds without issue.

I've learned an important lesson about TOTP. Make sure your server is automatically keeping the time correct!

Dude... Where's my phone?

If you are like me, you have used two factor authentication on other sites and have come across "recovery codes". These are meant to be written down and stored offline; if you were to ever lose the device that generated the token for the site, you could enter one of these codes to complete authentication and establish the shared secret on a new device. While I had an idea in mind of how this could be implemented, I wanted to see if there were any established best practices people follow. How many codes should be generated? How should you generate each code? What is the best way to securely store them?

After some digging, I found... nothing. I've seen sites provide anywhere from five to ten codes with the codes varying in length. I have yet to implement this feature but am leaning to at least generating five codes. However I end up generating them, I will store them exactly like passwords -- as encrypted salted hashes.

I need more...

One other flow I have encountered with other two factor enabled sites is the need to input multiple tokens if the server seems to think your token is out of sync. Again, I could find no information on best practices. How many tries should you give the user before entering this flow? How many tokens should you ask for? Does this really add any security value? For now, I am unlikely to implement this flow. While some big names use it, I'm not seeing much benefit from this flow versus just allowing the user to keep entering a single token.

That's all folks!

The RobThree library made it easy to add two factor authentication to our application. Follow the examples in the documentation, and you'll be up and running in no time. However, there seem to be no best practices (none I could easily surface at least) around the items like recovery codes and "out of sync... give us more than one token to proceed". Did I miss anything? Have any best practices from your own implementations? Sound off in the comments and let me know.