Athena vs Q: Battle of the Data Parsers

What are we fighting for?

Here at Traackr, we pull together data from a variety of sources and visualize it in various forms -- bar charts, timelines, etc. Typically, this data comes from APIs in the form of JSON or XML, formats most developers and their tools are well equipped to handle. Now, our team has been tasked with a new data source in the format of... CSV... Artisanally crafted and delivered by our data team... Oh, joy... We will be getting these on a regular basis, and they are not small files.

The APIs we use typically have an interface to search through the data and return what we need. CSV files are just lumps of text formatted into columns. What we need is some way to easily query that data. We could roll our own solution to parse the files and store them away in MySQL, MongoDB, etc. However, someone must have invented this wheel already. Surely, there must be a solution we could leverage rather than building our own process and infrastructure.

Here comes a new challenger...

Traackr uses a variety of Amazon services, and Amazon's services are legion. I began searching through their list and found an interesting item... Amazon Athena. The one line service description was "Query data in S3 using SQL". Intriguing... Let's read on...

" with a variety of standard data formats, including CSV..." What?! You're telling me our data team could dump the CSV files into a bucket, and we could query it using a language we are very familiar with. That sounds too good to be true! I must play with this... Now...

A quick chat with the keeper of our Amazon access got the ball rolling to be able to access Athena and start tinkering. While discussing the use case with them, they mentioned they use a similar tool locally called q. Excellent! I now have two combatants to duke it out in the arena and see what solution might best fit our needs. The initial battle is over; time to break down the fight blow by blow.

Ease and cost of setup

Setting up q is a straightforward process of installing the package through your operating system's package manager. I use OS X myself making it a quick call to brew. It is also a free open source tool, so no out of pocket expenses incurred. After installation, I am ready to run queries against CSV files on my drive.

If you have ever worked with any Amazon service, you'll know setting something up is never quite that simple. Athena needs a bucket in S3 to house the CSV files. Right... Not a huge time consuming task, but that storage will add to our monthly costs. Bucket created... Some folders under the bucket created to separate the CSV files by type... CSV files uploaded... Now what?

Next, you have to create a database in Athena. There is a decent UI for doing this, so not a huge time hurdle there. A database has to have some tables to query; Athena creates tables by defining what S3 bucket the source data is held and what data types and names you want for each column in the data. Some of the non-CSV data sources have ways to auto-discover and build the tables for you.

Unfortunately, there was no auto-generation I could find for CSV files. The UI for building the tables manually is decent but a bit clunky. However, once you have built one table, you can use the UI to issue a command to get the CREATE TABLE syntax used to create it. Armed with this, you can easily modify the query for other tables and issue it directly. With my tables built, what then will Amazon charge me for using this service? Turns out the pricing is US$5 per terabyte of data scanned. Amazon has recommendations on how you can partition and compress your data to reduce the amount of scanning. Our CSV files are large but not large enough to likely push us over that price point too often. Time will tell...

There is one additional cost to Athena that is not readily apparent though. When you query data, Athena creates and stores two files -- one is a CSV file with the results of the query and the other is a file with metadata about the query. These files are stored in another S3 bucket (you can configure where that bucket lives or just have it use an auto-generated bucket) meaning you will be paying storage fees beyond the fees for storing your source data. We'll likely throw a caching layer in front of our use of Athena if we move forward with it, as we do with other API sources. Still, something we need to keep in mind.

Winner: q takes the opening round for being quick to install and completely free.

Current Score: q 1 - Athena 0

Show me how you set a table!

We've already shown how to build a table in Athena. The nice bit is adding data to that table is as easy as uploading a new CSV file into the S3 bucket. That's it! Any query you issue against that table is going to pull from all files in the bucket. No need for any "query against these things" logic beyond the standard "issue a query against this table" SQL syntax.

q, however, is a command line tool requiring you to somehow tell it what files to operate on. This is done in the FROM portion of the query you issue:

SELECT * FROM /path/to/file1.csv+/path/to/file2.csv+... WHERE...

Hmm... It would be easy enough to build a string in the proper format by globbing all the files out of directory. However, this syntax raises a question. The list of files will do nothing but grow. How long before we reach some odd limit on the length of the command you can issue? That's the sort of question that tends to be answered around 2AM when your most important customer in Europe needs the data this would provide ASAP. We could build in logic to issue multiple commands and then massage the data together. Ick... Sounds like a big ball of complex mud filled with nails...

Winner: Athena evens things up by having tables grow simply by uploading a new file.

Current Score: q 1 - Athena 1

I have questions... How do I get my answers?

Both Athena and q boast the ability to query the CSV files using SQL, and they both do exactly that. I can issue this SQL against either and get the same data back:

SELECT first_name, last_name, location
FROM users
WHERE first_name = 'Odin'

The only initial difference is the table name for q is a file path. q supports the SQLite syntax. Athena is based on Presto but does not support all of Presto's features. However, the list of what is supported seems sufficient for our needs.

One item did stand out though. I can use LIMIT and OFFSET with q to get pages of data. Athena supports LIMIT but not OFFSET. After some research, this appears to be due to the architectural and conceptual mindsets behind the Presto engine. You get back all the data in one go due to how it parses things.

Winner: Too close to call... Both satisfy the SQL needs to pull the data from the files. The loss of OFFSET in Athena is not a significant blow. Call it a draw.

Current Score: q 2 - Athena 2

I want to change the design of my table

We all know data formats never, ever change. #sarcasm If the data were to change, how easy is it to alter your tables accordingly? q has a straightforward answer. It simply doesn't care. It reads the header row from your CSV file and uses that information for column names. Just adjust the column names in your query and you are good to go.

Athena handles this differently. We had to explicitly define the table structure in Athena. Each column in the table maps to a column in the CSV file in order. e.g., Column 0 in the table maps to column 0 in the CSV file, column 1 in the table to column 1 in the file, etc. Any alteration in the file or alteration you wish to make to the table itself -- new columns added to the file, you want to rename a column in the table -- require dropping the table and recreating it.

Recall the Athena UI has a way to get the CREATE TABLE syntax used to build a table. You would need to get that statement, alter it, drop the existing table, and then issue the new command. The upside is you can drop the table entirely without worrying about the data. It is stored in S3, and the table is "repopulated" once you create it again.

What happens though if your CSV files have no header row defining the column names? Athena has a slight edge here given it doesn't need a header row. You define column X in the file is column Y in my table with the name I choose. q can handle not having a header row, but the syntax for column references becomes numeric:

SELECT c1, c2, c3
FROM users
WHERE c2 = 'Odinson'

Winner: Again, too close to call... You are going to have to update something with either tool when the table structure changes. Neither the drop / recreate pattern in Athena nor q's numeric pattern for handling no header row in the data seem to sway the judges either way. Another draw...

Current Score: q 3 - Athena 3


Just how fast are these tools? It is difficult to make a true comparison here. I do not have enough files to work with currently to say how they perform at scale. I also have yet to throw the actual SQL statements we would need (they are still being forged in the dark places) to use against both tools. q certainly feels slightly faster, but nothing about Athena's performance has given me concerns yet.

Winner: Not enough data to go on. No points awarded.

Current Score: q 3 - Athena 3

How would I integrate them?

The final and most important round... If I were to use this tool, how would I go about integrating it into our stack and workflow? Either tool requires some interface for our data team to upload new files. They don't care where the files end up; they just need to ensure they are available for use. The difference between having a web form dump those files into S3 or a disk elsewhere is marginal.

q is a command line tool, so it would not be terribly difficult to issue a system call and get the results back. The results returned though are just the lines directly from the files. We would have to do some additional parsing after the fact to break things down into named bits. Hrm... That means carrying the logic of the row result structure around and needing to alter it if the data format ever changes. Combine that with the concern on having to find and concatenate file paths and the length of said file paths on the command line execution... I think q may be on the ropes.

Amazon provides a SDK for most of the frequently used programming languages, and the ones I would need have support for using Athena. Query results are just an API call away via the SDK, and they come back in an array of data. The format of that array is interesting.

Aws\Result Object
    [data:Aws\Result:private] => Array
            [UpdateCount] => 0
            [ResultSet] => Array
                    [Rows] => Array
                            [0] => Array
                                    [Data] => Array
                                            [0] => Array
                                                    [VarCharValue] => first_name

                                            [1] => Array
                                                    [VarCharValue] => last_name

                                            [2] => Array
                                                    [VarCharValue] => location
                            [1] => Array
                                    [Data] => Array
                                            [0] => Array
                                                    [VarCharValue] => Thor

                                            [1] => Array
                                                    [VarCharValue] => Odinson

                                            [3] => Array
                                                    [VarCharValue] => Asgard

The first row in the data set is essentially the table definition with subsequent rows being the actual data. While we don't get back the data in a typical SQL-esque fashion -- each entry in the array is an associative array with the key being column name, we could easily build a parser / formatter to take that first row of data and use it to return a similar structure. That is one bit of coding that can handle any data changes down the line.

Winner: Athena lands a solid blow to q's gut by not needing to worry about having to update code any time the data format changes.

Current Score: Athena 4 - q 3

Here is your winner...

Your winner... Athena (Battlestar Galactica)

Athena wins with a decisive knock out blow in the final round. While you do have some operating expenses in its use of S3 and scanning the data, Athena provides everything we need to process CSV data and serve it up to our customers. Our developers can query the data in SQL like they would with any other data source through the SDK API calls. Given it can parse other file formats, this is definitely a tool I can see reaching for more frequently.

Book Review: SQL Antipatterns

SQL is an interesting yet temperamental tool in the programmer's kit. You can use it to find amazing insights in your data, but you can watch a poorly formed query grind your application to a halt. SQL Anitpatterns wants to help you avoid the latter by identifying common pitfalls when designing and using databases.

The book identifies twenty-four antipatterns and breaks each down in the same formula -- an objective one is trying to achieve, the antipattern typically used as a solution, how to spot this antipattern, the legitimate uses of the antipattern, and alternative solutions. The author introduces each antipattern with a brief example scenario followed by a summary of the objective people are typically trying to achieve.

Next, examples of the antipattern are given with an explanation of why you would not want to do this. Even with examples, the author provides other ways to recognize the antipattern; most of these are questions you are likely to ask or hear colleagues ask. If you hear or ask "How do I do X", that would be a good opportunity to pull this book off the shelf, give it a quick skim, and possibly avoid a pitfall.

However, the author recognizes there are legitimate uses of each antipattern and presents them before the solutions to avoid it. This allows you to decide if this is a pitfall to avoid or a necessary evil in your application. Finally, the appropriate solutions are presented with attention given to how the solutions might differ between database vendors.

Easy to read and digest, this is the worthy reference book to add to your shelf. As you extend or update a database and related application code, you will find yourself reaching for it to reduce the likelihood of being paged at night by a misbehaving database.

Photo by Karolina Szczur on Unsplash

Fresh Baked Roles with Zend Permissions RBAC

As our Traackr application has grown, so has our need to restrict access to features or pieces of features. Recently, our search feature added new filters to use. However, some of these filters are only meant to be available to enterprise customers. This was not a new problem for us; our analytics feature added a new report type, one only enterprise customers should have.

Feature flags were our initial solution to both efforts. This worked well enough for our analytics use case. Our customer success team attaches an account to the feature flag, and the account can view the new report. Cracks started showing in this approach when applied to our search effort.

Let's say we added four new filters to search. That is four new feature flags to create, not a terrible burden from a coding standpoint. But our poor customer success team... They now have to go account by account and flip four new switches to make sure the account has access to the search filters granted to them by their subscription. Yuck...

Role-based Access Control to the Rescue!

During one of our hack weeks, I had tinkered with the Zend Permissions RBAC package without a solid use case in mind. It was just an interesting concept that crossed my path at the time. Now, it seemed a perfect use case for it was in front of me. Before we dive into details, we should first answer an important question.

What is role-based access control (RBAC)?

A standard access control list (ACL) focuses on objects and who can act on them. Your computer's file system is an example of an ACL. Using a Linux file system, each file and directory is an object. The ACL for a file contains three entities -- the owner of the object, the group associated with the object, and everyone else. A feature in our application would be an object, and the feature flag ended up being the ACL attached to it.

The statement ACL makes is "This object is associated with these people, and each person can access the object in specific ways". Phrased in the context of our application, "this feature can be accessed by the following accounts."

How does RBAC differ then? I'll let the package's documentation make the initial introduction:

RBAC differs from access control lists (ACL) by putting the emphasis on roles and their permissions rather than objects (resources).

RBAC makes the statement "These people are associated with these roles. These roles grant them these permissions." Again, in the context of our application, the statement becomes "This account is associated with this role. This role allows them access to these features." This concept has an immediate benefit for us. Accounts can be granted a specific role. As we add a new feature, we update the role to have new permissions for the feature, and every account is automatically able to use the feature. No longer do we need to edit accounts individually.

Baking up our roles

Let's start building this out! We first need to define our roles and the permissions associated with them. Here is a stripped down version of our class to manage that:

Our "Enterprise Subscription Package" starts with a parent role representing just the subscription for the account. It contains no feature specific permissions and would contain more general permissions instead -- number of users allowed in the account, number of concurrent logins, quotas, etc. A child "Search" role is added to the subscription role. Within the "Search" role we grant permission (search) to use the search feature and grant permission to use the topic (search.topics), brand (search.brands), and location (search.locations) filters. The dot notation is our own convention for role and permissions names. Names can be any string, but this gives us a visual hierarchy of the pieces of a feature.

May I?

When a customer logs into our site, we check the subscription attached to their account and attach the appropriate Role object to their session. Our search page must now check they can access the page, and if so, what filters they should see.

That smells good...

The sample checks above may not seem much different than what you would do with features flags. We are just replacing "are they on this feature flag" with "do they have this role and permission". The benefit here is in the ease of maintenance, especially for managing accounts and features. Using RBAC, a new feature is added to a role, that role attached to a subscription, and that role addition / subscription update is shipped with the feature. Every account tied to that subscription has immediate access to the feature. Our customer success team no longer has to take time to turn the feature on for an account.

RBAC has also made it easy to extend the system even further. As an example, we built an override and exclusion system on top of it. If a customer abuses the location search feature, we can shut it off just for them. Need an incentive for a customer to upgrade? We can override the permissions to allow them access to a feature for a short time.

We will be migrating our existing feature flagged based items to RBAC. Using it has also sparked ways to improve other areas of our code, particularly user based permissions. Our code and process has been much more manageable since we baked RBAC into it. But don't take my word; give the recipe a try for yourself.

Yoda conditionals... Change my mind I should?

I had another blog topic in mind but ran across the article "The Mistakes I Made As a Beginner Programmer". The entire article is worth reading, but this item leaped off the page at me:

... avoid Yoda conditions and assignments within conditionals.

I have encountered this statement in other articles and in the coding standards for various projects. Reduced readability is the criticism typically levied against the pattern, though they seldom elaborate on why readability suffers. I picked up the habit of using Yoda conditionals from a former mentor, but it has never been a practice I strongly advocated using. Perhaps, it is time to reexamine if there is any value in using it at all.

What exactly is a "Yoda conditional"? Rather than saying this

a Yoda conditional reverses the expression:

It is a coding practice where you always phrase a conditional statement with the "constant" portion, 66 in this example, first. The goal of the practice is to avoid errors where you unintentionally perform variable assignment inside the conditional rather than making an actual comparison.

The first conditional will always pass because variable assignment results in a truth-y value; your clone troopers will exterminate the Jedi on every order given. Having Yoda on the field though causes an immediate disturbance in the Force. Hidden this subtle bug is not.

There is a clear benefit here, but is it enough to trump the benefits of readable code? Is there truly a problem with it being less readable? On the surface, there appears to be no issue. You see the "if" and know some sort of comparison will be made. What does it matter if 66 is listed first?

Our brains do not read code the way a machine does. It makes no difference to the machine what the order is. However, say the conditional out loud. "If sixty-six equals the order variable." "If the order variable equals sixty-six." The latter falls into more familiar patterns of speech. Reading one slightly awkward statement may not be a big deal, but try reading more and more of them while debugging a large code base. It will tax your mental capacity.

My development environment also includes a bevy of code sniffers and linters. They constantly remind me to use the triple equals for equality checks along with warnings when assigning variables inside a conditional. I am less likely to fat finger away an equal sign and end up with an assignment rather than a conditional check.

So... Hrm... Yeah, I think I finally agree; the benefit of using Yoda conditionals is not sufficient enough compared against code readability. I would much rather have my code be easy for people to read and understand. No longer use this pattern will I. Wait... Apologies... I will no longer use this pattern. There, that is much better.

A pipeline for cleaning up technical debt?

Traackr's customer facing application is built on top of CakePHP and has accumulated its fair share of technical debt over time. One particular piece of debt weighed on my mind. Our application fetches a core set of data from an API and then augments it with additional data our customers have entered. A single method handles augmenting the initial data through a series of transformations. This beast of a method cannot be tested. There are far too many pieces going on to write maintainable tests. It was long overdue for a refactor, and there was a particular pattern and accompanying package I wanted to try out.

A Pipeline for Success?

I decided to use the Pipeline library from The League of Extraordinary Packages for a few reasons. First, the pattern lends itself to writing code that does one thing and one thing very well. Here is a sample pipe:

A pipe is invoked and passed a single parameter, a payload of data. You act upon that payload and then return it for the next pipe to use or to be used as the final output of the entire pipeline. Given you can chain any number of pipes together, your logic can be focused on performing a single task. This in turn leads to being able to write and maintain unit tests for a pipe.

Second, it introduces a level of reuse the beast method could not. Each pipe is a reusable element we can chain together in any fashion. If we only needed to augment the data with a small subset of our customer data, this could easily be done by putting together a different pipeline. For example, here is our entire augmentation pipeline:

Just need to augment the data with notes and projects? Simple:

Small, easy to read classes... Classes that are easy to test and maintain... Easy to reuse items in a flexible way... Yes, please!

How's It Flowin'?

Excitement must be tempered with reality though. Yes, the beast may be bloated, but it does run very fast. Would this pipeline run just as fast? The initial results were... not promising. The augmentation was not horribly slow, but it was noticeably slower. Timing logic revealed the beast ran under half a second on average, while the pipeline was running a full second on average.

Why is this approach slower? I cannot provide concrete answers, but I have a theory. Notice the pipes in our full pipeline take in a number of arguments like $this->InfData or $this->TagUtil. These are CakePHP component and model classes. My best guess is there is some overhead in having these items passed around into the pipe classes. The beast does not suffer from this overhead because the framework has already bootstrapped those classes and made it available.

So... Now what?

While the execution time of the pipeline makes using it impractical, it was worth the effort to go through the exercise. Yes, we may not be able to use these pipe classes, but they and their tests can easily be ported back into CakePHP component methods. We will still get the benefit of small, easy to test, and reusable units of code.

This library may not be used in conjunction with our CakePHP code base, but it is certainly a tool I will reach for outside of that context. More importantly, using it reinforces the mind set of writing small, testable code. And that, more than any library or tool, is the best pipeline towards reducing your technical debt.

Book Review: Agile Retrospectives

Agile Retrospectives: Making Good Teams Great*cough* *cough* Goodness... It's a bit dusty around here. Nearly a year of silence will do that. One of the goals our team had last year was to blog more. This effort failed. But, why did it fail? How do we determine what went wrong and what actions to take to ensure we do better going forward? There happens to be a handy tool in the agile / scrum toolkit known as the "retrospective". A mini retrospective of what happened with our blogging might look like this.

What did we do well? While we did not generate much content, the content we did generate was something the writer was passionate about. What could we have done better? We could have encouraged the team to write about anything. It does not need to be a long, involved post. Just write about something that interested you. How can we improve things going forward? Have someone leading by example and being a resource to cajole ask people to provide more content in addition to theirs.

This is a simple example from my own mind. Things become more complex when trying to ask these questions from an entire team. The most invaluable resource I have found for learning how to lead retrospectives is Agile Retrospectives: Making Good Teams Great. This is typically one of the books I regularly gift to co-workers.

The book is only 142 pages long but contains a wealth of information and tools to use. It is organized into three sections. First, three chapters are dedicated to explaining how to prepare to lead a retrospective, how to tailor it to your team, and the actual process of the retrospective meeting. The next five chapters outline various activities you could use for each step of the meeting. The final two chapters provide guidance on how to take all these ideas and apply them to lead retrospectives.

The best advice I have for consuming this book is to really absorb the first three chapters and final two chapters. Much of the advice there will help immensely in leading a retrospective and many other meetings. Skim the next five chapters; find a few activities your team might connect best with and start using those. Revisit these chapters every so often to find new activities to keep your retrospectives fresh.

Even if you are not in charge of leading retrospectives or other agile meetings, the advice carries over to leading or participating in other meetings. Also, the retrospective process is a useful tool outside of software development. You can have a retrospective about anything -- your recent home renovation project, your car buying experience, your vacation plans. There are few titles I gush over and push onto people. This is one of them, a title I firmly believe should be on every developer's shelf.

Adventures in Two Factor Authentication

Off we go!

Another hack week was upon us, and I had a definite idea in mind. I wanted to add two factor authentication to our web application. Having enabled it on web sites I use regularly, I was curious to see how difficult it was to implement and learn about the best practices associated with it. A bare bones implementation turned out to be remarkably trivial.

Our application runs on PHP, and it took minimal Google-fu to determine the RobThree/TwoFactorAuth library was the best option to try first -- well maintained, great documentation. One quick

composer require RobThree/TwoFactorAuth

to pull the library in, and it was right into setting it up. First hurdle reached! Our authentication code needed some tender loving refactoring to be able to support having two factor authentication enabled for your login or not. Ah... the satisfaction of being able to clean code up including squashing a bug that has long irritated you...

Once the authentication code was cleaned up, actually adding two factor authentication was trivial. It took more time to refactor our existing code than it did to use the library to generate the shared secret, display the QR code for scanning, and verify the entered token against the shared secret.

Which thirds to invite to the party?

By default, the PHP library reaches out to an external Google service to generate the QR code. However, you can use a local library to generate it if security of sending the shared secret over the wire is a concern. Hrm... Should I be concerned about that? Should I use an external third party service for generating QR codes, or should I compose in another third party library into our application?

Did a bit of research and found no compelling argument either way. Thinking it through, let's say I use an external third party and send the shared secret to them. What is the attack surface? They could sniff and record the secret, but they get no other details. How would they tie it back and use it with an actual username? They would need access to our database or server to make that association. If they have that, the game is already up, and they likely don't care or need the secret anyway.

What other drawbacks might there be then? There are two that spring to mind. First is the need to make an HTTP request to the service to get the QR code back. You are now dependent on the service being up and the network being fast and stable between you and the service. Second is the (likely) closed nature of the service; you have no way of vetting the code to make sure there are no shenanigans behind the curtain.

Given all of that, I stuck with the default Google service. Google knows just a little bit about keeping a service up and running, and their QR code service has likely been battle tested as much if not more than any other third party library I could use locally. However, the RobThree documentation provides an easy example of using a local library should the need or desire to switch arise.

Drifting away...

It was demo day; time to practice before presenting my work to the rest of the team! I'll just plug in my username and password... Type in the token from my device, and... Wait... Why am I not logged in? What did I break with my last commit?! All my work is for naught! I'm a frauuuud!

OK... OK... calm down... There must be a logical explanation. What would make a token work yesterday but not today? This is a time based protocol... My phone is setup to automatically keep my time correct. Let's check the time on my server... AHA! My local vagrant instance is not running any time synchronization and has decided to drift away by about an hour. One quick NTP query to fix that, and... We're back in business! Whew... Crisis averted; demo proceeds without issue.

I've learned an important lesson about TOTP. Make sure your server is automatically keeping the time correct!

Dude... Where's my phone?

If you are like me, you have used two factor authentication on other sites and have come across "recovery codes". These are meant to be written down and stored offline; if you were to ever lose the device that generated the token for the site, you could enter one of these codes to complete authentication and establish the shared secret on a new device. While I had an idea in mind of how this could be implemented, I wanted to see if there were any established best practices people follow. How many codes should be generated? How should you generate each code? What is the best way to securely store them?

After some digging, I found... nothing. I've seen sites provide anywhere from five to ten codes with the codes varying in length. I have yet to implement this feature but am leaning to at least generating five codes. However I end up generating them, I will store them exactly like passwords -- as encrypted salted hashes.

I need more...

One other flow I have encountered with other two factor enabled sites is the need to input multiple tokens if the server seems to think your token is out of sync. Again, I could find no information on best practices. How many tries should you give the user before entering this flow? How many tokens should you ask for? Does this really add any security value? For now, I am unlikely to implement this flow. While some big names use it, I'm not seeing much benefit from this flow versus just allowing the user to keep entering a single token.

That's all folks!

The RobThree library made it easy to add two factor authentication to our application. Follow the examples in the documentation, and you'll be up and running in no time. However, there seem to be no best practices (none I could easily surface at least) around the items like recovery codes and "out of sync... give us more than one token to proceed". Did I miss anything? Have any best practices from your own implementations? Sound off in the comments and let me know.