The perils of experimentation

Lincoln Anderson

—

August 10, 2022

‍

Well, it’s been a bit of a rough few days! Late last week I launched a couple of fun new features: The ability to share a “view only” link, and the ability to see who’s viewing your timer. The view-only link was simple enough to develop – mostly copying and pasting existing code. However, the ability to see your viewers was another story. I had opted for a quick solution that would “ping” our backend system with a timestamp for each user.

Oh, fiddlesticks.

Unfortunately, the approach that I used caused our lean prototype infrastructure to get swamped with traffic, and to go offline. And, since I didn’t have any robust guards in place to handle this sort of thing, there wasn’t any way for a user to tell what was going wrong, or for me to quickly resolve the issue. It was a few days of the prototype going up and down. There’s no two ways about it – it was a bonehead move on my part, and I didn’t have a great recovery path in place.

At the same time, I just happened to notice a couple of bugs caused by other experiments launched in the previous week, related to editing segments and resetting timers.

Right when all this happened, I was wrapping up a recap of the entire project so far, highlighting lessons learned, and raising questions about when is the right time to go from prototype to production. Getting smacked with all these issues certainly took me from a high point to a low one!

I’ll admit, I lost some confidence in the project this week. I can tolerate making mistakes. I can fix those and move on. The one thing I really don’t want to do is waste people’s time. If a user invests their energy in trying out my prototype, I want it to at least do what I intended. I don’t want to rely on users to find mistakes in my work. Ideally all of users’ feedback would be suggestions, and not bug reports.

Looking up

However, now that I’ve rolled back the buggy feature and hot-fixed the other defects, I can take a breath and look at the bigger picture a little more clearly.

I knew going into this project that I was trading design-time speed for traditional enterprise concerns like performance, accessibility, scalability and durability. I’m starting to feel the pains of that tradeoff, but they shouldn’t come as a surprise. If I had two go back and do it all over again, I wouldn’t change too much. But I would add a couple of things.

To-dos

A few best practices would have helped me avoid these gaffes, even in a scale-back form:

Graceful error handling. Even in professionally tested, enterprise-grade software, we gracefully handle errors. This means giving users some visual feedback that tells them the system is at fault, not them. It also may contain information that will help with fixing the error later. I had skipped this entirely in the prototype, but now I think a minimum of a root-level error handling is called for. No one using an Invision of Balsamiq prototype would ever dream that it could simply stop working altogether, and the same should be true of our prototyping setup.
Mission critical unit testing. Again, in my professional work I’m a huge fan of unit-testing any critical or complex functionality. For most apps, I don’t believe every line of code needs a unit test, but it’s not uncommon for there to be upwards of 20% coverage on the most important, most reused bits. In this experiment, I have done zero unit testing. I would amend that approach, and add unit tests for the most critical, core functionality – especially areas that have caused confusion in the past. For the timer, this would certainly include the event reducer. This is the component that takes in the entire history of the timer and determines its state (time remaining, playing/stopped state, etc.). If anything (and I mean any single thing) goes wrong with this procedure, it throws off the entire timer, resulting in incorrect time, missed events, broken controls, and more. In a timer app, you could tolerate lots of issues, but the timer itself absolutely needs to work correctly! Unit testing would have avoided at least 2 bugs I’ve encountered so far.
Server monitoring. This shouldn’t be a big concern with a simple prototype, but most hosting solutions have some kind of monitoring, and can even alert you when you’re receiving dangerous levels of traffic. I can’t really see a downside to setting up some kind of alert in this situation, even if you don’t ever foresee it being triggered. As any UX tester can tell you, it’s tough to get testers to dig into your app and provide meaningful feedback. You don’t want them to hit a wall during the precious window of time you have their attention.

At the end of the day, I think these steps will only add a small percentage to my overall effort, and will greatly increase my confidence in the prototype.

Finally I’d like to give a special thanks to any users who were tripped up by these issues and still came back! Your support is greatly appreciated.

« previous post next post »