Postmortem Sitemetrics

During the last month, I’ve been working side to side with my friend and coworker Lucas Contreras on a web application. In this post, I will try to highlight both the things that went great and the things that went not-so-great.

The problem.

Everything we do on our daily lives arises from a problem we want to solve. This isn’t less true in the field of software development.

One day, I was finishing another project of my own, when Lucas asked me if I wanted to be part of a brand-new monitoring project. The very same was the pinnacle of the SRE discipline. We were asked to monitor both latency and error budgets of a large scale e-commerce site. For what we must develop from scratch a fully automated dashboard that displays all the previously fed info. The very same aimed to replace an already existing tool that due to its lack of maintenance suddenly stopped working. We all know that things don’t just stop working from time to time for no reason but in this case, tracing the reason and fixing the tool would have been a waste of time. The thing is this old tool was written by a person that was no longer part of the team and the tool was rather difficult to understand in terms of both code and design.

As if the situation wasn’t complicated enough already, we were told that this tool was used by very important people with high ranges, fine suits, and expensive shoes. People whose decisions can determine the future of our company.

You might guess, seemed kind of unlikely that the two (not only not Junior but “Shadowing”) members of the team were in charge of such a task. But as you might have experienced, seniority is (sometimes) something that companies like to play with.

So… what do we do?

After a short meeting with the client, we got a bit (and just a bit) more knowledge on the whole thing. We were given three weeks to recreate the whole system that was working before but with added functionality.

We started discussing a plan that allowed us to get it done in time. The first idea was to use React to develop the frontend and make a RESTful API for the backend, but we ditched it rather quickly: the time constraint didn’t allow for us to use a technology we were inexperienced in.

In the end, we settled on using AdonisJS, a JavaScript full-stack fully-fledged web framework made for when you need to deliver software as fast as bread comes out of the oven. This full-stack monster brings packed a handful of solutions that made our development much easier and let us focus on the design of the application. Things like authentication, making the connection to de DB and integrating an ORM was not an issue, so the next step was to think about how we were going to get that data from the site (latency metrics and error budget). Luckily, the old tool used Splunk for that.

App Infrastructure + 3 microservices
Having this scheduled “Splunk reports” that could be easily integrated with our dashboard over Weebhooks, all that was left was to build a friendly REST API where Splunk could POST all this data.

After some faulty merge requests and pushes directly to master, we managed to have the dashboard set except for the login. The large-scale company for which we were developing this app for uses LDAP. Yes, LDAP for everything. Dealing with it is kind of a pain but not accidentally we have already abstract away that complexity by building an API for this LDAP instance (you can read this post on building a microservice stack for more info). This way, something that could have taken us weeks was solved in a matter of days. Nevertheless, it added one dependency to nothing but the authentication, which means that if the API is down for some reason we wouldn’t be able to access our app. Finally, we solved this by persisting the user login on the Dashboard database and updating (if needed) every time the users change their LDAP password.

Conclusion

The most practical solution is usually the best one. Especially when you have limited time to do something (and that’s always). Adonis allowed us to quickly develop a working solution that met all the requirements we were given.

Remember to check out Lucas Contreras blog, where he talks about more techy things.

Bibliography and technology used in this project


Author: Lucas Contreras

Leave a comment: Join the Reddit discussion