Scaling Crisp to The Next 100k Users

looking for tips on how to scale your startup at its early stages? This article is made for you.

Scaling Crisp to The Next 100k Users

May has been a tremendous month of growth for Crisp!

We have been open in beta over the last 9 months, working closely with a restricted user base whom is helping us improve the software quality on a daily basis, as well as giving us directions on which feature they need implemented.

We are now proud to say we host more than 2600 users on the platform, of different nature: startups, corporates as well as individuals and enthusiasts. The Crisp chatbox is now served to over 10 million users per month (a quarter tera-bytes of data served). This is marvelous for such a young project. We love you!

Attacking a market

Our team has been hard at work bootstrapping the service from the ground-up over the last few months. Now, it is time for us to look back at what we achieved, and reflect on it. We wish to be more transparent and communicate more about what we do. We will begin publishing regular blog posts on the matter.

What has been done so far?

1. Service hardening

Crisp has been started up from the ground up over the summer 2015 as a part-time project. We first focused on proving the idea that a modern chatbox could come as an upgrade to competitors.

We managed to get a basic chatbox service online, with minimum features, in early August 2015. We quickly focused on developing a great workflow that would support our long-term vision of delivering a free service to the masses, while developing some specific features (plugins) for paying users (eg: Slack integration).

We focused on a very small user base when we first began rolling out the very first chatbox codes (10 users on the alpha version!). We are very grateful for their feedback, which helped us envision the platform as it is today.

We quickly began focusing on improving platform scalability. Providing mostly free-tier accounts, we needed to ensure our infrastructure would support high loads for peanuts (in other words: a very low cost per served user). This will later enable us to sell plugins with a very competitive prices.

Minimizing this cost, under load, also means preventing most external load attack vectors. Denial of Service attacks (DOS), were prevented by using aggressive policies on our backends: we developed a by-IP and by-user network threshold system, which helps us prevent abuses and attacks. It is available as an open-source module for NodeJS: fast-ratelimit. fast-ratelimit protects our API systems, as well as our real-time messaging systems.

Speaking about attack protection, some attacks vectors which are harder to prevent by defining per-IP limits on our service (eg: Distributed Denial of Service attacks), were prevented by using CloudFlare, both as a Web assets, API and sockets proxy. CloudFlare serves not only as a security service, but also as a CDN (Content Delivery Network). It greatly helps improve chatbox loads performance on websites, by reducing network latency. They have a bunch of servers spread all over the world (in a little less than a hundred locations!), connected to anycast IP addresses. When you connect to Crisp, you connect to the nearest CloudFlare server on earth, which then connects to our servers in Netherlands (which is often not needed because it holds a cache). Since data cannot travel fast than light (unless we go for quantum entanglement networks), reducing the distance can help reducing the mean time it takes to connect from ~400 milliseconds to ~10 milliseconds (eg: without CloudFlare, it would take almost 400ms for a round-trip from San Francisco to our servers in Amsterdam, which is so slow than an human can perceive the delay).

CloudFlare Map (May 2015)

Of course, there are a massive amount of things that you do not see, which make up the skeleton of our platform: the backend systems. The largest amount of code we wrote composes the Crisp backend, that we call "relay".

The Crisp backend is fully written in JavaScript/NodeJS; more specifically in ECMAScript 6 (the latest and shiniest JavaScript standard).

Our backend embraces a distributed approach: we split functionality in what we call "nodes". To introduce it as a real life situation: you see human people interacting all together and exchanging messages from mouth to ear; similarly, in our infrastructure we have multiple "nodes" speaking together, in a distributed fashion via a message broker (RabbitMQ).

More importantly, each node is specialized at a task, and can be replicated to infinite. To put this in parallel to the business world, you have some people specialized in plumbing, others in electrical systems, truck drivers and so forth. A Crisp "node" can be specialized in handling messages, dispatching emails, managing users availability (online/offline states), and so on. This allows us to quickly update nodes without disturbing the service as a whole if something goes wrong. We push multiple updates a day, and you don't even get to perceive them! Moreover, if a specific node crashes, the service will continue seamlessly since this node is replicated.

2. Platform migration

The Crisp chatbox is directly served from Crisp servers, and can be installed on any website. All website are different in the nature and volume of traffic they receive (eg: one can get 100 visitors a day while others may get hundreds of thousands!).

We do not ask Crisp users to pay based on how much traffic their website have, we rather sell extra features that users can opt-in to and opt-out from freely. We call them plugins. This means we need to have a cost-efficient way to serve the same quality of service to anyone, no matter how much load they bring to the platform.

Initial platform structure

We initially built Crisp on the Firebase realtime database platform. Firebase is a great and effective service when you need to build great apps fast, and it definitely helped us move fast on the initial Crisp release. However, user data & account is tied to their servers (it's a SaaS platform, which we cannot run from our servers), and have a pricing model that don't fit our mostly free-tier based business model (which make us pay a lot for free Crisp users). Moreover, Firebase is a Google-owned US company which network is restricted by the US embargo (eg: we had users complaining they could not use Crisp from Iran, which is now possible!). On the top of that, we had a lot of small issues with Firebase: connecting to Firebase from our backend servers consumed a huge amount of system resources, which rapidly brought our infrastructure to its knees.

Thus, we decided 2 months ago to migrate from Firebase to something with a better fit to our model. This means building our own data storage systems, as well as realtime data serving systems. In fact, using Firebase nearly killed our service, thus we knew we had to move away from it, fast!

About the platform migration

Crisp has been successfully migrated to the new system over the weekend of 14 to 15 May 2016. It is now based on MongoDB, an API system plus a realtime system allowing messages to be pushed to the operator apps as they come through.

Some of you using the Crisp desktop apps may have had issues, and we sincerely apologize for that. Our team was really in the rush of getting things done fast (which we did), but we didn't communicate or plan the migration enough to offer a transparent transition for everyone. We ask to all people using an outdated version of the Crisp desktop apps and mobile apps to upgrade them (pre-May 2016 versions), which they can download at: Crisp apps download

How will the new platform change the life of our users?

The new platform will allow us to ensure our future growth, both in terms of user base growth (more server load!), as well as implement new features quicker. This means, that we will very soon begin rolling our our first plugins, as part of our public plugin platform.

More generally, Crisp users can now experience faster apps on the operator-side. Also, the Crisp desktop apps that are released as of now are now able to get update by themselves, so that you will never have to download them again to bump up to the latest version.

What did we learn from it?

Before, during and after migrating, we felt we missed some critical communication practices. We did not communicate enough with people about what will change for them, and what action they would need to take. We were too busy building the new platform and migrating it, as well as fixing a lot of bugs once it was online.

Learning from our mistakes, here are the actions we will take in the future for similar situations:

  • We will communicate on our blog about what will happen / what's happening
  • We will notify all users by email when we have something important to say / something that affects users
  • We will ensure migration steps and dates are clearly communicated for users, at least a week before the actual migration
  • We will do more thorough pre-migration tests to ensure less bugs are spotted once it is online for everyone (eg: we may do a restricted migration for a group of users; it's still good to test it in real-use conditions)

Apart from the actual migration, here are the actions we took as of now in order to be more transparent:

  • We published our public roadmap on Trello (so that you can follow and request feature development)
  • We created a release cycle for our desktop and mobile apps, avoiding too frequent releases that may break things and ensuring we publish quality code

What we plan for the next months

While working on the Crisp migration, we had one thing in mind: making our future growth possible. Indeed, the last 9 months enabled us to validate our initial market hypothesis: that a plugin-based chatbox system would spark a need for people.

Our goal is now to secure our business model and provide the paid features that are needed to support our growth and the free-tier part of our service. We promise: we will never, ever, compromise the free service tier to get more paid people.

In a snap, here is what the new system enables Crisp to support:

  1. Plugins (we will announce how the plugin system will work)
  2. Support much more traffic (scale to the next 100,000 users, baby!)
  3. Provide a public API (for your external/custom integrations)

To achieve that, we planned:

  • Incorporate as a business and raise money for the sake of growth (seed round)
  • Build a developers website (eg: docs.crisp.chat) to document our public API
  • Target for profitability as soon as possible, for the sake of stability
  • Prioritize on a strict roadmap, and enter in a high-focus mode to ensure we get things done fast

A little more about plugins:

  • When can we expect first plugins to be available? We will begin rolling out the first plugins over the month of July
  • What are the first plugins on your timeline? Slack, Zendesk, Salesforce, triggers and de-branding (remove the Crisp logo in the chatbox and emails)
  • What prices to expect? Around $15 a month per plugin, for unlimited use (eg: no limit on how much messages you can send from Slack using the Slack plugin)

Emphasis on the long-term vision

Here are some critical points on our long-term vision (which goes further away than our roadmap):

  1. The chatbox will stay free, forever
  2. The service will never be degraded for non-paying users
  3. Once a few plugins are out, a developers platform will be opened, as well as a plugins marketplace. Anyone will be able to build, and sell plugins on the marketplace
  4. We will continue investing efforts in security and safety (we know you want your chats to stay safe)
  5. A Crisp SDK for native mobile apps (iOS and Android) will be released, so that the Crisp chatbox can be used for in-app integrations

Thanks

So much love

From the start of the project, they are supporting us and enabling us to grow further; we'd like to thank those organizations:

  • TheFamily (France) - they provide the entrepreneurial infrastructure and connect us with relevant companies and people
  • DigitalOcean (USA) - we host all our infrastructure on their servers, with an unprecedented QoS (we only got a 1 hour downtime on a single server, which didn't break our distributed infrastructure)
  • SendGrid (Netherlands) - we used them to send message emails at scale (though we recently switched to a solution of our own that was a better fit for our custom needs)

Ready to improve your customer experience?