On collision course with Cloud Firestore

in #firebase6 years ago

In Cloud Firestore, you can only update a single document about once per second, which might be too low for some high-traffic applications.

Note: this article was originally posted by me on Medium. This is a retro-fitted start of cross posting.

This is the opening line for a concept called “distributed counters” in the Cloud Firestore documentation, which goes on to explain a fairly complicated random distribution of shard counters to reduce the probability of collisions upon document write. But “reduce the probability” are the words to pay attention to.

So, what’s the problem here? What happens if you ignore the warnings and decide to try your luck?

Let’s find out.

This example will use Firebase cloud functions to simulate aggressively simultaneous actors to our highly popular (and also fictive) movie review application.

const numReviews = 20;
const promises = [];
for (let i = 0; i < numReviews; i++) {
    const numStars = Math.trunc(Math.random() * 5) + 1;
    console.log(`#${i} creating a ${numStars} star review.`);
    promises.push(admin.firestore().collection("movies").doc("tt0081748").collection("reviews").add({
        name: `Jane Deer #${i}`,
        stars: numStars,
        comment: "Meh"
    }));
}

await Promise.all(promises);

Just do it

First we’ll make a naive example of just boldly going for it. No transactions, no regrets.

export const countReviews = functions.firestore.document("movies/{movieId}/reviews/{reviewId}").onCreate(async (event) => {
    console.log(`Got a ${event.data.data().stars} star review`);

    const review = event.data.data();
    const movieRef = admin.firestore().collection("movies").doc(event.params.movieId);
    const movie = (await movieRef.get()).data();

    movie.numReviews += 1;
    movie.totalReviewScore += review.stars;
    movie.averageScore = (movie.totalReviewScore/movie.numReviews).toPrecision(3);
    
    console.log(`[${event.data.id}] Got a ${review.stars} star review from ${review.name} (now ${movie.numReviews} total reviews)`);

    return movieRef.update(movie);
});

The cloud function is triggered on each new review, counting and aggregating the number of reviews to get an average review score. But it’s not hard to imagine what kind of problems that will arise as soon as the pace of writes are picking up.

Firebase cloud functions log

As expected, we can see that there are several simultaneous cloud function executions that all read an old state of the counter, hence creating inconsistency in the counter. We did indeed fail to count all reviews.

Created 20 reviews, only counted 15

Transactions

Perhaps we’re having better luck if we’re wrapping it in a transaction so that the operation will retry until we can get our write operation through?

export const countReviews = functions.firestore.document("movies/{movieId}/reviews/{reviewId}").onCreate(async (event) => {
    console.log(`Got a ${event.data.data().stars} star review`);

    const review = event.data.data();
    const movieRef = admin.firestore().collection("movies").doc(event.params.movieId);
    
    admin.firestore().runTransaction(async transaction => {
        const movie = (await transaction.get(movieRef)).data();

        movie.numReviews += 1;
        movie.totalReviewScore += review.stars;
        movie.averageScore = (movie.totalReviewScore/movie.numReviews).toPrecision(3);
        
        console.log(`[${event.data.id}] Got a ${review.stars} star review from ${review.name} (now ${movie.numReviews} total reviews)`);

        return transaction.update(movieRef, movie);
    });
});

Unfortunately, as you can see in the logs below, the operation is indeed retrying as we want and expect (see the highlighted #16). But the result is less satisfying as it pushes the Firestore to a congested state where an exception is thrown.

Dirty writes on transactions are retried, but ends up causing “too much contention”

Distributed counters

It’s starting to be clear that we can rule out any simple and magical solution for this. The distributed counter is starting to sound like a good idea to try out. The Google Cloud Datastore (as in Google App Engine) is the very same technology on which Firestore is build upon, and the shard counter concept has been around for quite a long time. Long before Firebase realtime database existed.

Using Firebase Realtime Database for counting and aggregating Firestore data

Cloud Firestore is sitting conveniently close to the Firebase Realtime Database, and the two are easily available to use, mix and match within an application. You can freely choose to store data in both places for your project, if that serves your needs.

So, why not use the Realtime database for one of its strengths: to manage fast data streams from distributed clients. Which is the one problem that arises when trying to aggregate and count data in the Firestore.
First we add a little helper cloud function that scaffolds a default state of aggregated counter and average review score. Please note that this is written to the realtime Database.

export const createMovieReviewAggregate = functions.firestore.document("movies/{movieId}").onCreate(event => {
    return admin.database().ref("movieRevies").child(event.params.movieId).set({
        numReviews: 0,
        totalReviewScore: 0,
        averageScore: 0
    });
});

After that, we’ll modify the cloud function that was making a transaction on the Firestore, above. Change the transaction to run on the realtime database instead.

export const countReviews = functions.firestore.document("movies/{movieId}/reviews/{reviewId}").onCreate(async (event) => {
    console.log(`Got a ${event.data.data().stars} star review`);

    const review = event.data.data();

    admin.database().ref("movieRevies").child(event.params.movieId).transaction(movie => {
        if (!movie) {
            return movie;
        }

        console.log(`[${event.data.id}] Got a ${review.stars} star review from ${review.name} (now ${movie.numReviews} total reviews)`);
    
        movie.numReviews += 1;
        movie.totalReviewScore += review.stars;
        movie.averageScore = (movie.totalReviewScore/movie.numReviews).toPrecision(3);
        return movie;
    });
});

Once again, pay attention to that we’re running the transaction against the realtime database to handle the high volume of writes better. And as you can see in the log below, the RTDB handles it with grace.

Dirty writes on transactions are retried, and look at the absence of errors

The use of realtime database in this case is much easier to manage and setup than the shard counter. And we also don’t need to worry about having to “reduce the probability” of document write collisions. We simply leave that to the RTDB implementation, which is designed to handle exactly this kind of high pace concurrent writes.

Reader’s exercise

For further exercise, you might want to find a way to mirror the counters and aggregated values back to Firestore in a controlled way. I’d be happy to see any actual implementation that uses this method and also mirrors the values back to a Firestore document in a smart way.

Sort:  

Metode testingnya keren... Aku baru belajar cloud function, bisa gampang ngerti kodenya. Terima kasih dennis. Pasti akan banyak ketemu kasus seperti ini nantinya.

Congratulations @dennisalund! You have completed some achievement on Steemit and have been rewarded with new badge(s) :

You made your First Vote
You published your First Post
You got a First Vote

Click on any badge to view your own Board of Honor on SteemitBoard.
For more information about SteemitBoard, click here

If you no longer want to receive notifications, reply to this comment with the word STOP

By upvoting this notification, you can help all Steemit users. Learn how here!

Coin Marketplace

STEEM 0.29
TRX 0.11
JST 0.031
BTC 68296.35
ETH 3839.24
USDT 1.00
SBD 3.64