Jan 30, 2018

GraphQL pitfalls

First of all, I love GraphQL and none of these pitfalls forced me to stop using it. Actually, I am trying hard to overcome these pitfalls by using it even more in order to find the best solutions and practices. But still, I’ve been faced with these challenging questions almost every day for more than a year and it makes me think.

Even though Facebook started working on GraphQL in 2012 and published it as open source in 2015 — it’s still unknown. How many companies do you know of that are using GraphQL in production? I meet a lot of people and even though they usually know that it exists, they have no idea what it actually is and why they should be using it. The same thing is actually happening even in our company. Most of my team-mates really like it (see 

Yuri Yakovlev‘s article on Server-side Subscriptions) but we live in a bubble and it seems impossible to convince our other colleagues to take GraphQL seriously. And this lack of interest leads to the first pitfall.

The barriers to entry are really high

GraphQL is hard. You need to learn a new syntax and understand new concepts. But even if you want to learn it proactively by building APIs, it’s not easy. GraphQL is very benevolent in what you can do and this’s why most of the first graph designs end up with REST API endpoints copy-pasted into GraphQL queries. They usually have exactly the same field names in the form of one level deep super long type. It’s not necessarily wrong — you can even benefit from such a query. But everyone is questioning GraphQL on this point.

One of the most common questions is — why not stick with REST? And the most interesting questions is: “So, is REST API dead?”.

No, it’s not. In fact, I really like what we do at Kiwi.com with our REST endpoints in GraphQL. We have a lot of REST APIs. And every REST is going to be a little bit weird after a few years because you cannot do dramatic BC breaks. This affects almost every REST API. Over the last few months, I’ve implemented around 20 different REST endpoints and it’s really painful. Different authentication and authorisation mechanisms, wrong HTTP status codes, different error formats, non-determinism.

Don’t get me wrong — I kind of understand why it’s like this but this is what we did to make it less painful for us (internally):

Instead of using them directly in mobile devices or React applications, we hide them in the GraphQL proxy. This greatly simplifies our clients. We don’t have to take care of all the weird APIs— we already did that in the GraphQL API — once. In all the clients we just use one mechanism to get all the necessary data.

The other good thing about this is the fact that the REST API guys can ask whether they can, for example, remove this one problematic or deprecated field (huge BC break) and we can provide the correct answer because we are measuring API usage and performance on the field level. GraphQL can evolve over time easily.

So yes, entry levels are high. Very high. But if you understand the true benefits you’ll be much keener to jump into it.

GraphQL is not production-ready by default

This one is tricky. Everyone tries to compare GraphQL to REST (I just did a few minutes ago) and that makes it even harder. The thing is that you can use GraphQL as it is but there will be a lot of questions asked and the proper answers (and implementation) are not that easy.

Two examples: queries themselves and server-side rendering (SSR).

Quick recap on how GraphQL works. You write queries on the client and you send them to the server. The server will respond with the data you asked for based on the query and it returns only this subset. Simple. There is one drawback, however. The queries are HUGE. It’s not like sending one GET request to the URL. You have to send kilobytes of text (the query itself) just to get what you need which is especially critical on mobile devices.

The solution is very simple — use persistent queries. However, you need to build this system first. Or use another proxy like the Apollo Engine (????).

And now server-side rendering. I asked my colleague whether they use proper SSR on one internal project. And his response was “No, because we don’t need it. Yet.” But I think the reason is a little bit different. We use SSR in all JS applications thanks to Next.js. So it’s weird that suddenly we don’t need it. I know that it’s just an internal project but the true reason they’re not using SSR is that it’s actually very hard to use. We could use the Apollo client there but we use Relay Modern there and well — good luck with that. Imagine that you could render queries on the server using Relay Modern by default. We would not even ask questions about SSR because it would already be as it should be.

Side note: we are using Relay Modern on mobile devices but in this case, it makes much more sense, because there is no such thing as server-side rendering. We are just fetching data from the already rendered client. I would personally choose Apollo to build a website application (even though I really love Relay Modern). The New York Times did it as well

In the end, it’s not that “GraphQL is not production-ready”. GraphQL itself is more than ready. But the tools around GraphQL are not good enough yet / the level of entry is still very high. The Apollo Engine is a great example: we use AWS Lambda functions to run our GraphQL proxy which means that the complete infrastructure around GraphQL is serverless. In order to use the Apollo Engine we need to spin up Docker images just to send query metadata to the Apollo Engine analytics — which sucks. Are there any other available options? We’re not sure…

Response cache is, well, complicated…

Currently, I spend most of my time building a React Native mobile application (and GraphQL proxy for the frontend of course — which is closely related). As this application for travellers, we use the “offline first” approach. We are basically storing every valid response to the offline store for later use if needed. Turns out that client caching is hard. I mean caching is hard in general (as well as naming) but caching GraphQL responses is seriously complicated.

The thing is that with REST you can just query the URL and store the response. The URL itself can easily be the cache key. So a lot of people expect the same with GraphQL — how hard can it be? But you cannot use the GraphQL endpoint URL as a cache key because you only have one endpoint. So it makes sense to use the query itself (or query hash) as the cache key. Not such a big problem so far. Now you can store the response based on this query. And if you did this — congratulations. You now have a very nasty bug in your application and I’ll explain to you why (it’s also explained perfectly in the Relay docs).

Let’s say you have this (pseudo) query for fetching all flight bookings:

allBookings {
departure { localTime }
arrival { localTime }

And now a very similar query returns only one booking based on its opaque identifier:

booking(id: "0paqu3==") {
departure { localTime }
arrival { localTime }

This second query should return the booking which you already fetched in the first query. If you’re storing responses based on the query itself then you probably introduced inconsistency because the booking could change over time leading you to render conflicting info for the same booking. So, for example, you could render the status “pending” on all bookings in the overview even though the status is “confirmed” on the booking detail.

How do we fix this? The correct way is to use the ID of the booking. And by ID, I mean the GraphQL ID which should be unique in the whole graph. You can normalise the response and use IDs as the key in the normalised store. It’s not exactly a trivial operation considering all the possibilities that may occur. I really recommend diving deep into the Relay documentation around caching so you can understand the true complexity of this problem.

Also, the ID itself may be little bit confusing and misleading. The ID represents the unique entity in the graph but it does not guarantee that the data inside this entity are still the same. So even though you can use ID as a key in the cache store, you cannot assume that having this ID already means “we already have this record and it’s up to date”.

GraphQL advantages are greater than the pitfalls

I believe that it’s important to understand the pitfalls. Especially if we are still dealing with them on a daily basis. And I also believe that it’s good to have these pitfalls. Having a simple system with small advantages usually means having small disadvantages. But the same rule can apply to the more advanced systems: having an API system with a huge amount of advantages means many more disadvantages too. Fortunately, they all have a solution or at least a good explanation so we can focus on the positives… 🙂

High barriers are getting lower thanks to all the tools and new companies around GraphQL. I’m a really big fan of design tools like React Sketch.app using GraphQL as a way of designing with real data. Programmers and designers are realising the importance of this approach and thanks to GraphQL it’s getting more real. We can also see more and more open-sourced examples which is inspiring. Especially if the example is real and running in production. For this reason, we’re planning to open-source our GraphQL proxy for our frontend as well. Stay tuned!

Featured articles
Generating SwiftUI snapshot tests with Swift macros
Don’t Fix Bad Data, Do This Instead