Nov 13, 2019

Designing GraphQL schema for Kiwi.com: Part I. — Why?

This is the first part of a series about Designing a GraphQL schema for Kiwi.com.

GraphQL schema design is important; this can be very well illustrated by our necessity to refactor some of the parts multiple times. Learn more in the following article by our Frontend Developer Jaroslav Kubíček.


It’s been more than two years since the first Kiwi.com GraphQL server spotted its first rays of sun. Some of our queries have already been reshaped three times, and a few times we iterated to solutions we later found hidden in Github discussions of other companies.

This article is the first in a series of four where I’ll explain that the implementation of GraphQL server is, in the first place, all about schema design.

Read on to learn from our mistakes. We can begin by answering the question: why do we even need GraphQL?


Borderless entropy of REST API

Before we dive into a discussion on how we tackled some specific problems, it would be best to think more about the reasons why REST API stopped being sufficient for our purposes and what led us to adopt GraphQL and spend additional time (and money) on such a thing.

Let’s have a look at what you get when you make a GET request to https://booking-api.skypicker.com/api/v0.1/users/self/bookings/{bid}

See full response at https://gist.github.com/jaroslav-kubicek/9b5e0040146b07701e2b5de8c121b044

In total, 340 prettified lines, 39 top-level fields, an array of “flights” where each item contains an object with 12 fields…

But hey, we only wanted to render this:

In short, we got back much more than we asked for. We didn’t need information about every stopover, just the origin and the final destination and yet we got it all.

It’s most probably because of all these fields are needed on some other particular place, some might be already deprecated, others might be computed fields to make handling of common scenarios easier, but in the end, it definitely takes you some effort to understand the response just to render such a simple card.

Clacks have its traits

In this place, the author of the article apologizes for making these silly bookworm references into fantasy worlds.

If you ever stepped into Ankh-Morpork and wanted to send a brief message back to your relatives elsewhere on Discworld, probably the most convenient way was using the services of the Grand Trunk Company.

So, if you weren’t lucky enough to visit the city, let me briefly explain how Grand Trunk Company operates when it’s delivering your message:

You can learn from “Going Postal” by Terry Pratchett more than you think…

Your message is encoded, translated into light signals and transmitted over the “clacks” network of telegraph towers, full of the complex machinery of pedals and levers, to the destination where it’s again decoded and delivered.

Sounds complicated, right?

Well, the actual REST API works on a similar basis!

Don’t let your API become the wall…

Most likely, you have types in your backend application and database, with references between objects, and probably also some sort of polymorphism or similar abstraction.

The same applies to your frontend as well. With TypeScript or Flow, you are also granted type safety, and JavaScript is now just as equipped with features as any other programming language.

But a lot of information, with all the types precisely defined and the references between objects, is lost in between REST calls. In the end, we are spending significant amounts of time on normalizing and encoding data for APIs, just so we can denormalize, decode and reconstruct this back on the client. What if we didn’t need to manually define any flow type for our data? As you will see later, this is an example of an issue that is easily circumvented when using GraphQL.

Things to be improved

Below I describe some common challenges we face when developing web or mobile apps:

  • Overfetching — getting far more data than needed, potentially wasting user’s monthly data package
  • Underfetching — making additional requests necessary, when the UI component requires more data than can be furnished with a single call
  • Loss of types and references between entities — necessitates going through responses, checking the existence of fields, and recreating object models which fit the scenario’s needs
  • Improper documentation — documentation that’s either completely missing, misleading, outdated, or not known to developers because it’s deployed as different service
  • Excessive boilerplates — a lot of (usually imperative) code on both ends to encode, normalize and process data . Especially in larger projects, it may happen that you must solve the same problem three times, e.g. in Swift, Kotlin and JavaScript.
  • The unease of migration — with multiple clients, you never know exactly which field can be safely deleted, renamed or reshaped to match new requirements. This causes people to be afraid of such cleanups, increasing technical debt on the backend as a result.

In Kiwi.com, for example, we have multiple projects, maintained by several teams, where we need to display itineraries as below — e.g. on web search, account page, help center, mobile app, an internal tool for CS, etc.:

And all the people in these teams have to maintain and update it separately in their codebases, while keeping it consistent across the whole product. For that purpose, they need to use the same complex REST API shown above, which requires them to:

  • remember what each field means (can be solved by documentation, but this is not always feasible…)
  • try to forget about a plethora of other fields in the response that are not relevant at the moment
  • synchronize migration across teams when a new feature emerges — like buses on top of regular flights — to keep the product consistent
  • conditionally render that “Changing transport is your responsibility” because of value “arrival.where.code” is different from “departure.where.code” in the second segment:
{
...
"flights": [
{
"departure": { ... },
"arrival": {
"where": {
"city_id": "nuremberg_de",
"code": "ZAQ"
}
}
},
{
"departure": {
"where": {
"city_id": "nuremberg_de",
"code": "NUE"
}
},
"arrival": { ... }
}
]
}

GraphQL to the rescue

Speaking solely about documentation, you could argue that there are plenty of tools like swagger.io for such purpose.

You would be right. However, it’s not hardwired; you always need to opt-in and make extra effort, which is usually skipped when building the next startup unicorn.

Meanwhile, let’s look at GraphQL now:

Schema definition for a single connection might look like this.

As you can see, the documentation is already a part of the schema itself, no additional tooling & effort required. Thanks to the declarative nature of definitions, you instantly get covered right with boot-up of the server:

GraphiQL explorer

That was documentation in a nutshell. Now let’s get to migrations.

Have you ever wanted to remove a field from an API because it was colliding with an upcoming feature?

I have. And it wasn’t the smoothest operation I’ve ever done. Within a company of multiple API consumers, you can never be 100% sure whether the field can be safely modified or deleted. Often, there is simply no chance to obtain such information, except to go through every team and ask them.

Plus, there are always mobile app users running one-year-old versions. Ouch.

With GraphQL, we always ask only for what we need:

And that has some useful implications:

  • you can log which fields are still being used and not
  • you get back exactly what is needed — no overfetching or underfetching

On top of that, when requirements change, you are able to deprecate particular fields:

Since we introduced also buses and trains, it might be worth to start using “station” instead “airport”.

And of course, the usage of any deprecated field can be logged again. Clients can leverage tools like Eslint with the graphql/no-deprecated-field rule to warn themselves against it to actively perform migrations.


Conclusion

After reading this article, you should understand why we need GraphQL. In the next chapter, you will learn how to design it.

Also, you might have noticed the ‘… on BookingOneWay’ in our query. It’s an inline fragment required when asking for some specified fields, because our query actually returns BookingInterface type. You will learn more about how we leveraged GraphQL interfaces when designing booking queries in the next chapter.

Before the next article arrives, you can learn more about GraphQL in the links and videos below.

Also, feel free to check our open positions if you’d like to join us.


Resources:

Share
Featured articles
Airflow Summit 2023: A Snapshot of the Technical Feast in Toronto
Codename Tulip: The Making of Async