How we developed an AR feature that measures your bags with an iOS or Android smartphone
Recently, Kiwi.com launched an augmented reality app on both iOS and Android to measure the size of your bags. It can tell if they’re too large for airline requirements and may require an extra payment at the airport.
I’m going to run through the different steps of the development process that we went through to build it on iOS, and my colleague, Arcadii, will write about his experiences with Android.
I’ll highlight the changes and improvements made on our AR feature. Similarly, I’ll follow the preparation of the app to be ready for the end user, as well as describing the challenges we faced.
I’ve refrained from getting too much into the specifics of some algorithms, because I think you could find better and more understandable descriptions in the sources I’ve used.
Both Arcadii and I have written in the first-person, but we were working as a team and discussing our approach and deciding what to try next.
Last year, we worked on the first, unreleased, iteration of the app and I wrote about my first experiments with ARKit, which you can read here. We didn’t release it because it was cumbersome and didn’t work half the time.
Fortunately, I found some time to try new things and finally follow through with my work. The use case stayed the same — we were trying to measure cabin luggage using augmented reality on an iPhone.
In the previous article, I wrote about using augmented reality to solve existing problems and not creating new ones. I also explained some basics of starting with ARKit and SceneKit development on iOS, before taking a deeper dive into our solution.
The old and the new
The ultimate goal of Kiwi.com’s mobile app is that it can be used throughout a trip. It is designed to help users from the moment they open it for the first time. We aim to provide helpful and interesting features before, during and after their journey.
This is why we decided to try bag measuring — not everyone has a tape measure on hand in the forests of Vietnam.
The measuring feature should not only help users decide what to take with them on board, but should also be fun and interesting to use.
The original prototype of the app required a lot of manual input from the user:
- Allowing ARKit to scan the floor
- Placing the physical baggage somewhere within this scanned area
- Manually placing a virtual cage over the real baggage
- Confirming the placement
- Making a complete circle around the baggage
- Ending the measurement and checking it
Even just reading through this list is boring.
The flow required many difficult instructions and the measurement took a while. Both of these aspects lead to a poor user experience.
The first goal of my improvements was to remove as many steps as possible, or at least automate them. I took the old prototype of the app as a starting point for the new iteration and tried to improve it as much as I could in a month or so.
Looking for a bag in a haystack
The first two steps in the list are mandatory because of the technology itself.
ARKit always requires the floor to be scanned and you can’t measure a physical object if you don’t have this scan. Therefore, the first step I was able to improve was the virtual cage placement.
However, automating this is not a simple task. Apart from correct positioning, the cage also needs to be rotated properly. In cases where it isn’t, the user still has to carry out extra steps. This is the right time to play with computer vision.
Fortunately, ARKit provides developers with a basic point cloud representation of a user’s environment. At its most simple, this is a collection of data points recognised by the camera.
Point clouds are often used in various computer vision problems. They do a much better job than a pure image when it comes to analysing an environment in 3D.
Usually, various textured objects (such as baggage) fall within this category, too. However, the camera considers everything it catches. Looking for a bag in the data is like looking for a needle in a haystack.
There are many methods to classify all the various objects in a point cloud. I had to find some which would run in real time and on a phone. I also had to keep in mind that the phone also runs augmented reality sensing and 3D rendering, so finding a fast algorithm was essential. However, there were none.
Scanning the bag
When solving a problem with no commonly used algorithm available, I like to approach the problem in a simple way. Usually, I consider the users’ behaviour and how they may want to use the feature.
Using augmented reality to measure your bag is very similar to taking a video of it. After having launched the app, the majority of users would quickly point the camera at the bag and then keep it pointed at it. We’ve allowed them to follow that path.
With such presumed behaviour in mind, I wrote a filtering algorithm.
It would start looking for feature points near the centre of the scanned area, where the bag would most likely be positioned.
After catching the initial point, it expands the list of points representing the bag by looking for nearby points. This basically expands from the first detected point to the edges of the bag.
It proved to work quite well, at least for the initial positioning of the cage. This first version was working on a per-frame basis, without remembering previous frames. It was processing the whole point cloud in every frame, and it was neither quick nor precise.
A solution to the low precision was to store the detected features and create a model of the environment.
However, the point cloud provided by ARKit is a very rough approximation of the environment. The feature points representing interesting elements might receive different coordinates in each frame and there is no guarantee that the data won’t be random noise.
Storing the feature points without any kind of post-processing and duplicate detection would result in a model made out of thousands of points, many of which duplicate others. This would, in turn, slow down filtering and further processing. We needed a data structure which could automatically detect duplicates and simplify the whole point cloud.
An octree structure (or Quadtree in 2D and many other variants) is a common method used to simplify point clouds (source).
An octree is a kind of a nested tree, based on recursive splitting of the space into smaller nodes, with each node holding a set of points. These nodes also split and their contained points are redistributed amongst their appropriate children. Both the splitting and redistributing are repeated until the count of nodes equals the required resolution.
Each of the lowest level nodes — also called a leaf — represents one point in the cloud. If a point is added to the structure, it is recursively stored into appropriate child nodes, until the point reaches a leaf node.
Each leaf node can contain multiple points, but it represents just one single point. The point which the leaf node represents is calculated by averaging the coordinates of all points contained within it.
This allows a point cloud representing the bag to be stored and refined as the user continues scanning.
Thinking of boxes
At first, I tried to use the model (stored in the octree) to position the cage automatically. The image below shows the positioning kind of worked, but finding the correct rotation was a completely different, and still unsolved, problem.
I decided to take a step back and do a bit of research on fitting objects into other objects. I needed to fit a bag, represented by points, into a cuboid — the cage. This approach is often used for different purposes in game development.
During my research, I realised that I was looking at the problem from the wrong perspective. Why was I trying to fit an object into another object? Surely I could wrap this object with a shape.
This led me to an algorithm called the convex hull (more here), which shares its name with its output.
The aim of this algorithm is to find the outline of any object represented by points. To put it broadly, you can construct a model — with sides — of the object from an available point cloud.
This 3D solution of the problem probably wouldn’t run very well with the processing power of a phone, at least not in near-time speeds. Although I have not tried it, I had to think about a way to overcome this problem. In other words, the whole processing flow needed to be simplified.
The final solution was to work in a 2D space, which is a lot faster and easier. ARKit already detects the floor in 3D, so it can be used as a 2D working space.
The projection of 3D coordinates onto a 2D space is a simple task — you simply ignore one axis. After simplifying the data, I fed it into a 2D implementation of a convex hull to see if it would be usable for this project. And it worked well.
The biggest obstacle was to properly convert the SceneKit coordinates onto a local 2D space and then to 3D.
SceneKit provides useful methods for converting between different coordinate spaces. You can take the scanned points and convert them to a new coordinate space. After you work with them and convert them back, they can be properly rendered.
You can learn more about the SceneKit library in my previous article.
The baggage sizes at the airport are cuboids, but as you can see in the image, the convex hull produces a polygon. The available points representing the bag are usually very approximate, even after tweaking the octree. Therefore, using this as the only basis for the size visualisation is not enough.
A common practice to transform a polygon into a cuboid is to find its bounding box. SceneKit even has this function included. You can easily find a bounding box of any node, including its child nodes.
If you’ve ever worked with SceneKit, you might just ask: “Why not get the bounding box of the point cloud and save all this hassle?”
Well, the bounding box SceneKit provides is always oriented in the same default axis and it does not matter what the rotation of the node is. It does not return a minimum existing bounding box, but some bounding box that is usually unnecessarily large.
In this case, it is necessary to find the smallest area the baggage occupies. The difference between airline requirements is only a few centimetres, so every little bit matters.
A common, and easy, solution to finding the smallest bounding box with rotation is the rotating calipers algorithm that works with the polygons as an input (source).
The whole polygon is rotated and shifted as many times as the number of edges it has. Every rotation is checked by aligning one edge to the X-axis and shifted, so the leftmost point’s X coordinate is zero. Calculating the bounding box is as simple as checking the maximum X and Y coordinates (because it’s origin is always zero; zero).
With the bounding box and rotation as an output, the app is able to measure the bag’s width and depth. Transforming the bounding box’ coordinates back to the SceneKit system and rotating it properly gives you the base of the baggage.
The final step in the processing pipeline was to measure the height of the baggage. I used the information about the floor’s position provided by ARKit along with the processed model of the bag. Then, a point which is the furthest from the floor (in the height axis) is selected. Height is calculated as a difference between the floor and the reference point.
Visualising the measured dimensions is a simple task of creating a graphics primitive offered by SceneKit. I’ve used two SCNBox objects, one for the blue colour and the second for the white outline. This is run through a shader that makes sure it is rendered only as an outline.
The final touch was to include some nice animations as the baggage is scanned. I also wanted to provide the user with feedback about the scanning process itself. I’ve used the taptic engine for that.
The thought was that every new scanned point would tap the user, so at the beginning, you feel a lot of taps. As the bag is close to being scanned as a whole, there are fewer taps and, finally, nothing. That should prompt the user to finish the scan. However, I did not include any instructions about the feedback. I just wanted to see if this would come to users naturally.
You can see the whole animated algorithm flow below:
A few notes about the feature on Android
Arcadii Rubailo, Kiwi.com’s Android Developer
As mentioned earlier, the bag measuring feature is also available on Android. Despite the fact there is almost identical logic on both platforms, I would like to talk about important distinctions.
First of all, ARKit cannot be used on Android, thus an appropriate replacement was found — ARCore. This is an AR project from Google, which was released in September 2017.
ARCore is a great inheritor to Tango, Google’s original AR software, which had one main drawback — the necessity to use a special camera. It limited the number of supported devices to the Lenovo Phab 2 Pro and the Asus Zenfone AR. Fortunately, ARCore bypasses these limitations and supports various devices from different manufacturers, including Apple.
One of the main advantages of ARCore is the ability to check the confidence of each point. After a point cloud is acquired, we have a float array in the following format:
[x1, y1, z1, confidence, x2, y2,…]
The confidence can have a value from 0 to 1, because it represents the probability that the point is a part of a real object. In such a manner, ARCore provides a more densely populated point cloud, but one which is less accurate.
It adds significant flexibility for developers. They can control the precision depending on the current environment: light, unwanted things in the frame or uneven surfaces. Such control can be achieved by changing the minimum confidence threshold in the point filter.
Similarly, the parameter is used to merge close points. When a new point arrives within the radius of an existing point, the application compares the confidence of each and picks the point it has the most in.
Back to basics
Along with ARCore, the Sceneform library was brought into the world. It is a scene graph API that should ease the rendering process without having to learn OpenGL.
It sounds very promising but, unfortunately, still lacks a lot of features. One of the main obstacles is the inability to control the rotation of the bounding box in a simple way. Limited functions for drawing materials is another big minus.
After investigating sample projects, I have discovered a great example of using ARCore with OpenGL. Thus, we were back to basics and using OpenGL as the main tool for rendering scene objects.
Such an approach increased the development time but provided us with better control. However, the Google AR team actively develops Sceneform, and I believe usage of OpenGL will soon be redundant.
Including this feature in an app used to manage bookings is a definite advantage. Since information about airlines’ baggage restrictions is already available with every booking, it can be used to decide if the bag meets the requirements or not.
As augmented reality features are currently popular and fun to use, we also allow users without a booking to try it on the profile tab. This way they can display the measurements without any comparison.
Building a bag measurement app for both iOS and Android was a challenge, but a fun one. The lack of processing power on phones means that everything must be simplified, while maintaining accuracy — especially because people will actually use it to get their bag on a plane. But it is possible.
Join us for the next development challenge
Would you like to join us for the next challenges? Check our open positions.