The biggest European Python conference previously saw the sunny Italian beaches and the rocky cliffs of Scotland. In 2019, it moved again — this time to the Swiss city of Basel famous for raclette, pocket knives and enormously high beer prices.
Since at Kiwi.com we love both learning and discovering nice destinations, we couldn’t miss this year’s EuroPython.
Check out some of our notes and insights.
The Python stories
Once upon a time, back in 1989, Guido was looking for a hobby project to keep him occupied during the Christmas holidays and he decided to start writing an interpreter for a new scripting language… You probably know how the story of the Python project began but have you ever wondered what the initial release of Python looked like in 1994? How different it was from the language we use nowadays?
Actually, not that much. You can try it out thanks to the vintage Python container images. Multiple versions of the interpreter from the previous millennium can be run with a single command and it feels almost like travelling in time.
Another notable milestone in the history of Python took place in the year 2000. Back then Python 2.0 was released and it brought about a significant change in the shape of a GPL compatible license. Yes, we all wish this was also the case with the 2 to 3 transition.
Python and its weaknesses: some notes on the Friday keynote
Python was adopted by many people working in all kinds of fields, from web development and networking to data science and machine learning. It is elegant, readable and easy to learn.
However, if we really try to find a weakness, it is the performance. “Python is slow,” people usually say. Its performance was also the topic of Friday’s keynote by Victor Stinner, the CPython core developer who gave a talk titled Python Performance: Past, Present and Future.
His talk was a great overview of the interpreter limits and bottlenecks. Various projects were attempting to improve the performance of the interpreter but a majority of them failed terribly. Many ambitious projects backed by big companies were stopped after it had been found out that reaching the goal is harder than expected.
A very common thing to blame for all the performance issues is GIL. The Gilectomy project that has been trying to remove it has also been failing so far.
The most important thing is to measure correctly the performance of multithreading systems and the impact of the GIL. The proper analysis and a well-aimed optimisation may solve the issues. Another option is to use Cython for the performance of a critical code or try some of the performance-oriented alternatives to CPython interpreter, like PyPy or Numba.
These tools might help, but they also bring some drawbacks, which must be taken into account. There are many ideas for future improvements, such as the new C API, tracing garbage collector and many other promising features.
Being involved in CPython development is not an easy job. Pablo SalgadoIn, another CPython’s core developer, described in a talk titled The soul of the beast how Python grammar is defined and parsed. The parser of our favourite, featureful language uses LL(1) grammar! Can you believe that?
The Zen of Python’s “Simple is better than complex” principle is applicable here and the simple grammar makes the language easy to read for us humans as well.
Pablo also presented giant finite automata of the parser, which didn’t seem simple at all and showed how easy it is to add new grammar rules into the language. The small change in the grammar can allow testlist nonterminal in the decorator rule. Recompiled CPython interpreter will parse the following code snippet without raising an error:
@lambda x: 32
return “Larry Hastings”
Can you guess what this function returns when called?
EuroPython and Machine Learning: TensorFlow 2.0 and Gradient Boosting in scikit-learn 0.21
As Python is currently the programming language of choice for data scientists and machine learning practitioners, this field was also heavily represented at EuroPython 2019.
With the anticipated release of TensorFlow 2.0, which is currently in its Beta stage, we were treated to a very nice sneak peek into what features we can look forward to in this new major version.
The newest developments with TensorFlow were covered by two talks, TensorFlow 2.0: TensorFlow Strikes Back by Michele De Simoni, and Deep Learning with TensorFlow 2.0 by Brad Miro, developer programme engineer from Google.
The first difference is that the new version will come with new libraries for deploying trained models. Examples are TensorFlow Extended targeting servers and TensorFlow Lite for mobile and small devices like Raspberry Pi.
But what about the training interface? In the new version, we have lost tf.contrib library which was deemed too messy and moved to a standalone project. A major change is that the control features like tf.cond, tf.while_loop and most notably Session.run are now gone. These were replaced by control statements from Python itself and by integrating Keras as the primary high-level interface into tf.keras.
For those who are worried, Keras will continue to exist as a standalone project, supporting different backends such as Theano or CNTK. The only difference is that it is now conveniently shipped to tensorflow.keras.
Another highly anticipated feature is AutoGraph. This is a simple tool that adds a possibility to rewrite any Python code to native TensorFlow code just by adding @tensorflow.function as a decorator, which can speed up the execution on order of tens or hundreds.
To try these features and many more, you can install the beta version via pip install -U –pre TensorFlow.
Gradient Boosting in scikit-learn 0.21
Another talk worth mentioning is Histogram-based Gradient Boosting in scikit-learn 0.21 by Olivier Grisel who is a core developer of scikit-learn.
His talk showcases that deep learning is not a panacea for all machine learning problems and neural networks should not always be a way to go. Scikit-learn provides a variety of machine learning algorithms out of the box, which can be switched with almost no effort. This is facilitated by beautiful homogenous API (fit / predict / transform) and by extension it creates a unique tool for meta-learning.
The highlight of the talk was the presentation of the new histogram-based Gradient Boosting implementation, optimised to train on huge datasets. This algorithm was compared to Multilayer Perceptron, the previous Gradient Boosting implementation and Random Forrest on a real estate pricing dataset.
MLP reached an error of 0.215 on 8.41s training time, Gradient Boosting reached 0.187 in six seconds and Random Forrest capped at 0.192 in 4.65s. The new Histogram-based Gradient Boosting reached the same performance as the original. So what is the benefit you ask?
This algorithm can run on datasets with tens of hundreds of millions of data samples, which is the main shortcoming of traditional Gradient Boosting. Another benefit is that a trained model can evaluate data in just ~4 µs which is around a hundred times faster than Random Forrest. An additional bonus is that the trained model is just 2.4 MB in size, making it ideal for memory scarce devices, compared to 152.2 MB of Random Forrest’s memory requirement.
Kiwi.com sharing the experience
There are a couple of other things we love at Kiwi.com besides travelling — we love technology and we love Python. A majority of our backend services is written in Python. We are dealing with software and APIs design, asynchronous tasks, code refactoring and collaboration in big teams on a daily basis. Two of our skilled engineers shared our experience in Basel this year.
Refactoring in Python
Tin Marković, chapter lead at the Booking tribe, was speaking about Refactoring in Python. He points out why it is important to use automated code linters and checkers like Pylint, MyPy, Black and Coala. The best approach is to make them part of the CI/CD pipeline, especially when working in bigger teams.
Side note: if you want to get a more detailed explanation of an actual CI setup, check out this blog post.
Using actual code examples, Tin also presented some patterns and antipatterns — everything learnt the hard way on the Kiwi.com code bases. The one thing you should always keep in mind is that code is written to be replaced. The less interdependent it is, the better. Good code can be reused easily and adjusted or even replaced without any troubles.
Different kinds of queues for asynchronous calls
The dos and don’ts of task queues is the title of the talk given by Petr Stehlík, Finance tribe developer. Petr explains the challenges of using different kinds of queues for asynchronous calls.
His story is about initial design for a new project based on Redis Queues, which turned out not to be the greatest idea. When the project participants realised it is not going to scale for their use case, they decided to use good old Celery instead. The lesson they’ve learned is that the new approach is not always better than a well-tested solution, which is adopted by many other teams in the company.
Petr also summarises the principles that should be followed in the task queues setup and advises what things to consider when writing asynchronous calls.
Check out the official EuroPython YouTube channel. All of the 2019 edition talks will be added soon.
Pool and other sorts of parties
Do you know what we love to do when work is over? Party! Our crew was lucky this time. We found a beautiful apartment with a pool on its roof, where we stayed during the conference.
Our community team arranged a special Kiwi.com party at the pub and two rooftop parties near the pool on top of the apartment. EuroPython attendees were welcome to join, do some swimming, have a drink and enjoy the evening with us. The atmosphere was quite relaxed, so it was a great opportunity to have an informal chat with many amazing people from the conference.
Another great event is over, but don’t worry, we will be at more tech conferences soon. Stop by at the Kiwi.com booth for a chat and don’t forget to ask for a party ticket!