data

Getaround EU Data Engineering Challenge (previously Drivy)

Looking for a job? Check out our open positions. You can also take a look at our engineering blog to learn more about the way we work.

Guidelines

Clone this repo (do not fork it)
Solve the levels in ascending order
Only do one commit per level and include the .git when submiting your test

Please do the simplest thing that could work for the level you're currently solving.

For higher levels we are interested in seeing code that is:

Clean
Extensible
Reliable

Challenge

The challenge needs to be resolved in Python. Each level depends on one python 3.7 executable and one to many libraries that you'll have to use. You can't modify them. Your solution to each level needs to live in the level_{N} directory.

Level 1

Launch the level_file program. It will write log messages into ./logs/#{id}.txt. Each file will contain one messsage log. The log looks like this:

id=0060cd38-9dd5-4eff-a72f-9705f3dd25d9 service_name=api process=api.233 sample#load_avg_1m=0.849 sample#load_avg_5m=0.561 sample#load_avg_15m=0.202

You need to write a program that will parse the messages, write the result to a JSON file in ./parsed/#{id}.json and deletes the original message. You need to write a JSON in the following format:

{
  "id": "2acc4f33-1f80-43d0-a4a6-b2d8c1dbbe47",
  "service_name": "web",
  "process": "web.1089",
  "load_avg_1m": "0.04",
  "load_avg_5m": "0.10",
  "load_avg_15m": "0.31"
}

Level 2

When you launch the levels_http program it will send the same log messages to a local HTTP server at http://localhost:3000/. The HTTP server listens to POST requests on the port 3000.

The POST requests will timeout after 100ms.

You need to write a simple HTTP server that will listen to this requests, parse the logs and write the result to a JSON file in ./parsed/#{id}.json in the same format than Level 1. To write a simple HTTP server look at Flask or Bottle.

Level 3

Launch the levels_http program. This time your HTTP server need to parse the logs and send them to a Redis LIST on a local Redis instance (redis://localhost:6379).

Level 4

Launch the levels_http program. Your HTTP server, after parsing the logs, needs to enrich them with a library called slow_computation.

To use this library:

import slow_computation

new_dict = slow.compute(new_dict)
print(new_dict)
# {
#   "id": "2acc4f33-1f80-43d0-a4a6-b2d8c1dbbe47",
#   "service_name": "web",
#   "process": "web.1089",
#   "load_avg_1m": "0.04",
#   "load_avg_5m": "0.10",
#   "load_avg_15m": "0.31",
#   "slow_computation": "0.0009878"
# }

As in level 3 you’ll send the resulting JSON in a redis LIST. Again, the HTTP call will timeout after 100ms.

Level SQL

Data

We provide some (fake) data to play with. You will work with cars and rentals. You can download the CSV files here:

cars: https://cl.ly/eDaw rentals: https://cl.ly/eDUn

Data description

Cars:

id: the car ID
city: the city where the car is available
created_at: date when the car was made available on the platform

Rentals:

id: the rental ID
car_id: the ID of the car used for this rental
starts_at: the datetime when the rental starts
ends_at: the datetime when the rental ends

Remarks regarding data quality: rentals starts_at and ends_at columns only contain 00:00:00 or 12:00:00 time components providing a half-day (AM/PM) level of detail. rentals are clean: there is no overlap between rentals. car_id and dates are also clean: no NULL value or erroneous value. cars: all fields are clean except created_at, which can be NULL

Exercice

You need to solve this in SQL. You are free to choose any kind of database engine.

We first want to fix the NULL created_at for cars. For each car with a NULL created_at, we will consider that they were created on the same date as the previous car (ie. the car with the closest id before with a non null created_at). Assume that cars can be more than 1 ID apart.

Then, for each month, find how many cars reach their 3rd rental since their registration. Use the starts_at to determine the month to attribute.

We hope you'll have fun doing this challenge. It shouldn't take more than a few hours. Enjoy and be reliable <3

Name		Name	Last commit message	Last commit date
parent directory ..
level_1		level_1
level_2		level_2
level_3		level_3
level_4		level_4
level_SQL		level_SQL
lib		lib
README.md		README.md
level_file		level_file
levels_http		levels_http
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

data

data

README.md

Getaround EU Data Engineering Challenge (previously Drivy)

Guidelines

Challenge

Level 1

Level 2

Level 3

Level 4

Level SQL

Data

Data description

Exercice

Files

data

Directory actions

More options

Directory actions

More options

Latest commit

History

data

Folders and files

parent directory

README.md

Getaround EU Data Engineering Challenge (previously Drivy)

Guidelines

Challenge

Level 1

Level 2

Level 3

Level 4

Level SQL

Data

Data description

Exercice