Merging Datasets with priority using built-ins
- merge datasets based on a unique key / identifier
- complete a dataset with defaults
- built-in heros:
collections.ChainMap
- Stimulation level: 2,5/5 (can be useful, but it is not a spaceship:))
Lets start with an example: we have na Uber/Taxi-like system, where there is a certain baseline rate for each hour during the week. You want to have an option to bump or decrease a specific price without recreating or updateting the baseline table.
Baseline prices:
Hour | Weekday | Rate |
---|---|---|
12 | Tuesday | $ 5 |
13 | Tuesday | $ 5 |
14 | Tuesday | $ 5 |
15 | Tuesday | $ 7 |
16 | Tuesday | $ 7 |
17 | Tuesday | $ 7 |
... | ... | ... |
22 | Tuesday | $ 7 |
23 | Tuesday | $ 7 |
0 | Wednesday | $ 11 |
1 | Wednesday | $ 11 |
2 | Wednesday | $ 11 |
3 | Wednesday | $ 7 |
We would like to increate the priece for Tuesday, because it's new year's Eve which is one of most profitable days of the year. The issue is that Tuesdays are usually lazy and by default they have low rates. Our proposition looks like:
Hour | Weekday | Rate |
---|---|---|
22 | Tuesday | $ 17 |
23 | Tuesday | $ 17 |
0 | Wednesday | $ 21 |
1 | Wednesday | $ 21 |
2 | Wednesday | $ 21 |
3 | Wednesday | $ 17 |
We can usecollections.ChainMap
to merge multiple dictionaries with priority which may have conflicting keys, like:
from collections import ChainMap
hour_to_price = { # default daily prices
1: 15,
2: 15,
3: 10,
4: 7,
5: 5,
6: 5,
7: 5,
8: 7,
9: 10,
10: 10,
11: 7,
12: 5,
13: 5,
14: 5,
15: 7,
16: 7,
17: 7,
18: 7,
19: 7,
20: 7,
21: 10,
22: 13,
23: 13,
}
new_years_price_bump = {
22: 17,
23: 17,
0: 21,
1: 21,
2: 21,
3: 17,
}
merged_hour_to_prices = ChainMap(new_years_price_bump, hour_to_price)
print(merged_hour_to_prices[21]) # 10
print(merged_hour_to_prices[22]) # 17
print(merged_hour_to_prices[3]) # 17
print(merged_hour_to_prices[4]) # 7
We are facing 2 issues here:
- How do we add a value only for Tuesday / Wednesday? - Composite keys
- How do we know that the price is a bumped price?
Composite keys
Our data record comprises of 3 values named: hour, weekday and rate. We can just use 2 values as a dictionary key, like:
new_years_price_bump = {
(22, "Tuesday"): 17,
(23, "Tuesday"): 17,
(0, "Wednesday"): 21,
(1, "Wednesday"): 21,
(2, "Wednesday"): 21,
(3, "Wednesday"): 17,
}
We need to align the default price table if we want to merge it with our new bumped price table. We can do that dynamically:
weekdays = [
"Monday",
"Tuesday",
"Wednesday",
"Thursday",
"Friday",
"Saturday",
"Sunday",
]
hour_weekday_to_price = {
(hour, wd): price
for hour, price in hour_to_price.items()
for wd in weekdays
}
merged_hour_to_prices = ChainMap(
new_years_price_bump,
hour_weekday_to_price,
)
print(merged_hour_to_prices[22]) # fails
print(merged_hour_to_prices[(22, "Tuesday")]) # 17
Note:collections.ChainMap
will work with inconsistant keys between dictionaries and you will be able to get both values from it if at least 1 dictionary in the chain contains that key, like:
print(merged_hour_to_prices[22]) # 13
print(merged_hour_to_prices[(22, "Tuesday")]) # 17
Ideas
You can easily substitute this composite key with a datetime
object, but this solution is not limited to dates. You can add categories of products on promotion, limited editions etc. without impacting your baseline.
T.B.C.
We will discuss in the next article how to decipher what type of price did we get for a specific key e.g. New Year's Eve bump or maybe Monday's promotions? :)