In the city where I live, four different kinds of waste are collected regularly:
- organic waste,
- the "yellow bag" which is for all sorts of packaging, and
- residual waste, which contains all the rest (except batteries, dangerous chemicals and a couple of other things, which people have to bring to collection facilities themselves).
At first sight, the days on which I have to take out the different bins and bags seem easy enough to remember: usually, everything is collected on the same day of the week. However, there are different schedules for the different kinds of waste (biweekly or every four weeks in my part of the city). Moreover, in weeks with bank holidays, the collection is often shifted to another weekday.
Fortunately, the city council provides iCalendar files (
*.ical) with all waste collection dates for the current year at its website. The downloaded file can easily be imported into any calendar application. I found that the structure of the events in the file could be made more convenient though.
Structure of iCalendar files¶
Some readers might not be familiar with the iCalendar file format, so let's first have a quick look at the downloaded file.
It turns out that the file contains plain text. The meaning of most lines is evident, and lines are grouped into events and other kinds of components with lines like
END:VEVENT. I will show you the file up to the end of the first event below.
- The source of this post is a Jupyter notebook, which you can download, modify, and run with an iCalendar file yourself.
- I am using
sedto show just the first calendar event on my Linux command line. Working with the events on the calendar with Python should work the same on every operating system though. You do not need
sednor any other special tools.
- The exclamation mark in
!sedtells the Jupyter kernel to execute this command in the shell, and not in the Python interpreter.
!sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' calendar.ics
BEGIN:VCALENDAR VERSION:2.0 PRODID:regio iT BEGIN:VEVENT UID:6d52ed35-9b04-41bc-9e4b-a6c07d845699 DTSTAMP:20220107T185517Z SUMMARY;LANGUAGE=de:Bio 2wö DTSTART:20220106T050000Z DTEND:20220106T050000Z DESCRIPTION:Bio 2wö LOCATION:<my address> BEGIN:VALARM ACTION:DISPLAY TRIGGER;RELATED=START:-PT720M DESCRIPTION:Bio 2wö END:VALARM END:VEVENT
In principle, we could write a script that loads and parses the lines, groups them into events, and works with these. But we do not have to reinvent the wheel - there are libraries for this purpose, of course 😉
Let's open the file and look at the first events in the calendar:
with open("calendar.ics") as file: calendar = ics.Calendar(file.read()) sorted(calendar.events)[:10]
[<Event 'Gelber Sack' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>, <Event 'Restabfall 2wö' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>, <Event 'Bio 2wö' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>, <Event 'Restabfall 2wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>, <Event 'Altpapier 4wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>, <Event 'Bio 2wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>, <Event 'Gelber Sack' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>, <Event 'Restabfall 2wö' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>, <Event 'Gelber Sack' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>, <Event 'Bio 2wö' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>]
The structure becomes more obvious if we print the date first in each line and add some grouping:
import itertools grouped_by_date = itertools.groupby(sorted(calendar.events), key=lambda event: event.begin) first_groups = itertools.islice(grouped_by_date, 3) for i, (date, events) in enumerate(first_groups): if i > 0: print() for event in events: print(event.begin.datetime.date().isoformat(), event.name)
2022-01-06 Gelber Sack 2022-01-06 Restabfall 2wö 2022-01-06 Bio 2wö 2022-01-20 Restabfall 2wö 2022-01-20 Altpapier 4wö 2022-01-20 Bio 2wö 2022-01-20 Gelber Sack 2022-02-03 Restabfall 2wö 2022-02-03 Gelber Sack 2022-02-03 Bio 2wö
As you can see, three different kinds of waste are collected on some dates, and four on others (note that the suffix "2wö" is a shorthand for "2-wöchentlich", or biweekly, and "4wö" means "every four weeks"). I did not like these showing up as different events in my calendar. This makes the calendar more cluttered than it needs to be, especially on days with a number of other events.
To fix this, we merge simultaneous events into one. The name of the merged event should contain all kinds of waste that are collected on that day.
First, let's strip the name suffix that indicates the schedule.
import re def strip_suffix(name): if (match := re.fullmatch("^(.*)( \d*wö)$", name)) is not None: return match.group(1) else: return name assert strip_suffix("Gelber Sack") == "Gelber Sack" assert strip_suffix("Altpapier 4wö") == "Altpapier" assert strip_suffix("Bio 2wö") == "Bio" assert strip_suffix("Restabfall 2wö") == "Restabfall"
We can then write a generator that finds all simultaneous events and yields a merged event:
def merge_names(events): return ", ".join(sorted(strip_suffix(e.name) for e in events)) def merge_simultaneous_events(events): begin_and_end = lambda event: (event.begin, event.end) for _, events in itertools.groupby(sorted(calendar.events, key=begin_and_end), key=begin_and_end): # We can consume 'events' only once, but we need it twice. # Therefore, we put them into a tuple. events = tuple(events) new_name = merge_names(events) merged_event = events.clone() merged_event.name = new_name merged_event.description = new_name yield merged_event merged_events = tuple(merge_simultaneous_events(events))
Now all collections on the same day are merged nicely:
for event in merged_events[:4]: print(event.begin.datetime.date().isoformat(), event.name)
2022-01-06 Bio, Gelber Sack, Restabfall 2022-01-20 Altpapier, Bio, Gelber Sack, Restabfall 2022-02-03 Bio, Gelber Sack, Restabfall 2022-02-17 Altpapier, Bio, Gelber Sack, Restabfall
Are we done yet, or is there more that could be improved?
Let's look at the times of events close to the daylight saving time switch:
def print_events_in_months(events, months=(3, 4)): for event in events: if (dt := event.begin.datetime).month in months: print(dt.date().isoformat(), dt.time().isoformat(), event.name) print_events_in_months(merged_events)
2022-03-04 05:00:00 Bio, Gelber Sack, Restabfall 2022-03-17 05:00:00 Altpapier, Bio, Gelber Sack, Restabfall 2022-03-31 04:00:00 Bio, Gelber Sack, Restabfall 2022-04-13 04:00:00 Altpapier, Bio, Gelber Sack, Restabfall 2022-04-28 04:00:00 Bio, Gelber Sack, Restabfall
All events have the same start time in UTC, but it would be nice if they had the same start time in local time! Maybe 7 am, because the first waste collections occur around that time.
import datetime import pytz berlin = pytz.timezone("Europe/Berlin") def set_time_7am(event): date = event.begin.datetime.date() time = datetime.time(hour=7) new_dt = datetime.datetime.combine(date, time, berlin) event.end = new_dt event.begin = new_dt
Note that we modify
ics would complain because the new
begin date is after the old one, such that
begin would be after
end temporarily. This issue could be fixed better, but simply swapping the assignments works just fine for my simple task.
new_calendar = ics.Calendar() for event in merged_events: set_time_7am(event) new_calendar.events.add(event)
It can be serialized easily to a file because a
Calendar object will happily behave like an iterable of strings that produces the file contents line by line:
with open("new-calendar.ics", "w") as f: f.writelines(new_calendar)
We can check now that the first event in the new file looks as it should. Actually, the first event in the file is not the event that occurs first because the events are stored in a
dict, a Python
set does not preserve the insertion order.
Note that I run
dos2unix on the file before processing it further because the Windows line breaks (
"\r\n") created by
ics appear to be turned into
"\r\n\r" in the cell output. This is not visible in Jupyter, but it confuses the code which converts the Notebook file into a blog post. I couldn't investigate yet what the root cause of this problem is, so I just remove the Windows line breaks.
!dos2unix new-calendar.ics 2>/dev/null || echo "dos2unix failed!" !sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' new-calendar.ics
BEGIN:VCALENDAR VERSION:2.0 PRODID:ics.py - http://git.io/lLljaA BEGIN:VEVENT BEGIN:VALARM ACTION:DISPLAY DESCRIPTION:Restabfall 2wö TRIGGER:-PT12H END:VALARM DTSTAMP:20220107T185517Z DESCRIPTION:Altpapier\, Bio\, Gelber Sack\, Restabfall DTEND:20220804T050000Z LOCATION:<my address> DTSTART:20220804T050000Z SUMMARY:Altpapier\, Bio\, Gelber Sack\, Restabfall UID:7993aafc-a134-48ea-b391-615b5ec63720 END:VEVENT
You might notice that the value of the
DESCRIPTION field of the alarm is still the one from one of the original events. This is also straightforward to fix, but I think that this post is already long enough as it is 🙂
Perhaps surprisingly, taking out the waste can teach you things about programming.
If you ever want to perform changes on iCalendar files and enjoy coding in Python as much as I do, I recommend that you give ics a try. It's just a
pip install ics
away and is documented nicely at https://icspy.readthedocs.io/en/stable/index.html.