Getting rid of waste: manipulating calendars with Python and the ics library
In the city where I live, four different kinds of waste are collected regularly:
- paper,
- organic waste,
- the "yellow bag" which is for all sorts of packaging, and
- residual waste, which contains all the rest (except batteries, dangerous chemicals and a couple of other things, which people have to bring to collection facilities themselves).
At first sight, the days on which I have to take out the different bins and bags seem easy enough to remember: usually, everything is collected on the same day of the week. However, there are different schedules for the different kinds of waste (biweekly or every four weeks in my part of the city). Moreover, in weeks with bank holidays, the collection is often shifted to another weekday.
Fortunately, the city council provides iCalendar files (*.ical
) with all waste collection dates for the current year at its website. The downloaded file can easily be imported into any calendar application. I found that the structure of the events in the file could be made more convenient though.
Structure of iCalendar files¶
Some readers might not be familiar with the iCalendar file format, so let's first have a quick look at the downloaded file.
It turns out that the file contains plain text. The meaning of most lines is evident, and lines are grouped into events and other kinds of components with lines like BEGIN:VEVENT
and END:VEVENT
. I will show you the file up to the end of the first event below.
Note:
- The source of this post is a Jupyter notebook, which you can download, modify, and run with an iCalendar file yourself.
- I am using
sed
to show just the first calendar event on my Linux command line. Working with the events on the calendar with Python should work the same on every operating system though. You do not needsed
nor any other special tools. - The exclamation mark in
!sed
tells the Jupyter kernel to execute this command in the shell, and not in the Python interpreter.
!sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' calendar.ics
In principle, we could write a script that loads and parses the lines, groups them into events, and works with these. But we do not have to reinvent the wheel - there are libraries for this purpose, of course 😉
import ics
Let's open the file and look at the first events in the calendar:
with open("calendar.ics") as file:
calendar = ics.Calendar(file.read())
sorted(calendar.events)[:10]
The structure becomes more obvious if we print the date first in each line and add some grouping:
import itertools
grouped_by_date = itertools.groupby(sorted(calendar.events),
key=lambda event: event.begin)
first_groups = itertools.islice(grouped_by_date, 3)
for i, (date, events) in enumerate(first_groups):
if i > 0:
print()
for event in events:
print(event.begin.datetime.date().isoformat(), event.name)
As you can see, three different kinds of waste are collected on some dates, and four on others (note that the suffix "2wö" is a shorthand for "2-wöchentlich", or biweekly, and "4wö" means "every four weeks"). I did not like these showing up as different events in my calendar. This makes the calendar more cluttered than it needs to be, especially on days with a number of other events.
Merging events¶
To fix this, we merge simultaneous events into one. The name of the merged event should contain all kinds of waste that are collected on that day.
First, let's strip the name suffix that indicates the schedule.
import re
def strip_suffix(name):
if (match := re.fullmatch("^(.*)( \d*wö)$", name)) is not None:
return match.group(1)
else:
return name
assert strip_suffix("Gelber Sack") == "Gelber Sack"
assert strip_suffix("Altpapier 4wö") == "Altpapier"
assert strip_suffix("Bio 2wö") == "Bio"
assert strip_suffix("Restabfall 2wö") == "Restabfall"
We can then write a generator that finds all simultaneous events and yields a merged event:
def merge_names(events):
return ", ".join(sorted(strip_suffix(e.name) for e in events))
def merge_simultaneous_events(events):
begin_and_end = lambda event: (event.begin, event.end)
for _, events in itertools.groupby(sorted(events,
key=begin_and_end),
key=begin_and_end):
# We can consume 'events' only once, but we need it twice.
# Therefore, we put them into a tuple.
events = tuple(events)
new_name = merge_names(events)
merged_event = events[0].clone()
merged_event.name = new_name
merged_event.description = new_name
yield merged_event
merged_events = tuple(merge_simultaneous_events(calendar.events))
Now all collections on the same day are merged nicely:
for event in merged_events[:4]:
print(event.begin.datetime.date().isoformat(), event.name)
Are we done yet, or is there more that could be improved?
Fixing start and end times¶
Let's look at the times of events close to the daylight saving time switch:
def print_events_in_months(events, months=(3, 4)):
for event in events:
if (dt := event.begin.datetime).month in months:
print(dt.date().isoformat(),
dt.time().isoformat(),
event.name)
print_events_in_months(merged_events)
All events have the same start time in UTC, but it would be nice if they had the same start time in local time! Maybe 7 am, because the first waste collections occur around that time.
import datetime
import pytz
berlin = pytz.timezone("Europe/Berlin")
def set_time_7am(event):
date = event.begin.datetime.date()
time = datetime.time(hour=7)
new_dt = datetime.datetime.combine(date, time, berlin)
event.end = new_dt
event.begin = new_dt
Note that we modify event.end
before event.begin
. Otherwise, ics
would complain because the new begin
date is after the old one, such that begin
would be after end
temporarily. This issue could be fixed better, but simply swapping the assignments works just fine for my simple task.
Creating a new file with the merged events¶
Now we can put the new events into a new Calendar:
new_calendar = ics.Calendar()
for event in merged_events:
set_time_7am(event)
new_calendar.events.add(event)
It can be serialized easily to a file like this:1
with open("new-calendar.ics", "w") as f:
f.write(new_calendar.serialize())
We can check now that the first event in the new file looks as it should. Actually, the first event in the file is not the event that occurs first because the events are stored in a set
in Calendar
. Unlike dict
, a Python set
does not preserve the insertion order.
Note that I run dos2unix
on the file before processing it further because the Windows line breaks ("\r\n"
) created by ics
appear to be turned into "\r\n\r"
in the cell output. This is not visible in Jupyter, but it confuses the code which converts the Notebook file into a blog post. I couldn't investigate yet what the root cause of this problem is, so I just remove the Windows line breaks.
!dos2unix new-calendar.ics 2>/dev/null || echo "dos2unix failed!"
!sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' new-calendar.ics
You might notice that the value of the DESCRIPTION
field of the alarm is still the one from one of the original events. This is also straightforward to fix, but I think that this post is already long enough as it is 🙂
Conclusion¶
Perhaps surprisingly, taking out the waste can teach you things about programming.
If you ever want to perform changes on iCalendar files and enjoy coding in Python as much as I do, I recommend that you give ics a try. It's just a
pip install ics
away and is documented nicely at https://icspy.readthedocs.io/en/stable/index.html.
-
The first version of this post serialized the new calendar with
f.writelines(new_calendar)
because aCalendar
object happily behaved like an iterable of strings that produces the file contents line by line. However, as of today (January 4, 2023), this results in a deprecation warning. In future versions of ics, this will not work any more. Moreover, even if theserialize()
function is used, there is still an unnecessary deprecation warning with ics 0.7.2. This has been fixed already, but version 0.7.2 does not contain the fix yet. Until a new version is released, a branch version without this issue can be installed withpip install git+https://github.com/ics-py/ics-py@version-0.7
. ↩
Comments