Getting rid of waste: manipulating calendars with Python and the ics library

In the city where I live, four different kinds of waste are collected regularly:

  • paper,
  • organic waste,
  • the "yellow bag" which is for all sorts of packaging, and
  • residual waste, which contains all the rest (except batteries, dangerous chemicals and a couple of other things, which people have to bring to collection facilities themselves).

At first sight, the days on which I have to take out the different bins and bags seem easy enough to remember: usually, everything is collected on the same day of the week. However, there are different schedules for the different kinds of waste (biweekly or every four weeks in my part of the city). Moreover, in weeks with bank holidays, the collection is often shifted to another weekday.

Fortunately, the city council provides iCalendar files (*.ical) with all waste collection dates for the current year at its website. The downloaded file can easily be imported into any calendar application. I found that the structure of the events in the file could be made more convenient though.

Structure of iCalendar files

Some readers might not be familiar with the iCalendar file format, so let's first have a quick look at the downloaded file.

It turns out that the file contains plain text. The meaning of most lines is evident, and lines are grouped into events and other kinds of components with lines like BEGIN:VEVENT and END:VEVENT. I will show you the file up to the end of the first event below.

Note:

  1. The source of this post is a Jupyter notebook, which you can download, modify, and run with an iCalendar file yourself.
  2. I am using sed to show just the first calendar event on my Linux command line. Working with the events on the calendar with Python should work the same on every operating system though. You do not need sed nor any other special tools.
  3. The exclamation mark in !sed tells the Jupyter kernel to execute this command in the shell, and not in the Python interpreter.
In [1]:
!sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' calendar.ics
BEGIN:VCALENDAR
VERSION:2.0
PRODID:regio iT
BEGIN:VEVENT
UID:6d52ed35-9b04-41bc-9e4b-a6c07d845699
DTSTAMP:20220107T185517Z
SUMMARY;LANGUAGE=de:Bio 2wö
DTSTART:20220106T050000Z
DTEND:20220106T050000Z
DESCRIPTION:Bio 2wö
LOCATION:<my address>
BEGIN:VALARM
ACTION:DISPLAY
TRIGGER;RELATED=START:-PT720M
DESCRIPTION:Bio 2wö
END:VALARM
END:VEVENT

In principle, we could write a script that loads and parses the lines, groups them into events, and works with these. But we do not have to reinvent the wheel - there are libraries for this purpose, of course 😉

Loading the iCalendar file with Python

I looked for Python libraries which can read and write iCalendar files and found that ics is easy to work with and more than powerful enough for my needs.

In [2]:
import ics

Let's open the file and look at the first events in the calendar:

In [3]:
with open("calendar.ics") as file:
    calendar = ics.Calendar(file.read())

sorted(calendar.events)[:10]
Out[3]:
[<Event 'Gelber Sack' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>,
 <Event 'Restabfall 2wö' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>,
 <Event 'Bio 2wö' begin:2022-01-06T05:00:00+00:00 end:2022-01-06T05:00:00+00:00>,
 <Event 'Restabfall 2wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>,
 <Event 'Altpapier 4wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>,
 <Event 'Bio 2wö' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>,
 <Event 'Gelber Sack' begin:2022-01-20T05:00:00+00:00 end:2022-01-20T05:00:00+00:00>,
 <Event 'Restabfall 2wö' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>,
 <Event 'Gelber Sack' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>,
 <Event 'Bio 2wö' begin:2022-02-03T05:00:00+00:00 end:2022-02-03T05:00:00+00:00>]

The structure becomes more obvious if we print the date first in each line and add some grouping:

In [4]:
import itertools

grouped_by_date = itertools.groupby(sorted(calendar.events),
                                    key=lambda event: event.begin)

first_groups = itertools.islice(grouped_by_date, 3)

for i, (date, events) in enumerate(first_groups):
    if i > 0:
        print()
    for event in events:
        print(event.begin.datetime.date().isoformat(), event.name)
2022-01-06 Gelber Sack
2022-01-06 Restabfall 2wö
2022-01-06 Bio 2wö

2022-01-20 Restabfall 2wö
2022-01-20 Altpapier 4wö
2022-01-20 Bio 2wö
2022-01-20 Gelber Sack

2022-02-03 Restabfall 2wö
2022-02-03 Gelber Sack
2022-02-03 Bio 2wö

As you can see, three different kinds of waste are collected on some dates, and four on others (note that the suffix "2wö" is a shorthand for "2-wöchentlich", or biweekly, and "4wö" means "every four weeks"). I did not like these showing up as different events in my calendar. This makes the calendar more cluttered than it needs to be, especially on days with a number of other events.

Merging events

To fix this, we merge simultaneous events into one. The name of the merged event should contain all kinds of waste that are collected on that day.

First, let's strip the name suffix that indicates the schedule.

In [5]:
import re

def strip_suffix(name):
    if (match := re.fullmatch("^(.*)( \d*wö)$", name)) is not None:
        return match.group(1)
    else:
        return name

assert strip_suffix("Gelber Sack") == "Gelber Sack"
assert strip_suffix("Altpapier 4wö") == "Altpapier"
assert strip_suffix("Bio 2wö") == "Bio"
assert strip_suffix("Restabfall 2wö") == "Restabfall"

We can then write a generator that finds all simultaneous events and yields a merged event:

In [6]:
def merge_names(events):
    return ", ".join(sorted(strip_suffix(e.name) for e in events))

def merge_simultaneous_events(events):
    begin_and_end = lambda event: (event.begin, event.end)
    
    for _, events in itertools.groupby(sorted(events, 
                                              key=begin_and_end), 
                                       key=begin_and_end):
        # We can consume 'events' only once, but we need it twice.
        # Therefore, we put them into a tuple.
        events = tuple(events)

        new_name = merge_names(events)
        
        merged_event = events[0].clone()
        merged_event.name = new_name
        merged_event.description = new_name

        yield merged_event

merged_events = tuple(merge_simultaneous_events(calendar.events))

Now all collections on the same day are merged nicely:

In [7]:
for event in merged_events[:4]:
    print(event.begin.datetime.date().isoformat(), event.name)
2022-01-06 Bio, Gelber Sack, Restabfall
2022-01-20 Altpapier, Bio, Gelber Sack, Restabfall
2022-02-03 Bio, Gelber Sack, Restabfall
2022-02-17 Altpapier, Bio, Gelber Sack, Restabfall

Are we done yet, or is there more that could be improved?

Fixing start and end times

Let's look at the times of events close to the daylight saving time switch:

In [8]:
def print_events_in_months(events, months=(3, 4)):
    for event in events:
        if (dt := event.begin.datetime).month in months:
            print(dt.date().isoformat(),
                  dt.time().isoformat(),
                  event.name)

print_events_in_months(merged_events)
2022-03-04 05:00:00 Bio, Gelber Sack, Restabfall
2022-03-17 05:00:00 Altpapier, Bio, Gelber Sack, Restabfall
2022-03-31 04:00:00 Bio, Gelber Sack, Restabfall
2022-04-13 04:00:00 Altpapier, Bio, Gelber Sack, Restabfall
2022-04-28 04:00:00 Bio, Gelber Sack, Restabfall

All events have the same start time in UTC, but it would be nice if they had the same start time in local time! Maybe 7 am, because the first waste collections occur around that time.

In [9]:
import datetime
import pytz

berlin = pytz.timezone("Europe/Berlin")

def set_time_7am(event):
    date = event.begin.datetime.date()
    time = datetime.time(hour=7)

    new_dt = datetime.datetime.combine(date, time, berlin)
    
    event.end = new_dt
    event.begin = new_dt

Note that we modify event.end before event.begin. Otherwise, ics would complain because the new begin date is after the old one, such that begin would be after end temporarily. This issue could be fixed better, but simply swapping the assignments works just fine for my simple task.

Creating a new file with the merged events

Now we can put the new events into a new Calendar:

In [10]:
new_calendar = ics.Calendar()

for event in merged_events:
    set_time_7am(event)
    new_calendar.events.add(event)

It can be serialized easily to a file like this:1

In [11]:
with open("new-calendar.ics", "w") as f:
    f.write(new_calendar.serialize())

We can check now that the first event in the new file looks as it should. Actually, the first event in the file is not the event that occurs first because the events are stored in a set in Calendar. Unlike dict, a Python set does not preserve the insertion order.

Note that I run dos2unix on the file before processing it further because the Windows line breaks ("\r\n") created by ics appear to be turned into "\r\n\r" in the cell output. This is not visible in Jupyter, but it confuses the code which converts the Notebook file into a blog post. I couldn't investigate yet what the root cause of this problem is, so I just remove the Windows line breaks.

In [12]:
!dos2unix new-calendar.ics 2>/dev/null || echo "dos2unix failed!"

!sed -e '/END:VEVENT/q' -e 's/^\(LOCATION:\).*$/\1<my address>/' new-calendar.ics
BEGIN:VCALENDAR
VERSION:2.0
PRODID:ics.py - http://git.io/lLljaA
BEGIN:VEVENT
BEGIN:VALARM
ACTION:DISPLAY
DESCRIPTION:Restabfall 2wö
TRIGGER:-PT12H
END:VALARM
DTSTAMP:20220107T185517Z
DESCRIPTION:Altpapier\, Bio\, Gelber Sack\, Restabfall
DTEND:20220804T050000Z
LOCATION:<my address>
DTSTART:20220804T050000Z
SUMMARY:Altpapier\, Bio\, Gelber Sack\, Restabfall
UID:7993aafc-a134-48ea-b391-615b5ec63720
END:VEVENT

You might notice that the value of the DESCRIPTION field of the alarm is still the one from one of the original events. This is also straightforward to fix, but I think that this post is already long enough as it is 🙂

Conclusion

Perhaps surprisingly, taking out the waste can teach you things about programming.

If you ever want to perform changes on iCalendar files and enjoy coding in Python as much as I do, I recommend that you give ics a try. It's just a

pip install ics

away and is documented nicely at https://icspy.readthedocs.io/en/stable/index.html.


  1. The first version of this post serialized the new calendar with f.writelines(new_calendar) because a Calendar object happily behaved like an iterable of strings that produces the file contents line by line. However, as of today (January 4, 2023), this results in a deprecation warning. In future versions of ics, this will not work any more. Moreover, even if the serialize() function is used, there is still an unnecessary deprecation warning with ics 0.7.2. This has been fixed already, but version 0.7.2 does not contain the fix yet. Until a new version is released, a branch version without this issue can be installed with pip install git+https://github.com/ics-py/ics-py@version-0.7.

Comments