I scraped KPN's site - 54426 times

Data: KPN; Map: Me; Pictured: A section of the map.

I scraped KPN's site - 54426 times

By Puck Meerburg

** NOTE: All this information I gathered only represent the speeds KPN can deliver. Most parts of Delft also have Ziggo, which isn’t factored into this blogpost. **

Our family all has KPN as mobile provider, and we are planning to move from another ISP, Ziggo (which uses DOCSIS/cable), to KPN (which uses ADSL/phone line). As I was looking at their site, I noticed that each address I entered showed a different maximum speed, ranging from 100Mbit/s to 25Mbit/s to 14Mbit/s, and, to be honest, I’d quite gotten used to the very respectable 120/12 speed we have now. I also love collecting random data and analyzing it, so I decided to scrape their site, to find the maximum speed one can get in Delft. I used a CSV download of the BAG (Basisregistraties Adressen en Gebouwen, a huge database full of information about each building), which I then filtered on the city. I downloaded the dataset from nlextract.nl, huge thanks for making it available in a format I could just throw into a few python scripts!

Actually scraping the data

Next was, of course, scraping all the data, but how would I do that? I first started by opening the web inspector and just looking at the data sent through the pipes of the Internet when I entered an address. I quickly wrote a python script to automate this, but quickly hit a few roadblocks, because since I scraped the HTML page some elements weren’t as static as I presumed. I started trying to fix this, when I suddenly noticed a very specific cookie being set… pcCheck. it looked like it was a URL-encoded JSON object, which it was! It looked a lot like this:

{
    "b":true,
    "e":"[REDACTED ZIPCODE]",
    "f":"[REDACTED HOUSE NUMBER]",
    "l":true,
    "m":true,
    // n through s are true as well
    "t":true,
    "u":"direct leverbaar",
    "x":"DELFT",
    "y":"[REDACTED STREET]",
    "ab":"BVDSL",
    "ac":"BVDSL",
    "ad":20, // <--- THE SPEED!
    "ae":2, // <--- upload speed?
    "ag":true
}

From here, I could quickly determine the address, and from looking at the website I saw that both ab and ac indicate the used technology for getting the internet to you. ad turned out to be the speed, which is of course the most important bit. But that’s not all, of course, because there are a few entries which had aa, which always has FTTH as value. Turns out, there are places in Delft where you can get fiber internet! I decided to just set those places to 500 Mbits/s, since that is the maximum that KPN has. The speed was then put in the CSV, combined with the latitude/longitude, address, zip code, and then sent to CartoDB, where I made a small map to analyse it.

Results

All items have been put into a map, which I have embedded below for simplicity/laziness:

Things of note are:

  • In the old centre of Delft, speeds slowly increase the more north you go. Why? I don’t know!
  • There’s a few houses in the fibre neighbourhood (northwest, bright green) that don’t have fibre, and have speeds below 10Mbits/s. I can only imagine using the internet must be a torture!
  • There’s a few blocks of houses where the network speed is relatively decent, except for a handful of houses, where the speeds are ridiculously low! I think I am glad I don’t live there… (as of now)
  • Scraping the data from KPN’s site took a lot of time (about 24 hours of constant requesting, somewhat less after I adjusted it to do 6 requests at the same time, still took about a week of off and on scraping)
  • All these conclusions in this post are based only on the data I scraped from this provider, other providers might offer fiberglass in other places but I think I’ve spent enough time on this already.

I am not claiming copyright or responsibility on any of the data I processed into the map, I just provided it in another format that is easier to process. Do with it what you want, I do not care.