Pingin' in the Rain: Datasets

Back to project page
Thunderping Datasets

As part of the Thunderping data release, we are making the following files available (with compressed file sizes):
  • README: Information about the datasets (also included below).
  • One-week sample (3.8 MB): This file contains a week's worth of data (Jan 01 2018 to Jan 08 2018).
  • Annual snapshots: Each of these files contains a year's worth of data.
  • 2011–2018 (1400 MB): This file contains the Thunderping dataset from 2011 to 2018. This was the dataset analyzed in "Residential Links Under the Weather", published at ACM SIGCOMM 2019.
About the data

Thunderping samples residential addresses in areas where severe weather events are likely and pings each of these addresses from multiple vantage points.

To allow the correlation of address-level connectivity data collected by Thunderping with historical weather information collected by the U.S. National Weather Service (NWS) (which is available at the granularity of an hour) we process the raw ping data collected by Thunderping to obtain "Responsive address hours" and "Dropout hours". An hour where an IP address probed by Thunderping is responsive to pings at the beginning of the hour is a "Responsive address hour". A responsive address hour containing a dropout event is a "Dropout hour".

The NWS records hourly weather information for weather stations that are often located at airports. For a given hour and airport, we thus find the number of addresses of various link types (Cable, DSL, Fiber, WISP, Satellite, All) that were being probed by Thunderping in the vicinity of an airport, and calculate the responsive and dropout address hours. For example, in the hour beginning at 11:00:00 on Jan 4 2018, Thunderping was receiving successful responses to its pings from 2091 addresses belonging to all linktypes near Ronald Reagan Washington National Airport (ICAO: KDCA) and 13 of them had a dropout during this hour, yielding a total of 2091 responsive address hours and 13 dropout hours.

Every line in the files contains Thunderping's observations of responsive and dropout address hours for addresses belonging to a specific link type that geolocate to the vicinity of a specific airport for a specific hour. The line also contains the weather data for that hour. Each of the files also contains a header describing the fields present in the file.

If you have any further questions or requests, please contact