Last week a friend asked on Twitter how to extract some data from Airline On-Time Performance Reports in a Tableau public dashboard. I replied back with some tricks using Tabula, when the raw data or a workbook aren't provided, and turned in.
While the data was interesting, it lacked location reference, thus eliminating the possibility of representing it on a map, not to mention further geospatial analysis. But the idea of visualizing it lived on.
Getting the data
Friday night, destroyed after a hard week. I got home, had some dinner, and started poking at FlightStats Developer Center. I grabbed a RedBull to stay awake, and wrote a script to get the rating for every route in a dataset from OpenFlights.org.
require 'csv' | |
require 'typhoeus' | |
require 'json' | |
FLIGHTSTATS_URL = "https://api.flightstats.com/flex/ratings/rest/v1/json/route" | |
APPID = "" | |
APPKEY = "" | |
CSV.open("ratings.csv", "w") do |csv| | |
# csv << ratings.first.keys | |
csv << ["departureAirportFsCode", "arrivalAirportFsCode", "airlineFsCode", "flightNumber", "codeshares", "directs", "observations", "ontime", "late15", "late30", "late45", "cancelled", "diverted", "ontimePercent", "delayObservations", "delayMean", "delayStandardDeviation", "delayMin", "delayMax", "allOntimeCumulative", "allOntimeStars", "allDelayCumulative", "allDelayStars", "allStars"] | |
# from http://blog.cartodb.com/jets-and-datelines/ | |
# https://gist.githubusercontent.com/pramsey/8eae41eae99cb07fd9a7/raw/6cbc092b831a9c5d3c884400549a9ef64426db76/routes.csv | |
CSV.foreach('routes.csv') do |row| | |
req_rating = Typhoeus::Request.new( | |
"#{FLIGHTSTATS_URL}/#{row[4]}/#{row[6]}?appId=#{APPID}&appKey=#{APPKEY}", | |
method: :get, | |
headers: { | |
"Content-Type" => "application/json" | |
} | |
) | |
req_rating.on_complete do |res_rating| | |
if res_rating.success? | |
json = JSON.parse(res_rating.body) | |
ratings = json['ratings'] | |
unless ratings.nil? | |
ratings.each do |r| | |
# rating_id = "#{r['airlineFsCode']}#{r['flightNumber']}" | |
csv << r.values | |
end | |
puts "-- Saving #{row[0]} in ratings.csv" | |
end | |
elsif res_rating.timed_out? | |
# aw hell no | |
puts "got a time out" | |
elsif res_rating.code == 0 | |
puts res_rating.return_message | |
else | |
puts "HTTP request failed: #{res_rating.code.to_s}" | |
end | |
end | |
res_rating = req_rating.run | |
end | |
end |
I validated the visualization with a sample of flights departing from Barcelona, with a very simple line visualization: a color ramp corresponding to a 5 star scale (red for 1, green for 5).
Several hours, a couple of "Authorization failed. usage limits are exceeded"
, and different API keys later I had the data.
Explore and analyze
Fast forward to this week. We've been working on some new and shiny stuff at CartoDB that has just been presented at MWC — CartoDB for Deep Insights. I had the data ready, so I uploaded it to CartoDB, and created my first Deep Insights dashboard within a couple of minutes. BOOM.
A quick look at the visualization and I can start uncovering stories.
We're continuously working to improve this, so any feedback is welcomed. Go ahead and play with the data.
You can check the code in GitHub.
Disclaimer: I'm not a Data Scientist, just some guy who knows how to put together a script in Ruby and happens to work at CartoDB. I really, really, really got inspired by this post to share the visualization. I usually tinker a lot but never end up sharing anything because I'm too critical with the results, so please be kind.