curl & cron
First up I need to build some system of delivering repeated requests to a web location to download a CSV
file.
This should not be hard I have done this a lot.
First Request
curl -s https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv
Nice!
Now just a matter of saving that nice STDOUT
to a file.
curl -s https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/all_hour.csv > eq-data.csv
Bash
Always a good idea to put repeated terminal commands into a bash script so that they don’t get lost.
I created get-eq-data.bash
and added the command.
Cron
With a bash script I can now create a cron job for running this task every five minutes:
*/5 * * * * /bin/bash /home/paul/personal/earthquake-data/get-eq-data.bash
Now every five minutes the script will fire and create a new CSV file with the past 60 minutes of geo events.
Concern
I planned for this, but if I make a request every 5 minutes and the data is updated every 1 minute and contains 60 minutes worth of data, the data in the requested CSV won’t fall off, until 60 minutes have gone by!
This means I would be heavily duplicating data. Essentially every request would have 55 minutes of duplicate data.
I will need some way to suss out what records have already been stored.
I decided in planning I would do this with Python since it’s a language I have used a lot.