Earthquake Data
To gain a better understanding of using Kafka I need to build some projects! Really I need data. So I should go find some data that I can put into Kafka. I have worked with geo-spatial data in the past, and I was impressed by the US Geological Services earthquake data, so I think that’s as good a place to start as any.
Data Source
-
USGS Eartquake Hazards Program: Real-time Notifications: Spreadsheet Format
-
Request for ALL EARTHQUAKES over the past hour (this link will trigger a download of the past 60 minutes of geo events in CSV format)
Plan
- Make a
curl
request to USGS every 5 minutes viacron
python
to:- hash each row of the requested data
- compare the has to a store of previously hashed records
- if hash does not exist in the store it has not yet been recorded so save the hash, and write the row to a new CSV file of
exactly-one-geo-events
- start kafka server
- create kafka topic
geo-events
- use
Scala
andKafkaProducer
to store data in topic - use
Scala
andKafkaConsumer
to read data out of topic, and filter it on earthquakes of a greater than3.9
magnitude
That would achieve a quasi-stream of USGS reported earthquake data.