Earthquake Data

To gain a better understanding of using Kafka I need to build some projects! Really I need data. So I should go find some data that I can put into Kafka. I have worked with geo-spatial data in the past, and I was impressed by the US Geological Services earthquake data, so I think that’s as good a place to start as any.

Data Source

Plan

  1. Make a curl request to USGS every 5 minutes via cron
  2. python to:
    1. hash each row of the requested data
    2. compare the has to a store of previously hashed records
    3. if hash does not exist in the store it has not yet been recorded so save the hash, and write the row to a new CSV file of exactly-one-geo-events
  3. start kafka server
  4. create kafka topic geo-events
  5. use Scala and KafkaProducer to store data in topic
  6. use Scala and KafkaConsumer to read data out of topic, and filter it on earthquakes of a greater than 3.9 magnitude

That would achieve a quasi-stream of USGS reported earthquake data.