Hi Sandeep,
I have an idea for a project that I think would suit cloudxlab.
I recently stumbled across an article offering SETI (the Search for Extraterrestial Intelligence) data to the public, along with a general request for data-crunching assistance. This sounds fantastic, but they then lead you to a sign up for an ‘IBM Bluemix’ account in order to handle the big data. From what I can see on the IBM website, they offer a free but very limited account, and then it’s pay as you go for anything more serious. They have a ‘cost estimator’ on the website, but it’s not particularly clear.
New SETI data is uploaded on a regular basis here … http://setiquest.info/ . They state that … “The setiQuest data archive contains 4 TB of quadrature waveform samples of interesting targets that have been collected at the Allen Telescope Array. The data and waterfall links are distributed via Amazon Web Services. The goal of the data availability is to be an educational resource and to aid setiQuest app developers”
There is also a collection of Python starter scripts available. There is a markdown file available stating the science goals of the project. They state that “The goal of this project is to help improve observations performed by SETI Institute at the Allen Telescope Array (ATA). We are aiming for citizen scientists to make significant contributions in the following ways … buried treasure (signal from ET) and improved signal detection”. There is even a dedicated python module called ibmseti.
I also found a different GitHub repository that appears to be slightly different to that above. This one looks like it was from a machine learning (deep learning) challenge that was held last year by SETI. Again, starter scripts are available … https://github.com/setiQuest/ML4SETI On this one they state that “We intended to make access and analysis as democratic as possible: there’s no platform or language requirements”, so it should all be possible on cloudxlab?
So,
Objective: Get the SETI starter code working on cloudxlab, demonstrating big data handling and machine learning, and then build on their work to progress it
Source of Data: The official SETI data source
Tags: Big data, python, Spark, machine learning, deep learning, convolutional neural networks, aliens!