Hi,
From your experiment what is the best way to implement a Genetic algorithm on Spark ? Could you please provide me with some examples or tutorials ?
Thanks.
Best regards,
MJ
Hi,
From your experiment what is the best way to implement a Genetic algorithm on Spark ? Could you please provide me with some examples or tutorials ?
Thanks.
Best regards,
MJ
Hi,
Genetic algorithms are usual algorithm but the difference is in the approach.
So, other than the basic knowledge of spark, you would require the knowledge of genetic algorithms.
Regards,
Sandeep Giri
I understood genetic algorithms very well. In addition, I have a good knowledge in Spark. I think my problem is with python and I have to use Scala to speedup my implementation.
The problem is I didn’t program with Scala before
So far in most of the distributed computing work, I have always found the choice of language to be mostly irrelavent because the advantage that you get because of a particular language is very small compared to distributing data properly.
The following are generally the culprit:
1. Multiple Actions
Scan through your code and note down how many actions you code has. Actions could be take, collect, reduce, save, foreach etc. On every action the job is executed.
This is the biggest issue and hardest one to solve. Most of us plug in the spark after they are done with writing the program and then try to tweak it. Instead, we should rewrite the entire program keeping distributed computing in mind.
2. Improper Distribution
Some times after the filter, there might be a small data left, in those cases you could coalesce() the data.
3. Improper Caching
If you are using the same dataset again and again in a spark job, it is better to cache it. Often we end-up abusing the cache. Be very judicous when it comes to caching. Caching is not a substitute for #1.
4. Use Proper partitioning
6. Use Proper Serialization
7. Ensure you are running on a proper cluster
Check the --master yarn options.
7. Fine tune hardware - Number of cores etc.
Add more machines, upgrade their memory, fine tune the yarn settings in order to run more containers etc.
8. May be Use Scala
If all of the optmizations fail then use scala but trust me it is not going to give you a huge improvement over python.