Airavata Sanbox for Machine learning,Rspark,Sage

Created On : Sep-28-2015 Author Author : Sasikanth
What is it?

its an Ubuntu sever sandbox on hyper-v with all Big data machine learning tools preset largely with a targeted audience of mathematicians

What does it contain?

Sage math, anaconda python, octave,theano for deep learning , yarn spark with hadoop ,R and R studio server with Rspark,R Shiny pre-setup.

why another sandbox?
  1. For one it its targetted for mathematicians and statisticians more that programmers.
  2. its a vhd file so can be run on on any windows machine.
  3.  all the software packages can be run through notebook interfaces through Rstudio, sage,Ipython Jupyter.
  4. if you have installed putty on your machine  http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html
  5. its as easy as accessing all these software from web on your windows machine as if they are installed on your windows machines. so you don't need to know any Linux
How to setup?
  1. Enable hyper-v
  2. create an internal network and share network to it , by default windows assigns an ip in the 192.168.137.1 series
  3. create a vm from the vhd with networking nd start the machine
  4. instal putty and the command:
  5. "call "c:\Program Files (x86)\PuTTY\putty.exe" -ssh -l airavata -pw password -L 8081:localhost:8081 -L 8080:localhost:8080 -L 8787:localhost:8787 -L 8888:localhost:8888 -L 8088:localhost:8088 -L 4040:localhost:4040 -L 50070:localhost:50070 192.168.137.159"
  6. save the above command in batch file as you can use it anytime you want
  7. all the required software are preset and are already running on the machine as a part of system startup, except sagemath
  8.  to start sage try sage -n & on your putty consol
  • username: airavata password:password (for all other services, rstudio )
  • username:admin password: password  (sage math credentials)

9. port numbers :(through putty you can use localhost for ip)
  • #8081 for sage notebook
  • #8888 for ipython notebook
  • #8080 for apache tomcat
  • #8088 hadoop cluster
  • #50070 hadoop fs
  • #4040 spark management
10. To find the home folder for hadoop, spark ,R shiny etc echo $PATH , most of the software are installed in /opt or /usr/local
11. Rspark launch Rstudio (localhost:8787 through putty or $ip:8787)and find the sample script (Rspark-single.R)for executing a distributed data frame on yarn spark

  • vhd file: https://drive.google.com/file/d/0B-RYaz_F3Lr6RC05T2lzT0xtTzg/view?usp=sharing
  • username: airavata   
  • password:password 
comments powered by Disqus