Tom Gurion: data analysis

Showing posts with label data analysis. Show all posts

Tuesday, August 4, 2015

My Jupyter (tmpnb) server and Thebe

%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from IPython.html.widgets import interact

def plot_sine(frequency=1.0, amplitude=1.0):
    plt.ylim(-1.0, 1.0);
    x = np.linspace(0, 10, 1000)
    plt.plot(x, amplitude*np.sin(x*frequency));

interact(plot_sine, frequency=(0.5, 10.0), amplitude=(0.0, 1.0));

Isn't that amazing?!?

I've recently installed an tmpnb sever on my digitalocean server, you can access it at nagasaki45.com:8000.

So, what's the big deal?
This configuration allow anyone to use python (or one of the other supported / installed kernels) on the web, using my server. You don't have to ask for permission; you can just go to the provided address and start to code without any local installation.

And it goes way beyond:

You can open new terminal, 'git clone' your project, and demonstrate it to someone else. And you can do it on mobile devices too. Again, no installation required, everything is running on the server.
You can use thebe to add code snippets as the one above to any static html page (your blog, as example). Even interactive widgets will run the computation back and fourth from the server to the web frontend for presentation.

So go ahead, write some code, let me execute it for you ;-)

# your python playground

Edit 1.9.15:

My digitalocean VM has "only" 512MB of RAM. I decided to span tmpnb with 4 docker containers, 50MB RAM each, to keep the server load on minimum. Apparently, it possessed some issues as 50MB are probably not enough.

Right now the example above uses the same tmpnb server has the one in thebe example (here), namely https://oreillyorchard.com:8000/. It works much better now as there are no kernal failures when running the examples.

Edit 20.9.15:

I'm stopping the service on my server due to some number crunching tasks I'm running on it.

Saturday, May 23, 2015

Writing a programming book? Don't compose an utility library!

I came across two books recently, in which the authors decided to write an utility library. The first book was Python in Practice, by Mark Summerfield (my opinion about the book can be found here), and the second, which I'm still reading, is Doing Bayesian Data Analysis, Second Edition, by John Kruschke. A separate review will be added when I will finish reading it.
The books are different in their nature: One is about python programming, while the other is about statistical methods, and uses the R programming language for hands-on examples and exercises; the first book is average quality overall (IMHO) and the second is absolutely amazing! However, I believe that I may be able criticize the utility libraries that came with the books in the same manner: Don't do this!

And why?

Installation process breaks conventions

When I need an external tool in a python project I know I have pypi to rely on for finding packages. I have pip to easily install the package and prefer to work with virtualenv whenever possible. This set of tools help me in maintaining a sane codebase, and reduce the effort of managing the dependencies by my own.
There is no chance that I will copy an external module into my project and source control it unless I'll have to, so why to use this module in an educational project in the first place?
I really don't know what is the convention in installing R external packages, but I believe that Kruschke suggestion of sourcing his supplied scripts is not the proper way to do this (enlighten me if I'm wrong).

Package maintenance / code quality

Before I'm installing an external package I tend to search about the package quality. First thing is checking how many stars the package have on github and how many times it was downloaded from pypi.
And there is a reason behind it: I can rely on packages that are used often to have better code quality; through gihub I can browse the package issues / latest commits and make sure that it is still maintained.
I'm sure that books authors invest a large amount of time in writing their utility libraries. But code free of bugs doesn't exists, and I prefer to know that the codebase is maintained before I use it (again, without distinction between educational and "real" projects).

Not specific enough

If your utility library is a mix of different solutions for different problems, it might not worth keeping in our toolbox. The above is probably more relevant to Python in practice than to Doing Bayesian Data Analysis, but I think it's still worth mentioning.

Documentation

When I choose a tool to work with I want it's documentation to be top notch! Take django for example. The project's documentation is not less than perfect, including a great tutorial for beginners. I really don't want to look for the book when I'm interesting in put in use some less obvious function from an utility library.

What I'm expecting from authors instead

If you think that your utility functions worth it pack it and publish it as any other package.

I really don't mind reading one or two additional pages of code in your book, if there's something interesting in it. Again, if the code deserved to be mentioned in your book, it may be also deserved to be talked about explicitly.

If this functionality exists elsewhere you should reference it, and advise the user to use it. I've never wrote code in R, but was ready to learn how to work with its ecosystem. I expected Kruschke to teach me that, instead of showing me how to source his supplied scripts.

Late disclaimer

Don't get me wrong, supplying code as part of your book is great! But there are different ways to do it: David Beazley's Python Cookbook is full of code snippets, fully commented and explained; In Test-Driven Development with Python Harry Percival guides the reader in developing an webapp with reference code available at github.
Don't get me wrong 2: The above doesn't mean that the books are bad.

Edit:

Don't miss Kruschke's comment below! He lights the above topics from different angle and supplies great arguments for his decisions.

Monday, October 27, 2014

Participants movement tracking animations from my MA experiment #2

The following animated renditions are a byproduct of the video tracking an analysis of my MA thesis second experiment.

The figure above shows a schematic diagram of the experiment design. The videos are of session 1 to 3 of each of the groups (the last session wasn't analyzed). They have been for great help in gaining insights about the social interactions between the participants themselves and between the participants and the system components.

The analysis repository can be found at github.
Additional information about the research can be found here.