Friday, March 4, 2016

Video demo of my Sign-language project

This project convert sign-language / gestures to speech. You can read more about it on github or in my site. I just uploaded a video demo of the project. Feel free to comment.

Monday, February 29, 2016

docker-compose in production

Deployment sucks! I'm not a dev ops / sys admin type of person, and every time I'm into deploying a web project I start to rethink the whole process and get confused. Recently, I decided to restart the work on one of my older django projects, Xteams, and one of the first tasks was to migrate it from heroku to my VPS - a digital ocean droplet. Don't get me wrong, heroku is great, but I'm already paying 5 bucks per month to digital ocean, where I host all of my web projects, and the single web dyno that heroku gives for free is a real limitation.
Before starting to migrate the project, I decided about the following deployment requirements:
  1. I want a consistent production environment, with the fewest possible system wide dependencies, mainly due to...
  2. I have several projects already deployed to this VPS, so I need a production environment which will play nice with them both in term of dependencies and in hostname routing (resolving foo.com and bar.com to their respective ports / apps).
  3. Staging and production environments should be as similar as possible, and I want to be able to run the staging environment on my local machine.
  4. Moreover, if I can utilize parts of the production configurations for development - like DB / job queue and workers - it's even better.
  5. Keep deployment scripts to the bare minimum.
  6. Not over-engineer the issue.

Dokku

I liked heroku. Deploying is really easy, and if I can accomplish a task using git alone I will probably do it that way :-) So I checked dokku. It clearly solves the first requirement easily: apart from dokku itself there is no system wide configuration and dependencies to worry about. It also solves issues 5 and 6 really nicely: the deployment is done by pushing to remote repository and production specific configurations (or secrets) can be added with environment variables, which I like. On the other hand, I'm not sure if dokku would play nicely with my other projects, there is no way to run an environment similar to production locally, and I can't reuse components for development.

Deployment scripts (like ansible / fabric)

Almost a year ago I read Harry Percival's great book "TDD with Python". He teaches how to automate deployment with ansible. I managed to deploy the sample app for the book and later I used the same technique to deploy one more django app of mine. However, I really don't like this approach. It seems very fragile, touching too many configurations too often, making me afraid about my other projects on the server. It's a lot of work too, and work that I can't reuse for development. Overall, I feel that it only answer requirement 3 and 6. The rest are not even close to be answered.

Finally: docker and docker-compose to the rescue

docker-compose containers in production
You weren't expected this, didn't you?!?

Let's follow the diagram and I will try to convince you why using docker-compose in production (and also partly in development) is a good idea. With docker, you can create an image of an application, together with all of its dependencies, and run it in an isolated environment, which is called a docker container. docker-compose lets you take a set of such images, define the links between containers (in means of network and volume access) and orchestrate all of the containers together, from building to running.

In the current example I had a DB container with an official postgres image. Every time I need to configure postgres on my local machine I find myself reading throughout half of stackoverflow and the official docs for information. This time it was really easy: I grabbed the official postgres image from docker hub and that's it - no more configuration needed.
Second, there is the web container that runs the django app itself. This is the main container in my project. I wrote a Dockerfile to describe how the image should be built. It contains only a few lines: starting from the official python 3.5 image, pip installing dependencies, and collect static files. Secrets are written in a special file which django-compose pass to the container as environment variables. This file is not source controlled: I created one manually on my local machine and another one, slightly different, on my server. Here's the Dockerfile for this image:

Above the web container there is the Nginx container, which have access to a shared volume from the web container that contains all of the static files, so static files are served by Nginx directly. Here is the Nginx container configuration file:

Outside of the docker orchestration there is one more Nginx instance, its job is to route each incoming request to the correct app on the server. Every app is listening on a different port and Nginx only route traffic based on the hostname in the http header. Here's the configuration file:

Here's how my docker-compose configuration file looks like:

Building and running these containers is really simple:

So now, let's try to tackle the requirements list again:
  1. The only system wide dependencies are docker and docker-compose. Apart from that there is the system wide Nginx server, which is already there for the other apps.
  2. Running the new project side by side with the other projects is just a matter of adding one more server configuration file to the system wide Nginx (more info is available in the project README). This is no different from any other app on the server, whether it's a django app or a static website.
  3. There is no difference at all between staging and production. Spinning a staging environment locally is just a matter of building and running the docker-compose environment.
  4. I'm not using a system wide postgres instance in development. Instead, I use the same postgress docker image I run in production. Moreover, if I will need more building blocks, as a job queue and workers, I will be able to add their respective images to both development and production docker-compose configuration files.
  5. I do have a script for deployment, but it doesn't do much except pulling the latest source from github, building and running. That's all.
  6. One might argue that I did over-engineered the issue. Compared to using dokku this solution is definitely more complex. However, I'm not sure if maintaining this deployment mechanism is harder than maintaining ansible deployment scripts, especially when there are several different apps on the same server.

Cons

  • Provisioning, although very simple, is done manually: I create a folder on the server, clone the project, and add the django "secrets" file. It can be automated too, of course, but I'm not sure I see a reason for that now.
  • I wished I could run functional tests from a special selenium container against the staging environment. This is not trivial as it requires a bidirectional network access between the selenium driver and the web app. I gave up the idea, because of its complexity, and I'm running selenium tests only against the development environment, outside of any docker container.
  • Sharing a volume between the web container and the Nginx container is a neat trick. However, I most force-remove the old web container after any build and before running the new container to "refresh" the volume with the latest collected static files. It's a hack I don't like, but I live with it.

Summary

I really like docker-compose. At first, it looks like a tool with a steep learning curve. But don't be too intimidated. Give it a try and you might find an elegant solution for deployments, which will hopefully scale well with your requirements.
I'm sure that there are lots of approaches I'm not covering here, and all of the above only reflects my limited experience in the field. Therefor, feel free to criticize and share your experience about the subject!

Thursday, February 4, 2016

My new portfolio site

I have a new site!

My portfolio header

The site was created to present the different projects I'm working on as a portfolio. I will keep posting ideas and explorations here. But I feel that a site with a proper home page and unique design (compared to this blog, at least) will present the projects and my skillset in a better light.
And for the technical part. This is a static, pelican based site. The theme was adapted from the Hugo Creative theme, with several modifications. Feel free to fork and change for your own needs.
As usual, comments are more than welcome!

Saturday, October 24, 2015

Poor man's trick to add and remove conda from $PATH

Conda is great for managing dependencies as matplotlib and scipy: try to install these with pip, in a virtualenv, and you will be convinced that conda is better in that regard.

But!
Somehow, the folks at continuum analytics decided that using conda should override the default python environment (the system-wide python installation). There are some recommendations, but AFAIK there is no official solution for the problem.

Here is my solution to keep the system-wide python installation as my default environment and start to use conda only when I want to:

~/bin/unconda
export PATH=`echo ":${PATH}:" | sed -e "s:\:$HOME/miniconda3/bin\::\::g" -e "s/^://" -e "s/:$//"`

Got the trick from here. Thanks Natsuki!

~/bin/reconda
export PATH="$HOME/miniconda3/bin:$PATH"

Now just add $HOME/bin to your path if it's not already there and you are ready to go.

Don't forget to remove the line in your .bashrc that add miniconda to the path in the first place.

Thursday, August 27, 2015

Back to music with Malinka

According to this blog title, there will be music involved.
It's been a while, but recently I started playing bass guitar in a band again. The band, "Malinka", is lead by Stav German, and we have our first live show next week in Tel-Aviv.
Feel free to hear, comment, and come to the show, it's free.

Tuesday, August 4, 2015

My Jupyter (tmpnb) server and Thebe


%matplotlib inline
import matplotlib.pyplot as plt
import numpy as np
from IPython.html.widgets import interact

def plot_sine(frequency=1.0, amplitude=1.0):
    plt.ylim(-1.0, 1.0);
    x = np.linspace(0, 10, 1000)
    plt.plot(x, amplitude*np.sin(x*frequency));

interact(plot_sine, frequency=(0.5, 10.0), amplitude=(0.0, 1.0));

Isn't that amazing?!?

I've recently installed an tmpnb sever on my digitalocean server, you can access it at nagasaki45.com:8000.

So, what's the big deal?
This configuration allow anyone to use python (or one of the other supported / installed kernels) on the web, using my server. You don't have to ask for permission; you can just go to the provided address and start to code without any local installation.

And it goes way beyond:
  • You can open new terminal, 'git clone' your project, and demonstrate it to someone else. And you can do it on mobile devices too. Again, no installation required, everything is running on the server.
  • You can use thebe to add code snippets as the one above to any static html page (your blog, as example). Even interactive widgets will run the computation back and fourth from the server to the web frontend for presentation.
So go ahead, write some code, let me execute it for you ;-)

# your python playground 

Edit 1.9.15:

My digitalocean VM has "only" 512MB of RAM. I decided to span tmpnb with 4 docker containers, 50MB RAM each, to keep the server load on minimum. Apparently, it possessed some issues as 50MB are probably not enough.

Right now the example above uses the same tmpnb server has the one in thebe example (here), namely https://oreillyorchard.com:8000/. It works much better now as there are no kernal failures when running the examples.

Edit 20.9.15:

I'm stopping the service on my server due to some number crunching tasks I'm running on it.

Saturday, May 23, 2015

Writing a programming book? Don't compose an utility library!

I came across two books recently, in which the authors decided to write an utility library. The first book was Python in Practice, by Mark Summerfield (my opinion about the book can be found here), and the second, which I'm still reading, is Doing Bayesian Data Analysis, Second Edition, by John Kruschke. A separate review will be added when I will finish reading it.
The books are different in their nature: One is about python programming, while the other is about statistical methods, and uses the R programming language for hands-on examples and exercises; the first book is average quality overall (IMHO) and the second is absolutely amazing! However, I believe that I may be able criticize the utility libraries that came with the books in the same manner: Don't do this!

And why?

Installation process breaks conventions 

When I need an external tool in a python project I know I have pypi to rely on for finding packages. I have pip to easily install the package and prefer to work with virtualenv whenever possible. This set of tools help me in maintaining a sane codebase, and reduce the effort of managing the dependencies by my own.
There is no chance that I will copy an external module into my project and source control it unless I'll have to, so why to use this module in an educational project in the first place?
I really don't know what is the convention in installing R external packages, but I believe that Kruschke suggestion of sourcing his supplied scripts is not the proper way to do this (enlighten me if I'm wrong).

Package maintenance / code quality

Before I'm installing an external package I tend to search about the package quality. First thing is checking how many stars the package have on github and how many times it was downloaded from pypi.
And there is a reason behind it: I can rely on packages that are used often to have better code quality; through gihub I can browse the package issues / latest commits and make sure that it is still maintained.
I'm sure that books authors invest a large amount of time in writing their utility libraries. But code free of bugs doesn't exists, and I prefer to know that the codebase is maintained before I use it (again, without distinction between educational and "real" projects).

Not specific enough

If your utility library is a mix of different solutions for different problems, it might not worth keeping in our toolbox. The above is probably more relevant to Python in practice than to Doing Bayesian Data Analysis, but I think it's still worth mentioning.

Documentation

When I choose a tool to work with I want it's documentation to be top notch! Take django for example. The project's documentation is not less than perfect, including a great tutorial for beginners. I really don't want to look for the book when I'm interesting in put in use some less obvious function from an utility library.


What I'm expecting from authors instead

  • If you think that your utility functions worth it pack it and publish it as any other package.
  • I really don't mind reading one or two additional pages of code in your book, if there's something interesting in it. Again, if the code deserved to be mentioned in your book, it may be also deserved to be talked about explicitly.
  • If this functionality exists elsewhere you should reference it, and advise the user to use it. I've never wrote code in R, but was ready to learn how to work with its ecosystem. I expected Kruschke to teach me that, instead of showing me how to source his supplied scripts.
 

Late disclaimer

Don't get me wrong, supplying code as part of your book is great! But there are different ways to do it: David Beazley's Python Cookbook is full of code snippets, fully commented and explained; In Test-Driven Development with Python Harry Percival guides the reader in developing an webapp with reference code available at github.
Don't get me wrong 2: The above doesn't mean that the books are bad.

Edit:

Don't miss Kruschke's comment below! He lights the above topics from different angle and supplies great arguments for his decisions.