Installing leveldb system-wide
Licence Creative Commons

At Botify, we extensively use Elevator to allocate and store url ids, which relies on Google LevelDB key-value store library. Which was meant to be statically compiled and embedded in projects, rather than shared system wide. But you know... Sometimes you just don't want to do what you're supposed to do...

Fortunately most modern unix systems exposes a libleveldb1 and libleveldb-dev package. But whenever you're using a bit too old linux distribution, which you're unable to upgrade (production rules), you generally don't have access to these packages.

You might be tempted to download packages and install them manually. Okay. You could try, and actually it might work. But still, if just like me you're meeting a libc version dependency problem, here's a shell script you could use to install leveldb library system wide.

#!/bin/sh

SANDBOX_DIR=/tmp/leveldb_install

## Bootstraps a sandbox dir
create_sandbox() {
  if [ ! -d $SANDBOX_DIR ]
    then
        mkdir -p $SANDBOX_DIR;
    fi
}

## Pulls and statically compiles last snappy version from repo
compile_snappy() {
    cd $SANDBOX_DIR
    svn checkout http://snappy.googlecode.com/svn/trunk/ snappy-read-only

    cd snappy-read-only
    ./autogen.sh
    ./configure --enable-shared=no --enable-static=yes
    make clean
    make CXXFLAGS='-g -O2 -fPIC'
}

## Pull and install last leveldb version from repo
compile_leveldb() {
    cd $SANDBOX_DIR
    git clone https://code.google.com/p/leveldb/ || (cd leveldb; git pull)

    cd leveldb
    make clean
    make LDFLAGS='-L../snappy-read-only/.libs/ -Bstatic -lsnappy -shared' OPT='-fPIC -O2 -DNDEBUG -DSNAPPY -I../snappy-read-only' SNAPPY_CFLAGS=''

    sudo cp -f $SANDBOX_DIR/leveldb/libleveldb.so* /usr/local/lib/
    sudo cp -rf $SANDBOX_DIR/leveldb/include/leveldb /usr/local/include/
}

create_sandbox
compile_snappy
compile_leveldb

## Destroy the sandbox
rm -rf $SANDBOX_DIR

Javascript: Swap divs on media query using enquire.js
Licence Creative Commons

If you've experimented with the responsive features of this blog, you may have noticed that the sidebar, originally on the right side of the screen, moves on top, when screen's resized to mobile size.

To do this, I needed to be able to swap the main container and the sidebar div.

<!-- TRANSFORMING THIS -->
<div id="narration"></div>
<div id="sidebar"></div>

<!-- INTO THIS -->
<div id="sidebar"></div>
<div id="narration"></div>

Unfortunately, css3 media queries weren't sufficent to achieve this on their own; and I was unable to play with divs float properties as I use the Foundation and didn't want to take the risk to mess with it's native behavior.

The idea, then, was to swap the two divs using jquery .after() and .before() functions. And I did use Enquirejs, which is a very lightweight javascript library (1 kB gzipped) allowing you to trigger js functions on media queries to call them according to window size.

The lib exposes a very simple, and intuitive, api, so it was quite simple to find my way in

<script>
$(function() {
    // register the media query to trigger on
    enquire.register("screen and (max-width:768px)", {

        // define the action whenever the media query matches
        match : function() {
            $("#narration").before( $("#sidebar") );
        },

        // define the action whenever the media query does not match anymore
        unmatch : function() {
            $("#narration").after( $("#sidebar") );
        },

        // OPTIONAL, Here you can set up some context (constructor-like)
        setup : function() {},
        // OPTIONAL, There you can unset your context
        destroy : function() {},
    }).listen(20); // Define the refresh rate in ms (triggering detection)
});
</script>

Tmux : Named sessions with autocomplete
Licence Creative Commons

Tmux is a great tool, but I usually find it hard to track my sessions. So after a few googling I found a trick, which allows to name, and easily retrieve alive tmux sessions (with autocomplete)

Here's the function (zsh)

function tm() {
    [[ -z "$1" ]] && { echo "usage: tm <session>" >&2; return 1; }
    tmux has -t $1 && tmux attach -t $1 || tmux new -s $1
}

function __tmux-sessions() {
    local expl
    local -a sessions
    sessions=( ${${(f)"$(command tmux list-sessions)"}/:[ $'\t']##/:} )
    _describe -t sessions 'sessions' sessions "$@"
}
compdef __tmux-sessions tm

Example :

$ tm testsession1  # Launch a tmux session named testsession1
$ tm testsession2  # Launch a second tmux session named testsession2
$ tm <tab>  # List (autocomplete) existing sessions

Source

Python: container destructuration pattern
Licence Creative Commons

One of the functional programmation pattern I enjoy and use the most is sequence container destructuration: when you use pattern matching, for example. (Haskell uses the [x:xs] syntaxic sugar).

Destructuration is just a fancy name for slicing a list.

Most of the time you use it to get the head of your list, and the rest of it appart. And I found it lacking in python (correct me if I'm wrong) when I was trying to do such things :

commands = ([signal, key, value], ...)
for command in commands:
    signal, args = **destructuring command**

Of course I could have used two lines like this :

signal = command.pop(0)
args = command

But it sounded important to me to be able to do it in one line, and without altering the original container. So here's my snippet for destructuration :

from collections import Sequence

def destructurate(container):
    class DestructurationError(Exception):
        pass

    if isinstance(container, Sequence):
        return container[0], container[1:]
    else:
        raise DestructurationError("Can't destructurate a non-sequence container")

That in my case you can use like :

>>> signal, args = destructurate(command)
>>> signal
signal
>>> args
[key, value]

Git: how to track branches activity
Licence Creative Commons

Using Git, when you've lost a little bit of focus in your project (understand "When you have dozens of unmerged branches you don't remember what the hell they were used to").

You and your team could find usefull to list and order all the branches by their last activity. Gives a good idea of what's usefull, what's used, what is useless or even forgotten.

Command

git for-each-ref --sort=-committerdate --format='%(committerdate:short) %(refname)' refs/heads refs/remotes

Source

Zsh : pretty print json in shell
Licence Creative Commons

I found Luis Nell trick to easily pretty print json in shell or code editor very usefull, so I decided to port it into my zsh config as a function.

As it might be usefull to someone else, here's the code. Might be compatible with bash and other shells with a few adjustements.

function pjson {
    if [ $# -gt 0 ];
        then
        for arg in $@
        do
            if [ -f $arg ];
                then
                less $arg | python -m json.tool
            else
                echo "$arg" | python -m json.tool
            fi
        done
    fi
}

It can whether print from a string cmdline arg or a file, and arguments can be mixed.

pjson '{"test": "test"}' or pjson myjsonfile.json or even pjson '{"test": "test"}' myjsonfile.json will work.

Pip : install a specific github repository tag or branch
Licence Creative Commons

I came across a rather simple problem lately : how to install one of my github hosted repo's specific tag using pip. It can be very usefull when it comes to add one of your project's tag to a requirements.txt for example.

Here's how you do it :

pip install -e git://github.com/{ username }/{ reponame }.git@{ tag name }#egg={ desired egg name }

Thanks to CodeIntel who posted the trick

Sublime text 2 : usefull python snippets
Licence Creative Commons

I find myself repeting some repetitive operations dozen times a day when coding in python. So here are so of my Sublime Text 2 snippets to enhance my productivity, and simplify my life. Type pdb and press tab to automatically insert import pdb; pdb.set_trace under current pointer

<snippet>
    <content><![CDATA[
        import pdb; pdb.set_trace()
    ]]></content>
    <tabTrigger>pdb</tabTrigger>
    <scope>source.python</scope>
</snippet>

Type docstring and press tab to automatically insert a numpy-style valid docstring pattern right after your function definition

<snippet>
    <content><![CDATA[
    """${1:One liner description}

    Parameters
    ----------
    ${2}
    Returns
    -------
    ${3}
    """
    ]]></content>
    <tabTrigger>docstring</tabTrigger>
    <scope>source.python</scope>
    <description>Adds a docstring skeleton to function</description>
</snippet>

Type testcase and press tab to automatically insert a unittest.TestCase test class definition pattern

<snippet>
    <content><![CDATA[
    class ${1:ClassTestName}(unittest.TestCase):
        def setUp(self):
            ${2:pass}

        def tearDown(self):
            ${3:pass}

    ]]></content>
    <tabTrigger>testcase</tabTrigger>
    <scope>source.python</scope>
    <description>Adds a unittest TestCase skeleton at current pointer</description>
</snippet>

Git hook - prohibit a pip package installation
Licence Creative Commons

I recently came across a rather simple problem : my project let co-exist some C++ and Python applicative code which had leveldb as a common dependency. LevelDB was disposable on pip but it's installation failed, and I had to compile it's python version from source. Problem : when freezing my virtualenv packages, the leveldb dependency appeared, which means that every time I would try to reinstall packages from my frozen environement packages list, pip would fetch the broken version from Pypi.

Turnaround

I'm using git as a version control system, it enables the usage of hooks to bind scripts to git actions. As hooks can be any kind of executable script, I made a pre-commit hook using python which is evaluated each time you run the git commit command, and checks if packages marked as prohibited are present in my requirements.txt. Of course, if they are, the commit fails.

Nota : requirements.txt has to be at the root of your repo and packages names have to be manually added to the hook.

#!/usr/bin/env python
# -*- coding: utf-8 -*-

import sys
import re

# packages marked as prohibited, add yours.
MANUAL_PACKAGES = dict.fromkeys([
    'leveldb',
])

# Compile blacklisted packages regexp
for name in MANUAL_PACKAGES.keys():
    MANUAL_PACKAGES[name] = re.compile(r'^%s*' % name)

for line in open('requirements.txt', 'r'):
    for r in MANUAL_PACKAGES.values():
        if r.match(line):
            print "Error : Requirements.txt contains prohibited packages references"
            sys.exit(1)

How to make a twitter bot using python and redis
Licence Creative Commons

My girlfriend started a tumblr not a very long time ago. It's concept is very simple: it aggregates pictures she took, or sent by people, of funny hairdresser names (There are a lot of these combinations in french). French of yours might be interessed in visiting it, here : Lolcoiffeurs.

As we launched a competition in order to encourage people electing the best of all theses "lolcoiffeurs", we wanted to create a twitter bot in order to remind people of this competition. I had to find a clever way to do this. I choose Python, python-twitter module and Redis in order to achieve a simple, light and balanced twitter bot (had to avoid being reported as spam).

Here are my notes on how to do it.

Twitter python api

First, you have to be aware there are a lot of different Python implementations of the twitter Api. I have tested many of them, with more or less success; the one I have selected is the most classic, and "official": python-twitter. It has been actively developped, and is now frequently updated; plus, it implements the oauth protocol identification(the only one supported by twitter today)

If you're not confortable with it yet, go take a look at their repository (Google code, yeah I know... everyone canno't be perfect) and never hesitate to call help() on the Api methods and class, the code has very good documentation. This is all about three classes, and a lot of plain methods, you should need 20 to 30 minutes in order to feel like home using it. Once you're done, you'll need to register an app on Twitter developper plateform, in order to get consumer key/secret and access token key/secret : don't forget to require read/write access, your bot won't be able to tweet if you don't.

Redis

Another major prerequisite in order to run, and eventually re-use/customize this bot is to install redis, and redis python client. Redis, if you're not aware (So JCVD is a NoSQL database system written by @antirez. It consists in a Key/Value system, which implements very common persistent data-structures like Strings, Lists, Hashs, Sets, Ordered Sets. The point here is not to convert you to the redis religion, but you have to know it embody all my C-Pythonist most secrets wishes... In one word like in a thousand : Try it!

If you're using Unix-Like system ('Cause if you're not, I don't really care about you), you should be able to easily find a redis package for your distribution. And you can install python redis client using pip:

pip install redis

Authenticate to twitter

Enough chattering, let's code... When you're done with previsous steps you're ready to authenticate to twitter.

import twitter

API_CREDENTIALS = {
    'consumer_key': 'my consumer key',
    'consumer_secret': 'my consumer secret',
    'access_token_key': 'my access token key',
    'access_token_secret': 'my access token secret',
}

def auth_to_twitter(api_credentials):
    """Returns an authenticated twitter session api object"""
    api = twitter.Api(consumer_key=api_credentials['consumer_key'],
                      consumer_secret=api_credentials['consumer_secret'],
                      access_token_key=api_credentials['access_token_key'],
                      access_token_secret=api_credentials['access_token_secret'])

    if (api.VerifyCredentials() is not None):
        return api

return

Fulfilling the API_CREDENTIALS dict with your twitter application informations, and passing it to auth_to_twitter function, you will get an Api object instance (if your credentials were correct of course ;)). This object gives you acces to every basic twitter common operations, like GetPublicTimeline, GetUserTimeline, PostUpdate, or, the one we're interessed in : GetSearch. This Api class method will ask twitter to serve us the result of a classic search query using their engine. Once again I encourage you to use the help function on the Api, Status, or User classes from the twitter api package in order to get more informations.

Redis over Python

Now that we're able to authenticate to twitter, and to get a valid Api object allowing us to query their search engine, it's time to have an overview of the bot overall. Basically the bot main function will do three very basic operations which are not to be explained : authenticate, connect to redis database, run the bot operations, that's it.

  • Main Function

    LOLCOIFFEURS_KEYWORDS = ('coiffeur', 'coiffeurs')
    
    if __name__ == "__main__":
        twitter_session = auth_to_twitter(API_CREDENTIALS)
        redis = redis.Redis("localhost")
        run_bot(twitter_session, redis, LOLCOIFFEURS_KEYWORDS)
    
  • run_bot Function

    def run_bot(api_session, redis, keywords):
        """
        Runs the lolcoiffeurs bot. Manages the redis keywords search stack,
        last replied id in order to not to query already answered tweets and
        save API calls, and postupdate for each found tweet.
        """
        while 42:
            for key in keywords:
                update_search_stack(api_session, redis, key)
                # Updates the stored last since id with the last found tweet one
                update_since_id(redis, "%s:%s" % (LOLCOIFFEURS_LIST, key))
                tweet_and_shout(api_session, redis, key, timeout=1)
            # Wait for six hours before new update and reply
            sleep(26000)
    

The really interesting part actually resides in the operations made by the run_bot function. So what do our bot is going to do? First of all it will circle through all the LOLCOIFFEURS_KEYWORDS, and for each of them use three functions:

  • update_search_stack

    def get_since_id(redis, key):
        """Tries to retrieve a keyword stored last_since_id"""
        # Fetching last value in keyword redis place
        since_id = redis.get(key + ":last_since_id")
    
        return since_id if since_id else None
    
    def update_search_stack(api_session, tweet_stack, keyword):
        """
        Searches for a specific term on twitter public timeline,
        right pushes found tweets in redis "lolcoiffeurs:keyword" List,
        adds each tweet information in redis "lolcoiffeurs:tweet:%s" % tweet_id, Hash
        """
        # Storing last fetched id in order to make fewer requests
        since_id = get_since_id(redis, "%s:%s" % (LOLCOIFFEURS_LIST, keyword))
        search_tweet = api_session.GetSearch(term=keyword, since_id=since_id)
    
        for t in search_tweet:
            computed_tweet = {
                "keyword": keyword,
                "username": t.user.screen_name,
                "created_at": t.created_at,
                "text": t.text,
            }
            sys.stdout.write("adding tweet with id %s by user %s to database\n" % (str(t.id), str(t.user.screen_name)))
            if (computed_tweet["username"] not in BLACKLISTED_USERS):
                redis.rpush((LOLCOIFFEURS_LIST + ":%s" % (keyword)), t.id)
                redis.hmset("%s:tweet:%s" % (LOLCOIFFEURS_LIST, t.id), computed_tweet)
    
        return
    

This function calls the twitter api GetSearch method which a keyword to seek, and stores the result in two different ways. First, for every retrieved tweet, we'll store it's content and meta data in hash, which will reside in it's own redis key ("lolcoiffeur:tweet:the_retrieved_tweet_id"), that way if we need for any reason (debug, for example) to retrieve a specific retrieved tweet informations : we'll be able to query it directly from redis using only it's id. Then, we right push every retrieved tweet id in a list (That we will use as a FIFO).

  • update_since_id

    def update_since_id(redis, key):
        """Updates a keyword last_since_id stored"""
        stored_since_id = redis1.get(key + ":last_since_id")
    
        try:
            last_tweet_id = str(redis.lrange(key, 0, 0)[0])
        except IndexError:
            last_tweet_id = None
    
        if last_tweet_id and (last_tweet_id != stored_since_id):
            redis.set(key + ":last_since_id", last_tweet_id)
    
        return
    

An important part of the whole bot resides in the last_since_id management. We're not gonna dwell on it, but bascally, we're keeping updated a value for each keyword stored in redis which contains the last id we've replied to, in order, first, not to answer multiple times to the same tweet, secondly, to save some precious Api calls. As you might have noticed it : since_id was already used in update_search_stack (get).

  • tweet_and_shout

    RESPONSE = """Les coiffeurs sont plus rigolos qu'ils n'en ont l'hair!\
     toi aussi vote pour tes lolcoiffeurs preferes! http://bit.ly/uiu8BZ"""
    
    def tweet_and_shout(api_session, redis, key, timeout=600):
        """
        Replies to found tweets. Store their ids in a "answered" set.
        Pops out element from the keyword tweets list, in order to keep the
        buffer clean.
        """
        for tweet_id in redis.lrange("%s:%s" % (LOLCOIFFEURS_LIST, key), 0, -1):
            tweet_dict = redis.hgetall("%s:tweet:%s" % (LOLCOIFFEURS_LIST, tweet_id))
    
            # Tracking answered tweets in a set
            redis.sadd((LOLCOIFFEURS_LIST + ":%s:answered" % (key)), tweet_id)
            # Posting reply update
            api_session.PostUpdate("@%s %s" % (tweet_dict["username"], RESPONSE), in_reply_to_status_id=tweet_id)
            # Popping out element from the left of the list
            # as we answer it
            redis.rpop("%s:%s" % (LOLCOIFFEURS_LIST, key))
    
            # Wait timeout before replying again
            sleep(timeout)
    
        return
    

Here goes the replying process. For each stored tweets, we retrieve it's meta datas hash, from redis (we've stored it as "lolcoiffeurs:tweet:the_tweet_id" redis key, remember?), store it's id in a "already answered tweets" set, post an update to it, prefixed by "@the_user_who_originally_posted_the_tweet", and pop out this id from the "lolcoiffeur:keyword" list. That way, we're keeping track of which tweets we've answered to, and keep our keyword stack clean in the same time.

In the end

This is it, that's a fully capable twitter bot. And you can actually customize it in order to use your own keywords or expressions, and set the response you'd like to make to those tweets. Notice that this is a totally destructured presentation of the bot, and you can find the whole code on Gist, or download it.

Nota

Any bug found? Any suggestion? Don't love my frenchy english intonation? Comment it out below, thanks!

Installing OwnCloud using nginx and postgresql
Licence Creative Commons

As you might already know, OwnCloud 2.0 has been realeased. It is a wonderfully usefull private dropbox-like application. You set it up on a web hosting and voila, you can use it as a file hosting, music streaming, calendar and contacts manager system. Kind of magic isn't it? (I can see thos little stars in your eyes)

But when came the time to install it on my debian private server running Nginx, PostgreSQL and Php5 fpm : What a pain!

Indeed like many php open-source projects OwnCloud was first designed for Apache/Mysql couple. Though OwnCloud developpers made the effort of using PDO in order to bring some more genericity and abstraction to their database management, and wrote some (good) documentation about how to set it up using Nginx, it brought me some headaches trying to make it working using "uncommon" versions of theses softs.

Nota: assuming here you're already using nginx, postgreSQL and php5-fpm, and are comfortable with their configuration.

Uncommon solutions for uncommon problems

Documentation Nginx configuration examples were only working with official php5, not -fpm fork. Installation errors lead me on the wrong way for a long time searching where the problem was definetly not: the database, being instead a files rights/owner problem. In the following steps, I assume you already have a Nginx, PostgreSQL and Php5-fpm up and running on your unix system.

Database flirting with filesystems, it's always a matter of rights

Following the install steps described in the OwnCloud documentation, I met problems with files rights/owner, which weren't very plain. Indeed, everything was going okay until I clicked on the "start installation" button. OwnCloud persisted droping me an error damn looking just like a DB one:

PHP Warning:  pg_query(): Query failed: ERROR:  relation "users" does not exist
LINE 1: SELECT * FROM users
                      ^ in /var/www/owncloud/lib/setup.php on line 182

I googled a lot without success, until I thought it may have been caused by a wrong package version or whatever. I tried restoring the code from the official OwnCloud 2.0 tar, and succesfully had set data/ and config/ folders rights/owner stricly following the documentation recommendations.

$ cd my_owncloud_install_dir
$ mkdir data
# chown :www-data data
$ chmod 770 data
# chown :www-data config
$ chmod g+w config

Nginx won't talk to php if their not properly introduced to each other

As, Nginx configuration examples was not clearly not compatible with Php5-fpm, I had to tweak it a little bit. Notice that a prerequisite in order for the following Nginx configuration is to activate https connexion handling.

To do this, I let you refer to the amazingly simple and clear Nginx doc section. Once you're ready, and your ssl nginx certs are into /usr/local/nginx/certs, you can use this kind of configuration for you domain:

upstream backend {
    server 127.0.0.1:9000;
}

server {
    listen                              443;
    server_name                         my_domain;
    root                                /path/to/owncloud;

    access_log                          /var/log/nginx/my_access_log.log;
    error_log                           /var/log/nginx/my_error_log.log info;

    ssl                                 on;
    ssl_certificate                     /usr/local/nginx/conf/server.crt;
    ssl_certificate_key                 /usr/local/nginx/conf/server.key;

    keepalive_timeout                   70;

    client_max_body_size                1000M;

    dav_methods                         PUT DELETE MKCOL COPY MOVE;
    create_full_put_path                on;
    dav_access                          user:rw group:rw all:r;

    location / {
       index index.html index.htm index.php;
    }

    location ~ \.php$ {
        fastcgi_split_path_info         ^(.+\.php)(.*)$;
        fastcgi_pass                    backend;
        fastcgi_index                   index.php;
        fastcgi_param                   SCRIPT_FILENAME /path/to/oncloud$fastcgi_script_name;
        include                         fastcgi_params;
        fastcgi_intercept_errors        on;
        fastcgi_ignore_client_abort     off;
        fastcgi_connect_timeout         60;
        fastcgi_send_timeout            180;
        fastcgi_read_timeout            180;
        fastcgi_buffer_size             128k;
        fastcgi_buffers                 4 256k;
        fastcgi_busy_buffers_size       256k;
        fastcgi_temp_file_write_size    256k;
    }

    location /owncloud {
        index  index.php;
        try_files $uri $uri/ @webdav;
    }

    location @webdav {
        fastcgi_split_path_info         ^(.+\.php)(/.+)$;
        fastcgi_param                   HTTPS on;
        fastcgi_pass                    backend;
        fastcgi_index                   index.php;
        fastcgi_param                   SCRIPT_FILENAME /path/to/oncloud$fastcgi_script_name;
        include                         fastcgi_params;
        fastcgi_intercept_errors        on;
        fastcgi_ignore_client_abort     off;
        fastcgi_connect_timeout         60;
        fastcgi_send_timeout            180;
        fastcgi_read_timeout            180;
        fastcgi_buffer_size             128k;
        fastcgi_buffers                 4 256k;
        fastcgi_busy_buffers_size       256k;
        fastcgi_temp_file_write_size    256k;

    }

}

You're done

You're now ready to run OwnCloud using nginx, postgreSQL and php5-fpm. just

# /etc/init.d/nginx restart
# /etc/init.d/php5-fpm restart

And you should be able to startup a clean and succesfull installation process from your domain url.

Tip : Default, Php5-fpm should not be able to manage file transfers over 2MB. Try increasing post_max_size and upload_max_filesize in your php.ini.