At Botify, we extensively use Elevator to allocate and
store url ids, which relies on Google LevelDB key-value store library. Which was meant to be statically compiled
and embedded in projects, rather than shared system wide. But you know... Sometimes you just don't want to do
what you're supposed to do...
Fortunately most modern unix systems exposes a libleveldb1 and libleveldb-dev package. But whenever
you're using a bit too old linux distribution, which you're unable to upgrade (production rules), you
generally don't have access to these packages.
You might be tempted to download packages and install them manually. Okay. You could try, and
actually it might work. But still, if just like me you're meeting a libc version dependency problem,
here's a shell script you could use to install leveldb library system wide.
#!/bin/sh
SANDBOX_DIR=/tmp/leveldb_install
## Bootstraps a sandbox dir
create_sandbox() {
if [ ! -d $SANDBOX_DIR ]
then
mkdir -p $SANDBOX_DIR;
fi
}
## Pulls and statically compiles last snappy version from repo
compile_snappy() {
cd $SANDBOX_DIR
svn checkout http://snappy.googlecode.com/svn/trunk/ snappy-read-only
cd snappy-read-only
./autogen.sh
./configure --enable-shared=no --enable-static=yes
make clean
make CXXFLAGS='-g -O2 -fPIC'
}
## Pull and install last leveldb version from repo
compile_leveldb() {
cd $SANDBOX_DIR
git clone https://code.google.com/p/leveldb/ || (cd leveldb; git pull)
cd leveldb
make clean
make LDFLAGS='-L../snappy-read-only/.libs/ -Bstatic -lsnappy -shared' OPT='-fPIC -O2 -DNDEBUG -DSNAPPY -I../snappy-read-only' SNAPPY_CFLAGS=''
sudo cp -f $SANDBOX_DIR/leveldb/libleveldb.so* /usr/local/lib/
sudo cp -rf $SANDBOX_DIR/leveldb/include/leveldb /usr/local/include/
}
create_sandbox
compile_snappy
compile_leveldb
## Destroy the sandbox
rm -rf $SANDBOX_DIR
If you've experimented with the responsive features of this blog, you may have noticed that the sidebar,
originally on the right side of the screen, moves on top, when screen's resized to mobile size.
To do this, I needed to be able to swap the main container and the sidebar div.
<!-- TRANSFORMING THIS -->
<div id="narration"></div>
<div id="sidebar"></div>
<!-- INTO THIS -->
<div id="sidebar"></div>
<div id="narration"></div>
Unfortunately, css3 media queries weren't sufficent to achieve this on their own; and I was unable to play with divs float properties as I use the Foundation and didn't want to take the risk to mess with it's native behavior.
The idea, then, was to swap the two divs using jquery .after() and .before() functions. And I did use Enquirejs, which is a very lightweight javascript library (1 kB gzipped) allowing you to trigger js functions on media queries to call them according to window size.
The lib exposes a very simple, and intuitive, api, so it was quite simple to find my way in
<script>
$(function() {
// register the media query to trigger on
enquire.register("screen and (max-width:768px)", {
// define the action whenever the media query matches
match : function() {
$("#narration").before( $("#sidebar") );
},
// define the action whenever the media query does not match anymore
unmatch : function() {
$("#narration").after( $("#sidebar") );
},
// OPTIONAL, Here you can set up some context (constructor-like)
setup : function() {},
// OPTIONAL, There you can unset your context
destroy : function() {},
}).listen(20); // Define the refresh rate in ms (triggering detection)
});
</script>
Tmux is a great tool, but I usually find it hard to track my sessions. So after a few googling I found a trick, which allows to name, and easily retrieve alive tmux sessions (with autocomplete)
Here's the function (zsh)
function tm() {
[[ -z "$1" ]] && { echo "usage: tm <session>" >&2; return 1; }
tmux has -t $1 && tmux attach -t $1 || tmux new -s $1
}
function __tmux-sessions() {
local expl
local -a sessions
sessions=( ${${(f)"$(command tmux list-sessions)"}/:[ $'\t']##/:} )
_describe -t sessions 'sessions' sessions "$@"
}
compdef __tmux-sessions tm
Example :
$ tm testsession1 # Launch a tmux session named testsession1
$ tm testsession2 # Launch a second tmux session named testsession2
$ tm <tab> # List (autocomplete) existing sessions
Source
One of the functional programmation pattern I enjoy and use the most is sequence container destructuration: when you use pattern matching, for example. (Haskell uses the [x:xs] syntaxic sugar).
Destructuration is just a fancy name for slicing a list.
Most of the time you use it to get the head of your list, and the rest of it appart. And I found it lacking in python (correct me if I'm wrong) when I was trying to do such things :
commands = ([signal, key, value], ...)
for command in commands:
signal, args = **destructuring command**
Of course I could have used two lines like this :
signal = command.pop(0)
args = command
But it sounded important to me to be able to do it in one line, and without altering the original container. So here's my snippet for destructuration :
from collections import Sequence
def destructurate(container):
class DestructurationError(Exception):
pass
if isinstance(container, Sequence):
return container[0], container[1:]
else:
raise DestructurationError("Can't destructurate a non-sequence container")
That in my case you can use like :
>>> signal, args = destructurate(command)
>>> signal
signal
>>> args
[key, value]
Using Git, when you've lost a little bit of focus in your project (understand "When you have dozens of unmerged branches you don't remember what the hell they were used to").
You and your team could find usefull to list and order all the branches by their last activity. Gives a good idea of what's usefull, what's used, what is useless or even forgotten.
Command
git for-each-ref --sort=-committerdate --format='%(committerdate:short) %(refname)' refs/heads refs/remotes
Source
I found Luis Nell trick to easily pretty print json in shell or code editor very usefull, so I decided to port it into my zsh config as a function.
As it might be usefull to someone else, here's the code. Might be compatible with bash and other shells with a few adjustements.
function pjson {
if [ $# -gt 0 ];
then
for arg in $@
do
if [ -f $arg ];
then
less $arg | python -m json.tool
else
echo "$arg" | python -m json.tool
fi
done
fi
}
It can whether print from a string cmdline arg or a file, and arguments can be mixed.
pjson '{"test": "test"}'
or
pjson myjsonfile.json
or even
pjson '{"test": "test"}' myjsonfile.json will work.
I came across a rather simple problem lately : how to install one of my github hosted repo's specific tag using pip.
It can be very usefull when it comes to add one of your project's tag to a requirements.txt for example.
Here's how you do it :
pip install -e git://github.com/{ username }/{ reponame }.git@{ tag name }#egg={ desired egg name }
Thanks to CodeIntel who posted the trick
I find myself repeting some repetitive operations dozen times a day when coding in python.
So here are so of my Sublime Text 2 snippets to enhance my productivity, and simplify my life.
Type pdb and press tab to automatically insert import pdb; pdb.set_trace under current pointer
<snippet>
<content><![CDATA[
import pdb; pdb.set_trace()
]]></content>
<tabTrigger>pdb</tabTrigger>
<scope>source.python</scope>
</snippet>
Type docstring and press tab to automatically insert a numpy-style valid
docstring pattern right after your function definition
<snippet>
<content><![CDATA[
"""${1:One liner description}
Parameters
----------
${2}
Returns
-------
${3}
"""
]]></content>
<tabTrigger>docstring</tabTrigger>
<scope>source.python</scope>
<description>Adds a docstring skeleton to function</description>
</snippet>
Type testcase and press tab to automatically insert a unittest.TestCase test class definition pattern
<snippet>
<content><![CDATA[
class ${1:ClassTestName}(unittest.TestCase):
def setUp(self):
${2:pass}
def tearDown(self):
${3:pass}
]]></content>
<tabTrigger>testcase</tabTrigger>
<scope>source.python</scope>
<description>Adds a unittest TestCase skeleton at current pointer</description>
</snippet>
I recently came across a rather simple problem : my project let co-exist
some C++ and Python applicative code which had leveldb as a common dependency.
LevelDB was disposable on pip but it's installation failed, and I had to compile
it's python version from source. Problem : when freezing my virtualenv packages, the leveldb
dependency appeared, which means that every time I would try to reinstall packages
from my frozen environement packages list, pip would fetch the broken version from Pypi.
Turnaround
I'm using git as a version control system, it enables the usage of
hooks to bind scripts to git actions. As hooks can be any kind of executable script,
I made a pre-commit hook using python which is evaluated each time you run the git commit command,
and checks if packages marked as prohibited are present in my requirements.txt. Of course, if they are, the commit fails.
Nota : requirements.txt has to be at the root of your repo and packages names have to be manually
added to the hook.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
import sys
import re
# packages marked as prohibited, add yours.
MANUAL_PACKAGES = dict.fromkeys([
'leveldb',
])
# Compile blacklisted packages regexp
for name in MANUAL_PACKAGES.keys():
MANUAL_PACKAGES[name] = re.compile(r'^%s*' % name)
for line in open('requirements.txt', 'r'):
for r in MANUAL_PACKAGES.values():
if r.match(line):
print "Error : Requirements.txt contains prohibited packages references"
sys.exit(1)
My girlfriend started a tumblr not a very long time ago. It's concept is very simple: it aggregates pictures she took, or sent by people, of funny hairdresser names (There are a lot of these combinations in french). French of yours might be interessed in visiting it, here : Lolcoiffeurs.
As we launched a competition in order to encourage people electing the best of all theses "lolcoiffeurs", we wanted to create a twitter bot in order to remind people of this competition. I had to find a clever way to do this. I choose Python, python-twitter module and Redis in order to achieve a simple, light and balanced twitter bot (had to avoid being reported as spam).
Here are my notes on how to do it.
Twitter python api
First, you have to be aware there are a lot of different Python implementations of the twitter Api. I have tested many of them, with more or less success; the one I have selected is the most classic, and "official": python-twitter. It has been actively developped, and is now frequently updated; plus, it implements the oauth protocol identification(the only one supported by twitter today)
If you're not confortable with it yet, go take a look at their repository (Google code, yeah I know... everyone canno't be perfect) and never hesitate to call help() on the Api methods and class, the code has very good documentation. This is all about three classes, and a lot of plain methods, you should need 20 to 30 minutes in order to feel like home using it.
Once you're done, you'll need to register an app on Twitter developper plateform, in order to get consumer key/secret and access token key/secret : don't forget to require read/write access, your bot won't be able to tweet if you don't.
Redis
Another major prerequisite in order to run, and eventually re-use/customize this bot is to install redis, and redis python client. Redis, if you're not aware (So JCVD is a NoSQL database system written by @antirez. It consists in a Key/Value system, which implements very common persistent data-structures like Strings, Lists, Hashs, Sets, Ordered Sets. The point here is not to convert you to the redis religion, but you have to know it embody all my C-Pythonist most secrets wishes... In one word like in a thousand : Try it!
If you're using Unix-Like system ('Cause if you're not, I don't really care about you), you should be able to easily find a redis package for your distribution. And you can install python redis client using pip:
Authenticate to twitter
Enough chattering, let's code... When you're done with previsous steps you're ready to authenticate to twitter.
import twitter
API_CREDENTIALS = {
'consumer_key': 'my consumer key',
'consumer_secret': 'my consumer secret',
'access_token_key': 'my access token key',
'access_token_secret': 'my access token secret',
}
def auth_to_twitter(api_credentials):
"""Returns an authenticated twitter session api object"""
api = twitter.Api(consumer_key=api_credentials['consumer_key'],
consumer_secret=api_credentials['consumer_secret'],
access_token_key=api_credentials['access_token_key'],
access_token_secret=api_credentials['access_token_secret'])
if (api.VerifyCredentials() is not None):
return api
return
Fulfilling the API_CREDENTIALS dict with your twitter application informations, and passing it to auth_to_twitter function, you will get an Api object instance (if your credentials were correct of course ;)). This object gives you acces to every basic twitter common operations, like GetPublicTimeline, GetUserTimeline, PostUpdate, or, the one we're interessed in : GetSearch. This Api class method will ask twitter to serve us the result of a classic search query using their engine.
Once again I encourage you to use the help function on the Api, Status, or User classes from the twitter api package in order to get more informations.
Redis over Python
Now that we're able to authenticate to twitter, and to get a valid Api object allowing us to query their search engine, it's time to have an overview of the bot overall. Basically the bot main function will do three very basic operations which are not to be explained : authenticate, connect to redis database, run the bot operations, that's it.
-
Main Function
LOLCOIFFEURS_KEYWORDS = ('coiffeur', 'coiffeurs')
if __name__ == "__main__":
twitter_session = auth_to_twitter(API_CREDENTIALS)
redis = redis.Redis("localhost")
run_bot(twitter_session, redis, LOLCOIFFEURS_KEYWORDS)
-
run_bot Function
def run_bot(api_session, redis, keywords):
"""
Runs the lolcoiffeurs bot. Manages the redis keywords search stack,
last replied id in order to not to query already answered tweets and
save API calls, and postupdate for each found tweet.
"""
while 42:
for key in keywords:
update_search_stack(api_session, redis, key)
# Updates the stored last since id with the last found tweet one
update_since_id(redis, "%s:%s" % (LOLCOIFFEURS_LIST, key))
tweet_and_shout(api_session, redis, key, timeout=1)
# Wait for six hours before new update and reply
sleep(26000)
The really interesting part actually resides in the operations made by the run_bot function. So what do our bot is going to do?
First of all it will circle through all the LOLCOIFFEURS_KEYWORDS, and for each of them use three functions:
-
update_search_stack
def get_since_id(redis, key):
"""Tries to retrieve a keyword stored last_since_id"""
# Fetching last value in keyword redis place
since_id = redis.get(key + ":last_since_id")
return since_id if since_id else None
def update_search_stack(api_session, tweet_stack, keyword):
"""
Searches for a specific term on twitter public timeline,
right pushes found tweets in redis "lolcoiffeurs:keyword" List,
adds each tweet information in redis "lolcoiffeurs:tweet:%s" % tweet_id, Hash
"""
# Storing last fetched id in order to make fewer requests
since_id = get_since_id(redis, "%s:%s" % (LOLCOIFFEURS_LIST, keyword))
search_tweet = api_session.GetSearch(term=keyword, since_id=since_id)
for t in search_tweet:
computed_tweet = {
"keyword": keyword,
"username": t.user.screen_name,
"created_at": t.created_at,
"text": t.text,
}
sys.stdout.write("adding tweet with id %s by user %s to database\n" % (str(t.id), str(t.user.screen_name)))
if (computed_tweet["username"] not in BLACKLISTED_USERS):
redis.rpush((LOLCOIFFEURS_LIST + ":%s" % (keyword)), t.id)
redis.hmset("%s:tweet:%s" % (LOLCOIFFEURS_LIST, t.id), computed_tweet)
return
This function calls the twitter api GetSearch method which a keyword to seek, and stores the result in two different ways.
First, for every retrieved tweet, we'll store it's content and meta data in hash, which will reside in it's own redis key ("lolcoiffeur:tweet:the_retrieved_tweet_id"), that way if we need for any reason (debug, for example) to retrieve a specific retrieved tweet informations : we'll be able to query it directly from redis using only it's id.
Then, we right push every retrieved tweet id in a list (That we will use as a FIFO).
-
update_since_id
def update_since_id(redis, key):
"""Updates a keyword last_since_id stored"""
stored_since_id = redis1.get(key + ":last_since_id")
try:
last_tweet_id = str(redis.lrange(key, 0, 0)[0])
except IndexError:
last_tweet_id = None
if last_tweet_id and (last_tweet_id != stored_since_id):
redis.set(key + ":last_since_id", last_tweet_id)
return
An important part of the whole bot resides in the last_since_id management. We're not gonna dwell on it, but bascally, we're keeping updated a value for each keyword stored in redis which contains the last id we've replied to, in order, first, not to answer multiple times to the same tweet, secondly, to save some precious Api calls. As you might have noticed it : since_id was already used in update_search_stack (get).
-
tweet_and_shout
RESPONSE = """Les coiffeurs sont plus rigolos qu'ils n'en ont l'hair!\
toi aussi vote pour tes lolcoiffeurs preferes! http://bit.ly/uiu8BZ"""
def tweet_and_shout(api_session, redis, key, timeout=600):
"""
Replies to found tweets. Store their ids in a "answered" set.
Pops out element from the keyword tweets list, in order to keep the
buffer clean.
"""
for tweet_id in redis.lrange("%s:%s" % (LOLCOIFFEURS_LIST, key), 0, -1):
tweet_dict = redis.hgetall("%s:tweet:%s" % (LOLCOIFFEURS_LIST, tweet_id))
# Tracking answered tweets in a set
redis.sadd((LOLCOIFFEURS_LIST + ":%s:answered" % (key)), tweet_id)
# Posting reply update
api_session.PostUpdate("@%s %s" % (tweet_dict["username"], RESPONSE), in_reply_to_status_id=tweet_id)
# Popping out element from the left of the list
# as we answer it
redis.rpop("%s:%s" % (LOLCOIFFEURS_LIST, key))
# Wait timeout before replying again
sleep(timeout)
return
Here goes the replying process. For each stored tweets, we retrieve it's meta datas hash, from redis (we've stored it as "lolcoiffeurs:tweet:the_tweet_id" redis key, remember?), store it's id in a "already answered tweets" set, post an update to it, prefixed by "@the_user_who_originally_posted_the_tweet", and pop out this id from the "lolcoiffeur:keyword" list.
That way, we're keeping track of which tweets we've answered to, and keep our keyword stack clean in the same time.
In the end
This is it, that's a fully capable twitter bot. And you can actually customize it in order to use your own keywords or expressions, and set the response you'd like to make to those tweets. Notice that this is a totally destructured presentation of the bot, and you can find the whole code on Gist, or download it.
Nota
Any bug found? Any suggestion? Don't love my frenchy english intonation?
Comment it out below, thanks!
As you might already know, OwnCloud 2.0 has been realeased. It is a wonderfully usefull private dropbox-like application. You set it up on a web hosting and voila, you can use it as a file hosting, music streaming, calendar and contacts manager system. Kind of magic isn't it? (I can see thos little stars in your eyes)
But when came the time to install it on my debian private server running Nginx, PostgreSQL and Php5 fpm : What a pain!
Indeed like many php open-source projects OwnCloud was first designed for Apache/Mysql couple. Though OwnCloud developpers made the effort of using PDO in order to bring some more genericity and abstraction to their database management, and wrote some (good) documentation about how to set it up using Nginx, it brought me some headaches trying to make it working using "uncommon" versions of theses softs.
Nota: assuming here you're already using nginx, postgreSQL and php5-fpm, and are comfortable with their configuration.
Uncommon solutions for uncommon problems
Documentation Nginx configuration examples were only working with official php5, not -fpm fork. Installation errors lead me on the wrong way for a long time searching where the problem was definetly not: the database, being instead a files rights/owner problem. In the following steps, I assume you already have a Nginx, PostgreSQL and Php5-fpm up and running on your unix system.
Database flirting with filesystems, it's always a matter of rights
Following the install steps described in the OwnCloud documentation, I met problems with files rights/owner, which weren't very plain. Indeed, everything was going okay until I clicked on the "start installation" button. OwnCloud persisted droping me an error damn looking just like a DB one:
PHP Warning: pg_query(): Query failed: ERROR: relation "users" does not exist
LINE 1: SELECT * FROM users
^ in /var/www/owncloud/lib/setup.php on line 182
I googled a lot without success, until I thought it may have been caused by a wrong package version or whatever. I tried restoring the code from the official OwnCloud 2.0 tar, and succesfully had set data/ and config/ folders rights/owner stricly following the documentation recommendations.
$ cd my_owncloud_install_dir
$ mkdir data
# chown :www-data data
$ chmod 770 data
# chown :www-data config
$ chmod g+w config
Nginx won't talk to php if their not properly introduced to each other
As, Nginx configuration examples was not clearly not compatible with Php5-fpm, I had to tweak it a little bit.
Notice that a prerequisite in order for the following Nginx configuration is to activate https connexion handling.
To do this, I let you refer to the amazingly simple and clear Nginx doc section.
Once you're ready, and your ssl nginx certs are into /usr/local/nginx/certs, you can use this kind of configuration for you domain:
upstream backend {
server 127.0.0.1:9000;
}
server {
listen 443;
server_name my_domain;
root /path/to/owncloud;
access_log /var/log/nginx/my_access_log.log;
error_log /var/log/nginx/my_error_log.log info;
ssl on;
ssl_certificate /usr/local/nginx/conf/server.crt;
ssl_certificate_key /usr/local/nginx/conf/server.key;
keepalive_timeout 70;
client_max_body_size 1000M;
dav_methods PUT DELETE MKCOL COPY MOVE;
create_full_put_path on;
dav_access user:rw group:rw all:r;
location / {
index index.html index.htm index.php;
}
location ~ \.php$ {
fastcgi_split_path_info ^(.+\.php)(.*)$;
fastcgi_pass backend;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /path/to/oncloud$fastcgi_script_name;
include fastcgi_params;
fastcgi_intercept_errors on;
fastcgi_ignore_client_abort off;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
}
location /owncloud {
index index.php;
try_files $uri $uri/ @webdav;
}
location @webdav {
fastcgi_split_path_info ^(.+\.php)(/.+)$;
fastcgi_param HTTPS on;
fastcgi_pass backend;
fastcgi_index index.php;
fastcgi_param SCRIPT_FILENAME /path/to/oncloud$fastcgi_script_name;
include fastcgi_params;
fastcgi_intercept_errors on;
fastcgi_ignore_client_abort off;
fastcgi_connect_timeout 60;
fastcgi_send_timeout 180;
fastcgi_read_timeout 180;
fastcgi_buffer_size 128k;
fastcgi_buffers 4 256k;
fastcgi_busy_buffers_size 256k;
fastcgi_temp_file_write_size 256k;
}
}
You're done
You're now ready to run OwnCloud using nginx, postgreSQL and php5-fpm. just
# /etc/init.d/nginx restart
# /etc/init.d/php5-fpm restart
And you should be able to startup a clean and succesfull installation process from your domain url.
Tip : Default, Php5-fpm should not be able to manage file transfers over 2MB. Try increasing post_max_size and upload_max_filesize in your php.ini.