Archive for the ‘ironman’ Category

The Dancer Ecosystem

Monday, April 19th, 2010

Even though it’s still a young project, an active community is starting to emerge around Dancer. Some modules start to appear on CPAN and github to add functionalities, or to extend existing ones.

Templates

By default, Dancer comes with support for two templating systems: Template Toolkit and Dancer::Template::Simple, a small templating engine written by sukria. But support for other templating systems are available:

Logger

Out of the box, Dancer only has a simple logging system to write to file, but more logging sytems are available:

The last one is for writing directly your log message via Plack. You can use a middleware like P::M::LogDispatch or P::M::Log4perl to handle logs for your application. Even better, if you use P::M::ConsoleLogger, you can have logs from your Dancer application in your javascript console.

Debug

To debug your application with Plack, you can use the awesome Plack::Middleware::Debug. I’ve writen Dancer::Debug (which requires my fork of P::M::Debug), a middleware that add panels, with specific informations for Dancer applications.

Dancer::Debug middleware

To activate this middleware, update your app.psgi to make it look like this:

my $handler = sub {
    my $env     = shift;
    my $request = Dancer::Request->new($env);
    Dancer->dance($request);
};
$handler = builder {
    enable "Debug",
        panels => [
        qw/Dancer::Settings Dancer::Logger Environment Memory ModuleVersions Response Session Parameters Dancer::Version /
        ];
    $handler;
};

Plugins

Dancer has support for plugins since a few version. There is not a lot of plugins at the moment, but this will soon improve. Plugins support is one of the top priorities for the 1.2 release.

Dancer::Plugin::REST

This one is really nice. This plugin, used with the serialization stuff, allow you to write easily REST application.

resource user =>
    get => sub { # return user where id = params->{id} },
    create => sub { # create a new user with params->{user} },
    delete => sub { # delete user where id = params->{id} },
    update => sub { # update user with params->{user} };

And you got the following routes:

# GET /user/:id
# GET /user/:id.:format
# POST /user/create
# POST /user/create.:format
# DELETE /user/:id
# DELETE /user/:id.:format
# PUT /user/:id
# PUT /user/:id.:format

Dancer::Plugin::Database

This plugin, by bigpresh, add the database keyword to your app.

    use Dancer;
    use Dancer::Plugin::Database;
 
    # Calling the database keyword will get you a connected DBI handle:
    get '/widget/view/:id' => sub {
        my $sth = database->prepare(
            'select * from widgets where id = ?',
            {}, params->{id}
        );
        $sth->execute;
        template 'display_widget', { widget => $sth->fetchrow_hashref };
    };

Dancer::Plugin::SiteMap

With this plugin, by James Ronan, a sitemap of your application is created.

Plugin module for the Dancer web framwork that automagically adds sitemap routes to the webapp. Currently adds /sitemap and /sitemap.xml where the former is a basic HTML list and the latter is an XML document of URLS.

you can help! :)

There is still a lot of stuff to do. Don’t hesitate to come on #dancer@irc.perl.org to discuss ideas or new features that you would like.

presque, a Redis / Tatsumaki based message queue

Wednesday, April 14th, 2010

presque is a small message queue service build on top of redis and Tatsumaki. It’s heavily inspired by RestMQ for the functionalities and resque for the name.

  • Communications are done in JSON over HTTP
  • Queues and messages are organized as REST resources
  • A worker can be writen in any language that make a HTTP request and read JSON
  • Thanks to redis, the queues are persistent

Overview

resque need a configuration file, writen in YAML that contains the host and port for the Redis server.

redis:
  host: 127.0.0.1
  port: 6379

Let’s start the server:

$ plackup app.psgi --port 5000

The applications provides some HTTP routes:

  • /: a basic HTML page with some information about the queues
  • /q/: REST API to get and post job to a queue
  • /j/: REST API to get some information about a queue
  • /control/: REST API to control a queue (start or stop consumers)
  • /stats/: REST API to fetch some stats (displayed on the index page)

Queues are created on the fly, when a job for an unknown queue is inserted. When a new job is created, the JSON send in the POST will be stored “as is”. There is no restriction on the schema or the content of the JSON.

Creating a new job simply consist to :

curl -X POST "http://localhost:5000/q/foo" -d '{"foo":"bar", "foo2":"bar" }'

and fetching the job:

curl "http://localhost:5000/q/foo"

When a job is fetched, it’s removed from the queue.

A basic worker

I’ve also uploaded presque::worker to github. It’s based on AnyEvent::HTTP and Moose. Let’s write a basic worker using this class:

use strict;
use warnings;
use 5.012; # w00t
 
package simple::worker;
use Moose;
extends 'presque::worker';
 
sub work {
    my ($self, $job) = @_;
    say "job's done";
    ...; # yadda yadda!
    return;
}
 
package main;
use AnyEvent;
 
my $worker = simple::worker->new(base_uri => 'http://localhost:5000', queue => 'foo');
 
AnyEvent->condvar->recv;

A worker have to extends the presque::worker class, and implement the method work. When the object is created, the class check if this method is avalaible. You can also provide a fail method, which will be called when an error occur.

The future

I plan to add support for websocket, and probably XMPP. More functionalities to the worker too: logging, forking, handling many queues, … I would like to add priorities to queue also, and maybe scheluding job for a given date (not sure if it’s feasable with Redis).

More fun with Tatsumaki and Plack

Saturday, April 3rd, 2010

Lately I’ve been toying a lot with Plack and two Perl web framework: Tatsumaki and Dancer. I use both of them for different purposes, as their features complete each other.

Plack

If you don’t already know what Plack is, you would want to take a look at the following Plack resources:

As sukria is planning to talk about Dancer during the FPW 2010, I will probably do a talk about Plack.

After reading some code, I’ve started to write two middleware: the first one add ETag header to the HTTP response, and the second one provides a way to limit access to your application.

Plack::Middleware::ETag

This middleware is really simple: for each request, an ETag header is added to the response. The ETag value is a sha1 of the response’s content. In case the content is a file, it works like apache, using various information from the file: inode, modified time and size. This middleware can be used with Plack::Middleware::ConditionalGET, so the client will have the ETag information for the page, and when he will do a request next time, it will send an “if-modified” header. If the ETag is the same, a 304 response will be send, meaning the content have not been modified. This module is available on CPAN.

Let’s see how it works. First, we create a really simple application (we call it app.psgi):

#!/usr/bin/env perl
use strict;
use warnings;
use Plack::Builder;
 
builder {
    enable "Plack::Middleware::ConditionalGET";
    enable "Plack::Middleware::ETag";
    sub {
        [ '200', [ 'Content-Type' => 'text/html' ], ['Hello world'] ];
    };
};

Now we can test it:

> plackup app.psgi&
 
> curl -D - http://localhost:5000
HTTP/1.0 200 OK
Date: Sat, 03 Apr 2010 09:31:43 GMT
Server: HTTP::Server::PSGI
Content-Type: text/html
ETag: 7b502c3a1f48c8609ae212cdfb639dee39673f5e
Content-Length: 11
 
> curl -H "If-None-Match: 7b502c3a1f48c8609ae212cdfb639dee39673f5e" -D - http://localhost:5000
HTTP/1.0 304 Not Modified
Date: Sat, 03 Apr 2010 09:31:45 GMT
Server: HTTP::Server::PSGI
ETag: 7b502c3a1f48c8609ae212cdfb639dee39673f5e

Plack::Middleware::Throttle

With this middleware, you can control how many times you want to provide an access to your application. This module is not yet on CPAN, has I want to add some features, but you can get the code on github. There is four methods to control access:

  • Plack::Middleware::Throttle::Hourly: how many times in one hour someone can access the application
  • P::M::T::Daily: the same, but for a day
  • P::M::T::Interval: which interval the client must wait between two query
  • by combining the three previous methods

To store sessions informations, you can use any cache backend that provides get, set and incr methods. By default, if no backend is provided, it will store informations in a hash. You can easily modify the defaults throttling strategies by subclassing all the classes.

Let’s write another application to test it:

#!/usr/bin/env perl
use strict;
use warnings;
use Plack::Builder;
 
builder {
    enable "Plack::Middleware::Throttle::Hourly", max => 2;
    sub {
        [ '200', [ 'Content-Type' => 'text/html' ], ['Hello world'] ];
    };
};
$ curl -D - http://localhost:5000/
HTTP/1.0 200 OK
Date: Sat, 03 Apr 2010 09:57:40 GMT
Server: HTTP::Server::PSGI
Content-Type: text/html
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 1
X-RateLimit-Reset: 140
Content-Length: 11
 
Hello world
 
$ curl -D - http://localhost:5000/
HTTP/1.0 200 OK
Date: Sat, 03 Apr 2010 09:57:40 GMT
Server: HTTP::Server::PSGI
Content-Type: text/html
X-RateLimit-Limit: 2
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 140
Content-Length: 11
 
Hello world
 
$ curl -D - http://localhost:5000/
HTTP/1.0 503 Service Unavailable
Date: Sat, 03 Apr 2010 09:57:41 GMT
Server: HTTP::Server::PSGI
Content-Type: text/plain
X-RateLimit-Reset: 139
Content-Length: 15
 
Over rate limit

Some HTTP headers are added to the response :

  • X-RateLimit-Limit: how many request can be done
  • X-RateLimit-Remaining: how many requests are available
  • X-RateLimit-Reset: when will the counter be reseted (in seconds)

This middleware could be a very good companion to the Dancer REST stuff added recently.

another Tatsumaki application with Plack middlewares

To demonstrate the use of this two middleware, I’ve wrote a small application with Tatsumaki. This application fetch a page, parse it to find all the feeds declared, and return a JSON with the result.

GET http://feeddiscover.tirnan0g.org/?url=http://lumberjaph.net/blog/

will return

[{"href":"http://lumberjaph.net/blog/index.php/feed/","type":"application/rss+xml","title":"iâm a lumberjaph RSS Feed"}]

This application is composed of one handler, that handle only GET request. The request will fetch the url given in the url parameter, scrap the content to find the links to feeds, and cache the result with Redis. The response is a JSON string with the informations.

The interesting part is the app.psgi file:

 
my $app = Tatsumaki::Application->new( [ '/' => 'FeedDiscovery::Handler' ], );
 
builder {
    enable "Plack::Middleware::ConditionalGET";
    enable "Plack::Middleware::ETag";
    enable "Plack::Middleware::Throttle::Hourly",
        backend => Redis->new(
        server => '127.0.0.1:6379',
        ),
        max => 100;
    $app;
};

The application itself is really simple: for a given url, the Tatsumaki::HTTPClient fetch an url, I use Web::Scraper to find the link rel=”alternate” from the page, if something is found, it’s stored in Redis, then a JSON string is returned to the client.

Github Poster

Friday, April 2nd, 2010

The Github poster is available as a PDF on the linkfluence atlas.
It’s distributed under a Attribution-Noncommercial-No Derivative Works 3.0 Unported Creative Commons license, and you are free to print it.

It’s optimized for a A2 size. You shouldn’t have any problem to print it at a bigger size though, as it’s vectorial.

We (linkfluence) really wanted to print and send the poster to people who were interested. But this is too complicated and too much work to handle by ourselves. If you do print a poster, please send us a notice, for we would love to know where it may end up :)

However, for lazy/busy people, I plan to attend the following Perl conferences :

so if you are interested in buying a poster and contact me early enough, I’ll print it and bring it with me. The cost should be between 35 and 50 euros per poster (this is the raw cost).

I would like to thank all the people who emailed me, and I’m really sorry I’m unable to provide each of you a poster.



Github explorer

Thursday, March 25th, 2010

More informations about the poster are available on this post

Last year, with help from my coworkers at Linkfluence, I created two sets of maps of the Perl and CPAN’s community. For this, I collected data from CPAN to create three maps :

I wanted to do something similar again, but not with the same data. So I took a look at what could be a good subject. One of the things that we saw from the map of the websites is the importance github is gaining inside the Perl community. Github provides a really good API, so I started to play with it.

This graph will be printed on a poster, size will be A2 and A1. Please, contact me (franck.cuny [at] linkfluence.net) if you will be interested by one.





This time, I didn’t aim for the Perl community only, but the whole github communities. I’ve created several graphs:

all the graph are available on my flickr’s account.

I think a disclaimer is important at this point. I know that github doesn’t represent the whole open source community. With these maps, I don’t claim to represent what the open source world looks like right now. This is not a troll about which language is best, or used at large. It’s ONLY about github.

Also, I won’t provide deep analysis for each of these graphs, as I lack insight about some of those communities. So feel free to re-use theses graphs and provide your own analyses.

Methodology

I didn’t collect all the profiles. We (with Guilhem) decided to limit to peoples who are followed by at least two other people. We did the same thing for repositories, limiting to repositories which are at least forked once. Using this technique, more than 17k profiles have been collected, and nearly as many repositories.

For each profile, using the github API, I’ve tried to determine what the main language for this person is. And with the help of the geonames API, find the right country to attach the profile to.

Each profile is represented by a node. For each node, the following attributes are set:

  • name of the profile
  • main language used by this profile, determined by github
  • name of the country
  • follower count
  • following count
  • repository count

An edge is a link between two profiles. Each time someone follows another profile, a link is created. By default, the weight of this link is 1. For each project this person forked from the target profile, the weight is incremented.

As always, I’ve used Gephi (now in version 0.7) to create the graphs. Feel free to download the various graph files and use them with Gephi.

Github

properties of the graph: 16443 nodes / 130650 edges

Github - All - by languages

The first map is about all the languages available on github. This one was really heavy, with more than 17k nodes, and 130k edges. The final version of the graph use the 2270 more connected nodes.

You can’t miss Ruby on this map. As github uses Ruby on Rails, it’s not really surprising that the Ruby community has a particular interest on this website. The main languages on github are what we can expect, with PHP, Python, Perl, Javascript.

Some languages are not really well represented. We can assume that most Haskell projects might use darcs, and therefore are not on github. Some other languages may use other platforms, like launchpad, or sourceforge.

Perl

properties of the graph: 365 nodes / 4440 edges

Perl community on Github

The Perl community is split into two parts. On the left side, there is the occidental community, driven by people like Florian, Yuval, rjbs, … The second part are the japanese Perl hackers, with Tokhuirom, Typester, Yappo, … And in between them, Miyagawa acts as a glue. This map looks a lot like the previous map of the CPAN. We can see that this community is international, with the exception of Japan that don’t mix with others.

There is no main project on github that gathers people, even though we can see a fair amount of MooseX:: projects. Most of the developers will work on different modules, that may not have the same purpose. Lately we have seen a fair amount of work on various Plack stuff, mainly middleware, but also HTTP servers (twiggy, starman, …) and web framework (dancer).

One important project that is not (deliberately) represented on this graph is the gitpan, Schwern’s project. The gitpan is an import of all the CPAN modules, with complete history using the Backpan.

To conclude about Perl, there are only 365 nodes on this graph, but no less than 4440 edges. That’s nearly two times the number of edges compared to the Python community. Perl is a really well structured community, probably thanks to the CPAN, which already acted as hub for contributors.

Python

properties of the graph: 532 nodes / 2566 edges

Python community, by country, on Github

The Python community looks a lot like the Perl community, but only in the structure of the graph. If we look closely, Django is the main project that represent Python on Github, in contrast with Perl where there is no leader. Some small projects gather small community of developers.

PHP

properties of the graph: 301 nodes / 1071 edges

PHP community on Github

PHP is the only community that is structured this way on Github. We can clearly see that people are structured based on a project where they mainly contribute.

CakePHP and Symphony are the two main projects. Nearly all the projects gather an international community, at the exception of a few japanese-only projects

Ruby

properties of the graph: 3742 nodes / 24571 edges

Ruby community, by country, on Github

As for the Github graph, we can clearly see that some countries are isolated. On the right side, we have: the Japan community is at the bottom; the Spanish at the top. Australian are represented on the upper right corner, while on the left side we got the Brazilians.

The main projects that gather most of the hackers are Rails and Sinatra, two famous web frameworks.

Europe

properties of the graph: 2711 nodes / 11259 edges

Europe community on Github

This one shows interesting features. Some countries are really isolated. If we look at Spain, we can see a community of Ruby programmers, with an important connectivity between them, but no really strong connection with any foreign developers. We can clearly see the Perl community exists as only one community, and is not split by country. The same is true for Python.

Japanese hackers community

properties of the graph: 559 nodes / 5276 edges

Japan community on github

This community is unique on github. In 2007, Yappo created coderepos.org, a repository for open source developers in Japan. It was a subversion repository, with Trac as an HTTP front-end. It gathered around 900 developers, with all kind of projects (Perl, Python, Ruby, Javascript, …). Most of these users have switched to github now.

Three main communities are visible on this graph: Perl; Ruby; PHP. As always, the Javascript community as a glue between them. And yes, we can confirm that Perl is big in Japan.

We have seen in the previous graph that the Japanese hackers are always isolated. We can assume that their language is an obstacle.

This is a really well-connected graph too.

Conclusions and graphs

I may have not provided a deep analysis of all the graph. I don’t have knowledge of most of the community outside of Perl. Feel free to download the graph, to load them in Gephi, experiment, and provides your own thoughts.

I would like to thanks everybody at Linkfluence (guilhem for his advices, camille for giving me time to work on this, and antonin for the amazing poster), who have helped me and let me use time and resources to finish this work. Special thanks to blob for reviewing my prose and cdlm for the discussion :)