Posts Tagged ‘visualization’

github explorer - a preview

Sunday, March 7th, 2010

For the last weeks, I’ve been working on the successor of CPAN Explorer. This time, I’ve decided to create some visualizations (probably 8) of the various communities using Github. I’m happy with the result, and will soon start to publish the maps (statics and interactives) with some analyses. I’m publishing two previews: the Perl community and the european developers. These are not final results. The colors, fonts, and layout may change. But the structure of the graphs will be the same. All the data was collected using the github API.

the Perl community on github

Each node on the graph represents a developer. When a developer “follows” another developer on github, a link between them is created. The color on the Perl community map represent the countries of the developer. One of the most visible things on this graph is that the japanese community is tighly connected and shares very little contact with the foreign developers. miyagawa obviously acts as a glue between japanese and worldwide Perl hackers.

European developers on github

The second graph is a little bit more complex. It represents the European developers on github. Here the colors represent the languages used by the developers. It appears that ruby is by far the most represented language on github, as it dominates the whole map. Perl is the blue cluster at the bottom of the map, and the green snake is… Python.

Thanks to bl0b for his suggestions :)

cpan-explorer update and three new maps

Tuesday, July 28th, 2009

The site cpan-explorer have been update with three new maps for the YAPC::EU. This three maps are different from the previous one. This time, instead of having a big photography of the distributions and authors on the CPAN, Task::Kensho have been used to obtain a representation of what we call the modern Perl.

distributions map

moosedist
Task::Kensho acted as the seed for this map. Task::Kensho contains a list of modules recommended to do modern Perl development. So we extracted the modules that have a dependancie toward one of these modules, and create the graph with this data.

authors map

The authors listed on this map are the one from the previous map. There is a far less authors thant the previous authors map, but it’s more readable. A lot of informations are on the map : label size, node size, edge size, color of the node.

web communities map

This map look a lot like the previous one, as we used nearly the same data. The seed have been extended with a few websites only.

cpan-explorer

cpan-explorer is now hosted on a wordpress, so you can leave comments or suggestions for new maps you would like to see (a focus on web development modules, tests::* module, etc …). All the new maps are also searchable, and give you a permalink for you search (I’m here and here).

I will give a talk at the YAPC::EU about this work. Also, each map have been printed, and will be given for the auction.

This work is a collective effort from all the guys at RTGI (antonin created the template for wordpress, niko the js and the tools to extract information from the SVG for the searchable map, julian helped me to create the graphs and extract valuable informations, and I got a lot of feedback from others coworkers), thanks guys!.

cpan-explorer

Sunday, July 26th, 2009

We (RTGI) have been working to update the cpan-explorer. A new version will be available this week, before YAPC::EU. Three new maps have been created, using different informations than the previous one, and you will be able to search and pinpoint the browsable maps.

authorsmap

CPANHQ and dependencies graph

Thursday, July 16th, 2009

CPANHQ is a new project that “aims to be a community-driven, meta-data-enhanced alternative to such sites as http://search.cpan.org/ and http://kobesearch.cpan.org/“.

I believe that a good vizualisation can help to have a better understanding of datas. One of the missing thing on the actual search.cpan.org is the lack of informations about a distribution’s dependencies. So my first contribution to the CPANHQ project was to add such informations.

cpanhq-dep

For each distributions, a graph is generated for the this distribution. For this, I use Graph::Easy and data available from the CPANHQ database. I alsa include a simple list of the dependencies after the graph.

Only the first level dependencies are displayed, as the distribution’s metadata are analysed when the request is made. I could follow all the dependencies when the request is made, but for some distribution it could take a really long time, and it’s not suitable for this kind of services.

edit: you can found CPANHQ here : CPANHQ on github

shape of CPAN

Friday, June 12th, 2009

My talk at the FPW this year is about the shape of the Perl and CPAN community. This talk was prepared by some $coworkers and me.

map of the Perl community on the web

map of the Perl community on the web

We generated two maps (authors and modules) using the CPANTS’ data. For the websites, we crawled a seed generated from the CPAN pages of the previous authors.

Each of this graphs are generated using a force base algorithm, with the help of Gephi.

All the map are available in PDF files, in creative common licence. The slides are in french, but I will explain the three maps here.

CPAN’s modules

The first map is about the modules available on the CPAN. We selected a list of modules which are listed as dependancies by at least 10 others modules, and the modules who used them. This graph is composed of 7193 nodes (or modules) and 17510 edges. Some clusters are interesting:

  • LWP and URI are really the center of the CPAN
  • a lot of web modules (XML::*, TemplateToolkit, HTML::Parser, …)
  • TK is isolated from the CPAN
  • Moose, DBIx::Class and Catalyst are forming a group. This data are from march, we will try to do a newer version of this map this summer. This one will be really interesting as Catalyst have switched to Moose

The CPAN’s authors

This map is about the authors on the CPAN. There is about 700 authors and their connections. Each time an author use a module of another author, a link is created.

  • Modern Perl, constitued by Moose, Catalyst, DBIx::Class. Important authors are Steven, Sartak, perigin, jrockway, mstrout, nothingmuch, marcus ramberg
  • Slaven Rezić and others TK developpers are on the border
  • Web map

    We crawled the web using the seed generated using the CPAN’s authors pages.

    • again, the “modern group”, on the top of the map, with Moose/Catalyst/DBIx::Class developpers
    • some enterprises, like shadowcat and iinteractive in the middle of the “modern Perl”, Booking in the middle of the YAPC’s websites (they are a major sponsor of this events), 6apart, …
    • perl.org is the reference for the Perl community (the site is oriented on their sides)
    • cpan.org is the reference for the open source community
    • github is in the center of the Perl community. It’s widely adopted by the Perl developpers. It offers all the “social media” features that are missing on the CPAN

    I hope you like this visualisations, have fun analyzing them :)