Last year I did a small exploration of GitHub to show the various communities using GitHub and how they work. I wanted to do it again this year, but I was lacking time and motivation to start over. A couple of months ago, I got a message from mojombo asking me if I was planning to do a new poster. This triggered the motivation to work on it again.
And of course, the poster. Feel free to print it yourself, the size of the poster is A1.
All the data are available! Last year I got some mails asking me for the dataset. So this time I asked first if I could release the data with the code and the poster, and the anwser is yes! So if you're intereseted, you can download it.
The data are stored in mongodb, so I provide the dump which you can easily use:
% wget http://maps.stargit.net/dump/github. % tar xvzf github.tgz % cd github % mongorestore -d github .
Now you can use mongodb to browse the imported database. There is 5 collections: profiles / repositories / relations / contributions / edges.
Last year I did a simple "follower/following" graph. It was already interesting, but it was also really too simple. This time I wanted to go deeper in the exploration.
The various step to process all this data are:
For all the graphs, I've used the following colors for:
Feel free to do your own analysis in the comments :) For each map, you'll find a PDF of the map, and the graph to explore using gephi (in GEXF or GDF format).
This took me about a month in order to collect the data and to build the adapted tools.
The following chart show the number of account created by month. "Everyone" means the total of accounts created. You can also see the numbers for each communities.
On the "Everyone" graph, you can see a huge pick around April 2008, that's the date GitHub was launched.
For most of the communities, the number of created accounts start to decrease since 2010. I think the reason is that most of the developers from those communities are now on GitHub.
(Keep in mind that these numbers are coming from the profiles I was able to tag, roughly 40k)
Those numbers doesn't really match "what GitHub gave":https://github.com/languages, but it could be explained by the way I've selected my users.
The United States are still the main country represented on GitHub, no suprise here.
If you are interested in the "geography" of Open Source, you should read these two articles: Coding Places and Investigating the Geography of Open Source Software through GitHub.
Looking at the "company" field on user's profile, here are some stats about which companies has employees using GitHub:
I didn't knew the first company, ThoughtWorks, and I was expecting to see FaceBook or Twitter as the company with most developpers on GitHub. It's also interesting to see Yandex here.
The main difference with last year, is the android / modders community. They're developing mostly in C and Java. The poster has been created from this map.
Here we have some clusters. I'm not familiar with the Python community, so I can't really give any insight.
I really like this graph since it show (in my opinion) one of the real strength of this community: everybody works with everybody. People working on a webframework will collaborate with people working on Moose, or an ORM, or other tools. It shows that in this community, people are competent in more than one field.
The Perl community is about the same size as last year. However, we can extract the following informations:
As we can see on the previous charts, the number of created accounts for the Perl developpers is stalling.
This one is really nice. We can clearly see all the communities. There is something interesting:
I'll let you take some conclusion by yourself on this one ;)
We have a lot of small clusters on this one, and some very big authorities.
There is three dominants clusters on this one:
The Ruby and Perl one are well connected. There is a lot of japanese hacker on CPAN using both languages.
I would like to thanks the whole GitHub team for being interested in the previous poster and to ask another one this year :)
A huge thanks to Alexis for his help on building the awesome StarGit. Another big thanks to Antonin for his work on the poster.