[Kolab-devel] Switching the Kolab server to GIT?

Thomas Arendsen Hein thomas at intevation.de
Mon Nov 15 13:15:42 CET 2010


* Christoph Wickert <wickert at kolabsys.com> [20101112 17:58]:
> the Kolab server recently switched from CVS to HG. While this was definitely a 
> step into the right direction, I feel however that it is not the end of the 
> journey and we can still improve.
> 
> My proposal is to do another switch and move to GIT. Why that?
> 
> 1. GIT will allow us better integration with other parts of the FOSS 
> ecosystem. As Kolab strongly depends on a lot of projects, this is an 
> important topic. ATM we have Cyrus, ClamAV and Horde using GIT. KDE is already 
> in the process of switching to GIT, and others are likely to do the same.

I don't think that ClamAV matters here, Cyrus might be somewhat
interesting in general, but so might be Dovecot (which uses
Mercurial).

But this leaves Horde as an important argument, as long as Kolab
Server depends on it and wants to contribute back to it.

> 2. It's not only upstream projects that use GIT but also downstream packaging 
> efforts like Debian, Fedora or SUSE. Our repo would be a chain link from 
> upstream to downstream.

Even if distributions use Git as their primary SCM, they still
should have ways to interact with upstreams that have a different
well-known SCM, being it CVS, Subversion or Mercurial.

Additionally distributions usually package released versions, so the
SCM does not matter that much anyway.

Of course it might become easier for the maintainer of a package to
contribute to Kolab Server if he is familiar with our SCM, but this
is already discussed in the next point:

> 3. GIT is the more well-known tool, which lowers the barrier to entry into the 
> Kolab community by not having to learn another tool.

Of course for people knowing Git the barrier is lower, but I still
think for people not knowing Git it is higher.

> Think of projects like 
> gitorious with it's incredible growth over the past years.

The same holds true for the Mercurial counterpart, bitbucket.

> I'm sure the user 
> base of GIT is larger than any other distributed SCM, this gives us a more 
> potential contributors.

You might be right here, but more important is: When someone is
interested in becoming a contributor, will he be content with using
our SCM? Remember, you could count me as one of the many Git users :)

> 4. GIT has a vast ecosystem of additional tools that develops rapidly due to 
> the large and growing user base. To name a few that come to mind quickly as I 
> think they could be useful for us:
> * Gerrit online code review: http://code.google.com/p/gerrit
> * SCMbug - Bug tracker - SCM integration: http://freshmeat.net/projects/scmbug
> * GitOlite - User and permission management: 
> http://github.com/sitaramc/gitolite

The same holds true for Mercurial, see
http://mercurial.selenic.com/wiki/OtherTools#Project_support
(Trac, Maven, Hudson, Bugzilla, Redmine, ReviewBoard, JIRA, ...)

> 5. GIT has a nice Web interface: If you don't like gitweb, there is cgit. It's 
> is not only looking good but also very fast due to it's intelligent caching.

As I already wrote more than once: cgit is really nice, gitweb is
definitely not.

> 6. GIT is fast:
> $ time git clone ssh://wickert@git.kolabsys.com/git/server.git
> ...
> real	4m9.560s
> user	0m7.599s
> sys	0m1.957s
> 
> $ time hg clone ssh://hg@hg.kolab.org/server
> ...
> real	6m11.644s
> user	0m15.824s
> sys	0m2.846s
> 
> That is real +49%, user +108% and sys +68%. Both repositiries are on the same 
> server.

Hmm, both times seem quite long, I've seen both, Git and Mercurial,
perform much better, so I ran some clones, too.

Here are the results (best of two tries):

$ time git clone git://git.kolabsys.com/git/server.git
Cloning into server...
remote: Counting objects: 38662, done.
remote: Compressing objects: 100% (14189/14189), done.
remote: Total 38662 (delta 22975), reused 38662 (delta 22975)
Receiving objects: 100% (38662/38662), 52.49 MiB | 386 KiB/s, done.
Resolving deltas: 100% (22975/22975), done.

real    2m36.756s
user    0m7.892s
sys     0m1.220s

(2m44.899s on a second try)

$ time hg clone http://hg.kolab.org/server
requesting all changes
adding changesets
adding manifests
adding file changes
added 4714 changesets with 19198 changes to 7740 files (+9 heads)
updating to branch default
994 files updated, 0 files merged, 0 files removed, 0 files unresolved

real    1m22.664s
user    0m21.841s
sys     0m1.232s

(1m58.027s on a second try)

$ time hg clone http://hg.intevation.org/mirrors/kolab.org/server
requesting all changes
adding changesets
adding manifests
adding file changes
added 4714 changesets with 19198 changes to 7740 files (+9 heads)
updating to branch default
994 files updated, 0 files merged, 0 files removed, 0 files unresolved

real    1m7.835s
user    0m21.621s
sys     0m1.212s

(1m12.022s on the first try)

I compared cloning the repo via ssh, too, here the times (best
of two tries) are 1m31.674s with hg vs. 2m20.578s with git.

These results show that Git is not always faster than Mercurial,
even if many people claim that. Both tools perform very well here
considering how long a simple checkout (only one branch at a time
and without transferring the history) of the old CVS repository
takes.

> 7. GIT takes little disk space:
> $ du -s linux/kolab/git/server/
> 62544	linux/kolab/git/server/
> $ du -s linux/kolab/hg/server/
> 74040	linux/kolab/hg/server/
> 
> That is +18,3%

Hmm, strange, with fresh clones I get:

$ du -s server.*/.{hg,git}
58744   server.hg/.hg
55076   server.git/.git

Files are less than 7M, so I don't know why you get 74040K.

With Mercurial 1.7 and parentdelta format the size even shrinks to
47552K for the .hg directory, this format will become the default in
a future version of Mercurial.

So both are in the same region and this is really no argument in
favour of one or the other.

> 8. Back in February when the question whether to move away from CVS was raised 
> on this mailing list, several people already proposed GIT. Obviously people 
> like GIT and as developers, we should not ignore our community.
> 
> For me the 'soft' facts like the larger ecosystem and tooling community, the 
> tigher link to up- and downstream and the preference of the community or the 
> user base are even more important than technical numbers.
> 
> Let's go one step further to the community. I'm sure we will not regret it.

You already know that some people (including me) who are member of
the Kolab community that will not be that happy when using Git, but:

- We can live with it, it is still better than using Subversion :)
- We can mirror it as a read-only Mercurial repository, so where
  appropriate we can still use the features that we did not yet find
  in Git, e.g.:
  hg outgoing (the closest seems to be "git log origin..HEAD",
               but I don't know if it works with branches)
  hg incoming (something like "git fetch && git log HEAD..origin",
               but after this you either need "git merge
               FETCH_HEAD" or a way to discard the incoming changes
               if you don't want them, so it is probably better to
               use a separate temporary clone)

And the main point:

- We are not as active as in the past, so it is more important that
  the people that are more active now are happy.

* Christoph Wickert <wickert at kolabsys.com> [20101112 18:18]:
> On Friday 12 November 2010 17:58:10 Christoph Wickert wrote:
> > My proposal is to do another switch and move to GIT. Why that?
> 
> Ah, and while we are at it, we should also:
> 
> * Separate the repo into smaller repositories. For example, the Server and the 
> Webcient should be separated, same goes for the webadmin. Details of the new 
> repo layout are of course subject so further discussion.

Generally yes, but please don't make it too complicated. People
should not have to clone more than a few repositories to find the
code they want to look at.

Does Git have something like Mercurial's subrepositories or
Subversion's externals, so cloning everything needed to get started
will be just one git command?

> * Sourcecode and packaging should be separated completely. The mixture of both 
> made people do weird things like comitting patches that afect both. By 
> separating this we wil lmake it easier for the distributions to package Kolab.  
> We have already been working on that and need to continue the work.
> 
> All of these are just suggestions, and we should discuss them once we have 
> made a decision about the SCM. I'm just mentioning this now because we should 
> do several things in one step because splitting will also cause a rewrite of 
> the history anyway.

As I wrote, I can live with using Git, so I don't want to put a
spoke in your wheel.

Regards,
Thomas

-- 
thomas at intevation.de - http://intevation.de/~thomas/ - OpenPGP key: 0x5816791A
Intevation GmbH, Neuer Graben 17, 49074 Osnabrueck - AG Osnabrueck, HR B 18998
Geschaeftsfuehrer: Frank Koormann, Bernhard Reiter, Dr. Jan-Oliver Wagner
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: not available
URL: <http://lists.kolab.org/pipermail/devel/attachments/20101115/885455a9/attachment.sig>


More information about the devel mailing list