[Kolab-devel] OpenLDAP replication issues: slurpd vs syncrepl

Thu Feb 23 14:24:42 CET 2006

Note: I am not running Kolab anywhere at present, but I would like to correct 
some perceptions about OpenLDAP.

On Monday 13 February 2006 13:02, Fabio Pietrosanti wrote:
> Martin Konold wrote:
> > Am Sonntag, 12. Februar 2006 12:33 schrieb Fabio Pietrosanti:
> >
> > Hi Fabio,
> >
> > I don'z think that syncrepl is a benefit even with 80 000 users
> > currently. If time tells that syncrepl is much more robust than slurpd
> > (which could be the case because the architecture is more robust) I am
> > willing to have a look at it for this single reason.

In our experience in our smaller LDAP deployment (30 000 entries in LDAP), 
sync-repl is much more robust.

> > In short the amount of LDAP data is small, mostly read and seldom written
> > and is required in all  locations with the current Kolab model.
>
> I agree that time is needed to evaluate which replication methods is
> more robust, however i really doesn't like slurpd that read replica log
> files as the only interaction with slapd.

The problem is dealing with reject files (for whatever reason), which must be 
done manually.

> I would like to take the opportunity to understand and discuss some
> kolab design aspects i don't understand and that i consider as a
> limitations for the projects in the route to enterprise markets.
>
> >> This would give many improvement:
> >> - security
> >>   With syncrepl is possible to specificy parameters for what have to be
> >> replicate and where.
> >
> > Currently Kolab needs all data anyway.
>
> Let's discuss why and how we should reduce/rationalize the data needed
> for kolab.
>
> >>    It should be possible to replicate to slave server B only the users
> >> that have KolabHomeServer: B .
> >
> > No, e.g. for public folders access control all users arer required on all
> > servers.
>
> Ok, i understand the needs.
>
> I'm wondering whether there could be some possible approach to reduce
> the ldap data between slave servers.
>
> It could be possible to introduce the concept of "ldap referral" for
> "non local users" or not to copy all the attributes but only the CN and
> needed data for public folders ACL managements?
>
> >>    Or it should be replicated the whoole ldap database but without the
> >> "password" for "non local users".
> >
> > If you don't trust in the physical security of Kolab servers you are in
> > trouble anyway.
> >
> > In general you may though avoid the password hashes entirely when relying
> > on a third party authentification mechanism. (SASL is your friend.)
>
> Suppose the traditional scenario of big organizations (bank, postal
> office, or whatever when you have a HQ and branch offices with 5-to-20
> persons).
> Each branch office in a microsoft scenario have a BDC along with a small
> exchange server.

BDC has full password information (ignoring domain trusts for now).

> Does the HQ trust the IT persons at the branch office and would give
> them the complete directory access? No!
> Does the HQ trust the phisical security at the branch office? No!
>
> So, why they should replicate the whoole directory along with the
> passwords and other sensitive data to the branch office?
>
> Imho this is a real scenario that would severely limit the introduction
> of kolab in the enterprise (big organizations) market.
>
> >> - network performance
> >>   Only the data needed to allow a slave server to work should be
> >> replicated.
> >
> > I don't buy this mainly because the amount of data for LDAP replication
> > is negligible.
>
> When OpenLDAP start generating transaction logs of 2.0GB for a 300mb
> database that are replicated between all kolab servers the network
> performance problem will reveal.

Transaction logs are not transferred, only the data that arrives in 
slurpd.replog for the specific server is replicated. slapcat the database to 
get a better idea of what is replicated (probably ~ 50MB).

> I don't know if this could be solved removing the openldap functionality
> of creating log.xxxxxx inside openldap-data directory

This is not OpenLDAP functionality, it is Berkeley DB functionality, and you 
can configure (via DB_CONFIG file in the database directory):
-location of transaction log files
-size of transaction log files
-automatic removal of transaction log files.

You should also tune the amount of memory used for the Berkeley DB database 
cache.

> however after the 
> import of 78k users openldap created more than 2GB of transaction logs
> which was part of the replica and caused a network congestions in the
> infrastructure i was setting up.

No, you are mistaken. The transaction logs are for the database backend, and 
have no impact on slurpd etc. And, the transaction logs will be present even 
if using sync-repl.

Also, for bulk loading, you can disable transactions.

> For this reason, replicating only needed data, carefully selecting it,
> would drammatically speed up the network performance.

This conclusion is based partially on incorrect assumptions.

>
> >> - cyrus performance
> >>   Only mailboxes of local users should be created.
> >
> > For proper ACLs it is benefitial if the cyrus imapd knows about all
> > users. Performance should not be affected by this. Can you provide any
> > measurements?
>
> When you have a lot of users the:
> - creation of thousands of unused mailboxes
> - verification and setup of all ACL for each mailbox at each kolabd restart
> - each kolabd ldap related activity that need to crawl the directory
> (like transport/virtual postfix map creation)
>
> would create some severe performance problem because kolabd and cyrus
> have to do *a lot* of "not needed" work.
>
> Crawling the ldap directory along with getting in memory 78k users along
> with creating mailboxes along with comparing ACL for each user (even if
> not needed because it's not a local users), trust me that cause severe
> performance problem.

But, is this just a design issue on the Kolab side? Searching 30 000 entries 
out of our OpenLDAP servers takes a few seconds:

$ time ldapsearch  -x -b dc=cybertrade,dc=co,dc=za,dc=isp -D 
cn=Manager,dc=telkomsa,dc=net -w $rootpw -z0 -l0 >/dev/null

real    0m2.211s
user    0m1.643s
sys     0m0.340s

$ time ldapsearch  -x -b dc=cybertrade,dc=co,dc=za,dc=isp -D 
cn=Manager,dc=telkomsa,dc=net -w $rootpw -z0 -l0 |grep ^dn|wc -l
30045

real    0m9.960s
user    0m12.274s
sys     0m0.233s

> I don't think that Microsoft Exchange and Active Directory replicate all
> mailboxes along with all ldap objects on every slave servers.

AFAIK this depends on the scenario.

> For performance reasons only "local mailbox" and only "needed local ldap
> objects" should be replicated from the master servers.
>
> Imho kolab should follow that way, at least for cyrus mailboxes.
>
> If there is a valid reason to keep kolab replicate all cyrus mailboxes
> across all slave servers let me know, only because i'm quite new to the
> kolab project and doesn't know all the design decision that was made in
> past.
>
> >> - kolab design simplicity enanchments
> >>    Slurpd should be used only for kolabd notification but not for
> >> replica, leaving this task to the more feature rich syncrepl.
> >
> > How will this simplify the Kolab design?
>
> When you manage huge database you should really reduce the amount of
> data that each components manage.
> Slurpd doesn't allow careful selection of what need to be replicated and
> instead syncrepl does.
>
> Additionally we all know that OpenLDAP is not a robust product, it often
> crash for misconfiguration or data integrity problem.

OpenLDAP is stable, data integrity is really only affected by whether database 
recovery is run when appropriate. Some locking can occur (and, may appear to 
some to be instability) if you don't tune the database backend correctly.

> Having two 
> processes (slapd + slurpd) mean taking care of resuming from crashes two
> different process instead of one.
>
> In the infrastructure i setup, slurpd have some stability problem and we
> need to resume it from "unknown" crashes, otherwise the replica will not
> work fine.
>
> And with slurpd when you will have inconsistencies, you will have
> trouble and have do to some manual an non intuitive work to recover the
> situation.

I agree on this aspect. Using sync-repl is much easier in this case, at worst 
you can just remove the entire database on the slave, start slapd on a 
different port logging at sync, and wait for it to finish syncing ('slapd -h 
ldap://localhost:3389 -u ldap -g ldap -d stats' or similar).

> Those are the main reasons that make me work at syncrepl as an
> alternative for data replication.

It is also possible to monitor the replication status (not if the last 
replication attempt worked, but if the slave is in sync with the master).

However, I wonder if the use of sync-repl may simplify the kolabd issues ...

Regards,
Buchan

-- 
Buchan Milne
ISP Systems Specialist
B.Eng,RHCE(803004789010797),LPIC-2(LPI000074592)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: <http://lists.kolab.org/pipermail/devel/attachments/20060223/39eb2a32/attachment.sig>