Pruning the dbmail_headervalue table

Saturday, February 13. 2010

In DBMail 2.2 the dbmail_headervalue table will contain all header-values for all messages in store. Even for medium-sized installations this can easily result in a very large table.

However, since that table is just about only used for IMAP search you might consider dropping a couple of headernames from the cache.

For example:

DELETE FROM dbmail_headername WHERE headername = 'received';
will delete from dbmail_headervalue all entries regarding the Received header.

To determine which headernames should be dropped you could use a view like this:

CREATE VIEW header_count AS 
    SELECT count(1) AS count, n.id, n.headername 
    FROM dbmail_headervalue v 
    LEFT JOIN dbmail_headername n ON v.headername_id=n.id 
    GROUP BY n.id;

After that you can do:

SELECT * FROM header_count ORDER BY count; 
and delete all headernames from dbmail_headername for those headers you deem unlikely to ever be used in IMAP search.
DELETE FROM dbmail_headername WHERE headername = 'Received';

Doing this for a couple of the most prolific - but un-used headers - will drastically reduce the size of the dbmail_headervalue tables. But remember you will have to keep an eye on the header_count, and re-issue the delete queries on a regular basis.

A RESTful interface is coming

Friday, July 3. 2009

Again and again, dbmail administrators ask us about the best way for their application to talk to dbmail. And our answer has always been the same: talk IMAP.

However, IMAP is a notorious protocol for many. Not because it is inherently evil, but because creating a parser for imap formatted data is not a trivial thing. It's simply a lot of work, and easy to get wrong. It's also top-heavy; a lot of the capabilities in IMAP are simply not needed at all by your typical webapp developer.

So the question was: how can we design a solution that will allow an application developer to quickly integrate with dbmail, using standard tools and languages.

Being forward compatible could be considered a secondary, but no less important requirement. The chosen approach - short of IMAP - often comes as custom build queries talking directly with the database backend. Few if any who walked this path, spent any effort on supporting database servers other than the one they use. This does not make for long-term viability, and does not support the community at large. Also, dbmail's schema has and will - quite possibly - change. Retrieving a message from the database was simple, but is no longer. Finally, not being able to leverage the scalability features included in the recent releases means you have to worry about a denial of service on your database server when the number of visitors of your webapp increases.

So enters a solution: dbmail-httpd. A simple event-driven daemon that will expose the object model of dbmail through a RESTful interface. A php5 userland module will also be provided both to test the interface and to demonstrate the power of this approach. Where possible, requested data will be returned as JSON. This makes parsing completely trivial of course.

It is our hope and expectation that this will drive development for a new class of webmail interfaces built on top of dbmail, but also enable construction of a new - and native - administration interface that will not require additional hosting sit-ups.

so stay tuned for more.

DBMail 2.3.6 released

Friday, July 3. 2009

I've just released dbmail 2.3.6, the latest development release.

There are still some rough edges in the packaging and documentation, but otherwise the code is approaching productionlevel quality.

New features in this release:

Single-instance header storage

The header caching tables used since 2.2 have been replaced with a new schema, optimized for a much smaller storage footprint, and therefor faster access. Headers are now cached using a single-instance storage pattern, similar to the one used for the message parts. This change also introduces for the first time the appearance of views in the database, which is somewhat experimental because of some uncertainties with regard to the possible performance impact this may have.

Authentication logging

A new table was added to the schema to log a couple of key metrics for users connecting to one of the daemons.

Storage migration

dbmail-util now supports migrating your old content into the single-instance storage.

Of course, a number of bugs have also been fixed along the way:

  • 0000689: [Command-Line programs (dbmail-users, dbmail-util)] dbmail-exports fails with File size limit exceeded (paul) - resolved.
  • 0000775: [PIPE delivery (dbmail-smtp/dbmail-deliver)] Issue with multiple inline attachments (paul) - resolved.
  • 0000783: [General] Boundary missing in message construction (paul) - resolved.
  • 0000681: [General] message reconstruction fails on message (paul) - resolved.
  • 0000774: [IMAP daemon] SQLException using dbmail-imapd - resolved.
  • 0000766: [POP3 daemon] dbmail-pop3d crash (paul) - resolved.
  • 0000754: [General] single instance storage for headervalues (paul) - resolved.
  • 0000760: [LMTP daemon] DNS Regresion in 2.3.5 (netvulture) - resolved.
  • 0000743: [LMTP daemon] Memory leak in lmtpd (paul) - resolved.
  • 0000755: [POP3 daemon] POP3D crash when fetchmail tries to connect (paul) - resolved.
  • 0000720: [Command-Line programs (dbmail-users, dbmail-util)] Missing operations on dbmail-util (paul) - resolved.

Changelog

Download

Interfacing with the DBMail database

Tuesday, May 20. 2008

A typical question that pops up now and then is about direct database access:

I'm looking for an application to help me save emails to a Database. I read about your email solution, DBMail, and it looks really good. I already have a mail server I'm using for my webmail, but my question is whether it would be possible to setup logging of emails to a database using my current mail server and DBMail.

DBMail uses a database to store its messages. Currently PostgreSQL, MySQL and Sqlite are supported. The intent of the database backend is to provide speed, scalability and integrity in storage. The database backend is not especially suited for direct access. The database schema is heavily normalized and contains numerous indexes and caching tables for speed, as well as trigger logic to ensure data integrity.

It's best to let DBMail manage the database contents and do message storage and retrieval through the appropriate mail protocols (LMTP for storage, POP3 or IMAP4 for retrieval). An additional advantage of this approach is, you can swap in or out any mail server under your webmail scripts layer. These protocols are widely used and well understood.