[Prev][Next][TOC][Home]


Archives

Why does a message get split into mulitple messages with no headers?

If you are processing UUCP mailbox files, messages are separated by a line starting with "From " (ie. The word "From" followed by a space). Some mail software will prefix lines in message bodies with a `>' to avoid MUA's from incorrectly treating the line as a message separator. However, some mail software doesn't.

To avoid incorrect separator detection, many MUAs perform a more stricter detection of separators beyond "From ". MHonArc, by default, will treat lines starting with "From " as a message separator, which can lead to incorrect message termination if the From line has not been escaped with a `>'.

To fix the problem, use the MSGSEP resource to instruct MHonArc to use a stricter test detecting a message separator. The following MSGSEP resource setting is known to work well:

<MsgSep>
^From \S+\s+\S+\s+\S+\s+\d+\s+\d+:\d+:\d+\s+\d+
</MsgSep>

If this fails, you can try the CONLEN resource available in v2.0. The CONLEN resource, when set, tells MHonArc to utilize the Content-Length fields in the message head. If your MTA defines this field accurately (sendmail on Solaris does), then you can utilize this feature.

Can I move a message from one archive to another?

No. In order to achieve the same effect, you must add the original, unprocessed, message to the destination archive, then remove the appropriate HTML version of the message from the source archive.

Can I reconstruct a database from the HTML messages?

Yes. The following was contributed by Stephane Bortzmeyer:

... some text deleted ...

Having rmed my database :-( I had to write such a program. I include it at the end,
it seems quite simple, while necessiting a few text edition after (you just have to
include the output of my program in an empty database).


#!/usr/local/bin/perl

require 'timelocal.pl';
require '/web/mail/MHonArc/lib/mhutil.pl';
require '/web/mail/MHonArc/lib/mhtime.pl';

$dir = shift (@ARGV);

opendir (DIR, "$dir") || die "Cannot open $dir: $!";
while ($file = readdir (DIR)) {
    if ($file =~ /^msg([0-9]+)\.html$/) {
        $no = $1;
        open (FILE, "< $dir/$file") || die "Cannot open $file: $!";
        while (<FILE>) {
            chop;
            if (/^<!--X-([^:]*): (.*)-->$/) {
                $headers{$1} = $2;
                $headers{$1} =~ s/ *$//;
            }
        }
        close (FILE);
        @date = &parse_date ($headers{'Date'});
        $date = &get_time_from_date ($date[1], $date[2], $date[3], $date[4], $date[5], $date[6]);
        $id = "$date $no";
        print STDERR "Message $id:\n";
        foreach $header (keys (%headers)) {
            print STDERR "$header: $headers{$header}\n";
            $name = $header;
            $name =~ s/-//;
            $$name{$id} = $headers{$header};
        }
    } 
}
closedir (DIR);
print "%ContentType = (\n";
foreach $key (keys (%ContentType)) {
    print "\'$key\', \'$ContentType{$key}\',\n";
}
print ");\n";
print "%Date = (\n";
foreach $key (keys (%Date)) {
    print "\'$key\', \'$Date{$key}\',\n";
}
print ");\n";
print "%From = (\n";
foreach $key (keys (%From)) {
    print "\'$key\', \'$From{$key}\',\n";
}
print ");\n";
print "%MsgId = (\n";
foreach $key (keys (%MessageId)) {
    print "\'$key\', \'$MessageId{$key}\',\n";
}
print ");\n";
print "%Subject = (\n";
foreach $key (keys (%Subject)) {
    print "\'$key\', \'$Subject{$key}\',\n";
}
print ");\n";
print "%IndexNum = (\n";
foreach $key (keys (%MessageId)) {
    ($garbage, $num) = split (' ', $key);
    print "\'$key\', \'$num\',\n";
}
print ");\n";

Is it safe to add messages to an archive as they are received?

Yes. MHonArc performs archive locking to protect from multiple MHonArc process attempting to write to an archive at the same time. This locking allows MHonArc to safely be used to add messages as they are received.

So it is safe. How do I do it??

The following example assumes you are using on a Unix system using sendmail as the mail transfer agent. Please refer to documentation about sendmail if you are not familiar with it (sendmail, 2ed, from O'Reilly is an excellent source).

The approach shown here uses a .forward file in the home directory of the account you want mailed archived. For this example, let's assume it is my account. Here is how to set up the .forward file to invoke MHonArc on incoming mail:

\ehood, "|/home/ehood/bin/webnewmail #ehood"
NOTE on .forward entry:

The "\ehood" tells sendmail to still deposit the incoming message to my mail spool file. The "#ehood" Bourne shell comment is needed to insure the command is unique from another user. Otherwise, sendmail may not invoke the program for you or the other user.

webnewmail is a Perl program that calls MHonArc with the appropriate arguments. A wrapper program is used instead of calling MHonArc directly to keep the .forward file simple, but you can call MHonArc directly if you want. Here is the code to the webnewmail program:

#!/usr/local/bin/perl
# Edit above path to point to where perl is on your system.

##	Specify a package to protect names from MHonArc.
##	MHonArc uses package main for most stuff; a minor
##	inconvenience.

package webnewmail;

##	Edit to point to installed mhonarc.

$MHonArc = "/home/ehood/bin/mhonarc";

##	Define ARGV (ARGV is same across all packages).
##	Edit options as required/desired.

@ARGV = ("-add",
	 "-quiet",
	 "-outdir", "/home/ehood/public_html/newmail");

##	Just require mhonarc, this prevents the overhead of a
##	fork/exec.  We reset the namespace to main just in-case.

package main;
require $webnewmail'MHonArc;
	# Or, $webnewmail::MHonArc (Perl 5 style)

The webnewmail program has to have the executable bit set. This is achieved by using "chmod a+x webnewmail".

Can I get MHonArc to filter messages to different archives?

No. This is outside of the MHonArc's scope. You can grow your own filter, using the method described in the previous question, to scan the message header an invoke MHonArc with the proper arguments. Or. you can use a tool like Procmail (http://www.ii.com/internet/robots/procmail/). Here are a some messages from users about using Procmail:

... some text deleted ...

Here is what I use in .procmailrc to archive the mhonarc list:

NEWDATE="`/usr/bin/date +%Y-%m`"
MHONARC_MBOX="/local/mail/lists/mhonarc/$NEWDATE.mbox"
:0: $MHONARC_MBOX$LOCKEXT
* ^Sender:.*owner-mhonarc@
{
        :0 c
        $MHONARC_MBOX

        :0 c
        | /local/mail/mhonarc-1.2.2/mailarchive -add mhonarc "$NEWDATE"
}

Mailarchive is nothing more than a wrapper around mhonarc with my long.
list of options.

Achim
P.S. Procmail itself comes with an example manual page. It's worth
     looking into it.

You can actually dispense with the wrapper if you use environment
variables to pass options to MHonArc, but I'm sure Achim has a good
reason for doing it his way.  Just for the purposes of comparion,
here's how I do it:

eeeweb% cat .procmailrc
#Set on when debugging
VERBOSE=off
#Replace `mail' with your mail directory (Pine uses mail, Elm uses Mail)
MAILDIR=$HOME/Mail
#Directory for storing procmail log and rc files
PMDIR=$HOME/.procmail
#Path and options for mhonarc
MHONARC='/dcs/packages/infosys/bin/mhonarc -add -quiet -umask 022 -idxfname inde
x.html'
:0
* ^Originator:.*@classes.uci.edu
{
  MHHOME=$HOME/classarc
  LOGFILE=$PMDIR/classlists.log
  INCLUDERC=$PMDIR/rc.classlists
}
:0 E
{
  MHHOME=$HOME/mail-arc
  LOGFILE=$PMDIR/otherlists.log
  INCLUDERC=$PMDIR/rc.otherlists
}

and then in the file .procmail/rc.classlists or rc.otherlists (depending
on the Originator: of the message), lots of the following:

# Procmail Entry for uci-www
:0 E
* ^TOuci-www
{
  :0 c
  uci-www/.

  :0
  |$MHONARC -rcfile $MHHOME/uci-www/0-rcfile.html -outdir $MHHOME/uci-www
}

Eric D. Friedman
friedman@uci.edu
... some text deleted ...

I use procmail to drive mhonarc archives from Majordomo.  I set up a
single pseudouser and drive several archives from the one pseudouser. 

Here's a sample .forward file:

"|/usr/ucb/rsh cappuccino \"set IFS=' '; exec
/usr/local/procmail/bin/procmail #widget\""

Another example is:

"|/bin/csh -c \"set IFS=' '; exec /usr/local/procmail/bin/procmail
#widget\""

Two reasons to use the "rsh cappuccino":
1. doesn't require the user to be able to login to server, although
   the username must still be valid
2. gets the processing load off the mail server

Here's an example .procmail recipe:

LOGFILE=$HOME/procmail_errors
LOGABSTRACT=all
LOCKEXT=.lock
VERBOSE=on
UMASK=003

# widget: list short description
:0 H
* ^List-Name: widget
{
  # The rotate call (under construction) does archive rotation
  # leave commented!
  #:0c i
  #| /home/web-arch/bin/rotate /usr/local/web/webarchive/widget

  # Put the mail in the mailbox, which is used by archiver to re-generate
  # the html indexes
  :0 cA
  /usr/local/web/webarchive/widget/current/mbox

  # The mhonarc call examines mbox, turns the mail messages into .html
  # documents, and compiles the indexes.
  # -reverse -treverse\
  :0 ia
  | /usr/local/mhonarc/bin/mhonarc \
    -idxfname index.shtml \
    -tidxfname threads.shtml \
    -rcfile widget.rc\
    -outdir /usr/local/web/webarchive/widget/current \
    /usr/local/web/webarchive/widget/current/mbox

}

I have a directory per archive, and put the current period in directory
"current".  Then I have an index page per archive that indexes the
periods, plus gives information about the list and how to
subscribe/unsubscribe.  The widget.rc file resides in the pseudouser's
home directory.

Note the 
* ^List-Name: widget
I put the following in the majordomo list's config file:

message_headers   <<  END
List-Name: widget
END

This adds the "List-Name" header to messages, which is what procmail
filters for.

Hope this helps

Paul McKinley
Unix SysAdmin Contractor

Is it safe to specify -add when no archive exists?

Yes. If MHonArc sees no archive exists when perform an add, it will automatically create the archive.

WARNING

Make sure the file maillist.html (or the value of the IDXFNAME resource) does not exist if no archive exists and -add has been specified. Otherwise, unpredictable output of the maillist.html file may result if maillist.html is not in the proper format.

Why are there "jumps" in message numbers?

Big gaps in the message number sequence may occur if you defined the MAXSIZE resource and you have MHonArc rescanning a mail folder for adding new messages. The problem occurs when MHonArc reads in messages that will automatically get deleted due to MAXSIZE. Ie. Messages subject to automatic deletion are the oldest ones. If the input contains old messages that will get deleted at the end of processing, the old messages will still use up message numbers since messages to be deleted are not determined until all input is read. Since MHonArc does not keep information about deleted messages, if the messages are fed into MHonArc again, the "jumping" will occur again (and the jump will get larger for each additional update).

To avoid the problem, try to pass only new, never processed, messages to MHonArc instead of having MHonArc rescanning the same mail folder for new messages. Another approach is to set either the EXPIREAGE or EXPIREDATE resources (available in v2.0 beta 2, or later). These work as an alternative to MAXSIZE and will help in preventing message number jumping since expiration of a message is checked when it is initially read (bypassing the assignment of a message number).

Why do some messages get re-added each time MHonArc processes a mail folder

This condition may occur when you have MHonArc examine the same folder periodically to add any new message. If there are messages in the folder without message-ids, then those messages will be re-added each time MHonArc runs.

Why? Well, MHonArc uses message-ids for determining if a message has been archived, or not. Therefore, if a message-id is missing for a message, then MHonArc believes it is new.

In general, mail has message-ids. They get assigned by MTAs. However, if messages are generated by a CGI program, or other non-mail specific software, then the program in question should create a message-id. Else, you will need to move already-processed messages into a different area so MHonArc does not read them again.


[Prev][Next][TOC][Home]


97/06/06 19:47:51
MHonArc
Copyright © 1997, Earl Hood, ehood@medusa.acs.uci.edu