MHonArc Resource List

CHARSETCONVERTERS


Syntax

Envariable

N/A

Element

<CHARSETCONVERTERS>
charset-filter-specification
</CHARSETCONVERTERS>

Command-line Option

N/A


Description

The CHARSETCONVERTERS resource specifies Perl routines to call for filtering characters of a character set to HTML legal characters. The filtering occurs for message header data encoded according to the MIME standard. The following example shows a header with encoded data:

From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu>
To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk>
CC: =?ISO-8859-1?Q?Andr=E9_?= Pirard <PIRARD@vm1.ulg.ac.be>
Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?=
 =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?=

This CHARSETCONVERTERS resource can only be defined via the resource file. Each line of the element specifies a character set, the Perl routine for filtering the character set, and the Perl source file containing the routine.

Example:

<CharsetConverters>
iso-8859-1:iso_8859'str2sgml:iso8859.pl
</CharsetConverters>

The first field is the character set specification. The second field is the routine name (which should contain a package qualifier). The third field is the source file the routine is defined. The source file is optional if the routine is known to be define in an already listed source file.

NOTE

Package qualification must use Perl 4 syntax. The '::' qualification is not supported, yet.

There are some special character set specifications. They are as follows:

plain

This specifies text that is not explicitly encoded in a specific character set.

default

This specifies the default routine to invoke for encoded data is no specific character specification exists for the data.

There are some special character set converter routines values. They are as follows:

-ignore-

Leave the data "as-is". I.e. The MIME encoding will be preserved.

-decode-

Just decode the data. This is useful if it is known that the characters set is the native character set for the system.

WARNING

If the decoded data contains the characters '<', '>', and '&', this may conflict with HTML markup. See DECODEHEADS where "-decode-" can be used.

Each charset converter function is invoked as follows:

    $converted_data = &function($data, $charset);

The data passed in will already be decoded from quoted-printable or base64 (as specified by the MIME syntax). Therefore, the called routine will be passed the raw byte data. It is important that the routine convert the data into a format suitable to be included in HTML markup.


Default Setting

<CharsetConverters>
plain:iso_8859'str2sgml:iso8859.pl
us-ascii:iso_8859'str2sgml:iso8859.pl
iso-8859-1:iso_8859'str2sgml:iso8859.pl
iso-8859-2:iso_8859'str2sgml:iso8859.pl
iso-8859-3:iso_8859'str2sgml:iso8859.pl
iso-8859-4:iso_8859'str2sgml:iso8859.pl
iso-8859-5:iso_8859'str2sgml:iso8859.pl
iso-8859-6:iso_8859'str2sgml:iso8859.pl
iso-8859-7:iso_8859'str2sgml:iso8859.pl
iso-8859-8:iso_8859'str2sgml:iso8859.pl
iso-8859-9:iso_8859'str2sgml:iso8859.pl
iso-8859-10:iso_8859'str2sgml:iso8859.pl
default:-ignore-
</CharsetConverters>

Resource Variables

N/A


Examples

The following example specifies to just decode iso-8859-1 character data since it is the default character set used by most browsers:

<CharsetConverters>
iso-8859-1:-decode-
</CharsetConverters>

Version

2.0


See Also

DECODEHEADS


97/06/03 17:03:56
MHonArc
Copyright © 1997, Earl Hood, ehood@medusa.acs.uci.edu