![]() |
Ressources sur Usenet-Fr et Usenet |
INTERNET DRAFT to be NEWS sec. -
News Article Format and Transmission
Henry Spencer
Status of this Memo
This document is intended to become an Internet Draft.
Internet Drafts are working documents of the Internet Engi-
neering Task Force (IETF), its Areas, and its Working
Groups. Note that other groups may also distribute working
documents as Internet Drafts.
Internet Drafts are draft documents valid for a maximum of
six months. Internet Drafts may be updated, replaced, or
obsoleted by other documents at any time. It is not appro-
priate to use Internet Drafts as reference material or to
cite them other than as a "working draft" or "work in
progress".
Please check the I-D abstract listing contained in each
Internet Draft directory to learn the current status of this
or any other Internet Draft. (Actually, this draft is at
too early a stage to even be listed there yet.)
It is hoped that a later version of this Draft will obsolete
RFC 1036 and will become an Internet standard.
References to the "successor to this Draft" refer not to
later versions of this draft, but to a hypothetical future
rewrite of this Draft (in the same way that this Draft is a
rewrite of RFC 1036).
Distribution of this memo is unlimited.
Abstract
This Draft defines the format and procedures for interchange
of network news articles. It is hoped that a later version
of this Draft will obsolete RFC 1036, reflecting more recent
experience and accommodating future directions.
Network news articles resemble mail messages but are broad-
cast to potentially-large audiences, using a flooding algo-
rithm that propagates one copy to each interested host (or
group thereof), typically stores only one copy per host, and
does not require any central administration or systematic
registration of interested users. Network news originated
as the medium of communication for Usenet, circa 1980.
2 June 1994 - 1 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. -
Since then Usenet has grown explosively, and many Internet
sites participate in it. In addition, the news technology
is now in widespread use for other purposes, on the Internet
and elsewhere.
This Draft primarily codifies and organizes existing prac-
tice. A few small extensions have been added in an attempt
to solve problems that are considered serious. Major exten-
sions (e.g. cryptographic authentication) that need signifi-
cant development effort are left to be undertaken as inde-
pendent efforts.
Table of Contents
TBW
1. Introduction
Network news articles resemble mail messages but are broad-
cast to potentially-large audiences, using a flooding algo-
rithm that propagates one copy to each interested host (or
groups thereof), typically stores only one copy per host,
and does not require any central administration or system-
atic registration of interested users. Network news origi-
nated as the medium of communication for Usenet, circa 1980.
Since then Usenet has grown explosively, and many Internet
sites participate in it. In addition, the news technology
is now in widespread use for other purposes, on the Internet
and elsewhere.
The earliest news interchange used the so-called "A News"
article format. Shortly thereafter, an article format
vaguely resembling Internet mail was devised and used
briefly. Both of those formats are completely obsolete;
they are documented in appendix A for historical reasons
only. With publication of RFC 850 [rrr] in 1983, news arti-
cles came to closely resemble Internet mail messages, with
some restrictions and some additional headers. RFC 1036
[rrr] in 1987 updated RFC 850 without making major changes.
In the intervening five years, the RFC 1036 article format
has proven quite satisfactory, although minor extensions
appear desirable to match recent developments in areas such
as multi-media mail. RFC 1036 itself has not proven quite
so satisfactory. It is often rather vague and does not
address some issues at all; this has caused significant
interoperability problems at times, and implementations have
diverged somewhat. Worse, although it was intended primar-
ily to document existing practice, it did not precisely
match existing practice even at the time it was published,
and the deviations have grown since.
2 June 1994 - 2 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 1
This Draft attempts to specify the format of articles, and
the procedures used to exchange them and process them, in
sufficient detail to allow full interoperability. In addi-
tion, some tentative suggestions are made about directions
for future development, in an attempt to avert unnecessary
divergence and consequent loss of interoperability. Major
extensions (e.g. cryptographic authentication) that need
significant development effort are left to be undertaken as
independent efforts.
NOTE: One question this all may raise is: why is
there no News-Version header, analogous to MIME-
Version, specifying a version number corresponding
to this specification? The answer is: it doesn't
appear to be useful, given news's backward-
compatibility constraints. The major use of a
version number is indicating which of several
INCOMPATIBLE interpretations is relevant. The
impossibility of orchestrating any sort of simul-
taneous change over news's installed base makes it
necessary to avoid such incompatible changes (as
opposed to extensions) entirely. MIME has a ver-
sion number mostly because it introduced incompat-
ible changes to the interpretation of several
"Content-" headers. This Draft attempts no
changes in interpretation and it appears doubtful
that future Drafts will find it feasible to intro-
duce any.
UNRESOLVED ISSUE: Should this be reconsidered?
Only if the header has SPECIFIC IDENTIFIABLE uses
today. Otherwise it's just useless added bulk.
As in this Draft's predecessors, the exact means used to
transmit articles from one host to another is not specified.
NNTP [rrr] is probably the most common transmission method
on the Internet, but a number of others are known to be in
use, including the UUCP protocol [rrr] extensively used in
the early days of Usenet and still much used on its fringes
today.
Several of the mechanisms described in this Draft may seem
somewhat strange or even bizarre at first reading. As with
Internet mail, there is no reasonable possibility of updat-
ing the entire installed base of news software promptly, so
interoperability with old software is crucial and will
remain so. Compatibility with existing practice and robust-
ness in an imperfect world necessarily take priority over
2 June 1994 - 3 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 1
elegance.
2. Definitions, Notations, and Conventions
2.1. Textual Notations
Throughout this Draft, "MAIL" is short for "RFC 822 [rrr] as
amended by RFC 1123 [rrr]". (RFC 1123's amendments are
mostly relatively small, but they are not insignificant.)
See also the discussion in section 3 about this Draft's
relationship to MAIL. "MIME" is short for "RFCs 1341 and
1342" (or their updated replacements).
UNRESOLVED ISSUE: Update these numbers.
"ASCII" is short for "the ANSI X3.4 character set" [rrr].
While "ASCII" is often misused to refer to various character
sets somewhat similar to X3.4, in this Draft, "ASCII" means
X3.4 and only X3.4.
NOTE: The name is traditional (to the point where
the ANSI standard sanctions it) even though it is
no longer an acronym for the name of the standard.
NOTE: ASCII, X3.4, contains 128 characters, not
all of them printable. Character sets with more
characters are not ASCII, although they may
include it as a subset.
Certain words used to define the significance of individual
requirements are capitalized. "MUST" means that the item is
an absolute requirement of the specification. "SHOULD"
means that the item is a strong recommendation: there may be
valid reasons to ignore it in unusual circumstances, but
this should be done only after careful study of the full
implications and a firm conclusion that it is necessary,
because there are serious disadvantages to doing so. "MAY"
means that the item is truly optional, and implementors and
users are warned that conformance is possible but not to be
relied on.
The term "compliant", applied to implementations etc., indi-
cates satisfaction of all relevant "MUST" and "SHOULD"
requirements. The term "conditionally compliant" indicates
satisfaction of all relevant "MUST" requirements but viola-
tion of at least one relevant "SHOULD" requirement.
This Draft contains explanatory notes using the following
format. These may be skipped by persons interested solely
in the content of the specification. The purpose of the
notes is to explain why choices were made, to place them in
context, or to suggest possible implementation techniques.
2 June 1994 - 4 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 2.1
NOTE: While such explanatory notes may seem super-
fluous in principle, they often help the less-
than-omniscient reader grasp the purpose of the
specification and the constraints involved. Given
the limitations of natural language for descrip-
tive purposes, this improves the probability that
implementors and users will understand the true
intent of the specification in cases where the
wording is not entirely clear.
All numeric values are given in decimal unless otherwise
indicated. Octets are assumed to be unsigned values for
this purpose. Large numbers are written using the North
American convention, in which "," separates groups of three
digits but otherwise has no significance.
2.2. Syntax Notation
Although the mechanisms specified in this Draft are all
described in prose, most are also described formally in the
modified BNF notation of RFC 822. Implementors will need to
be familiar with this notation to fully understand this
specification, and are referred to RFC 822 for a complete
explanation of the modified BNF notation. Here is a brief
illustrative example:
sentence = clause *( punct clause ) "."
punct = ":" / ";"
clause = 1*word [ "(" clause ")" / "," 1*word ]
word =
This defines a sentence as some clauses separated by puncts
and ended by a period, a punct as a colon or semicolon, a
clause as at least one optionally followed by either
a parenthesized clause or a comma and at least one more
, and a as (informally) any English word. <>
are used to enclose names when (and only when) distinguish-
ing them from surrounding text is useful. The full form of
the repetition notation is "*", denoting
through repetitions of ; defaults to zero,
to infinity, and the "*" and can be omitted if
and are equal, so 1*word is one or more words, 1*5word
is one through five words, and 2word is exactly two words.
The character "\" is not special in any way in this nota-
tion.
This Draft is intended to be self-contained; all syntax
rules used in it are defined within it, and a rule with the
same name as one found in MAIL does not necessarily have the
same definition. The lexical layer of MAIL is NOT, repeat
NOT, used in this Draft, and its presence must not be
assumed; notably, this Draft spells out all places where
2 June 1994 - 5 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 2.2
white space is permitted/required and all places where con-
structs resembling MAIL comments can occur.
NOTE: News parsers historically have been much
less permissive than MAIL parsers.
2.3. Definitions
The term "character set", wherever it is used in this Draft,
refers to a coded character set, in the sense of ISO charac-
ter set standardization work, and must not be misinterpreted
as meaning merely "a set of characters".
In this Draft, ASCII character 32 is referred to as "blank";
the word "space" has a more generic meaning.
An "article" is the unit of news, analogous to a MAIL "mes-
sage".
A "poster" is a human being (or software equivalent) submit-
ting a possibly-compliant article to be "posted": made
available for reading on all relevant hosts. A "posting
agent" is software that assists posters to prepare articles,
including determining whether the final article is compli-
ant, passing it on to a relayer for posting if so, and
returning it to the poster with an explanation if not. A
"relayer" is software which receives allegedly-compliant
articles from posting agents and/or other relayers, files
copies in a "news database", and possibly passes copies on
to other relayers.
NOTE: While the same software may well function
both as a relayer and as part of a posting agent,
the two functions are distinct and should not be
confused. The posting agent's purpose is (in
part) to validate an article, supply header infor-
mation that can or should be supplied automati-
cally, and generally take reasonable actions in an
attempt to transform the poster's submission into
a compliant article. The relayer's purpose is to
move already-compliant articles around efficiently
without damaging them.
A "reader" is a human being reading news articles. A "read-
ing agent" is software which presents articles to a reader.
NOTE: Informal usage often uses "reader" for both
these meanings, but this introduces considerable
potential for confusion and misunderstanding, so
this Draft takes care to make the distinction.
A "newsgroup" is a single news forum, a logical bulletin
board, having a name and nominally intended for articles on
2 June 1994 - 6 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 2.3
a specific topic. An article is "posted to" a single news-
group or several newsgroups. When an article is posted to
more than one newsgroup, it is said to be "cross-posted";
note that this differs from posting the same text as part of
each of several articles, one per newsgroup. A "hierarchy"
is the set of all newsgroups whose names share a first com-
ponent (see the name syntax in section 5.5).
A newsgroup may be "moderated", in which case submissions
are not posted directly, but mailed to a "moderator" for
consideration and possible posting. Moderators are typi-
cally human but may be implemented partially or entirely in
software.
A "followup" is an article containing a response to the con-
tents of an earlier article (the followup's "precursor"). A
"followup agent" is a combination of reading agent and post-
ing agent that aids in the preparation and posting of a fol-
lowup.
Text comparisons are "case-sensitive" if they consider
uppercase letters (e.g. "A") different from lowercase let-
ters (e.g. "a"), and "case-insensitive" if letters differing
only in case (e.g. "A" and "a") are considered identical.
Categories of text are said to be case-(in)sensitive if com-
parisons of such texts to others are case-(in)sensitive.
A "cooperating subnet" is a set of news-exchanging hosts
which is sufficiently well-coordinated (typically via a cen-
tral administration of some sort) that stronger assumptions
can be made about hosts in the set than about news hosts in
general. This is typically used to relax restrictions which
are otherwise required for worst-case interoperability; mem-
bers of a cooperating subnet MAY interchange articles that
do not conform to this Draft's specifications, provided all
members have agreed to this and provided the articles are
not permitted to leak out of the subnet. The word "subnet"
is used to emphasize that a cooperating subnet is typically
not an isolated universe; care must be taken that traffic
leaving the subnet complies with the restrictions of the
larger net, not just those of the cooperating subnet.
A "message ID" is a unique identifier for an article, usu-
ally supplied by the posting agent which posted it. It dis-
tinguishes the article from every other article ever posted
anywhere (in theory). Articles with the same message ID are
treated as identical copies of the same article even if they
are not in fact identical.
A "gateway" is software which receives news articles and
converts them to messages of some other kind (e.g. mail to a
mailing list), or vice-versa; in essence it is a translating
relayer that straddles boundaries between different methods
of message exchange. The most common type of gateway
2 June 1994 - 7 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 2.3
connects newsgroup(s) to mailing list(s), either unidirec-
tionally or bidirectionally, but there are also gateways
between news networks using this Draft's news format and
those using other formats.
A "control message" is an article which is marked as con-
taining control information; a relayer receiving such an
article will (subject to permissions etc.) take actions
beyond just filing and passing on the article.
NOTE: "Control article" would be more consistent
terminology, but "control message" is already well
established.
An article's "reply address" is the address to which mailed
replies should be sent. This is the address specified in
the article's From header (see section 5.2), unless it also
has a Reply-To header (see section 6.3).
The notation (e.g.) "(ASCII 17)" following a name means
"this name refers to the ASCII character having value 17".
An "ASCII printable character" is an ASCII character in the
range 33-126. An "ASCII control character" is an ASCII
character in the range 0-31, or the character DEL (ASCII
127). A "non-ASCII character" is a character having a value
exceeding 127.
NOTE: Blank is neither an "ASCII printable charac-
ter" nor an "ASCII control character".
2.4. End Of Line
How the end of a text line is represented depends on the
context and the implementation. For Internet transmission
via protocols such as SMTP [rrr], an end-of-line is a CR
(ASCII 13) followed by an LF (ASCII 10). ISO C [rrr] and
many modern operating systems indicate end-of-line with a
single character, typically ASCII LF (aka "newline"), and
this is the normal convention when news is transmitted via
UUCP. A variety of other methods are in use, including out-
of-band methods in which there is no specific character that
means end-of-line.
This Draft does not constrain how end-of-line is represented
in news, except that characters other than CR and LF MUST
not be usurped for use in end-of-line representations.
Also, obviously, all software dealing with a particular copy
of an article must agree on the convention to be used.
"EOL" is used to mean "whatever end-of-line representation
is appropriate"; it is not necessarily a character or
sequence of characters.
2 June 1994 - 8 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 2.4
NOTE: If faced with picking an EOL representation
in the absence of other constraints, use of a sin-
gle character simplifies processing, and the ASCII
standard [rrr] specifies that if one character is
to be used for this purpose, it should be LF
(ASCII 10).
NOTE: Inside MIME encodings, use of the Internet
canonical EOL representation (CR followed by LF)
is mandatory. See [rrr].
2.5. Case-Sensitivity
Text in newsgroup names, header parameters, etc. is case-
sensitive unless stated otherwise.
NOTE: This is at variance with MAIL, which is
case-insensitive unless stated otherwise, but is
consistent with news historical practice and
existing news software. See the comments on back-
ward compatibility in section 1.
2.6. Language
Various constant strings in this Draft, such as header names
and month names, are derived from English words. Despite
their derivation, these words do NOT change when the poster
or reader employing them is interacting in a language other
than English. Posting and reading agents SHOULD translate
as appropriate in their interaction with the poster or
reader, but the forms that actually appear in articles are
always the English-derived ones defined in this Draft.
3. Relation To MAIL (RFC 822 etc.)
The primary intent of this Draft is to completely describe
the news article format as a subset of MAIL's message format
augmented by some new headers. Unless explicitly noted oth-
erwise, the intent throughout is that an article MUST also
be a valid MAIL message.
NOTE: Despite obvious similarities between news
and mail, opinions vary on whether it is possible
or desirable to unify them into a single service.
However, it is unquestionably both possible and
useful to employ some of the same tools for manip-
ulating both mail messages and news articles, so
there is specific advantage to be had in defining
them compatibly. Furthermore, there is no appar-
ent need to re-invent the wheel when slight exten-
sions to an existing definition will suffice.
2 June 1994 - 9 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 3
Given that this Draft attempts to be self-contained, it
inevitably contains considerable repetition of information
found in MAIL. This raises the possibility of unintentional
conflicts. Unless specifically noted otherwise, any wording
in this Draft which permits behavior that is not MAIL-
compliant is erroneous and should be followed only to the
extent that the result remains compliant with MAIL.
NOTE: RFC 1036 said "where this standard conflicts
with [RFC 822], RFC-822 should be considered cor-
rect and this standard in error". Taken liter-
ally, this was obviously incorrect, since RFC 1036
imposed a number of restrictions not found in RFC
822. The intent, however, was reasonable: to
indicate that UNINTENTIONAL differences were
errors in RFC 1036.
Implementors and users should note that MAIL is deliberately
an extensible standard, and most extensions devised for mail
are also relevant to (and compatible with) news. Note par-
ticularly MIME [rrr], summarized briefly in appendix B,
which extends MAIL in a number of useful ways that are defi-
nitely relevant to news. Also of note is the work in
progress on reconciling PEM (Privacy Enhanced Mail, which
defines extensions for authentication and security) with
MIME, after which this may also be relevant to news.
UNRESOLVED ISSUE: Update the MIME/PEM information.
Similarly, descriptions here of MIME facilities should be
considered correct only to the extent that they do not
require or legitimize practices that would violate those
RFCs. (Note that this Draft does extend the application of
some MIME facilities, but this is an extension rather than
an alteration.)
4. Basic Format
4.1. Overall Syntax
The overall syntax of a news article is:
2 June 1994 - 10 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.1
article = 1*header separator body
header = start-line *continuation
start-line = header-name ":" space [ nonblank-text ] eol
continuation = space nonblank-text eol
header-name = 1*name-character *( "-" 1*name-character )
name-character = letter / digit
letter =
digit =
separator = eol
body = *( [ nonblank-text / space ] eol )
eol =
nonblank-text = [ space ] text-character *( space-or-text )
text-character =
space = 1*( / )
space-or-text = space / text-character
An article consists of some headers followed by a body. An
empty line separates the two. The headers contain struc-
tured information about the article and its transmission. A
header begins with a header name identifying it, and can be
continued onto subsequent lines by beginning the continua-
tion line(s) with white space. (Note that section 4.2.3
adds some restrictions to the header syntax indicated here.)
The body is largely-unstructured text significant only to
the poster and the readers.
NOTE: Terminology here follows the current custom
in the news community, rather than the MAIL con-
vention of (sometimes) referring to what is here
called a "header" as a "header field" or "field".
Note that the separator line must be truly empty, not just a
line containing white space. Further empty lines following
it are part of the body, as are empty lines at the end of
the article.
NOTE: Some systems make no distinction between
empty lines and lines consisting entirely of white
space; indeed, some systems cannot represent
entirely empty lines. The grammar's requirement
that header continuation lines contain some print-
able text is meant to ensure that the empty/space
distinction cannot confuse identification of the
separator line.
NOTE: It is tempting to authorize posting agents
to strip empty lines at the beginning and end of
the body, but such empty lines could possibly be
part of a preformatted document.
Implementors are warned that trailing white space, whether
alone on the line or not, MAY be significant in the body,
2 June 1994 - 11 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.1
notably in early versions of the "uuencode" encoding for
binary data. Trailing white space MUST be preserved unless
the article is known to have originated within a cooperating
subnet that avoids using significant trailing white space,
and SHOULD be preserved regardless. Posters SHOULD avoid
using conventions or encodings which make trailing white
space significant; for encoding of binary data, MIME's
"base64" encoding is recommended. Implementors are warned
that ISO C implementations are not required to preserve
trailing white space, and special precautions may be neces-
sary in implementations which do not.
NOTE: Unfortunately, the signature-delimiter con-
vention (described in section 4.3.2) does use sig-
nificant trailing white space. It's too late to
fix this; there is work underway on defining an
organized signature convention as part of MIME,
which is a preferable solution in the long run.
Posters are warned that some very old relayer software mis-
behaves when the first non-empty line of an article body
begins with white space.
4.2. Headers
4.2.1. Names and Contents
Despite the restrictions on header-name syntax imposed by
the grammar, relayers and reading agents SHOULD tolerate
header names containing any ASCII printable character other
than colon (":", ASCII 58).
NOTE: MAIL header names can contain any ASCII
printable character (other than colon) in theory,
but in practice, arbitrary header names are known
to cause trouble for some news software. Section
4.1's restriction to alphanumeric sequences sepa-
rated by hyphens is believed to permit all widely-
used header names without causing problems for any
widely-used software. Software is nevertheless
encouraged to cope correctly with the full range
of possibilities, since aberrations are known to
occur.
Relayers MUST disregard headers not described in this Draft
(that is, with header names not mentioned in this Draft),
and pass them on unaltered.
Posters wishing to convey non-standard information in head-
ers SHOULD use header names beginning with "X-". No stan-
dard header name will ever be of this form. Reading agents
SHOULD ignore "X-" headers, or at least treat them with
2 June 1994 - 12 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.2.1
great care.
The order of headers in an article is not significant. How-
ever, posting agents are encouraged to put mandatory headers
(see section 5) first, followed by optional headers (see
section 6), followed by headers not defined in this Draft.
NOTE: While relayers and reading agents must be
prepared to handle any order, having the signifi-
cant headers (the precise definition of "signifi-
cant" depends on context) first can noticeably
improve efficiency, especially in memory-limited
environments where it is difficult to buffer up an
arbitrary quantity of headers while searching for
the few that matter.
Header names are case-insensitive. There is a preferred
case convention, which posters and posting agents SHOULD
use: each hyphen-separated "word" has its initial letter (if
any) in uppercase and the rest in lowercase, except that
some abbreviations have all letters uppercase (e.g. "Mes-
sage-ID" and "MIME-Version"). The forms used in this Draft
are the preferred forms for the headers described herein.
Relayers and reading agents are warned that articles might
not obey this convention.
NOTE: Although software must be prepared for the
possibility of random use of case in header names
(and other case-independent text), establishing a
preferred convention reduces pointless diversity,
and may permit optimized software that looks for
the preferred forms before resorting to less-
efficient case-insensitive searches.
In general, a header can consist of several lines, with each
continuation line beginning with white space. The EOLs pre-
ceding continuation lines are ignored when processing such a
header, effectively combining the start-line and the contin-
uations into a single logical line. The logical line, less
the header name, colon, and any white space following the
colon, is the "header content".
4.2.2. Undesirable Headers
A header whose content is empty is said to be an empty
header. Relayers and reading agents SHOULD not consider
presence or absence of an empty header to alter the seman-
tics of an article (although syntactic rules, such as
requirements that certain header names appear at most once
in an article, MUST still be satisfied). Posting agents
SHOULD delete empty headers from articles before posting
them.
2 June 1994 - 13 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.2.2
Headers that merely state defaults explicitly (e.g., a Fol-
lowup-To header with the same content as the Newsgroups
header, or a MIME Content-Type header with contents
"text/plain; charset=us-ascii") or state information that
reading agents can typically determine easily themselves
(e.g. the length of the body in octets) are redundant, con-
veying no information whatsoever. Headers that state infor-
mation which cannot possibly be of use to a significant num-
ber of relayers, reading agents, or readers (e.g., the name
of the software package used as the posting agent) are use-
less and pointless. Posters and posting agents SHOULD avoid
including redundant or useless headers in articles.
NOTE: Information that someone, somewhere, might
someday find useful is best omitted from headers.
(There's quite enough of it in article bodies.)
Headers should contain information of known util-
ity only. This is not meant to preclude inclusion
of information primarily meant for news-software
debugging, but such information should be included
only if there is real reason, preferably based on
experience, to suspect that it may be genuinely
useful. Articles passing through gateways are the
only obvious case where inclusion of debugging
information appears clearly legitimate. (See sec-
tion 10.1.)
NOTE: A useful rule of thumb for software imple-
mentors is: "if I had to pay a dollar a day for
the transmission of this header, would I still
think it worthwhile?".
4.2.3. White Space and Continuations
The colon following the header name on the start-line MUST
be followed by white space, even if the header is empty. If
the header is not empty, at least some of the content MUST
appear on the start-line. Posting agents MUST enforce these
restrictions, but relayers (etc.) SHOULD accept even arti-
cles that violate them.
NOTE: MAIL does not require white space after the
colon, but it is usual. RFC 1036 required the
white space, even in empty headers, and some
existing software demands it. In MAIL, and
arguably in RFC 1036 (although the wording is
vague), it is technically legitimate for the white
space to be part of a continuation line rather
than the start-line, but not all existing software
will accept this. Deleting empty headers and
placing some content on the start-line avoids this
issue... which is desirable because trailing
blanks, easily deleted by accident, are best not
2 June 1994 - 14 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.2.3
made significant in headers.
In general, posters and posting agents SHOULD use blank
(ASCII 32), not tab (ASCII 9), where white space is desired
in headers. Existing software does not consistently accept
tab as synonymous with blank in all contexts. In particu-
lar, RFC 1036 appeared to specify that the character immedi-
ately following the colon after a header name was required
to be a blank, and some news software insists on that, so
this character MUST be a blank. Again, posting agents MUST
enforce these restrictions but relayers SHOULD be more tol-
erant.
Since the white space beginning a continuation line remains
a part of the logical line, headers can be "broken" into
multiple lines only at white space. Posting agents SHOULD
not break headers unnecessarily. Relayers SHOULD preserve
existing header breaks, and SHOULD not introduce new breaks.
Breaking headers SHOULD be a last resort; relayers and read-
ing agents SHOULD handle long header lines gracefully. (See
the discussion of size limits in section 4.6.)
4.3. Body
Although the article body is unstructured for most of the
purposes of this Draft, structure MAY be imposed on it by
other means, notably MIME headers (see appendix B).
4.3.1. Body Format Issues
The body of an article MAY be empty, although posting agents
SHOULD consider this an error condition (meriting returning
the article to the poster for revision). A posting agent
which does not reject such an article SHOULD issue a warning
message to the poster and supply a non-empty body. Note
that the separator line MUST be present even if the body is
empty.
NOTE: An empty body is probably a poster error
except, arguably, for some control messages... and
even they really ought to have a body explaining
the reason for the control message. Some old
reading agents are known to generate empty bodies
for "cancel" control messages, so posting agents
might opt not to reject body-less articles in such
cases (although it would be better to fix the
reading agents to request a body). However, some
existing news software is known to react badly to
body-less articles, hence the request for posting
agents to insert a body in such cases.
2 June 1994 - 15 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.3.1
NOTE: A possible posting-agent-supplied body text
(already used by one widespread posting agent) is
"This article was probably generated by a buggy
news reader.". (The use of "reader" to refer to
the reading agent is traditional, although this
Draft uses more precise terminology.)
NOTE: The requirement for the separator line even
in a bodyless article is inherited from MAIL, and
also distinguishes legitimately-bodyless articles
from articles accidentally truncated in the middle
of the headers.
Note that an article body is a sequence of lines terminated
by EOLs, not arbitrary binary data, and in particular it
MUST end with an EOL. However, relayers SHOULD treat the
body of an article as an uninterpreted sequence of octets
(except as mandated by changes of EOL representation and by
control-message processing) and SHOULD avoid imposing con-
straints on it. See also section 4.6.
4.3.2. Body Conventions
Although body lines can in principle be very long (see sec-
tion 4.6 for some discussion of length limits), posters
SHOULD restrict body line lengths to circa 70-75 characters.
On systems where text is conventionally stored with EOLs
only at paragraph breaks and other "hard return" points,
with software breaking lines as appropriate for display or
manipulation, posting agents SHOULD insert EOLs as necessary
so that posted articles comply with this restriction.
NOTE: News originated in environments where line
breaks in plain text files were supplied by the
user, not the software. Be this good or bad, much
reading-agent and posting-agent software assumes
that news articles follow this convention, so it
is often inconvenient to read or respond to arti-
cles which violate it. The "70-75" number comes
from the widespread use of display devices which
are 80 columns wide, and the desire to leave a bit
of margin for quoting etc. (see below).
Reading agents confronted with body lines much longer than
the available output-device width SHOULD break lines as
appropriate. Posters are warned that such breaks may not
occur exactly where the poster intends.
NOTE: "As appropriate" would typically include
breaking lines when supplying the text of an arti-
cle to be quoted in a reply or followup, something
that line-breaking reading agents often neglect to
do now.
2 June 1994 - 16 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.3.2
Although styles vary widely, for plain text it is usual to
use no left margin, leave the right edge ragged, use a sin-
gle empty line to separate paragraphs, and employ normal
natural-language usage on matters such as upper/lowercase.
(In particular, articles SHOULD not be written entirely in
uppercase. In environments where posters have access only
to uppercase, posting agents SHOULD translate it to lower-
case.)
NOTE: Most people find substantial bodies of text
entirely in uppercase relatively hard to read,
while all-lowercase text merely looks slightly
odd. The common association of uppercase with
strong emphasis adds to this.
Tone of voice does not carry well in written text, and mis-
understandings are common when sarcasm, parody, or exaggera-
tion for humorous effect is attempted without explicit warn-
ing. It has become conventional to use the sequence ":-)",
which (on most output devices) resembles a rotated "smiley
face" symbol, as a marker for text not meant to be taken
literally, especially when humor is intended. This practice
aids communication and averts unintended ill-will; posters
are urged to use it. A variety of analogous sequences are
used with less-standardized meanings [Sanderson].
The order of arrival of news articles at a particular host
depends somewhat on transmission paths, and occasionally
articles are lost for various reasons. When responding to a
previous article, posters SHOULD not assume that all readers
understand the exact context. It is common to quote some of
the previous article to establish context. This SHOULD be
done by prefacing each quoted line (even if it is empty)
with the character ">". This will result in multiple levels
of ">" when quoted context itself contains quoted context.
NOTE: It may seem superfluous to put a prefix on
empty lines, but it simplifies implementation of
functions such as "skip all quoted text" in read-
ing agents.
Readability is enhanced if quoted text and new text are sep-
arated by an empty line.
Posters SHOULD edit quoted context to trim it down to the
minimum necessary. However, posting agents SHOULD not
attempt to enforce this by imposing overly-simplistic rules
like "no more than 50% of the lines should be quotes".
NOTE: While encouraging trimming is desirable, the
50% rule imposed by some old posting agents is
both inadequate and counterproductive. Posters do
not respond to it by being more selective about
quoting; they respond by padding short responses,
2 June 1994 - 17 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.3.2
or by using different quoting styles to defeat
automatic analysis. The former adds unnecessary
noise and volume, while the latter also defeats
more useful forms of automatic analysis that read-
ing agents might wish to do.
NOTE: At the very least, if a minimum-unquoted
quota is being set, article bodies shorter than
(say) 20 lines, or perhaps articles which exceed
the quota by only a few lines, should be exempt.
This avoids the ridiculous situation of complain-
ing about a 5-line response to a 6-line quote.
NOTE: A more subtle posting-agent rule, suggested
for experimental use, is to reject articles that
appear to contain quoted signatures (see below).
This is almost certainly the result of a careless
poster not bothering to trim down quoted context.
Also, if a posting agent or followup agent pre-
sents an article template to the poster for edit-
ing, it really should take note of whether the
poster actually made any changes, and refrain from
posting an unmodified template.
Some followup agents supply "attribution" lines for quoted
context, indicating where it first appeared and under whose
name. When multiple levels of quoting are present and
quoted context is edited for brevity, "inner" attribution
lines are not always retained. The editing process is also
somewhat error-prone. Reading agents (and readers) are
warned not to assume that attributions are accurate.
UNRESOLVED ISSUE: Should a standard format for
attribution lines be defined? There is already
considerable diversity... but automatic news anal-
ysis would be substantially aided by a standard
convention.
Early difficulties in inferring return addresses from arti-
cle headers led to "signatures": short closing texts, auto-
matically added to the end of articles by posting agents,
identifying the poster and giving his network addresses etc.
If a poster or posting agent does append a signature to an
article, the signature SHOULD be preceded with a delimiter
line containing (only) two hyphens (ASCII 45) followed by
one blank (ASCII 32). Posting agents SHOULD limit the
length of signatures, since verbose excess bordering on
abuse is common if no restraint is imposed; 4 lines is a
common limit.
NOTE: While signatures are arguably a blemish,
they are a well-understood convention, and convey-
ing the same information in headers exposes it to
mangling and makes it rather less conspicuous. A
2 June 1994 - 18 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.3.2
standard delimiter line makes it possible for
reading agents to handle signatures specially if
desired. (This is unfortunately hampered by
extensive misunderstanding of, and misuse of, the
delimiter.)
NOTE: The choice of delimiter is somewhat unfortu-
nate, since it relies on preservation of trailing
white space, but it is too well-established to
change. There is work underway to define a more
sophisticated signature scheme as part of MIME,
and this will presumably supersede the current
convention in due time.
NOTE: Four 75-column lines of signature text is
300 characters, which is ample to convey name and
mail-address information in all but the most
bizarre situations.
4.4. Characters And Character Sets
Header and body lines MAY contain any ASCII characters other
than CR (ASCII 13), LF (ASCII 10), and NUL (ASCII 0).
NOTE: CR and LF are excluded because they clash
with common EOL conventions. NUL is excluded
because it clashes with the C end-of-string con-
vention, which is significant to most existing
news software. These three characters are
unlikely to be transmitted successfully.
However, posters SHOULD avoid using ASCII control characters
except for tab (ASCII 9), formfeed (ASCII 12), and backspace
(ASCII 8). Tab signifies sufficient horizontal white space
to reach the next of a set of fixed positions; posters are
warned that there is no standard set of positions, so tabs
should be avoided if precise spacing is essential. Formfeed
signifies a point at which a reading agent SHOULD pause and
await reader interaction before displaying further text.
Backspace SHOULD be used only for underlining, done by a
sequence of underscores (ASCII 95) followed by an equal num-
ber of backspaces, signifying that the same number of text
characters following are to be underlined. Posters are
warned that underlining is not available on all output
devices and is best not relied on for essential meaning.
Reading agents SHOULD recognize underlining and translate it
to the appropriate commands for devices that support it.
NOTE: Interpretation of almost all control charac-
ters is device-specific to some degree, and
devices differ. Tabs and underlining are sup-
ported, to some extent, by most modern devices and
reading agents, hence the cautious exemptions for
2 June 1994 - 19 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.4
them. The underlining method is specified because
the inverse method, text and then underscores, is
tempting to the naive... but if sent unaltered to
a device that shows only the most recent of sev-
eral overstruck characters rather than a compos-
ite, the result can be utterly unreadable.
NOTE: A common interpretation of tab is that it is
a request to space forward to the next position
whose number is one more than a multiple of 8,
with positions numbered sequentially starting at
1. (So tab positions are 9, 17, 25, ...) Reading
agents not constrained by existing system conven-
tions might wish to use this interpretation.
NOTE: It will typically be necessary for a reading
agent to catch and interpret formfeed, not just
send it to the output device. The actions per-
formed by typical output devices on receiving a
formfeed are neither adequate for nor appropriate
to the pause-for-interaction meaning.
Cooperating subnets which wish to employ non-ASCII character
sets by using escape sequences (employing, e.g., ESC (ASCII
27), SO (ASCII 14), and SI (ASCII 15)) to alter the meaning
of superficially-ASCII characters MAY do so, but MUST use
MIME headers to alert reading agents to the particular char-
acter set(s) and escape sequences in use. A reading agent
SHOULD not pass such an escape sequence through, unaltered,
to the output device unless the agent confirms that the
sequence is one used to affect character sets and has reason
to believe that the device is capable of interpreting that
particular sequence properly.
NOTE: Cooperating-subnet organizers are warned
that some very old relayers strip certain control
characters out of articles they pass along. ESC
is known to be among the affected characters.
NOTE: There are now standard Internet encodings
for Japanese [rrr] and Vietnamese [rrr] in partic-
ular.
Articles MUST not contain any octet with value exceeding
127, i.e. any octet that is not an ASCII character.
NOTE: This rule, like others, may be relaxed by
unanimous consent of the members of a cooperating
subnet, provided suitable precautions are taken to
ensure that rule-violating articles do not leak
out of the subnet. (This has already been done in
many areas where ASCII is not adequate for the
local language(s).) Beware that articles contain-
ing non-ASCII octets in headers are a violation of
2 June 1994 - 20 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.4
the MAIL specifications and are not valid MAIL
messages. MIME offers a way to encode non-ASCII
characters in ASCII for use in headers; see sec-
tion 4.5.
NOTE: While there is great interest in using 8-bit
character sets, not all software can yet handle
them correctly. Hence the restriction to cooper-
ating subnets. MIME encodings can be used to
transmit such characters while remaining within
the octet restriction.
In anticipation of the day when it is possible to use non-
ASCII characters safely anywhere, and to provide for the
(substantial) cooperating subnets that are already using
them, transmission paths SHOULD treat news articles as unin-
terpreted sequences of octets (except perhaps for transfor-
mations between EOL representations) and relayers SHOULD
treat non-ASCII characters in articles as ordinary charac-
ters.
NOTE: 8-bit enthusiasts are warned that not all
software conforms to these recommendations yet.
In particular, standard NNTP [rrr] is a 7-bit pro-
tocol, and there may be implementations which
enforce this rule. Be warned, also, that it will
never be safe to send raw binary data in the body
of news articles, because changes of EOL represen-
tation may (will!) corrupt it.
Except where cooperating subnets permit more direct
approaches, MIME [rrr] headers and encodings SHOULD be used
to transmit non-ASCII content using ASCII characters; see
section 4.5, appendix B, and the MIME RFCs for details. If
article content can be expressed in ASCII, it SHOULD be.
Failing that, the order of preference for character sets is
that described in MIME [rrr].
NOTE: Using the MIME facilities, it is possible to
transmit ANY character set, and ANY form of binary
data, using only ASCII characters. Equally impor-
tant, such articles are self-describing and the
reading agent can tell which octet-to-symbol map-
ping is intended! Designation of some preferred
character sets is intended to minimize the number
of character sets that a reading agent must under-
stand in order to display most articles properly.
Articles containing non-ASCII characters, articles using
ASCII characters (values 0 through 127) to refer to non-
ASCII symbols, and articles using escape sequences to shift
character sets SHOULD include MIME headers indicating which
character set(s) and conventions are being used, and MUST do
so unless such articles are strictly confined to a
2 June 1994 - 21 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.4
cooperating subnet which has its own pre-agreed conventions.
MIME encodings are preferred over all these techniques. If
it comes to a relayer's attention that it is being asked to
pass an article using such techniques outward across what it
knows to be the boundary of such a cooperating subnet, it
MUST report this error to its administrator, and MAY refuse
to pass the article beyond the subnet boundary. If it does
pass the article, it MUST re-encode it with MIME encodings
to make it conform to this Draft.
NOTE: Such re-encoding is a non-trivial task, due
to MIME rules such as the prohibition of nested
encodings. It's not just a matter of pouring the
body through a simple filter.
Reading agents SHOULD note MIME headers and attempt to show
the reader the closest possible approximation to the
intended content. They SHOULD not just send the octets of
the article to the output device unaltered, unless there is
reason to believe that the output device will indeed inter-
pret them correctly. Reading agents MUST not pass ASCII
control characters or escape sequences, other than as dis-
cussed above, unaltered to the output device; only by chance
would the result be the desired one, and there is serious
potential for harmful side effects, either accidental or
malicious.
NOTE: Exactly what to do with unwanted control
characters/sequences depends on the philosophy of
the reading agent, but passing them straight to
the output device is almost always wrong. If the
reading agent wants to mark the presence of such a
character/sequence in circumstances where only
ASCII printable characters are available, trans-
lating it to "#" might be a suitable method; "#"
is a conspicuous character seldom used in normal
text.
NOTE: Reading agents should be aware that many old
output devices (or the transmission paths to them)
zero out the top bit of octets sent to them. This
can transform non-ASCII characters into ASCII con-
trol characters.
Followup agents MUST be careful to apply appropriate trans-
formations of representation to the outbound followup as
well as the inbound precursor. A followup to an article
containing non-ASCII material is very likely to contain non-
ASCII material itself.
2 June 1994 - 22 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.5
4.5. Non-ASCII Characters In Headers
All octets found in headers MUST be ASCII characters. How-
ever, it is desirable to have a way of encoding non-ASCII
characters, especially in "human-readable" headers such as
Subject. MIME [rrr] provides a way to do this. Full
details may be found in the MIME specifications; herewith a
quick summary to alert software authors to the issues...
encoded-word = "=?" charset "?" encoding "?" codes "?="
charset = 1*tag-char
encoding = 1*tag-char
tag-char = @,;:\"[]/?=>
codes = 1*code-char
code-char =
An encoded word is a sequence of ASCII printable characters
that specifies the character set, encoding method, and bits
of (potentially) non-ASCII characters. Encoded words are
allowed only in certain positions in certain headers. Spe-
cific headers impose restrictions on the content of encoded
words beyond that specified in this section. Posting agents
MUST ensure that any material resembling an encoded word
(complete with all delimiters), in a context where encoded
words may appear, really is an encoded word.
NOTE: The syntax is a bit ugly, but it was
designed to minimize chances of confusion with
legitimate header contents, and to satisfy diffi-
cult constraints on use within existing headers.
An encoded word MUST not be more than 75 octets long. Each
line of a header containing encoded word(s) MUST be at most
76 octets long, not counting the EOL.
NOTE: These limits are meant to bound the looka-
head needed to determine whether text that begins
"=?" is really an encoded word.
The details of charsets and encodings are defined by MIME
[rrr]; the sequence of preferred character sets is the same
as MIME's. Encoded words SHOULD not be used for content
expressible in ASCII.
When an encoded word is used, other than in a newsgroup name
(see section 5.5), it MUST be separated from any adjacent
non-space characters (including other encoded words) by
white space. Reading agents displaying the contents of
encoded words (as opposed to their encoded form) should
ignore white space adjacent to encoded words.
UNRESOLVED ISSUE: Should this section be deleted
entirely, or made much more terse? The material
is relevant, but too complex to discuss fully.
2 June 1994 - 23 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.5
NOTE: The deletion of intervening white space per-
mits using multiple encoded words, implicitly con-
catenated by the deletion, to encode text that
will not fit within a single 75-character encoded
word.
Reading-agent implementors are warned that although this
Draft completely specifies where encoded words may appear in
the headers it defines, there are other headers (e.g. the
MIME Content-Description header) that MAY contain them.
4.6. Size Limits
Implementations SHOULD avoid fixed constraints on the sizes
of lines within an article and on the size of the entire
article.
Relayers SHOULD treat the body of an article as an uninter-
preted sequence of octets (except as mandated by changes of
EOL representation and processing of control messages), not
to be altered or constrained in any way.
If it is absolutely necessary for an implementation to
impose a limit on the length of header lines, body lines, or
header logical lines, that limit shall be at least 1000
octets, including EOL representations. Relayers and trans-
mission paths confronted with lines beyond their internal
limits (if any) MUST not simply inject EOLs at random
places; they MAY break headers (as described in 4.2.3) as a
last resort, and otherwise they MUST either pass the long
lines through unaltered, or refuse to pass the article at
all (see section 9.1 for further discussion).
NOTE: The limit here is essentially the same mini-
mum as that specified for SMTP mail in RFC 821
[rrr]. Implementors are warned that Path (see
section 5.6) and References (see section 6.5)
headers, in particular, often become several hun-
dred characters long, so 1000 is not an overly
generous limit.
All implementations MUST be able to handle an article
totalling at least 65,000 octets, including headers and EOL
representations, gracefully and efficiently. All implemen-
tations SHOULD be able to handle an article totalling at
least 1,000,000 (one million) octets, including headers and
EOL representations, gracefully and efficiently. "Grace-
fully and efficiently" is intended to preclude not only
failures, but also major loss of performance, serious prob-
lems in error recovery, or resource consumption beyond what
is reasonably necessary.
2 June 1994 - 24 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.6
NOTE: The intent here is to prohibit lowering the
existing de-facto limit any further, while
strongly encouraging movement towards a higher
one. Actually, although improvements are desir-
able in some cases, much news software copes rea-
sonably well with very large articles. The same
cannot be said of the communications software and
protocols used to transmit news from one host to
another, especially when slow communications links
are involved. Occasional huge articles that
appear now (by accident or through ignorance) typ-
ically leave trails of failing software, system
problems, and irate administrators in their wake.
NOTE: It is intended that the successor to this
Draft will raise the "MUST" limit to 1,000,000 and
the "SHOULD" limit still further.
Posters SHOULD limit posted articles to at most 60,000
octets, including headers and EOL representations, unless
the articles are being posted only within a cooperating sub-
net which is known to be capable of handling larger articles
gracefully. Posting agents presented with a large article
SHOULD warn the poster and request confirmation.
NOTE: The difference between this and the earlier
"MUST" limit is margin for header growth, differ-
ing EOL representations, and transmission over-
heads.
NOTE: Disagreeable though these limits are, it is
a fact that in current networks, an article larger
than 64K (after header growth etc.) simply is not
transmitted reliably. Note also the comments
above on the trauma caused by single extremely-
large articles now; the problems are real and cur-
rent. These problems arguably should be fixed,
but this will not happen network-wide in the imme-
diate future. Hence the restriction of larger
articles to cooperating subnets, for now.
Posters using non-ASCII characters in their text MUST take
into account the overhead involved in MIME encoding, unless
the article's propagation will be entirely limited to a
cooperating subnet which does not use MIME encodings for
non-ASCII characters. For example, MIME base64 encoding
involves growth by a factor of approximately 4/3, so an
article which would likely have to use this encoding should
be at most about 45,000 octets before encoding.
Posters SHOULD use MIME "message/partial" conventions to
facilitate automatic reassembly of a large document split
into smaller pieces for posting. It is recommended that the
content identifier used should be a message ID, generated by
2 June 1994 - 25 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 4.6
the same means as article message IDs (see section 5.3), and
that all parts should have a See-Also header (see section
6.16) giving the message IDs of at least the previous parts
and preferably all the parts.
NOTE: See-Also is more correct for this purpose
than References, although References is in common
use today (with less-formal reassembly arrange-
ments). MIME reassemblers should probably examine
articles suggested by References headers if See-
Also headers are not present to indicate the
whereabouts of the other parts of "mes-
sage/partial" articles.
To repeat: implementations SHOULD avoid fixed constraints on
the sizes of lines within an article and on the size of the
entire article.
4.7. Example
Here is a sample article:
From: jerry@eagle.ATT.COM (Jerry Schwarz)
Path: cbosgd!mhuxj!mhuxt!eagle!jerry
Newsgroups: news.announce
Subject: Usenet Etiquette -- Please Read
Message-ID: <642@eagle.ATT.COM>
Date: Mon, 17 Jan 1994 11:14:55 -0500 (EST)
Followup-To: news.misc
Expires: Wed, 19 Jan 1994 00:00:00 -0500
Organization: AT&T Bell Laboratories, Murray Hill
body
body
body
5. Mandatory Headers
An article MUST have one, and only one, of each of the fol-
lowing headers: Date, From, Message-ID, Subject, Newsgroups,
Path.
NOTE: MAIL specifies (if read most carefully) that
there must be exactly one Date header and exactly
one From header, but otherwise does not restrict
multiple appearances of headers. (Notably, it
permits multiple Message-ID headers!) This
appears singularly useless, or even harmful, in
the context of news, and much current news soft-
ware will not tolerate multiple appearances of
mandatory headers.
2 June 1994 - 26 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5
Note also that there are situations, discussed in the rele-
vant parts of section 6, where References, Sender, or
Approved headers are mandatory.
In the discussions of the individual headers, the content of
each is specified using the syntax notation. The convention
used is that the content of, for example, the Subject header
is defined as .
5.1. Date
The Date header contains the date and time when the article
was submitted for transmission:
Date-content = [ weekday "," space ] date space time
weekday = "Mon" / "Tue" / "Wed" / "Thu"
/ "Fri" / "Sat" / "Sun"
date = day space month space year
day = 1*2digit
month = "Jan" / "Feb" / "Mar" / "Apr" / "May" / "Jun"
/ "Jul" / "Aug" / "Sep" / "Oct" / "Nov" / "Dec"
year = 4digit / 2digit
time = hh ":" mm [ ":" ss ] space timezone
timezone = "UT" / "GMT"
/ ( "+" / "-" ) hh mm [ space "(" zone-name ")" ]
hh = 2digit
mm = 2digit
ss = 2digit
zone-name = 1*( / space )
This is a restricted subset of the MAIL date format.
If a weekday is given, it MUST be consistent with the date.
The modern Gregorian calendar is used, and dates MUST be
consistent with its usual conventions; for example, if the
month is May, the day must be between 1 and 31 inclusive.
The year SHOULD be given as four digits, and posting agents
SHOULD enforce this; however, relayers MUST accept the two-
digit form, and MUST interpret it as having the implicit
prefix "19".
NOTE: Two-digit year numbers can, should, and must
be phased out by 1999.
The time is given on the 24-hour clock, e.g. two hours
before midnight is "22:00" or "22:00:00". The hh must be
between 00 and 23 inclusive, the mm between 0 and 59 inclu-
sive, and the ss between 0 and 61 inclusive.
NOTE: Leap seconds very occasionally result in
minutes that are 61 or 62 seconds long.
2 June 1994 - 27 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.1
The date and time SHOULD be given in the poster's local
timezone, including a specification of that timezone as a
numeric offset (which SHOULD include the timezone name, e.g.
"EST", supplied in parentheses like a MAIL comment). If
not, they MUST be given in Universal Time (abbreviated "UT";
"GMT" is a historical synonym for "UT"). The timezone name
in parentheses, if present, is a comment; software MUST
ignore it, except that reading agents might wish to display
it to the reader. Timezone names other than "UT" and "GMT"
MUST appear only in the comment.
NOTE: Attempts to deal with a full set of timezone
names have all foundered on the vast number of
such names in use and the duplications (for exam-
ple, there are at least FIVE different timezones
called "EST" by somebody). Even the limited set
of North American zone names authorized by MAIL is
subject to confusion and misinterpretation. Hence
the flat ban on non-UT timezone names except as
comments.
NOTE: RFC 1036 specified that use of GMT (aka UT,
UTC) was preferred. However, the local time (in
the poster's timezone) is arguably information of
possible interest to the reader, and this requires
some indication of the poster's timezone. Numeric
offsets are an unambiguous way of doing this, and
their use was indeed sanctioned by RFC 1036 (that
is, this is a change of preference only).
NOTE: There is frequent confusion, including
errors in some news software, regarding the sign
of numeric timezones. Zones west of Greenwich
have negative offsets. For example, North Ameri-
can Eastern Standard Time is zone -0500 and North
American Eastern Daylight Time is zone -0400.
NOTE: Implementors are warned that the hh in a
timezone can go up to about 14; it is not limited
to 12. This is because the International Date
Line does not run exactly along the boundary
between zone -1200 and zone +1200.
NOTE: The comments in section 2.6 regarding trans-
lation to other languages are relevant here. The
Date-content format, and the spellings of its com-
ponents, as found in articles themselves, are
always as defined in this Draft, regardless of the
language used to interact with readers and
posters. Reading and posting agents should trans-
late as appropriate. Actually, even English-
language reading and posting agents will probably
want to do some degree of translation on dates, if
only to abbreviate the lengthy format and
2 June 1994 - 28 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.1
(perhaps) translate to and from the reader's time-
zone.
5.2. From
The From header contains the electronic address, and possi-
bly the full name, of the article's author:
From-content = address [ space "(" paren-phrase ")" ]
/ [ plain-phrase space ] "<" address ">"
paren-phrase = 1*( paren-char / space / encoded-word )
paren-char = \>
plain-phrase = plain-word *( space plain-word )
plain-word = unquoted-word / quoted-word / encoded-word
unquoted-word = 1*unquoted-char
unquoted-char =
@,;:\".[]>
quoted-word = quote 1*( quoted-char / space ) quote
quote = <" (ASCII 34)>
quoted-char = \>
address = local-part "@" domain
local-part = unquoted-word *( "." unquoted-word )
domain = unquoted-word *( "." unquoted-word )
(Encoded words are described in section 4.5.) The full name
is distinguished from the electronic address either by
enclosing the former in parentheses (making it resemble a
MAIL comment, after the address) or by enclosing the latter
in angle brackets. The second form is preferred. In the
first form, encoded words inside the full name MUST be com-
posed entirely of s. In the second form,
encoded words inside the full name may not contain charac-
ters other than letters (of either case), digits, and the
characters "!", "*", "+", "-", "/", "=", and "_". The local
part is case-sensitive (except that all case counterparts of
"postmaster" are deemed equivalent), the domain is case-
insensitive, and all other parts of the From content are
comments which MUST be ignored by news software (except
insofar as reading agents may wish to display them to the
reader). Posters and posting agents MUST restrict them-
selves to this subset of the MAIL From syntax; relayers MAY
accept a broader subset, but see the discussion in section
9.1.
NOTE: The syntax here is a restricted subset of
the MAIL From syntax, with quoting particularly
restricted, for simple parsing. In particular,
the presence of "<" in the From content indicates
that the second form is being used, otherwise the
first form is being used. The major restrictions
here are those already de-facto imposed by exist-
ing software.
2 June 1994 - 29 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.2
NOTE: Overly-lenient posting agents sometimes per-
mit the second form with a full name containing
"(" or ")", but it is extremely rare for a full
name to contain "<" or ">" even in mail. Accord-
ingly, reading agents wishing to robustly deter-
mine which form is in use in a particular article
should key on the presence or absence of "<", not
the presence or absence of "(".
The address SHOULD be a valid and complete Internet domain
address, capable of being successfully mailed to by an
Internet host (possibly via an MX record and a forwarder).
The pseudo-domain ".uucp" MAY be used for hosts registered
in the UUCP maps (e.g. name "xyz.uucp" for registered site
"xyz"), but such hosts SHOULD discontinue this usage (either
by arranging a proper Internet address and forwarder, or by
using the "% hack" (see below)), as soon as possible. Bit-
net hosts SHOULD use Internet addresses, avoiding the obso-
lescent ".bitnet" pseudo-domain. Other forms of address
MUST not be used.
NOTE: "Other forms" specifically include UK-style
"backward" domains ("uk.oxbridge.cs" is in the
Czech Republic, not the UK), pure-UUCP addressing
("knee!shin!foot" instead of
"foot%shin@knee.uucp"), and abbreviated domains
("zebra.zoo" instead of "zebra.zoo.toronto.edu").
If it is necessary to use the local part to specify a rout-
ing relative to the nearest Internet host, this MUST be done
using the "% hack", using "%" as a secondary "@". For exam-
ple, to specify that mail to the address should go to Inter-
net host "foo.bar.edu", then to non-Internet host "ein",
then to non-Internet host "deux", for delivery there to
mailbox "fred", a suitable address would be:
fred%deux%ein@foo.bar.edu
Analogous forms using "!" in the local part MUST not be
used, as they are ambiguous; they should be expressed in the
"%" form.
NOTE: "a!b@c" can be interpreted as either "b%c@a"
or "b%a@c", and there is no consistency in which
choice is made. Such addresses consequently are
unreliable. The "%" form does not suffer from
this problem, and although its use is officially
discouraged, it is a de-facto standard, to the
point that MAIL recognizes it.
Relayers MUST not, repeat MUST not, repeat MUST not, rewrite
From lines, in any way, however minor or innocent-seeming.
Trying to "fix" a non-conforming address has a very high
probability of making things worse. Either pass it along
2 June 1994 - 30 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.2
unchanged, or reject the article.
NOTE: An additional reason for banning the use of
"!" addressing is that it has a much higher proba-
bility of being rewritten into mangled unrecogniz-
ability by old relayers.
Posters and posting agents SHOULD avoid use of the charac-
ters "!" and "@" in full names, as they may trigger unwanted
header rewriting by old, simple-minded news software.
NOTE: Also, the characters "." and ",", not infre-
quently found in names (e.g., "John W. Campbell,
Jr."), are NOT, repeat NOT, allowed in an unquoted
word. A From header like the following MUST not
be written without the quotation marks:
From: "John W. Campbell, Jr."
5.3. Message-ID
The Message-ID header contains the article's message ID, a
unique identifier distinguishing the article from every
other article:
Message-ID-content = message-id
message-id = "<" local-part "@" domain ">"
As with From addresses, a message ID's local part is case-
sensitive and its domain is case-insensitive. The "<" and
">" are parts of the message ID, not peculiarities of the
Message-ID header.
NOTE: News message IDs are a restricted subset of
MAIL message IDs. In particular, no existing news
software copes properly with MAIL quoting conven-
tions within the local part, so they are forbid-
den. This is unfortunate, particularly for X.400
gateways that often wish to include characters
which are not legal in unquoted message IDs, but
it is impossible to fix net-wide. See the notes
on gatewaying in section 10.
The domain in the message ID SHOULD be the full Internet
domain name of the posting agent's host. Use of the ".uucp"
pseudo-domain (for hosts registered in the UUCP maps) or the
".bitnet" pseudo-domain (for Bitnet hosts) is permissible,
but SHOULD be avoided.
Posters and posting agents MUST generate the local part of a
message ID using an algorithm which obeys the specified syn-
tax (words separated by ".", with certain characters not
2 June 1994 - 31 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.3
permitted) (see section 5.2 for details), and will not
repeat itself (ever). The algorithm SHOULD not generate
message IDs which differ only in case of letters. Note the
specification in section 6.5 of a recommended convention for
indicating subject changes. Otherwise the algorithm is up
to the implementor.
NOTE: The crucial use of message IDs is to distin-
guish circulating articles from each other and
from articles circulated recently. They are also
potentially useful as permanent indexing keys,
hence the requirement for permanent uniqueness...
but indexers cannot absolutely rely on this
because the earlier RFCs urged it but did not
demand it. All major implementations have always
generated permanently-unique message IDs by
design, but in some cases this is sensitive to
proper administration, and duplicates may have
occurred by accident.
NOTE: The most popular method of generating local
parts is to use the date and time, plus some way
of distinguishing between simultaneous postings on
the same host (e.g. a process number), and encode
them in a suitably-restricted alphabet. An older
but now less-popular alternative is to use a
sequence number, incremented each time the host
generates a new message ID; this is workable, but
requires careful design to cope properly with
simultaneous posting attempts, and is not as
robust in the presence of crashes and other mal-
functions.
NOTE: Some buggy news software considers message
IDs completely case-insensitive, hence the advice
to avoid relying on case distinctions. The
restrictions placed on the "alphabet" of local
parts and domains in section 5.2 have the useful
side effect of making it unnecessary to parse mes-
sage IDs in complex ways to break them into case-
sensitive and case-insensitive portions.
The local part of a message ID MUST not be "postmaster" or
any other string that would compare equal to "postmaster" in
a case-insensitive comparison. Message IDs MUST be no
longer than 250 octets, including the "<" and ">".
NOTE: "Postmaster" is an irksome exception to
case-sensitivity in local parts, inherited from
MAIL, and simply avoiding it is the best way to
deal with it (not that it's likely, but the issue
needs to be dealt with). The length limit is
undesirable, but is present in widely-used exist-
ing software. The limit is actually 255, but a
2 June 1994 - 32 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.3
small safety margin is wise.
5.4. Subject
The Subject header's content (the "subject" of the article)
is a short phrase describing the topic of the article:
Subject-content = [ "Re: " ] nonblank-text
Encoded words MAY appear in this header.
If the article is a followup, the subject SHOULD begin with
"Re: " (a "back reference"). If the article is not a fol-
lowup, the subject MUST not begin with a back reference.
Back references are case-insensitive, although "Re: " is the
preferred form. A followup agent assisting a poster in
preparing a followup SHOULD prepend a back reference, UNLESS
the subject already begins with one. If the poster deter-
mines that the topic of the followup differs significantly
from what is described in the subject, a new, more descrip-
tive, subject SHOULD be substituted (with no back refer-
ence). An article whose subject begins with a back refer-
ence MUST have a References header referencing the precur-
sor.
NOTE: A back reference is FOUR characters, the
fourth being a blank. RFC 1036 was confused about
this. Observe also that only ONE back reference
should be present.
NOTE: There is a semi-standard convention, often
used, in which a subject change is flagged by mak-
ing the new Subject-content of the form:
new topic (was: old topic)
possibly with "old topic" somewhat truncated.
Posters wishing to do something like this are
urged to use this exact form, to simplify auto-
mated analysis.
For historical reasons, the subject MUST not begin with
"cmsg " (note that this sequence ends with a blank).
NOTE: Some old news software takes a subject
beginning with "cmsg " as an indication that the
article is a control message (see sections 6.6 and
7). This mechanism is obsolete and undesirable,
but accidental triggering of it is still possible.
The subject SHOULD be terse. Posters SHOULD avoid trying to
cram their entire article into the headers; even the sim-
plest query usually benefits from a sentence or two of
2 June 1994 - 33 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.4
elaboration and context, and the details of header display
vary widely among reading agents.
NOTE: All-in-the-subject articles are sometimes
the result of misunderstandings over the interac-
tion protocol of a posting agent. Posting agents
might wish to give special attention to the possi-
bility that a poster specifying a very long sub-
ject might have thought he was typing the body of
the article.
5.5. Newsgroups
The Newsgroups header's content specifies which newsgroup(s)
the article is posted to:
Newsgroups-content =
newsgroup-name *( ng-delim newsgroup-name )
newsgroup-name = plain-component *( "." component )
component = plain-component / encoded-word
plain-component = component-start *13component-rest
component-start = lowercase / digit
lowercase =
component-rest = component-start / "+" / "-" / "_"
ng-delim = ","
Encoded words used in newsgroup names MUST not contain char-
acters other than letters, digits, "+", "-", "/", "_", "=",
and "?" (although they may encode them).
A newsgroup name consists of one or more components, which
may be plain components or (except for the first) encoded
words. A plain component MUST contain at least one letter,
MUST begin with a letter or digit, and MUST not be longer
than 14 characters. The first component MUST begin with a
letter; subsequent components SHOULD begin with a letter.
Newsgroup names MUST not contain uppercase letters, except
where required by encodings in encoded words. The sequences
"all" and "ctl" MUST not be used as components.
NOTE: The alphabet and syntax specified encom-
passes all existing names of widespread news-
groups, while avoiding various forms that are
known to cause problems. Important existing soft-
ware uses various non-alphanumeric characters as
punctuation adjacent to newsgroup names. (It
would, in fact, be preferable to ban "+" from
newsgroup names, were it not that several
widespread newsgroups related to the C++ program-
ming language already use it.)
NOTE: Much existing software converts the news-
group name into a directory path and stores the
articles themselves using numeric filenames, so
2 June 1994 - 34 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.5
all-digit name components can be troublesome; the
"Great Renaming" early in the history of Usenet
included revisions of several newsgroup names to
eliminate such components.
NOTE: The same storage technique is the reason for
the 14-character limit. The limit is now largely
historical, since most modern systems have much
larger limits on the length of a directory entry's
name, but many old systems are still in use. Sys-
tems with shorter limits also exist, but news
software on such systems has had to deal with the
problem already, since there are several
widespread newsgroups with 14-character components
in their names. Implementors are warned that it
is intended that the successor to this Draft will
increase the 14-character limit, and are urged to
fix their software to handle longer names grace-
fully (if such fixes are necessary, given the
intended domain of application of the particular
software).
NOTE: The requirement that the first character of
a name be a letter accommodates existing software
which assumes it can tell the difference between a
newsgroup name and other possible syntactic enti-
ties by inspecting the first character. Similar
considerations motivate excluding "+", "-", and
"_" from coming first in a component, and the
preference for components that do not begin with
digits. The "all" sequence is used as a wildcard
symbol in much existing software, and the "ctl"
sequence was involved in an obsolete historical
mechanism for marking control messages, so they
are best avoided.
NOTE: Possibly newsgroup names should have been
case-insensitive, but all existing software treats
them as case-sensitive. (RFC 977 [rrr] claims
that they are case-insensitive in NNTP, but exist-
ing implementations are believed to ignore this.)
The simplest solution is just to ban use of upper-
case letters, since no widespread newsgroup name
uses them anyway; this avoids any possibility of
confusion.
NOTE: The syntax has the disadvantage of contain-
ing no white space, making it impossible to con-
tinue a Newsgroups header across several lines.
Implementors of relayers and reading agents are
warned that it is intended that the successor to
this Draft will change the definition of ng-delim
to:
2 June 1994 - 35 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.5
ng-delim = "," [ space ]
and are urged to fix their software to handle
(i.e., ignore) white space following the commas.
Meanwhile, posters must avoid inserting such space
(despite the natural-language convention which
permits it) and posting agents should strip it
out.
NOTE: Encoded words as components are somewhat
problematic, but are clearly desirable for use in
non-English-speaking nations. They are not sub-
ject to the 14-character limit, and this (plus the
possibility of "/" within them) may require spe-
cial handling in news software.
Encoded words are allowed in newsgroup names ONLY where non-
ASCII characters are necessary to the name, and must use the
"b" encoding [rrr] and the first suitable character set in
the MIME order of preferred character sets [rrr].
NOTE: Since the newsgroup name is the encoded
form, NOT the underlying non-ASCII form, there is
room for terrible confusion here if the choice of
encoding for a particular name is not fully stan-
dardized.
Posters SHOULD use only the names of existing newsgroups in
the Newsgroups header, because newsgroups are NOT created
simply by being posted to. However, it is legitimate to
cross-post to newsgroup(s) which do not exist on the posting
agent's host, provided that at least one of the newsgroups
DOES exist there, and followup agents MUST accept this
(posting agents MAY accept it, but SHOULD at least alert the
poster to the situation and request confirmation). Relayers
MUST not rewrite Newsgroups headers in any way, even if some
or all of the newsgroups do not exist on the relayer's host.
NOTE: Early experience with news software that
created newsgroups when they were mentioned in a
Newsgroups header was thoroughly negative: posters
frequently mistype newsgroup names.
NOTE: While it is legitimate for some of an arti-
cle's newsgroups not to exist on the host where it
is posted, this IS a rather unusual situation
except in followups (which should go to all news-
groups the precursor was posted to, even if not
all of them reach the site where the followup is
being posted).
NOTE: Rewriting Newsgroups headers to strip
locally-unknown newsgroups is superficially
attractive. However, early experience with
2 June 1994 - 36 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.5
exactly that policy was thoroughly negative: news
propagation is more redundant and much less
orderly than many people imagine, and in particu-
lar it is not unheard-of for the (sometimes)
fastest path between two (say) U of Toronto sites
to pass outside U of Toronto... in which case
newsgroup stripping can cause incomplete propaga-
tion. Having an article's set of newsgroups
change as it propagates can also result in fol-
lowups not achieving the same propagation as the
original. It's been tried; it's more trouble than
it's worth; don't do it.
NOTE: In particular, newsgroup stripping superfi-
cially looks like a solution to the problem of
duplicate regional newsgroup names. For example,
both University of Toronto and University of Texas
have "ut.general" newsgroups, and material cross-
posted to that name and a global newsgroup appears
in both universities' local newsgroups. However,
the side effects of stripping are sufficiently
unacceptable to disqualify it for this purpose.
Don't do it.
Cross-posting an article to several relevant newsgroups is
far superior to posting separate articles with duplicated
content to each newsgroup, because reading agents can detect
the situation and show the article to a reader only once.
Posters SHOULD cross-post rather than duplicate-post.
NOTE: On the other hand, cross-posting to a large
number of newsgroups usually indicates that the
poster has not thought about his audience; arti-
cles are rarely pertinent to more than (say) half
a dozen newsgroups. Posting agents might wish to
request confirmation when the number of newsgroups
exceeds (say) five in the presence of a Followup-
To header, or (say) two in the absence of such a
header.
NOTE: One problem with cross-postings is what to
do with an article cross-posted to a set of news-
groups including both moderated and unmoderated
ones. Posters tend to expect such an article to
show up immediately in the unmoderated newsgroups,
especially if they do not realize that one or more
of the newsgroups is moderated. However, since it
is not possible for a moderator to retroactively
add an already-posted article to a moderated news-
group, the only correct action is to mail such an
article to one (and only one) of the moderators
for action. It is probably best for the posting
agent to detect this situation and ask the poster
what action is preferred. The acceptable choices
2 June 1994 - 37 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.5
are to alter the newsgroup list or to mail to a
moderator of the poster's choice; the posting
agent should NOT offer duplicate-posting as an
easy-to-request option (if only because many mod-
erators will reject a submission that has already
been posted to unmoderated newsgroups).
NOTE: An article cross-posted to multiple moder-
ated newsgroups really should have approval from
all the moderators involved. In practice, the
only straightforward way to do this is to send the
article to one of them and have him consult the
others.
A newsgroup SHOULD not appear more than once in the News-
groups header.
Newsgroup names having only one component are reserved for
newsgroups whose propagation is restricted to a single host
(or the administrative equivalent). It is inadvisable to
name a newsgroup "poster" because that word has special
meaning in the Followup-To header (see section 6.1). The
names "control" and "junk" are frequently used for pseudo-
newsgroups internal to relayer implementations, and hence
are also best avoided.
NOTE: Beware of the duplicate-regional-newsgroup-
names problem mentioned above. In particular,
there are many, many hosts with a newsgroup named
"general", and some surprising things show up in
such newsgroups when people cross-post. It is
probably better to use multi-component names,
which are less likely to be duplicated. Fred's
Widget House should use "fwh.general" rather than
just "general" as its in-house general-topics
newsgroup.
It is conventional to reserve newsgroup names beginning with
"to." for test messages sent on an essentially point-to-
point basis (see also the ihave/sendme protocol described in
section 7.2); newsgroup names beginning with "to." SHOULD
not be used for any other purpose. The second (and possibly
later) components of such a name should, together, comprise
the relayer name (see section 5.6) of a relayer. The news-
group exists only at the named relayer and its neighbors.
The neighbors all pass that newsgroup to the named relayer,
while the named relayer does not pass it to anyone.
The order of newsgroup names in the Newsgroups header is not
significant.
2 June 1994 - 38 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.6
5.6. Path
The Path header's content indicates which relayers the arti-
cle has already visited, so that unnecessary redundant
transmission can be avoided:
Path-content = [ path-list path-delimiter ] local-part
path-list = relayer-name *( path-delimiter relayer-name )
relayer-name = 1*rn-char
rn-char = letter / digit / "." / "-" / "_"
path-delimiter = "!"
The Path content is a list of relayer names, separated by
path delimiters, followed (after a final delimiter) by the
local part of a mailing address. Each relayer MUST prepend
its name, and a delimiter, to the Path content in all arti-
cles it processes. A relayer MUST not pass an article to a
neighboring relayer whose name is already mentioned in an
article's path list, unless this is explicitly requested by
the neighbor in some way. The Path content is case-
sensitive.
NOTE: The Path header supplied by a posting agent
should normally contain only the local part. The
relayer that the posting agent passes the article
to for posting will prepend its relayer name to
get the path list started.
NOTE: Observe that the trailing local part is NOT
part of the path list. This Path header:
Path: fee!fie!foe!fum
contains three relayer names: "fee", "fie", and
"foe". A relayer named "fum" is still eligible to
be sent this article.
NOTE: This syntax has the disadvantage of contain-
ing no white space, making it impossible to con-
tinue a Path header across several lines. Imple-
mentors of relayers and reading agents are warned
that it is intended that the successor to this
Draft will change the definition of path delimiter
to:
path-delimiter = "!" [ space ]
and are urged to fix their software to handle
(i.e., ignore) white space following the exclama-
tion points. They are urged to hurry; some ill-
behaved systems reportedly already feel free to
add such white space.
2 June 1994 - 39 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.6
NOTE: RFC 1036 allows considerably more flexibil-
ity in choice of delimiter, in theory, but this
flexibility has never been used and most news
software does not implement it properly. The
grammar reflects the current reality. Note, in
particular, that RFC 1036 treats "_" as a delim-
iter, but in fact it is known to appear in relayer
names occasionally.
Because an article will not propagate to a relayer already
mentioned in its path list, the path list MUST not contain
any names other than those of relayers the article has
passed through AS NEWS. This is trivially obvious for nor-
mal news articles, but requires attention from the modera-
tors of moderated newsgroups and the implementors and main-
tainers of gateways.
NOTE: For the same reason, a relayer and its
neighbors need to agree on the choice of relayer
name, and names should not be changed without
notifying neighbors.
Relayer names need to be unique among all relayers which
will ever see the articles using them. A relayer name is
normally either an "official" name for the host the relayer
runs on, or some other "official" name controlled by the
same organization. Except in cooperating subnets that agree
to some other convention, and don't let articles using it
escape beyond the subnet, a relayer name MUST be either a
UUCP name registered in the UUCP maps (without any domain
suffix such as ".UUCP"), or a complete Internet domain name.
Use of a (registered) UUCP name is recommended, where prac-
tical, to keep the length of the path list down.
The use of Internet domain names in the path list presents
one problem: domain names are case-insensitive, but the path
list is case-sensitive. Relayers using domain names as
their relayer names MUST pick a standard form for the name,
and use that form consistently to the exclusion of all oth-
ers. The preferred form for this purpose, which relayers
SHOULD use, is the all-lowercase form.
NOTE: It is arguably unfortunate that the path
list is case-sensitive, but it is much too late to
change this. Most Internet sites do, in any
event, use one standardized form of their name
almost everywhere.
In the ordinary case, where the poster is the author of the
article, the local part following the path list SHOULD be
the local part of the poster's full Internet domain mailing
address.
2 June 1994 - 40 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 5.6
NOTE: It should be just the local part, not the
full address. The character "@" does not appear
in a Path header.
The Path content somewhat resembles a mailing address, par-
ticularly in the UUCP world with its manual routing and "!"
address syntax. Historically, this resemblance was impor-
tant, and the Path content was often used as a reply
address. This practice has always been somewhat unreliable,
since news paths are not always mail paths and news relayer
names are not always recognized by mail handlers, and its
reliability has generally worsened in recent times. The
widespread use of and recognition of Internet domain
addresses, even outside the actual Internet, has largely
eliminated the problem. Readers SHOULD not use the Path
content as a reply address. On the other hand, relayer
administrators are urged not to break this usage without
good reason; where practical, paths followed by news SHOULD
be traversable by mail, and mail handlers SHOULD recognize
relayer names as host names.
It will typically be difficult or impractical for gateways
and moderators to supply a Path content that is useful as a
reply address for the author, bearing in mind that the path
list they supply will normally be empty. (To reiterate: the
path list MUST not contain any names other than those of
relayers the article has passed through AS NEWS.) They
SHOULD supply a local part that will result in replies to a
Path-derived address being returned to the sender with a
brief explanation. Software permitting, the local part
"not-for-mail" is recommended.
NOTE: A moderator or gateway administrator who
supplies a local part that delivers such mail to
an administrative mailbox will quickly discover
why it should be bounced automatically! It is
best, however, for the returned message to include
an explanation of what has probably happened,
rather than just a mysterious "undeliverable mail"
complaint, since the sender may not be aware that
his/her software is unwisely using the Path con-
tent as a reply address. Reply software might
wish to question attempts to reply to a Path-
derived address ending in "not-for-mail" (which is
why a specific name is being recommended here).
6. Optional Headers
Many MAIL headers, and many of those specified in present
and future MAIL extensions, are potentially applicable to
news. Headers specific to MAIL's point-to-point transmis-
sion paradigm, e.g. To and Cc, SHOULD not appear in news
articles. (Gateways wishing to preserve such information
2 June 1994 - 41 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 6
for debugging probably SHOULD hide it under different names;
prefixing "X-" to the original headers, resulting in e.g.
"X-To", is suggested.)
The following optional headers are either specific to news
or of particular note in news articles; an article MAY con-
tain some or all of them. (Note that there are some circum-
stances in which some of them are mandatory; these are
explained under the individual headers.) An article MUST
not contain two or more headers with any one of these header
names.
NOTE: The ban on duplicate header names does not
apply to headers not specified in this Draft at
all, such as "X-" headers. Software should not
assume that all header names in a given article
are unique.
6.1. Followup-To
The Followup-To header contents specify which newsgroup(s)
followups should be posted to:
Followup-To-content = Newsgroups-content / "poster"
The syntax is the same as that of the Newsgroups content,
with the exception that the magic word "poster" means that
followups should be mailed to the article's reply address
rather than posted. In the absence of Followup-To, the
default newsgroup(s) for a followup are those in the News-
groups header.
NOTE: The way to request that followups be mailed
to a specific address other than that in the From
line is to supply "Followup-To: poster" and a
Reply-To header. Putting a mailing address in the
Followup-To line is incorrect; posting agents
should reject or rewrite such headers.
NOTE: There is no syntax for "no followups
allowed" because "Followup-To: poster" accom-
plishes this effect without extra machinery.
Although it is generally desirable to limit followups to the
smallest reasonable set of newsgroups, especially when the
precursor was cross-posted widely, posting agents SHOULD not
supply a Followup-To header except at the poster's explicit
request.
NOTE: In particular, it is incorrect for the post-
ing agent to assume that followups to a cross-
posted article should be directed to the first
newsgroup only. Trimming the list of newsgroups
2 June 1994 - 42 - expires 15 July 1994
INTERNET DRAFT to be NEWS sec. 6.1
should be the poster's decision, not the posting
agent's. However, when an article is to be cross-
posted to a considerable number of newsgroups, a
posting agent might wish to SUGGEST to the poster
that followups go to a shorter list.
6.2. Expires
The Expires header content specifies a date and time when
the article is deemed to be no longer useful and should be
removed ("expired"):
Expires-content = Date-content
The content syntax is the same as that of the Date content.
In the absence of Expires, the default is decided by the
administrators of each host the article reaches, who MAY
also restrict the extent to which the Expires header is hon-
ored.
The Expires header has two main applications: removing arti-
cles whose utility ends on a specific date (e.g., event
announcements which can be removed once the day of the event
is past) and preserving articles expected to be of prolonged
usefulness (e.g., information aimed at new readers of a
newsgroup). The latter application is sometimes abused.
Since individual hosts have local policies for expiration of
news (depending on available disk space, for instance),
posters SHOULD not provide Expires headers for articles
unless there is a natural expiration date associated with
the topic. Posting agents MUST not provide a default
Expires header. Leave it out and allow local policies to be
used unless there is a good reason not to. Expiry dates are
properly the decision of individual host administrators;
posters and moderators SHOULD set only expiry dates that
most administrators would agree with.
NOTE: A poster preparing an Expires header for an
article whose utility ends on a specific day
should typically specify the NEXT day as the
expiry date. A meeting on July 7th remains of
interest on the 7th.