297 lines
7.4 KiB
HTML
297 lines
7.4 KiB
HTML
|
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML Strict//EN">
|
||
|
<HTML>
|
||
|
<HEAD>
|
||
|
<TITLE>OpenSP - SGML declaration</TITLE>
|
||
|
</HEAD>
|
||
|
<BODY>
|
||
|
<H1>Handling of the SGML declaration in OpenSP</H1>
|
||
|
<H2>Extended Naming Rules</H2>
|
||
|
<P>
|
||
|
OpenSP supports the Extended Naming Rules as specified in Annex J
|
||
|
of ISO 8879:1986 (added by the 1996 technical corrigendum).
|
||
|
<H2>Web SGML Adaptations</H2>
|
||
|
<P>
|
||
|
OpenSP supports most of the Web SGML Adaptations as specified in
|
||
|
Annex K of ISO 8879:1996 (added by the second technical corrigendum, 1998)
|
||
|
<H2>Default SGML declaration</H2>
|
||
|
<P>
|
||
|
If the SGML declaration is omitted
|
||
|
and there is no applicable
|
||
|
<A HREF="catalog.htm#sgmldecl"><SAMP>SGMLDECL</SAMP></A>
|
||
|
or <A HREF="catalog.htm#dtddecl"><SAMP>DTDDECL</SAMP></A>
|
||
|
entry in a catalog,
|
||
|
the following declaration will be implied:
|
||
|
<PRE>
|
||
|
<!SGML "ISO 8879:1986"
|
||
|
CHARSET
|
||
|
BASESET "ISO 646-1983//CHARSET
|
||
|
International Reference Version (IRV)//ESC 2/5 4/0"
|
||
|
DESCSET 0 9 UNUSED
|
||
|
9 2 9
|
||
|
11 2 UNUSED
|
||
|
13 1 13
|
||
|
14 18 UNUSED
|
||
|
32 95 32
|
||
|
127 1 UNUSED
|
||
|
CAPACITY PUBLIC "ISO 8879:1986//CAPACITY Reference//EN"
|
||
|
SCOPE DOCUMENT
|
||
|
SYNTAX
|
||
|
SHUNCHAR CONTROLS 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
|
||
|
18 19 20 21 22 23 24 25 26 27 28 29 30 31 127 255
|
||
|
BASESET "ISO 646-1983//CHARSET International Reference Version
|
||
|
(IRV)//ESC 2/5 4/0"
|
||
|
DESCSET 0 128 0
|
||
|
FUNCTION RE 13
|
||
|
RS 10
|
||
|
SPACE 32
|
||
|
TAB SEPCHAR 9
|
||
|
NAMING LCNMSTRT ""
|
||
|
UCNMSTRT ""
|
||
|
LCNMCHAR "-."
|
||
|
UCNMCHAR "-."
|
||
|
NAMECASE GENERAL YES
|
||
|
ENTITY NO
|
||
|
DELIM GENERAL SGMLREF
|
||
|
SHORTREF SGMLREF
|
||
|
NAMES SGMLREF
|
||
|
QUANTITY SGMLREF
|
||
|
ATTCNT 99999999
|
||
|
ATTSPLEN 99999999
|
||
|
DTEMPLEN 24000
|
||
|
ENTLVL 99999999
|
||
|
GRPCNT 99999999
|
||
|
GRPGTCNT 99999999
|
||
|
GRPLVL 99999999
|
||
|
LITLEN 24000
|
||
|
NAMELEN 99999999
|
||
|
PILEN 24000
|
||
|
TAGLEN 99999999
|
||
|
TAGLVL 99999999
|
||
|
FEATURES
|
||
|
MINIMIZE DATATAG NO
|
||
|
OMITTAG YES
|
||
|
RANK YES
|
||
|
SHORTTAG YES
|
||
|
LINK SIMPLE YES 1000
|
||
|
IMPLICIT YES
|
||
|
EXPLICIT YES 1
|
||
|
OTHER CONCUR NO
|
||
|
SUBDOC YES 99999999
|
||
|
FORMAL YES
|
||
|
APPINFO NONE>
|
||
|
</PRE>
|
||
|
<P>
|
||
|
with the exception that all characters that are neither significant
|
||
|
nor shunned will be assigned to DATACHAR.
|
||
|
<H2><A NAME="charset">Character sets</A></H2>
|
||
|
<P>
|
||
|
A character in a base character set is described either by giving its
|
||
|
number in a <i>universal</i> character set, or by specifying a minimum
|
||
|
literal.
|
||
|
The first 65536 character numbers in the <i>universal</i> character
|
||
|
set are assumed to be the same as in Unicode 2.0 (ISO/IEC 10646).
|
||
|
The remaining character numbers can be assigned in any way convenient.
|
||
|
<P>
|
||
|
The public identifier of a base character set can be associated
|
||
|
with an entity that describes it by using a
|
||
|
<SAMP>PUBLIC</SAMP>
|
||
|
entry in the catalog entry file.
|
||
|
The entity must be a fragment
|
||
|
of an SGML declaration
|
||
|
consisting of the
|
||
|
portion of a character set description,
|
||
|
following the DESCSET keyword,
|
||
|
that is, it must be a sequence of character descriptions,
|
||
|
where each character description specifies a described character
|
||
|
number, the number of characters and
|
||
|
either a character number in the universal character set, a minimum literal
|
||
|
or the keyword
|
||
|
<SAMP>UNUSED</SAMP>.
|
||
|
Character numbers in the universal character set can be as big as
|
||
|
99999999.
|
||
|
<P>
|
||
|
In addition OpenSP has built in knowledge of many character sets.
|
||
|
These are identified using the designating sequence in the
|
||
|
public identifier. The following designating sequences are
|
||
|
recognized:
|
||
|
<DL>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 4/0</SAMP>
|
||
|
<DD>
|
||
|
The full set of ISO 646 IRV.
|
||
|
This is not a registered character set,
|
||
|
but is recommended by ISO 8879 (clause 10.2.2.4).
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/8 4/0</SAMP>
|
||
|
<DD>
|
||
|
G0 set of ISO 646 IRV,
|
||
|
ISO Registration Number 2.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/8 4/2</SAMP>
|
||
|
<DD>
|
||
|
G0 set of ASCII,
|
||
|
ISO Registration Number 6.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/1 4/0</SAMP>
|
||
|
<DD>
|
||
|
C0 set of ISO 646,
|
||
|
ISO Registration Number 1.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/1</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-1
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/2</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-2
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/3</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-3
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/4</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-4
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/12</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-5
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/7</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-6
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/6</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-7
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/8</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-8
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/13 4/13</SAMP>
|
||
|
<DD>
|
||
|
G1 set of ISO 8859-9
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/8 4/10</SAMP>
|
||
|
<DD>
|
||
|
Roman set from JIS-X-0202.
|
||
|
JIS version of ISO 646.
|
||
|
ISO Registration Number 14.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/8 4/9</SAMP>
|
||
|
<DD>
|
||
|
Katakana set from JIS X 0201.
|
||
|
ISO Registration Number 13.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/4 4/2</SAMP>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/6 4/0 ESC 2/4 4/2</SAMP>
|
||
|
<DD>
|
||
|
JIS X 0208-1990.
|
||
|
ISO Registration Numbers 87 and 168.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/4 2/8 4/4</SAMP>
|
||
|
<DD>
|
||
|
JIS X 0212-1990.
|
||
|
ISO Registration Number 159.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/4 4/1</SAMP>
|
||
|
<DD>
|
||
|
GB 2312-80.
|
||
|
ISO Registration Number 58.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/4 2/8 4/3</SAMP>
|
||
|
<DD>
|
||
|
KS C 5601-1992.
|
||
|
ISO Registration Number 149.
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/0</SAMP>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/3</SAMP>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/5</SAMP>
|
||
|
<DD>
|
||
|
ISO/IEC 10646 UCS-2
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/1</SAMP>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/4</SAMP>
|
||
|
<DT>
|
||
|
<SAMP>ESC 2/5 2/15 4/6</SAMP>
|
||
|
<DD>
|
||
|
ISO/IEC 10646 UCS-4
|
||
|
</DL>
|
||
|
|
||
|
<H2>Concrete syntaxes</H2>
|
||
|
<P>
|
||
|
The public identifier for a public concrete syntax can be associated
|
||
|
with an entity that describes using a
|
||
|
<SAMP>PUBLIC</SAMP>
|
||
|
entry in the catalog entry file.
|
||
|
The entity must be a fragment of an SGML declaration
|
||
|
consisting of a concrete syntax description
|
||
|
starting with the
|
||
|
<SAMP>SHUNCHAR</SAMP>
|
||
|
keyword
|
||
|
as in an SGML declaration.
|
||
|
The entity can also make use of the following extensions:
|
||
|
<UL>
|
||
|
<LI>
|
||
|
The Extended Naming Rules extensions can be used regardless of the minimum
|
||
|
literal used in the SGML declaration.
|
||
|
<LI>
|
||
|
An
|
||
|
<I>added function</I>
|
||
|
can be expressed as a parameter literal
|
||
|
instead of a name.
|
||
|
<LI>
|
||
|
The replacement for a reference reserved name
|
||
|
can be expressed as a parameter literal instead of a name.
|
||
|
<LI>
|
||
|
The total number of characters specified for
|
||
|
<SAMP>UCNMCHAR</SAMP>
|
||
|
or
|
||
|
<SAMP>UCNMSTRT</SAMP>
|
||
|
may exceed the total number of characters specified for
|
||
|
<SAMP>LCNMCHAR</SAMP>
|
||
|
or
|
||
|
<SAMP>LCNMSTRT</SAMP>
|
||
|
respectively.
|
||
|
Each character in
|
||
|
<SAMP>UCNMCHAR</SAMP>
|
||
|
or
|
||
|
<SAMP>UCNMSTRT</SAMP>
|
||
|
which does not have a corresponding character in the same position in
|
||
|
<SAMP>LCNMCHAR</SAMP>
|
||
|
or
|
||
|
<SAMP>LCNMSTRT</SAMP>
|
||
|
is simply assigned to <SAMP>UCNMCHAR</SAMP> or <SAMP>UCNMSTRT</SAMP>
|
||
|
without making it the upper-case form of any character.
|
||
|
<LI>
|
||
|
Within the specification of the short reference delimiters,
|
||
|
a parameter literal containing exactly one character
|
||
|
may be followed by the delimiter <SAMP>-</SAMP>
|
||
|
and another parameter literal containing exactly one character.
|
||
|
This has the same meaning as a sequence of parameter literals
|
||
|
one for each character number that is greater than or equal
|
||
|
to the number of the character in the first parameter literal
|
||
|
and less than or equal to the number of the character in the
|
||
|
second parameter literal.
|
||
|
<LI>
|
||
|
A number may be used as a delimiter in the
|
||
|
<SAMP>DELIM</SAMP>
|
||
|
section with the same meaning as a parameter literal
|
||
|
containing just a numeric character reference with that number.
|
||
|
</UL>
|
||
|
<H2>Capacity sets</H2>
|
||
|
<P>
|
||
|
The public identifier for a public capacity set can be associated
|
||
|
with an entity that describes using a
|
||
|
<SAMP>PUBLIC</SAMP>
|
||
|
entry in the catalog entry file.
|
||
|
The entity must be a fragment of an SGML declaration
|
||
|
consisting of a sequence of capacity names and numbers.
|
||
|
</BODY>
|
||
|
</HTML>
|