@@ -11,12 +11,11 @@ on EBCDIC based computers.
1111
1212Portions of this document that are still incomplete are marked with XXX.
1313
14- Early Perl versions worked on some EBCDIC machines, but the last known
15- version that ran on EBCDIC was v5.8.7, until v5.22, when the Perl core
16- again works on z/OS. Theoretically, it could work on OS/400 or Siemens'
17- BS2000 (or their successors), but this is untested. In v5.22 and 5.24,
18- not all
19- the modules found on CPAN but shipped with core Perl work on z/OS.
14+ Early Perl versions worked on some EBCDIC machines, but after v5.8.7,
15+ until v5.22, it likely didn't. Theoretically, it could work on OS/400
16+ or Siemens' BS2000 (or their successors), but this is untested. In
17+ v5.22 and 5.24, not all the modules found on CPAN but shipped with core
18+ Perl work on z/OS.
2019
2120If you want to use Perl on a non-z/OS EBCDIC machine, please let us know
2221at L<https://github.com/Perl/perl5/issues>.
@@ -35,7 +34,7 @@ If your code just uses the 52 letters A-Z and a-z, plus SPACE, the
3534digits 0-9, and the punctuation characters that Perl uses, plus a few
3635controls that are denoted by escape sequences like C<\n> and C<\t>, then
3736there's nothing special about using Perl, and your code may very well
38- work on an ASCII machine without change.
37+ work on an EBCDIC machine without change.
3938
4039But if you write code that uses C<\005> to mean a TAB or C<\xC1> to mean
4140an "A", or C<\xDF> to mean a "E<yuml>" (small C<"y"> with a diaeresis),
@@ -95,7 +94,7 @@ Most are for European languages, but there are also ones for Arabic,
9594Greek, Hebrew, and Thai. There are good references on the web about
9695all these.
9796
98- =head2 Latin 1 (ISO 8859-1)
97+ =head3 Latin 1 (ISO 8859-1)
9998
10099A particular 8-bit extension to ASCII that includes grave and acute
101100accented Latin characters. Languages that can employ ISO 8859-1
@@ -109,6 +108,19 @@ to ASCII and is commonly encountered in World Wide Web work.
109108In IBM character code set identification terminology, ISO 8859-1 is
110109also known as CCSID 819 (or sometimes 0819 or even 00819).
111110
111+ Unicode uses ASCII plus Latin 1 as its base, adding many many more
112+ characters.
113+
114+ =head3 Other ISO 8859-1 encodings
115+
116+ Every one of these encodings include every character in ASCII (encoded
117+ identically); the differences are in the additional characters added,
118+ which are tailored for the language(s) the encoding is designed to
119+ support.
120+
121+ To access these, the locale system of Perl must be used. See
122+ L<perllocale>.
123+
112124=head2 EBCDIC
113125
114126The Extended Binary Coded Decimal Interchange Code refers to a
@@ -127,7 +139,8 @@ Some IBM EBCDIC character sets may be known by character code set
127139identification numbers (CCSID numbers) or code page numbers.
128140
129141Perl can be compiled on platforms that run any of three commonly used EBCDIC
130- character sets, listed below.
142+ character sets, listed below. (And it should be easy to add additional
143+ ones, except for the inevitable glitches that could crop up.)
131144
132145=head3 The 13 variant characters
133146
@@ -146,6 +159,18 @@ mistakenly and silently choose one of the three.
146159The Line Feed (LF) character is actually a 14th variant character, and
147160Perl checks for that as well.
148161
162+ These variant characters are the main reason that EBCDIC can't be
163+ handled by Perl's L<locale system|perllocale>. All the characters are
164+ used all over the place in Perl programs. When you type one of them in
165+ at your keyboard, its meaning must be what you expect it to be; which
166+ could easily be violated if another code page is in use. Therefore the
167+ Perl interpreter must be compiled for a particular code page.
168+
169+ (The implementation is mostly table driven. If a new code page needed
170+ to be added, simply add a new table to F<regen/charset_translations.pl>
171+ that translates from ASCII to the new page, and then regenerate. And
172+ then go deal with any glitches.
173+
149174=head3 EBCDIC code sets recognized by Perl
150175
151176=over
@@ -157,6 +182,9 @@ characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used
157182in North American English locales on the OS/400 operating system
158183that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1
159184in 236 places; in other words they agree on only 20 code point values.
185+ All but one of those is a control character. The only printable
186+ character that has the same ordinal number in this code page (and the
187+ others below) as ASCII is the PILCROW SIGN, C<E<182>>.
160188
161189=item B<1047>
162190
@@ -168,10 +196,11 @@ and from ISO 8859-1 in 236.
168196
169197=item B<POSIX-BC>
170198
171- The EBCDIC code page in use on Siemens' BS2000 system is distinct from
172- 1047 and 0037. It is identified below as the POSIX-BC set.
173- Like 0037 and 1047, it is the same as ISO 8859-1 in 20 code point
174- values.
199+ This code page is no longer generated (although it would be easy to
200+ re-enable it). The Siemens' BS2000 systems which used it have been
201+ discontinued. It is distinct from 1047 and 0037, and is identified
202+ below as the POSIX-BC set. Like 0037 and 1047, it is the same as ISO
203+ 8859-1 in 20 code point values.
175204
176205=back
177206
0 commit comments