Skip to content

Commit c7edb9f

Browse files
committed
Update perlebcdic.pod
1 parent f1a8d7d commit c7edb9f

File tree

1 file changed

+42
-13
lines changed

1 file changed

+42
-13
lines changed

pod/perlebcdic.pod

Lines changed: 42 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,11 @@ on EBCDIC based computers.
1111

1212
Portions of this document that are still incomplete are marked with XXX.
1313

14-
Early Perl versions worked on some EBCDIC machines, but the last known
15-
version that ran on EBCDIC was v5.8.7, until v5.22, when the Perl core
16-
again works on z/OS. Theoretically, it could work on OS/400 or Siemens'
17-
BS2000 (or their successors), but this is untested. In v5.22 and 5.24,
18-
not all
19-
the modules found on CPAN but shipped with core Perl work on z/OS.
14+
Early Perl versions worked on some EBCDIC machines, but after v5.8.7,
15+
until v5.22, it likely didn't. Theoretically, it could work on OS/400
16+
or Siemens' BS2000 (or their successors), but this is untested. In
17+
v5.22 and 5.24, not all the modules found on CPAN but shipped with core
18+
Perl work on z/OS.
2019

2120
If you want to use Perl on a non-z/OS EBCDIC machine, please let us know
2221
at L<https://github.com/Perl/perl5/issues>.
@@ -35,7 +34,7 @@ If your code just uses the 52 letters A-Z and a-z, plus SPACE, the
3534
digits 0-9, and the punctuation characters that Perl uses, plus a few
3635
controls that are denoted by escape sequences like C<\n> and C<\t>, then
3736
there's nothing special about using Perl, and your code may very well
38-
work on an ASCII machine without change.
37+
work on an EBCDIC machine without change.
3938

4039
But if you write code that uses C<\005> to mean a TAB or C<\xC1> to mean
4140
an "A", or C<\xDF> to mean a "E<yuml>" (small C<"y"> with a diaeresis),
@@ -95,7 +94,7 @@ Most are for European languages, but there are also ones for Arabic,
9594
Greek, Hebrew, and Thai. There are good references on the web about
9695
all these.
9796

98-
=head2 Latin 1 (ISO 8859-1)
97+
=head3 Latin 1 (ISO 8859-1)
9998

10099
A particular 8-bit extension to ASCII that includes grave and acute
101100
accented Latin characters. Languages that can employ ISO 8859-1
@@ -109,6 +108,19 @@ to ASCII and is commonly encountered in World Wide Web work.
109108
In IBM character code set identification terminology, ISO 8859-1 is
110109
also known as CCSID 819 (or sometimes 0819 or even 00819).
111110

111+
Unicode uses ASCII plus Latin 1 as its base, adding many many more
112+
characters.
113+
114+
=head3 Other ISO 8859-1 encodings
115+
116+
Every one of these encodings include every character in ASCII (encoded
117+
identically); the differences are in the additional characters added,
118+
which are tailored for the language(s) the encoding is designed to
119+
support.
120+
121+
To access these, the locale system of Perl must be used. See
122+
L<perllocale>.
123+
112124
=head2 EBCDIC
113125

114126
The Extended Binary Coded Decimal Interchange Code refers to a
@@ -127,7 +139,8 @@ Some IBM EBCDIC character sets may be known by character code set
127139
identification numbers (CCSID numbers) or code page numbers.
128140

129141
Perl can be compiled on platforms that run any of three commonly used EBCDIC
130-
character sets, listed below.
142+
character sets, listed below. (And it should be easy to add additional
143+
ones, except for the inevitable glitches that could crop up.)
131144

132145
=head3 The 13 variant characters
133146

@@ -146,6 +159,18 @@ mistakenly and silently choose one of the three.
146159
The Line Feed (LF) character is actually a 14th variant character, and
147160
Perl checks for that as well.
148161

162+
These variant characters are the main reason that EBCDIC can't be
163+
handled by Perl's L<locale system|perllocale>. All the characters are
164+
used all over the place in Perl programs. When you type one of them in
165+
at your keyboard, its meaning must be what you expect it to be; which
166+
could easily be violated if another code page is in use. Therefore the
167+
Perl interpreter must be compiled for a particular code page.
168+
169+
(The implementation is mostly table driven. If a new code page needed
170+
to be added, simply add a new table to F<regen/charset_translations.pl>
171+
that translates from ASCII to the new page, and then regenerate. And
172+
then go deal with any glitches.
173+
149174
=head3 EBCDIC code sets recognized by Perl
150175

151176
=over
@@ -157,6 +182,9 @@ characters (i.e. ISO 8859-1) to an EBCDIC set. 0037 is used
157182
in North American English locales on the OS/400 operating system
158183
that runs on AS/400 computers. CCSID 0037 differs from ISO 8859-1
159184
in 236 places; in other words they agree on only 20 code point values.
185+
All but one of those is a control character. The only printable
186+
character that has the same ordinal number in this code page (and the
187+
others below) as ASCII is the PILCROW SIGN, C<E<182>>.
160188

161189
=item B<1047>
162190

@@ -168,10 +196,11 @@ and from ISO 8859-1 in 236.
168196

169197
=item B<POSIX-BC>
170198

171-
The EBCDIC code page in use on Siemens' BS2000 system is distinct from
172-
1047 and 0037. It is identified below as the POSIX-BC set.
173-
Like 0037 and 1047, it is the same as ISO 8859-1 in 20 code point
174-
values.
199+
This code page is no longer generated (although it would be easy to
200+
re-enable it). The Siemens' BS2000 systems which used it have been
201+
discontinued. It is distinct from 1047 and 0037, and is identified
202+
below as the POSIX-BC set. Like 0037 and 1047, it is the same as ISO
203+
8859-1 in 20 code point values.
175204

176205
=back
177206

0 commit comments

Comments
 (0)