KOI8-R
KOI8-R is an 8-bit character encoding, designed to cover Russian, which uses a Cyrillic alphabet. It also happens to cover Bulgarian, but has not been used for that purpose since CP1251 was accepted. A derivative encoding is KOI8-U, which adds Ukrainian characters. The original KOI-8 encoding was designed by Soviet authorities in 1974. KOI8 remains much more commonly used than ISO 8859-5, which never really caught on. Another common Cyrillic character encoding is Windows-1251. The use of these older code pages is being replaced with Unicode as a more common way to represent Cyrillic together with other languages.
In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878.[1]
KOI8 stands for Kod Obmena Informatsiey, 8 bit (Russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit".
The KOI8 character sets have the property that the Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order as in ISO 8859-5 or Unicode. Although this may seem unnatural, it has the useful property that if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI7. For instance, "Русский Текст" in KOI8-R becomes rUSSKIJ tEKST ("Russian Text") if the 8th bit is stripped; attempting to interpret the ASCII string rUSSKIJ tEKST as KOI7 yields "РУССКИЙ ТЕКСТ". KOI8 was based on Russian Morse code, which was created from Latin Morse code based on sound similarities, and which has the same connection to the Latin Morse codes for A-Z as KOI8 has with ASCII.
Codepage layout
The following table shows the KOI8-R encoding.[1] Each character is shown with its equivalent Unicode code point and its decimal code point.
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0_ |
||||||||||||||||
1_ |
||||||||||||||||
2_ |
SP 0020 32 |
! 0021 33 |
" 0022 34 |
# 0023 35 |
$ 0024 36 |
% 0025 37 |
& 0026 38 |
' 0027 39 |
( 0028 40 |
) 0029 41 |
* 002A 42 |
+ 002B 43 |
, 002C 44 |
- 002D 45 |
. 002E 46 |
/ 002F 47 |
3_ |
0 0030 48 |
1 0031 49 |
2 0032 50 |
3 0033 51 |
4 0034 52 |
5 0035 53 |
6 0036 54 |
7 0037 55 |
8 0038 56 |
9 0039 57 |
: 003A 58 |
; 003B 59 |
< 003C 60 |
= 003D 61 |
> 003E 62 |
? 003F 63 |
4_ |
@ 0040 64 |
A 0041 65 |
B 0042 66 |
C 0043 67 |
D 0044 68 |
E 0045 69 |
F 0046 70 |
G 0047 71 |
H 0048 72 |
I 0049 73 |
J 004A 74 |
K 004B 75 |
L 004C 76 |
M 004D 77 |
N 004E 78 |
O 004F 79 |
5_ |
P 0050 80 |
Q 0051 81 |
R 0052 82 |
S 0053 83 |
T 0054 84 |
U 0055 85 |
V 0056 86 |
W 0057 87 |
X 0058 88 |
Y 0059 89 |
Z 005A 90 |
[ 005B 91 |
\ 005C 92 |
] 005D 93 |
^ 005E 94 |
_ 005F 95 |
6_ |
` 0060 96 |
a 0061 97 |
b 0062 98 |
c 0063 99 |
d 0064 100 |
e 0065 101 |
f 0066 102 |
g 0067 103 |
h 0068 104 |
i 0069 105 |
j 006A 106 |
k 006B 107 |
l 006C 108 |
m 006D 109 |
n 006E 110 |
o 006F 111 |
7_ |
p 0070 112 |
q 0071 113 |
r 0072 114 |
s 0073 115 |
t 0074 116 |
u 0075 117 |
v 0076 118 |
w 0077 119 |
x 0078 120 |
y 0079 121 |
z 007A 122 |
{ 007B 123 |
| 007C 124 |
} 007D 125 |
~ 007E 126 |
|
8_ |
─ 2500 128 |
│ 2502 129 |
┌ 250C 130 |
┐ 2510 131 |
└ 2514 132 |
┘ 2518 133 |
├ 251C 134 |
┤ 2524 135 |
┬ 252C 136 |
┴ 2534 137 |
┼ 253C 138 |
▀ 2580 139 |
▄ 2584 140 |
█ 2588 141 |
▌ 258C 142 |
▐ 2590 143 |
9_ |
░ 2591 144 |
▒ 2592 145 |
▓ 2593 146 |
⌠ 2320 147 |
■ 25A0 148 |
∙ 2219 149 |
√ 221A 150 |
≈ 2248 151 |
≤ 2264 152 |
≥ 2265 153 |
NBSP 00A0 154 |
⌡ 2321 155 |
° 00B0 156 |
² 00B2 157 |
· 00B7 158 |
÷ 00F7 159 |
A_ |
═ 2550 160 |
║ 2551 161 |
╒ 2552 162 |
ё 0451 163 |
╓ 2553 164 |
╔ 2554 165 |
╕ 2555 166 |
╖ 2556 167 |
╗ 2557 168 |
╘ 2558 169 |
╙ 2559 170 |
╚ 255A 171 |
╛ 255B 172 |
╜ 255C 173 |
╝ 255D 174 |
╞ 255E 175 |
B_ |
╟ 255F 176 |
╠ 2560 177 |
╡ 2561 178 |
Ё 0401 179 |
╢ 2562 180 |
╣ 2563 181 |
╤ 2564 182 |
╥ 2565 183 |
╦ 2566 184 |
╧ 2567 185 |
╨ 2568 186 |
╩ 2569 187 |
╪ 256A 188 |
╫ 256B 189 |
╬ 256C 190 |
© 00A9 191 |
C_ |
ю 044E 192 |
а 0430 193 |
б 0431 194 |
ц 0446 195 |
д 0434 196 |
е 0435 197 |
ф 0444 198 |
г 0433 199 |
х 0445 200 |
и 0438 201 |
й 0439 202 |
к 043A 203 |
л 043B 204 |
м 043C 205 |
н 043D 206 |
о 043E 207 |
D_ |
п 043F 208 |
я 044F 209 |
р 0440 210 |
с 0441 211 |
т 0442 212 |
у 0443 213 |
ж 0436 214 |
в 0432 215 |
ь 044C 216 |
ы 044B 217 |
з 0437 218 |
ш 0448 219 |
э 044D 220 |
щ 0449 221 |
ч 0447 222 |
ъ 044A 223 |
E_ |
Ю 042E 224 |
А 0410 225 |
Б 0411 226 |
Ц 0426 227 |
Д 0414 228 |
Е 0415 229 |
Ф 0424 230 |
Г 0413 231 |
Х 0425 232 |
И 0418 233 |
Й 0419 234 |
К 041A 235 |
Л 041B 236 |
М 041C 237 |
Н 041D 238 |
О 041E 239 |
F_ |
П 041F 240 |
Я 042F 241 |
Р 0420 242 |
С 0421 243 |
Т 0422 244 |
У 0423 245 |
Ж 0416 246 |
В 0412 247 |
Ь 042C 248 |
Ы 042B 249 |
З 0417 250 |
Ш 0428 251 |
Э 042D 252 |
Щ 0429 253 |
Ч 0427 254 |
Ъ 042A 255 |
_0 | _1 | _2 | _3 | _4 | _5 | _6 | _7 | _8 | _9 | _A | _B | _C | _D | _E | _F |
References
- 1 2 "CPGID 00878". Code page identifiers. IBM. Retrieved 2012-10-25.
External links
- RFC 1489
- All about KOI8-R
- Universal Cyrillic decoder, an online program that may help recovering Cyrillic texts with broken KOI8-R or other character encodings.
- A brief history of Cyrillic encodings
- IBM CDRA