Processing Byte Strings and Character Strings

Byte Strings and Character Strings

Byte strings are the content of byte-type data objects. A byte-type data object always has a byte-type data type x or xstring.

Character strings are the content of character-type data objects. A character-type data object either has a character-type data type c, d, n, t, string or is a structure with purely character-type components. Before Release 6.10 any flat structures and byte-type data objects can be character-type.

Since Release 6.10 any flat structures and byte-type data objects can only be treated as character strings outside Unicode programs.

Instructions for Byte and Character String Processing

The following table lists the keywords for byte and character string processing and states which process is supported.

Keyword	Byte string processing	Character string processing
CLEAR ... WITH	X	X
CONCATENATE	X	X
CONDENSE	_	X
CONVERT TEXT	_	X
FIND	X	X
OVERLAY	_	X
REPLACE	X	X
SEARCH	X	X
SHIFT	X,	X
SPLIT	X,	X
TRANSLATE	_	X

Since Release 6.10, there is a clear difference between the processing of byte strings and character strings. Since Release 6.10, keywords that support both byte and character string processing have an optional addition:

... IN { BYTE | CHARACTER } MODE ...

This addition defines which process is carried out. If the addition is not specified, character string processing is carried out.

Before Release 6.10, this addition cannot be specified. The system always carries out character string processing. Flat structures and byte strings are treated as character strings (implicit casting). All statements for which explicit byte string processing is possible since Release 6.10 have the correct result when accessing byte strings even before Release 6.10, because one character is exactly one byte. As from Release 6.10, character string processing of byte strings is only possible outside Unicode programs.

Operands in Byte and Character String Processing

Operands in Byte String Processing

With byte string processing, that is when the IN BYTE MODE addition is used, the relevant operands must be byte-type. The system accesses byte-by-byte. This condition applies within and outside classes and in all programs (Unicode and non-Unicode programs).

Operands in Character String Processing

With character string processing , that is when the IN CHARACTER MODE addition or no addition is used, the relevant operands must be character-type. The system access character-by-character depending on the codepage used.

This condition is essential for character string processing to function properly, but is checked in different ways:

Before Release 6.10, the condition is checked strictly only within classes and outside classes sometimes only causes a syntax check warning.

As of Release 6.10, the condition is strictly checked within classes and in Unicode programs.

Note that in Unicode programs, the term character-type has a more specific meaning than in non-Unicode programs:

In Unicode programs only data objects of the character-type data types c, d, n, t, string or structures with purely character-type components can be character-type.

In non-Unicode programs any flat structures and byte-type data objects can also be character-type. Operands of non-character-type data types are treated like character-type data objects irrespective of their type.

In non-Unicode programs, that is especially all programs before Release 6.10, the last point allows character string processing of byte strings with the same results as the byte string processing as from Release 6.10, if the statement is appropriate.

Note

Because of the container problems mentioned under Byte Order you should not store byte-type contents in character-type fields.

The application help is available under:

Processing Character Strings