Concepts and Conventions

Overview

1. Data types
2. Data layout of structures
3. Unicode fragment view
4. Permitted characters
5. Character-type and numeric-type operands



1. Data types

The data type that can be interpreted as character-type in a Unicode program are:

C Character (letters, digits, special characters)
N Numeric character (digits)
D Date
T Time
STRING String
Zeichenartige Strukturen Structures that contain only fields of types C, N, D, or T - either , directly or in substructures

In an non-Unicode system, a character of this type has a length of 1 byte, and in a Unicode system a length corresponding to the length of one character on the relevant platform.

Variables of the types X and XSTRING are called byte type.

The main characteristics of the different kinds of structures are as follows:





2. Data layout of structures

For several data types such as I and F or object references, certain alignment requirements are in place that depend on the platform used. Fields of these types must begin and end at addresses in the memory that can be divided by 4 or 8. Under Unicode, this also applies to types of character type where the alignment is determined by the character length of the types.

Within structures, bytes can be inserted before or after components with alignment requirements to achieve the necessary alignment. These bytes are referred to as alignment. The beginning and the end of structures, substructures and includes is always aligned according to the component with the highest requirement. In the sample structure below that contains three fields, no alignments are built in a non-Unicode system or Unicode system.

BEGIN OF struc1,
  a(1) TYPE X,
  b(1) TYPE X,
  c(6) TYPE C,
END OF struc1.

In the next example, however, alignments are built in a Unicode system but not in a non-Unicode system. The first alignment gap is built because of the alignment of structure struc3, the second because of the alignment of the C field c, and the third because of the addressing of integer d.

BEGIN OF struc2,
  a(1) TYPE X,
  BEGIN OF struc3,
    b(1) TYPE X,
    c(6) TYPE C,
  END OF struc3,
  d    TYPE I,
END OF struc2.

Non-Unicode System    [ a | b | cccccc ]
Unicode System     [ a | A | b | A | cccccccccccc | AA | dddd ]
               |        struc3        |



3. Unicode fragment view

The data layout of structures is relevant to Unicode program checks with regard to the reliability of assignments and comparisons, for example. This data layout is represented in the Unicode fragment view. The fragment view breaks down the structure into alignment gaps, in byte and character-type areas, and all other types such as P, I, F, strings, references or internal tables.

Juxtaposed character-type components of a structure except strings are internally combined into a group if no alignment gaps exist between these components. All possible alignment requirements for characters are considered. Juxtaposed byte type components are grouped together in the same way.



BEGIN OF struc,
  a(2) TYPE C,
  b(4) TYPE N,
  c    TYPE D,
  d    TYPE T,
  e    TYPE F,
  f(2) TYPE X,
  g(4) TYPE X,
  h(8) TYPE C,
  i(8) TYPE C,
END OF struc.

In the following example, F1-F6 show the separate fragments of the structure struc:

[ aa|bbbb|cccc|ddd|AA|eeee|f|gg|AA|hhhhhhhh|iiiiiiii ]
[ F1 |F2| F3 | F4 |F5| F6 ]



4. Permitted characters

In a Unicode system, all ABAP program sources are also stored as Unicode. As in ABAP Objects, you may only use the following characters as identifiers in programs for which the Unicode flag is set:

For compatibility reasons, the characters %, $, ?, -, #, * and / are also permitted. You should, however, use them only in exceptional cases where you cannot avoid them.

To ensure that programs can be transported from a Unicode system to a non-Unicode system without any loss of information in the process of conversion, you should not use any characters for comments and literals even in a Unicode system that cannot be represented in an non-Unicode system.



5. Character-type and numeric-type operands

Up to now, you have been able to use flat structures as arguments of ABAP statements wherever single fields of type C were expected. In a Unicode program this is no longer generally permitted. In a Unicode program, you can use a structured field in a statement expecting a single field only if this structured field consists of character-type elementary types or purely character-type substructures. The structure is treated like a single field of type C.

The main restrictions applying to a Unicode system, in contrast to a non-Unicode system, result from the fact that flat structures are only considered character-type on a limited basis, and fields of type X or STRING are never considered character-type. In addition, flat structures are only considered character-type if they are purely character-type. Numeric-type arguments include, for example, offset or index specifications as in READ TABLE ... INDEX i. The following examples show a structure that is character-type and a structure that is not:

BEGIN OF struc1,,,,,,,,,BEGIN OF struc2,
a(2) TYPE C,,,,,,,,,,,  a(2) TYPE C,
b(2) TYPE C,,,,,Not,, ,,,,  n(6) TYPE N,,,  Character type
x(1) TYPE x,,,,,character type,,  d    TYPE D,
i    TYPE I,,,,,,,,,,,  t,,  TYPE T,
END OF struc1,,,,,,,,,,END OF struc2

Control breaks when using internal tables are another example, triggered by the AT keyword. In non-Unicode systems, fields of type X, to the right of the control key, are considered character-type and filled with an asterisk. Conversely, type X fields in Unicode systems are set to initial.