Tuesday, January 8, 2008

Trigraphs in C

Trigraphs are a sequence of three ISO 646 characters that get treated as if they were one character in the C alphabet; all of the trigraphs start with two question marks ?? which helps to indicate that ‘something funny’ is going on. Table below shows the trigraphs defined in the Standard.

C Character Trigraph
# ??=
[ ??(
] ??)
{ ??<
} ??>
\ ??/
| ??!
~ ??-
^ ??'

As an example, let's assume that your terminal doesn't have the # symbol. To write the preprocessor line "#define MAX 32767" isn't possible;

Then you must use trigraph notation instead: "??=define MAX 32767 "

Of course trigraphs will work even if you do have a # symbol; The trigraphs are there to help in difficult circumstances more than to be used for routine programming.


The ? ‘binds to the right’, so in any sequence of repeated ?s, only the two at the right could possibly be part of a trigraph, depending on what comes next—this disposes of any ambiguity.

It would be a mistake to assume that programs written to be highly portable would use trigraphs ‘in case they had to be moved to systems that only support ISO 646’. If your system can handle all 96 characters in the C alphabet, then that is what you should be using. Trigraphs will only be seen in restricted environments, and it is extremely simple to write a character-by-character translator between the two representations. However, all compilers that conform to the Standard will recognize trigraphs when they are seen.

* Trigraph substitution is the very first operation that a compiler performs on its input text.

3 comments:

Guhan M. said...

ur blog might b very useful for our 3rd yrs... but enakku idhellam salpi

Ravikumar said...

Gud info da.... Will be useful to our juniors......

Swaminathan said...

nice post da.. please dedicate ur blog towards technical side.. this is my kind request. or start a new technical blog..