RSS

Definition of EOF and how to use it effectively

The use and meaning of EOF seems to cause a lot of confusion with some new coders, hopefully this explanation will help you

understand better. Before I go into too much detail about what EOF is, I'll tell you what it isn't.

EOF is NOT:

# A char

# A value that exists at the end of a file

# A value that could exist in the middle of a file

And now to what it actually is.

EOF is a macro defined as an int with a negative value. It is normally returned by functions that perform read operations to denote

either an error or end of input. Due to variable promotion rules (discussed in detail later), it is important to ensure you use an int to

store the return code from these functions, even if the function appears to be returning a char, such as getchar() or fgetc().

Here are some code examples that you might use:

int c;

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}



int ch;

while ((ch = cin.get()) != EOF)
{
cout <<(char)ch;
}



char to int Promotion

By definition an int is larger than a char, therefore a negative valued int can never hold the same value as a char. However, when you

compare an int with a char, the char will get promoted to an int to account for the difference in size of the variables. The value of a

promoted char is affected by its sign, and unfortunately, a char can be either signed or unsigned by default, this is compiler

dependant.

To understand this better, let's look at the representation of a few numbers in both ints and chars.

The following assumes 2 byte ints (your compiler might use a larger amount). A char uses only 1 byte (this will be the same amount

on your compiler). With the exception of the first column, the values are shown in hexadecimal.

------------------------------------------ --------------------------------------
| char and int comparison | | char to int promotion |
------------------------------------------- --------------------------------------
| Decimal | int | char | | char | unsigned | signed |
|---------------|-------------|---------| |---------|-----------------|-------------|
| 2 | 00 02 | 02 | | 02 | 00 02 | 00 02 |
| 1 | 00 01 | 01 | | 01 | 00 01 | 00 01 |
| 0 | 00 00 | 00 | | 00 | 00 00 | 00 00 |
| -1 | FF FF | FF | | FF | 00 FF | FF FF |
| -2 | FF FE | FE | | FE | 00 FE | FF FE |
----------------------------------------- -------------------------------------------


The "char to int promotion" table makes it clear that the sign of a char produces a very different number in the int.

So what does all this mean to me as a programmer?

Well, let's have a look at a revised version of the code shown above, this time incorrectly using a char variable to store the return

code from fgetc().

char c;

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}



Now let's assume that within the file we are reading from is a byte with value 0xff. fgetc() returns this value within an int, so it looks like

this: 0x00 0xff (again, I'm assuming 2 byte ints). To store this value in a char, it must be demoted, and the char value becomes 0xff.

Next, the char c is compared with the int EOF. Promotion rules apply, and c must be promoted to an int. However, in the sample code,

the sign of c isn't explicitly declared, so we don't know if it's signed or unsigned, so the int value could become either 0xff 0xff or 0x00

0xff. Therefore, the code is is not guaranteed to work in the way we require.

The following is a short program to help show the promotion:

#include

int main(void)
{
int i = -1;
signed char sc = 0xff;
unsigned char usc = 0xff;

printf ("Comparing %x with %x\n", i, sc);
if (i == sc) puts("i == sc");
else puts("i != sc");
putchar ('\n');
printf ("Comparing %x with %x\n", i, usc);
if (i == usc) puts("i == usc");
else puts("i != usc");

return 0;
}

/*
* Output

Comparing ffff with ffff <--- Notice this has been promoted
i == sc

Comparing ffff with ff
i != usc

*
*/

Another scenario to consider is where the char is unsigned. In this case, the process of demoting and promoting the returned value

from fgetc() will have the affect of corrupting the EOF value, and the program will get stuck in a infinite loop. Let's follow that process

through:

- EOF (0xff 0xff) is returned by fgetc() due to end of input
- Value demoted to 0xff to be stored in unsigned char c
- unsigned char c promoted to an int, value goes from 0xff to 0x00 0xff
- EOF is compared with c, meaning comparison is between 0xff 0xff and 0x00 0xff.
- The result is FALSE (the values are different), which is undesirable.
- fgetc() is called again, and still returns EOF. The endless loop begins.



The following code demonstrates this problem.

#include

int main(void)
{
FILE *fp;
unsigned char c;

if ((fp = fopen("myfile.txt", "rb")) == NULL)
{
perror ("myfile.txt");
return 0;
}

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}

fclose(fp);
return 0;
}







Why it's bad to use feof() to control a loop ???




When reading in a file, and processing it line by line, it's logical to think of the code loop as "while not at the end of the file, read and

process data". This often ends up looking something like this:

i = 0;

while (!feof(fp))
{
fgets(buf, sizeof(buf), fp);
printf ("Line %4d: %s", i, buf);
i++;
}



This apparently simple snippet of code has a bug in it, though. The problem stems from the method feof() uses to determine if EOF

has actually been reached. Let's have a look at the C standard:

The feof function

Synopsis

1 #include
int feof(FILE *stream);

Description
2 The feof function tests the end-of-file indicator for the stream pointed to by stream.

Returns
3 The feof function returns nonzero if and only if the end-of-file indicator is set for stream.



Do you see the problem yet? The function tests the end-of-file indicator, not the stream itself. This means that another function is

actually responsible for setting the indicator to denote EOF has been reached. This would normally be done by the function that

performed the read that hit EOF. We can then follow the problem to that function, and we find that most read functions will set EOF

once they've read all the data, and then performed a final read resulting in no data, only EOF.

With this in mind, how does it manifest itself into a bug in our snippet of code? Simple... as the program goes through the loop to get

the last line of data, fgets() works normally, without setting EOF, and we print out the data. The loop returns to the top, and the call to

feof() returns FALSE, and we start to go through the loop again. This time, the fgets() sees and sets EOF, but thanks to our poor logic,

we go on to process the buffer anyway, without realising that its content is now undefined (most likely untouched from the last loop).

This problem results in the last line being printed twice. Now, with the various code and compilers I've tried, I've seen varying results

when using this poor quality code. Some give the wrong answer as described here, but some do seem to get it right, and print the last

line only once.

Here is a full example of the broken code. It's pointless providing sample results, as they're not necessarily going to be the same as

yours. However, if you compile this code, and run it against an empty file (0 bytes), it should output nothing. If it's doing it wrong, as I

expect it will, you'll get a line similar to this:

Line 0: Garbage

Here, Garbage was left in the buffer from the initialisation, but should not have been printed. Anyway, enough talk, here's the code.

#include
#include

#define MYFILE "junk1.txt"

int main(void)
{
FILE *fp;
char buf[BUFSIZ] = "Garbage";
int i;

if ((fp = fopen(MYFILE, "r")) == NULL)
{
perror (MYFILE);
return (EXIT_FAILURE);
}

i = 0;

while (!feof(fp))
{
fgets(buf, sizeof(buf), fp);
printf ("Line %4d: %s", i, buf);
i++;
}

fclose(fp);

return(0);
}



To correct the problem, always follow this rule: use the return code from the read function to determine when you've hit EOF. Here is a

revised edition of the same code, this time checking the return code from fgets() to determine when the read fails. The code is exactly

the same, except for the loop.

#include
#include

#define MYFILE "junk1.txt"

int main(void)
{
FILE *fp;
char buf[BUFSIZ] = "Garbage";
int i;

if ((fp = fopen(MYFILE, "r")) == NULL)
{
perror (MYFILE);
return (EXIT_FAILURE);
}

i = 0;

while (fgets(buf, sizeof(buf), fp) != NULL)
{
printf ("Line %4d: %s", i, buf);
i++;
}

fclose(fp);

return(0);
}



When this is run against an empty file (0 bytes), it will not print anything.

Here are some other read functions being used to control loops:

total = 0;

while (fscanf(fp, "%d", &num) == 1)
{
total += num;
}

printf ("Total is %d\n", total);



int c;

while ((c = fgetc(fp)) != EOF)
{
putchar (c);
}



  • Digg
  • Del.icio.us
  • StumbleUpon
  • Reddit
  • RSS

0 comments:

Post a Comment