UNIX Utility Recreation, cat: Part One

[linkstandalone]

The first utility I decided to recreate was cat. While I am sure that the overwhelming majority of people reading this post are familiar with how cat works, however in the interest of padding the word count I'll explain it anyway. Cat is a shell utility that concatenates the contents of one or more files, and displays that onto stdout, it also works with input from standard in. It is an extremely simple utility, almost as simple as echo. This simplicity is the reason I chose to replicate cat first, as it seemed to be the most approachable of the shell utilities from a design standpoint. It also has quite a few options and flags that while simple, may prove to be an interesting challenge. I will go over those options in the next part, for now let's just focus on the most basic functionality.

In this case, the basic functionality can be broken down like this:

  1. Check the argument list
    1. If no arguments are encountered, echo stdin
    2. If arguments are encountered, move to step 2
  2. Parse the arguments by iterating over them
    1. if a flag is encountered, process it (in this case "processing" just means skip)
    2. if a file name or "-" is encountered, add it to the array of files to concatenate
  3. iterate over the list of files to concatenate, add echo stdin when "-" is encountered
  4. output the contents of each file concatenated
  5. exit
I made a few simplifications to this process in my implementation, as we will see shortly. My implementation gives the same results as GNU cat with no flags, verified by sending the output to a file from each and running diff on the files, but has some issues handling signals, such as CTRL-D, which in GNU cat signals to quit echoing stdin and move to the next file in the list. I hope to resolve this issue before the next stage of this utility, and may have to examine the source of gnu cat earlier than anticipated to ensure I am doing things right. This is a learning exercise after all, so I am not too upset about it if I do. I just don't want to make that a habit, so I'm not just copy pasting what already exists.

Let's get into the meat and potatoes of my attempt at recreating this iconic utility. The first feature that I implemented was simply echoing input from stdin to stdout. This function was trivial, all I did was use fgets into an input buffer. I have a defined buffer size that the input buffer is set as, and that fgets uses to check input. This will fail in an input is received that is larger than the buffer size.

Next, I implemented simple file output. My solution is extremely simple, and is included here so that we may all mock it together.


int simpleCat(const char * filename) {
	FILE *input_fp;
	char line_buffer[BUFFER_SIZE];
	char *stat;
	const int chars_to_read = BUFFER_SIZE;
	input_fp = fopen(filename, "r");
	if (input_fp == NULL) {
		printf("Unable to open file: %s", filename);
		return -127;
	}

	while(1) {
		stat = fgets(line_buffer, chars_to_read, input_fp);
		if (stat == NULL) {
			if ( feof(input_fp) != 0) {
				break;
			} else {
				printf("Unable to read from file: %s", filename);
				return -126;
			}
		}
		fputs(line_buffer, stdout);
	}

	return 0;
}

As you can see, this is an extremely brutal solution, all it does is simply open a file, ensure the file was opened correctly, then parse the file line by line and output each line once a good read is confirmed. Then terminates when EOF is reached. I also have some return codes, however this is so simple that there is no real need for them, as it will just exit after printing the error.

Finally we can take a brief look at how I am handling arguments. Again, it is an extremely naive and simplistic solution.


int main(int argc, char* argv[]) {
	if (argc > 1 && strcmp(argv[1], "-") != 0) {
		size_t optind;
		int i = 0;
		for(optind = 1; optind < argc; optind++) {
			if (argv[optind][0] == '-' && strlen(argv[optind]) == 1) {
				echoStandardIn();
			} else if (argv[optind][0] != '-')  {
				int stat = simpleCat((const char*) argv[optind]);
				if (stat != 0) {
					exit(stat);
				}
			}
		}
	}
	else {
		echoStandardIn();
	}
	return 0;
}

This most certainly won't win any awards for elegance, however it does get the job done. Now you can see the simplification I made to the functionality stated above. Instead of handling the flags, and shunting the file list into a temporary array, we simple treat argv as the file list, and skip over anything that looks like a flag. This will no work in the second stage, as I will have to know what flags have been sent, and how that affects my output once I reach that step. However, this solution performs as expected in the tests that I have run. I am sure there is something I am missing, however, I am pleased with my results thus far.

Conclusion

This project is going smoothly, but after just this little bit, I have seen some issues as far as validation and testing. Ideally, I would like to have a test suite that looks for things like buffer overflows and other weird issue, as well as automates validating my outputs with those of the GNU utilities. But I have no idea how to write something like that currently. I am going to look at testing code like this and try to recreate them before moving on to the next stage. This early I am comfortable taking breaks to make my life better in the long run. Also the signal issue has made me realize that I might be missing some key knowledge for developing shell utilities like this, so I will also be on a knowledge hunt. If anyone has suggestions for a decent testing framework, or were I can find knowledge about how shell utilities are supposed to talk to the OS as a whole, feel free to contact me and let me know.

Anyway, thank you all for reading and I hope to see you in the next part!