Reference
The material in this page of notes is organized after "C Programming,
A Modern Approach" by K.N. King, Chapter 13. Some content may also be
from other sources.
Overview
- The C language does not have a specific "String" data type, the way some other languages such as C++ and Java do.
- Instead C stores strings of characters as arrays of chars, terminated by a null byte.
- This page of notes covers all the details of using strings of
characters in C, including how they are stored, how to read and write
them, common functions from the string library for dealing with them,
and how to process command-line arguments.
- Note that many of the concepts covered in this section also apply
to other situations involving combinations of arrays, pointers, and
functions.
String Literals
- A String Literal, also known as a string constant or constant string, is a string of characters enclosed in double quotes, such as "To err is human - To really foul things up requires a computer."
- String literals are stored in C as an array of chars, terminted by a null byte.
- A null byte is a char having a value of exactly zero, noted as '\0'.
- Do not confuse the null byte, '\0', with the character '0', the integer 0, the double 0.0, or the pointer NULL.
- String literals may contain as few as one or even zero characters.
- Do not confuse a single-character string literal, e.g. "A"
with a character constant, 'A'. The former is actually two characters,
because of the terminating null byte stored at the end.
- An empty string, "", consists of only the null byte, and is
considered to have a string length of zero, because the terminating null
byte does not count when determining string lengths.
- String literals may contain any valid characters, including escape
sequences such as \n, \t, etc. Octal and hexadecimal escape sequences
are technically legal in string literals, but not as commonly used as
they are in character constants, and have some potential problems of
running on into following text.
Passing String Literals to Functions
- String literals are passed to functions as pointers to a stored string.
- For example, given the statement:
printf( "Please enter a positive value for the angle: " );
the string literal "Please enter . . .
" will be stored somewhere in memory, and the address will be passed to
printf. The first argument to printf is actually defined as a char *.
Continuing a String Literal
If a string literal needs to be continued across multiple lines, there are three options:
- If the new line character is included between the double quotes, then it is included as part of the string:
printf( "This will
print over three
lines, ( and will include extra tabs or spaces. )" );
- Escaping the newline character with the backslash will cause it to be ignored:
printf( "This will \
print over a single \
line, ( but will still include extra tabs or spaces. )" );
- However the best approach is to note that when two strings
appear adjacent to one another, C will automatically concatenate them
into a single string:
printf( "This will "
"print over a single "
"line, ( without extra tabs or spaces! )" );
Operations on String Literals
- Character pointers may hold the address of string literals:
char *prompt = "Please enter the next number: ";
- String literals may be subscripted:
- "abc" [ 2 ] results in 'c'
- "abc" [ 0 ] results in 'a'
- "abc" [ 3 ] results in the null byte.
- Attempting to modify a string literal is undefined, and may cause problems in different ways depending on the compiler:
prompt[ 0 ] = 'S'; // From above. Don't do this.
String Variables
String variables are typically stored as arrays of chars, terminated by a null byte.
Initializing String Variables
String variables can be initialized either with individual characters or more commonly and easily with string literals:
char s1[ ] = { 'h', 'e', 'l', 'l', 'o', '\0' };
char s2[ ] = "hello"; // Array of size six
char s3[ 20 ] = "hello"; // Fifteen null bytes
char s4[ 5 ] = "hello"; // Five chars, sacrifices the null byte
char s5[ 3 ] = "hello"; // Illegal
Character Arrays versus Character Pointers
Character arrays and character pointers are often interchangeable,
but there can be some very important differences. Consider the following
two variables:
char s6[ ] = "hello", *s7 = "hello";
- s6 is a fixed constant address, determined by the compiler; s7 is a pointer variable, that can be changed to point elsewhere.
- s6 allocates space for exactly 6 bytes; s7 allocates space for
10 ( typically ) - 6 for the characters plus another 4 for the pointer
veriable.
- The contents of s6, can be changed, e.g. s6[ 0 ] = 'J'; The contents of s7 should not be changed.
Reading and Writing Strings
Writing Strings Using printf and puts
- The standard format specifier for printing strings with printf is %s
- The width parameter can be used to print a short string in a long space
- The .precision parameter limits the length of longer strings.
- int puts ( const char * str );
- Writes the string out to standard output ( the screen ) and automatically appends a newline character at the end.
This is a lower-level routine that printf, without formatting, and possibly faster.
Reading Strings Using scanf, gets, and fgets
- The standard format specifier for reading strings with scanf is %s
- scanf reads in a string of characters, only up to the first
non-whitespace character. I.e. it stops reading when it encounters a
space, tab, or newline character.
- scanf appends a null byte to the end of the character string stored.
- scanf does skip over any leading whitespace characters in order to find the first non-whitespace character.
- The width field can be used to limit the maximum number of characters read.
- The argument passed to scanf needs to be of type
pointer-to-character, i.e. char *, and is normally the name of a char
array. ( In this case the size of the array -1 should be given as the
width field, to make sure scanf can append the null byte without
overflowing the array.
- char * gets ( char * str );
- The gets function will read a string of characters from
standard input ( the keyboard ) until it encounters either a newline
character or the end of file.
- Unlike scanf, gets does not care about spaces in the input, treating them the same as any other character.
- Note that gets has no provision for limiting the number of characters read, leading to possible overflow problems.
- When gets encounters a new line character, it stops reading, but does NOT store it.
- A null byte is always appended to the end of the string of stored characters.
- char * fgets ( char * str, int num, FILE * stream );
- fgets is equivalent to gets, except for two additional arguments and one key difference:
- num provides for a limit on the maximum number of characters read
- The FILE * argument allows fgets to read from any file. stdin may be entered to read from the keyboard.
- fgets stops reading when it encounters a new line character, and it DOES store the newline character.
- A null byte is always appended to the end of the string of stored characters.
Reading Strings Character by Character
- Another option for reading in strings is to read them in character by character.
- This is useful when you don't know how long the string might be,
or if you want to consider other stopping conditions besides spaces and
newlines. ( E.g. stop on periods, or when two successive slashes, //,
are encountered. )
- The scanf format specifier for reading individual characters is %c
- If a width greater than 1 is given, then multiple characters are read, and stored in successive positions in a char array.
- Characters scanned this way are often stored in an array,
although the could also be read into an ordinary char variable for
further processing first.
Accessing the Characters in a String
Characters in a string can be acessed using either the [ ] notation or a char pointer.
Using the C String Library
- The C language includes a standard library full of routines for working with null-byte terminated arrays of chars.
- #include < string.h > to use any of these functions.
- CPlusPlus.com documents the library at http://cplusplus.com/reference/cstring/
- Some of the more commonly used and useful routines include:
- Returns the length of the string, not counting the null byte
- Copies characters from source to destination, until a null byte is eventually encountered
- Copies from source to destination, up to a maximum of num characters.
- Pads destination with extra null bytes if the length of source is smaller than num.
- May sacrifice the null byte if the source is longer than num.
- Concatenates ( appends ) source onto the end of destination.
- The terminating null byte of destination is over-written by the
first char of source, and a new null-byte is appended at the end of the
concatenated string.
- char * strncat ( char * destination, char * source, size_t num ); is a version that sets a limit on the number of characters concatenated.
- Compares strings str1 and str2, up until the first encountered null byte
- Returns zero if the two strings are equal
- Returns a positive value ( 1? ) if the first encountered difference has a larger value in str1 than str2
- Returns a negative value ( -1? ) if the first encountered differenc has a smaller value in str1 than str2
- int strncmp ( const char * str1, const char * str2, size_t num ); checks only up to num characters.
- The methods stricmp and strnicmp do case-insensitive comparisons, but may be available for C++ programs only.
- parses a string of numeric characters into a number of type int, double, long int, or long long int, respectively.
- Most often used when parsing command line arguments.
- Can also be used for rigorous checking of user input - Read in
an entire line into a char array using gets, fgets, or char-by-char,
then rigorously test all the characters in the string, ( e.g. are they
all numeric ), and then use one of these routines to convert the char
array into a number once all the checks have been made.
sprintf( char [ ], format [ , data . . . ] );
- "Prints" into a character array, similar to printf. Useful for
converting data from other data types into a character string and/or
combining things.
sscanf( char *, format [ , address . . . ] );
- Reads data from a character string, just as if scanf read it from the keyboard.
- One approach to rigorous input checking is to use getline( ) to
read in an entire line of input from a user, then examine it character
by character for any problems, e.g. non-numeric characters when trying
to read in a number. Then when the string has been verified, use scanf
to read in the data from the character string.
See also strchr, strrchr, strstr, and strtok for searching and tokenizing strings.
String Idioms
Searching for the End of a String
Copying a String
void strcpy( const char * s, char * d ) {
do {
*d++ = *s++;
} while( *s );
return;
}
Arrays of Strings
- There is a subtle difference between:
- a one-dimensional arrays of pointers to chars, ( char * [ ] ) and
- a two-dimensional array of chars, ( char [ ] [ ] )
- Both can be accessed using double subscripts
- However the storage is different.
Command Line Arguments
- int main( void );
- int main( int argc, char * argv[ ] );
- int main( int argc, char ** argv ); // Same as above
- argv is an array of character pointers, of length argc
- int main( int argc, char ** argv, char ** envp );
- envp is an array of character pointers, terminated by a NULL pointer
- Sample Program: argprinter.c
- If integers, doubles, etc. need to be read in from the command
line, use atoi, atof, etc. to convert them from character strings.
No comments:
Post a Comment