Real Computer Science begins where we almost stop reading ...: Structures, Unions, and Enumerated Types

Tuesday, 1 October 2013

Structures, Unions, and Enumerated Types

Structures

Recall that an array is a collection of data items, all having the same data type and accessed using a common name and an integer index into the collection.
A struct is also a collection of data items, except with a struct the data items can have different data types, and the individual fields within the struct are accessed by name instead of an integer index.
Structs are very powerful for bundling together data items that collectively describe a thing, or are in some other way related to each other.

Declaring New Structure Types and struct variables

In order to use variables of type struct, it is first necessary to define the particular type of struct to be used.

Tagged ( Named ) Structs

The most common and perhaps best way to define a new structure type involves naming it, with a tag, as shown in the following example:
struct Part {
	int number, on_hand;
	char name [ NAME_LEN + 1 ];
	double price;
};
In the example above, the tag for the newly defined struct is "Part", and the names of the fields within the struct are number, on_hand, name, and price. Note that the fields can have different types, although that is not necessary.

At this point "struct Part" is a valid data type, and so variables can be declared of this type:
          struct Part p1, p2;
It is also possible to create a struct type and declare variables of the type at the same time:
struct Student {
	int nClasses;
	char name [ NAME_LEN + 1 ];
	double gpa; 
} joe, sue, mary;
Now that struct Student is a declared type, additional variables of the same type can also be created:
 struct Student mike, carla;
It is also possible to create struct variables without creating a named struct type in the process, however in this case it is not possible to create additional variables of the same type, nor can they be passed to functions, etc. Even if other struct variables are created with the same data types and field names in the same order, they are not considered the same type:
struct {
	int nClasses;
	char name [ NAME_LEN + 1 ];
	double gpa; 
} alice, bill;  // alice and bill are of the same type, but not the same as struct Student
struct {
	int nClasses;
	char name [ NAME_LEN + 1 ];
	double gpa; 
} charlie, daniel; // charlie and daniel are the same type as each other, but not anyone else.
Use of typedef with structs

typedef is a powerful tool that allows programmers to define and then use their own data types. For example:

typedef int Integer;

Integer nStudents, nCourses, studentID;

Note that in the typedef statement, the newly defined type name goes in the place where a variable name would normally go.

There are a few benefits of using typedef with simple types such as the example above:

For readability, "Integer" may be easier to understand than "int".

The typedef can be easily changed later, ( say to "typedef long int Integer;" ), which would then affect all variables of the defined type.

However the real benefit of typedef comes into play with complex data structures, such as structs:

Scope of struct type definitions
The scope of struct type definitions ( with or without the use of typedef ) follows the same rules as variable declarations.

Obviously this means that the struct definition must be within scope before variables of that type can be declared.

If only one function needs to know about a particular struct definition, then it should be defined inside that function, but if more than one function needs to know the definition, then it should be defined globally. ( Even though the variables should still normally be declared locally. )
typedef struct Student {
	int nClasses;
	char name [ NAME_LEN + 1 ];
	double gpa; 
} Student;
Now that struct Student has been typedefed to the name "Student", additional variables of the same type can also be created:
 Student phil, georgina;
In this particular example the tag name and the newly defined type name are the same. This is allowed, but not required. In fact it is not even necessary to have the tag name, so the following would do the same thing:
typedef struct {
	int nClasses;
	char name [ NAME_LEN + 1 ];
	double gpa; 
} Student;

Initializing variables of type struct

Structs variables are initialized in much the same way as arrays, by enclosing multiple data fields in curly braces. In the simplest approach all fields must be initialized in the order declared in the struct, and if the struct is only partially initialized, then remaining fields are initialized to zero:

          Student john = { 3, "John Doe", 4.0 }, susie = { 5, "Susie Jones" }; // Susie has a gpa of 0.0

In c99 it is also possible to use designated initializers to initialize particular fields, in any order. Unnamed fields take the next position after their predecessor

          Student jane = { .gpa = 4.0 }, sue = { .name = "Sue Smith", 3.5 }; // Sue has 0 classes and a gpa of 3.5

Accessing data fields within structs

The dot operator ( period ) is used to access named fields within struct variables:

          john.nClasses = 4;

          totalGPA += sue.gpa;

Assigning one struct to another

If two variables are of the same struct type, then they can be directly assigned one to the other.
- See the variable definitions above to see why some of these are invalid.
- ( Note: There is a potential problem if the structs contain pointers. See below for details. )

          joe = sue;      // Legal, both of the same type
          mary = carla;   // Also legal, both of the same type
          alice = bill;   // Still legal.  Both the same unnamed type

          alice = charlie;  // Illegal.  Different types, even though identical

Nested Structs

Structs can be nested, either with previously defined structs or with new internally defined structs. In the latter case the struct names may not be necessary, but scoping rules still apply. ( I.e. if a new struct type is created inside another struct, then the definition is only known within that struct. ) For example:

		struct Date {
			int day, month, year; };
            
        struct Exam {
        	int room, nStudents;
            
            struct Date date;
            
            struct { 
            	int hour, minute;
                bool AM;
            } time;
            
            typedef struct Score{
            	int part1, part2, total; } Score;
                
            Score scores[ 100 ];
        };

In a nested struct situation, it may take multiple dot operators to fully qualify a basic data type:

        struct Exam CS107Exam2 = { 1000, 50, { 5, 4, 2013 }, { 6, 0, false } }; // All scores zero
            
		 CS107Exam2.nStudents = 90;
        CS107Exam2.date.month = April;
        CS107Exam2.time.hour = 6;
        CS107Exam2.scores[ i ].part2 = 20;

Structs and Arrays

As illustrated above:
- Structs can contain arrays
- Arrays can contain structs
- Structs and Arrays can be nested in arbitrary levels of complexity.
When dealing with complex combinations of arrays and structs, start with a variable name, and ask "What kind of data is this?"
- If the answer is "array", then apply the array subscript operator, [ ], and ask the question again.
- If the answer is "struct", then apply the dot operator, ., and ask the question again.
- Continue until a basic type ( or the desired type ) is reached, and then use it.

Structs and Functions

Structs may be passed to and from functions just like any basic data types.
Individual structs are always passed by value, even if they contain arrays.
Arrays of structs are passed by pointer / address, just like any other array, even if they only contain a single value.
- Some programmers use this trick to pass struct variables by pointer/address.
The struct definition must be known to both the calling and called functions, which means it is must be defined globally.
- Struct definitions are often included in header files, which get included using #include< >
- However the actual struct variables should still be declared locally, just like any other variables.

Structs and Pointers

Pointers can hold the address of any kind of data, including structs. For example, by analogy to ints:
```
        int i, j, *iptr = &i, k;
        struct Student alice, bob, *sptr = &alice, carol;
```
- In this example, the variable sptr is of type struct Student *, i.e. a pointer to a Student struct.
- Functions often use pointer-to-struct arguments, to avoid passing entire structures by value.
  - If the intention is for the function not to change the structure pointed to by the pointer, then it should be labelled const:
  - When calling such a function the address of the struct being pointed to is what is passed:
  - Unfortunately the dot operator has higher precedence than the pointer deference operator, which means the following would ( incorrectly ) try to access the thing pointed to by sptr.gpa, which does not work, because sptr is not a struct:
  - Instead a new operator is introduced, the "pointer to structure member reference operator", which accesses a field within a structure pointed to by a structure pointer:
Pointers can also point to specific fields of structures, in which case the type of the pointer must match the type of the field being pointed to:

     double *gpaPtr = &carol.gpa;  // Dot has higher precedence than & - same as &(carol.gpa)
     *gpaPtr = 4.0;

Structures can also contain pointers, either to basic types or to structs of the same or different types.
```
     struct Employee {
        char *name;
        struct Date * hiringDate;
     };
```
- This can cause potential problems if a struct containing a pointer is copied using the assignment operator, or passed by value to a function.
- The problem is that a simple assignment of one struct to another of the same type will copy the pointer, but not the data pointed to by the pointer, which means that both structs will now contain pointers to the same location. This is termed a "shallow copy". Any changes made via one pointer will be seen by the other pointer.
- The solution is to write a function that will perform a "deep copy", by first copying the data pointed to by the pointer, and then creating a new pointer that points to the newly copied data.
Many powerful and complicated data structures can be realized using pointers to structs containing pointers. These will be covered in a later section of these notes.

Struct bit fields

On occasion it is desired to hold a number of small integer items in a structure.
To save space, the fields within a structure are not required to occupy a full word. Instead, they can occupy a specified number of bits.
Multiple consecutive bit fields in a structure will share a single word of memory, insofar as each field fits completely. This reduces storage requirements, at the expense of slower execution.
If the next bit field does not fit in the currently unallocated portion of the current word, then it will be put entirely into the next word. The remainder of the current word will be wasted.
The size of a bit field is indicated by appending a colon and the number of desired bits after the field name.
If a bit field size is specified as zero, it forces the next bit field to be placed on a word boundary. These variables are more quickly accessed. The field name is not required for zero length bit fields.
Structure bit fields must be of an integral type. Most implementations treat them as unsigned.
Example:

     struct Packed_data {
        unsigned  int is_element:1;   /* = 1 if element *
        unsigned  int atomic_number:8;     /* Maximum 128 */
        unsigned  int is_reactant:1;
        unsigned  int is_product:1;
        unsigned  int is_catalyst:1;
        unsigned  int Stock_Index:16; /* Maximum 65,535 */
     } chemical_inventory[ 10000 ];

Each data item in the above array takes up one 32-bit word ( with four bits wasted ), for a total of 10,000 words of storage for the entire array, as opposed to 60,000 words of storage if bitfields were not used.

Enumerated Data Types

Enumerated data types are a special form of integers, with the following constraints:
- Only certain pre-determined values are allowed.
- Each valid value is assigned a name, which is then normally used instead of integer values when working with this data type.
Enumerated types, variables, and typedefs, operate similarly to structs:

     enum suits { CLUBS, HEARTS, SPADES, DIAMONDS, NOTRUMP } trump;
     enum suits ew_bid, ns_bid;

     typedef enum Direction{ NORTH, SOUTH, EAST, WEST } Direction;
     Direction nextMove = NORTH;

Values may be assigned to specific enum value names.
- Any names without assigned values will get one higher than the previous entry.
- If the first name does not have an assigned value, it gets the value of zero.
- It is even legal to assign the same value to more than one name.
- Example:
Because enumerated data types are integers, they can be used anywhere integers are allowed. One of the best places in in switch statements:

     switch( nextMove ) {
     
          case NORTH:
               y++;
               break;
          
          // etc.

The compiler will allow the use of ordinary integers with enumerated variables, e.g. trump = 2; , but it is bad practice.

Unions

Unions are declared, created, and used exactly the same as struts, EXCEPT for one key difference:
- Structs allocate enough space to store all of the fields in the struct. The first one is stored at the beginning of the struct, the second is stored after that, and so on.
- Unions only allocate enough space to store the largest field listed, and all fields are stored at the same space - The beginnion of the union.
This means that all fields in a union share the same space, which can be used for any listed field but not more than one of them.
In order to know which union field is actually stored, unions are often nested inside of structs, with an enumerated type indicating what is actually stored there. For example:

     typedef struct Flight {
        enum { PASSENGER, CARGO } type;
        union {
            int npassengers;
            double tonnages;  // Units are not necessarily tons.
        } cargo;
    } Flight;
    
    Flight flights[ 1000 ];
    
    flights[ 42 ].type = PASSENGER;
    flights[ 42 ].cargo.npassengers = 150;
    
    flights[ 20 ].type = CARGO;
    flights[ 20 ].cargo.tonnages = 356.78;

The example above does not actually save any space, because the 4 bytes saved by using a union instead of a struct for the cargo is lost by the int needed for the enumerated type. However a lot of space could potentially be saved if the items in the union were larger, such as nested structs or large arrays.
Unions are sometimes also used to break up larger data items into smaller pieces, such as this code to extract four 8-bit bytes from a 32-bit int:

    int nRead;
    
    union {
        unsigned int n;
        unsigned char c[ 4 ];
    } data;
    
     // ( Code to read in nRead, from the user or a file, has been omitted in this example )
     
     data.n = nRead;
     for( int i = 0; i < 4; i++ )
         printf( "Byte number %d of %ud is %ud\n", i, nRead, data.c[ i ] );

Real Computer Science begins where we almost stop reading ...

Tuesday, 1 October 2013

Structures, Unions, and Enumerated Types

Structures

Declaring New Structure Types and struct variables

Tagged ( Named ) Structs

Use of typedef with structs

Scope of struct type definitions

Initializing variables of type struct

Accessing data fields within structs

Assigning one struct to another

Nested Structs

Structs and Arrays

Structs and Functions

Structs and Pointers

Struct bit fields

Enumerated Data Types

Unions

No comments:

Post a Comment