Real Computer Science begins where we almost stop reading ...: Basic Data Types

Tuesday, 1 October 2013

Basic Data Types

Fundamentals of Data Storage

Variables are named storage locations where data is stored, which may be changed as a program runs. E.g. "nStudents".
Constants are values that are hard-coded into a program, and which do not chnage value. E.g. "3.14159".
Ultimately all data stored on a computer, both variables and constants, is stored as a sequence of binary digits, e.g. strings of zeros and ones.
- These binary digits are referred to as "bits".
- Physically these zeros and ones may be implemented using wires with two different voltages, magnetic particles with two different alignments, spots on an optical disk having two different optical properties, or by other means.
The "type" of a particular variable or constant determines how many bits are used used for that paticular data item, and how the bits are to be interpreted.
Basic C recognizes four basic categories of data types: Integral, Floating Point, Character, and Character String. Modern C adds a few special types to this list.
Data may be converted from one type to another, ( possibly with loss of precision ), and new types may be user defined.

Integral Types

Integral data types include all whole numbers, that is numbers not having any fractional component.
The bits of integral types are interpreted as simple powers of two:
- The right-most bit, known as the least significant bit, represents the number of 1s. ( 2^0 )
- The next bit represents the number of 2s. ( 2^1 )
- The next bit represents the number of 4s. ( 2^2 )
- The next bit represents the number of 8s. ( 2^3 )
- In general the nth bit from the right represents 2^(n-1)
For unsigned integral types, the leftmost bit, known as the most significant bit, represents 2^(N-1), where N is the total number of bits in the data item.
- The range of possible values for an unsigned integer of N bits is from 0 to 2^N - 1. ( All 0s to all 1s )
- So for example, a 4-bit unsigned integer could range from 0 to 15, and an 8-bit unsigned integer could range from 0 to 255.
For signed integral types, the leftmost bit can be thought of as representing a negative 2^(N-1).
- ( The real interpretation in the computer is more complicated, but if you think of it this way you will get the right answers. )
- The most negative value would be the first bit a 1 and all other bits 0s, yielding negative 2^(N-1).
- The most positive value would be the first bit a 0 and all other bits 1s, yielding 2^(N-1) - 1.
- So for example, a 4-bit signed integer could range from -8 to +7, and an 8-bit signed integer could range from -128 to +127.
- A signed integral type having all bits 1 is equal to -1, regardless of how many bits are in the number.
Signed and unsigned integers with the same number of total bits have the same number of different possible values.
- Unsigned integers use one bit pattern ( all 0s ) to represent zero and all others to represent positive values.
- Signed integers use half of the possible bit patterns to represent negative numbers, one pattern to represent zero, and half minus 1 to represent positive values.

Specific details of the integer types available on a particular implementation, along with the number of bits allocated to each one and their minimum and maximum allowable values can be found in the file limits.h

int

The most basic and commonly used integral type is "int".

The int data type is always the "best" size for the particular computer it is running on, typically 32 bits

Format specifiers for ints are either %d or %i, for either printf or scanf.

long int

A long int typically uses twice as many bits as a regular int, allowing it to hold much larger numbers.

( The C standard only specifies that a long cannot use a fewer number of bits than a regular int )

printf and scanf replace %d or %i with %ld or %li to indicate the use of a long int.

long int may also be specified as just long.

long long int

C99 introduces the long long int, typically having twice as many bits as a long int and printed/scanned using %lld or %lli format specifiers

short int

A short int may use fewer bits than a regular int, thereby saving storage space.

( The C standard only specifies that a short int cannot use more bits than a regular int. On many machines short ints use the same number of bits as regular ints. )

printf and scanf replace %d or %i with %hd or%hi to indicate the use of a short int.

short int may also be specified as just short.

unsigned ints

Unless otherwise specified, all of the aforementioned int types are signed numbers.

Any of the above may be preceded by the keyword "unsigned" to indicate that they are unsigned.

e.g. "unsigned int", "unsigned long int", "unsigned long", etc.

"unsigned" by itself implies "unsigned int"

The format specifier for unsigned integers in decimal form is %u. The u may be preceded by l, ll, or h for long, long long, and short unsigned types respectively.

Unsigned integers can also be printed or scanned in octal or hexidecimal form using the %o, %x, or %X format specifiers in place of %u.

char

Normally chars are interpreted as characters ( see below )

Technically the char data type is an integral type, always having exactly 8 bits, ( known as a byte. )

Signed chars can have values from -128 to +127, and can be printed as integers using %d or %i format specifiers

chars can also be specified as unsigned, giving them a range from 0 to+255.

Unsigned chars can be printed using %u, %o, %x, or %X format specifiers.

Integral Constants
int constants in decimal form must begin with a digit from 1 to 9, and can be followed by additional digits from 0 to 9.

in octal form they must begin with a 0, and following digits must be in the range from 0 to 7.

in hexidecimal form they begin with 0x. Followng digits must be in the range 0 to 9, or A to F, or a to f.

Any of the above may be preceded by a minus sign.

int constants may be followed by a u, l, or ll to indicate unsigned, long, or long long constants respectively.
Allowable formats are as follows, where the [ square brackets ] denote optional characters:
Decimal:         [±]1-9[0-9...][Ll][Uu]
Octal:           [±]0[0-7...][Ll][Uu]
Hexadecimal:     [±]0x[0-9a-fA-F...][Ll][Uu]
    
Integer Overflow

When integer math yields a result that is too big ( or too negative ) to be represented by the corresponding integer type, then overflow ( or underflow ) is said to have occurred.

With unsigned numbers the result is defined to "wrap around" to the other end of the integer's range, so if you add 1 to an integer that is at the maximum value for it's type, the result is zero.

With signed numbers the result of overflow is undefined. In some cases adding 1 to the maximum positive value will wrap around to the most negative value, but in other cases erratic behaviour or even a program crash may occur.

Floating Point Types

Floating point types include all types in which a number may have a fractional component. Fortunately there are only three that we need to worry about - float, double, and long double.

Specific details of the floating point types available on a particular implementation, along with the number of bits allocated to each one and their minimum and maximum allowable values can be found in the file float.h

float

The most basic type of floating point number is the float type.

According to the IEEE standard, a single precision floating point number is exactly 32 bits long, comprised of:

one bit for indicating the sign

23 bits ( plus one implied ) for recording the digits as a 24-bit integer. This works out to about 6 or 7 decimal digits of precision.

8 bits for a binary exponent, to shift the digits left or right. This works out to an absolute range from about 10^-38 to 10^38

Note that because a float only has 24 bits available for storing the digits of the number, it can actually be less precise than a 32-bit int for certain large integers.

double

The double precision data type uses twice as many bits as a float, yielding approximately twice the number of digits of precision.

According to the IEEE standard, a double precision floating point number is 64 bits long, comprised of:

one sign bit.

52 bits ( plus one implied ) for recording digits, which works out to about 15 decimal digits of precision.

11 bits for the exponent, which works out to a range of about 10^-308 to 10^308

The double data type is the preferred floating point type for most scientific and engineering calculations.

long double

The long double type is guaranteed to have more bits than a double, but the exact number my vary from one hardware platform to another. The most typical implementations are either 80 or 128 bits.

The IEEE standard for quadruple precision floating point numbers is 128 bits consisting of:

one sign bit

112 bits ( plus one implied ) for digits, working out to about 34 decimal digits of precision

15 bits for the exponent, giving a range from about 10^-4932 to 10^4932

Floating point constants
Floating-point constants are normally indicated by the presence of a decimal point, and are normally doubles.
Floating point constants may be followed by either an "F" to indicate an ordinary float, or an "L" to indicate a long double.

Floating point constants can also be expressed in scientific notation
Allowable formats are as follows:
[±]1-9[0-9...].[0-9...][Ee[±]0-9...][FfLl]
[±][0].[0-9...][Ee[±]0-9...][FfLl]
[±]1-9[0-9...]Ee[±]0-9...[FfLl]
[±]1-9[0-9...]Ff[Ll]

Characters

Although the char data type is technically a small integer ( see above ), its most common use is to hold numeric codes representing ( ASCII ) characters.
For example, the ASCII code for the capital letter 'A' is 65.
Note that the character '9' is not the same as the integer value 9.
- In this case the ASCII code for the character '9' is 57.
- The numerical value 9 in the ASCII code set happens to represent a horizontal tab character.
The full ASCII code table can be found in the back of most programming textbooks, or online at http://www.asciitable.com/ and many other sites.

Character Constants

Character constants are enclosed in single quotes, such as 'A'.

Character constants may also include escape characters such as '\n' or '\t', as shown here:

Escape Sequence Meaning

\a
alarm ( bell )

\b
Backspace

\f
Form feed ( clears screen )

\n
New line

\r
Carriage return

\t
Horizontal tab

\v
Vertical tab

\\
Backslash

\?
Question mark

\'
Single quote

\"
Double quote

\0
Numerical zero ( null byte )

The \ can also be used to escape a 3-digit octal numerical constant, or a hexidecimal constant beginning with \x

So '\112' and '\x4A' are both a capitol 'J'. See the ASCII table to confirm.

Character Arithmetic

Because chars are really small integers, it is possible to do mathematical operations on them. For example:
char letter = 'G', lower, upper;

// Presume lower has been given a value somehow

letter = letter + 3;     // letter has now been changed from 'G' to 'J'

if( lower >= 'a' && lower <= 'z' )    // If lower is a lower-case letter
	upper = lower + ( 'A' - 'a' )	  // Convert it to upper-case by adding an offset

Escape Sequence	Meaning
\a	alarm ( bell )
\b	Backspace
\f	Form feed ( clears screen )
\n	New line
\r	Carriage return
\t	Horizontal tab
\v	Vertical tab
\\	Backslash
\?	Question mark
\'	Single quote
\"	Double quote
\0	Numerical zero ( null byte )

Character Strings

C stores character strings as arrays of type char, terminated by a null byte.
Constant character strings are enclosed in double quotes, and may include escape characters, such as "\n\n\t Please enter X > "
These notes will postpone further discussion of character strings until after arrays have been covered.

Special Types

The enumerated type is an integer with a restricted list of legal values, referred to by names. It will be covered in full details in the section on structs and unions.
C99 introduces new types _Bool, _Complex, and _Imaginary.

Type Conversions

Implicit

There are certain cases in which data will get automatically converted from one type to another:
- When data is being stored in a variable, if the data being stored does not match the type of the variable.
  - The data being stored will be converted to match the type of the storage variable.
- When an operation is being performed on data of two different types.
  - The "smaller" data type will be converted to match the "larger" type.
    - For example, when an int is added to a double, the computer uses a double version of the int and the result is a double.
- When data is passed to or returned from functions.
Explicit
Data may also be expressly converted, using the typecast operator

The following example converts the value of nTotal to a double precision value before performing the division.
( nStudents will then be implicitly promoted, following the guidelines listed above. )
```
      average = ( double ) nTotal / nStudents;
```
Note that nTotal itself is unaffected by this conversion.

Constants Exercise

For each of the constants in the following table, indicate whether the constant is legal or illegal, what type of constant it is if legal, and why illegal otherwise:

Constant	Legal?	Explanation
486	Yes	Decimal int
98.6	Yes	Double
98.6f	Yes	Float - The f indicates float, as opposed to double.
02.479	No	No decimal allowed in an octal constant
0.2479	Yes	Double
"A"	Yes	Character string - Two chars including the null byte.
'A'	Yes	A single char. No null byte.
'ABC'	No	Single quotes are for single chars only
'\n'	Yes	The escape sequence represents a single newline character
000042	Yes	Octal. 4 * 8 + 2 = 34 in decimal
0ffH	No	No 'f' in octal. ( Resembles an old FORTRAN format. )
0xC2F9	Yes	Hexadecimal
+37.1	Yes	Double
.0000	Yes	Double
0xFLU	Yes	Hexidecimal. Value = 15, long, unsigned
0x48.6	No	No decimal point allowed in hexidecimal
0x486e1	Yes	'e' is a valid hexidecimal digit
486e1	Yes	Double. 486 * 10^1 = 4860
0486e1	No	No 'e' allowed in octal
0486	No	No '8' allowed in octal either. :-)

Type-Related Functions and Concepts ( Advanced, Optional )

The sizeof( ) operator returns the number of bytes needed to store a variable or data type, so on most sytems, sizeof( int ) would yield 4, as would sizeof( number ) if number were a variable of type int.
The keyword typedef allows programmers to define their own type names.
- For example, "typedef float Dollars;" would define a new type named "Dollars" that is really the same as float.
- In this case the programmer can now declare variables to be of type "Dollars" instead of type float.
- One advantage is for portability purposes. If the typedef is changed to "typedef double Dollars", then it will affect all variables of type Dollars in the entire program, with one small change.
- It can also make programs more readable when complicated types are used. For example, "typedef unsigned long long int BigInteger;"
- ( typedef is most commonly used to rename complicated types that we have not yet covered, such as structs and pointers. )

Enumerated Types ( Advanced, Optional )

Enumerated ( enum ) data types are basically ints, except that they are restricted to a limited set of values, and those values are referred to by name not by number.
The use of enums where applicable helps make code more readable and also limits the possibilities for bad values, thereby reducing bugs and making the code more maintainable and overall better.
The enum keyword is used to define a new data type, having a new data type name and list of acceptable named values.
Once the new enum type has been declared, variables can be declared of the new type, and assigned the named values.

For example:

     enum SizeType { small, medium, large };  // Declares a new data type, "SizeType"
     
     SizeType item;    // Declares a variable of type "SizeType"
   
     // ( Some code left out here. )
     
     if( num < 25 )
         item = small;  // Use as an int, using the named values instead of numbers

     cout << "\nThe item is ";
   
     switch( item ) {
   
         case small:              // Named values are valid integers
             printf( "tiny\n" );
             break;

Named values can be assigned specific numbers. Those not assigned will get successive values. So in the following example, minor2 will have the value 2 and major2 will have the value 101:

     enum errorType { none = 0, minor1 = 1, minor2, major1 = 100, major2, fatal1 = 1000 };

Enumerated type variables can also be initialized.. For example:

errorType errorCode = none;
sizeType bookSize = large;

It is sometimes a good idea to include values such as "invalid", "undefined" or "none" among the list of enumerated values.
Some compilers may allow using enum variables with ordinary integers, ( e.g. using numbers instead of names ), but it is poor practice.
Printing enumerated variables prints the assigned integer value.
One should not attempt to do any math using enumerated variables.

Real Computer Science begins where we almost stop reading ...

Tuesday, 1 October 2013

Basic Data Types

Fundamentals of Data Storage

Integral Types

int

long int

long long int

short int

unsigned ints

char

Integral Constants

Integer Overflow

Floating Point Types

float

double

long double

Floating point constants

Characters

Character Constants

Character Arithmetic

Character Strings

Special Types

Type Conversions

Implicit

Explicit

Constants Exercise

Type-Related Functions and Concepts ( Advanced, Optional )

Enumerated Types ( Advanced, Optional )

No comments:

Post a Comment