Real Computer Science begins where we almost stop reading ...: TOC 1.4

Saturday, 26 January 2013

TOC 1.4

1.4. Languages: Basic Concepts

Basic data structure or input to grammars or automaton are strings. Strings are defined over an alphabet which is finite. Alphabet may vary depending upon the application. Elements of an alphabet are called symbols. Usually we denote the basic alphabet set either as Σ or T. For example, the following are a few examples of an alphabet set.

Σ₁ = {a,b} Σ₂ = {0,1,2} Σ₃ = {0,1,2,3,4,5,6,7,8,9}.

A string or word is a finite sequence of symbols from that alphabet, usually written as concatenated symbols and not separated by gaps or commas. For example, if Σ = {a, b}, a string abbab is a string or word over Σ. If w is a string over an alphabet Σ, then the length of w written as len(w) or |w| is the number of symbols it contains. If |w| = 0, then w is called as empty string denoted either as λ or ε.

For any word w, w ε = εw = w. For any string w = a₁ ... a_n of length n, the reverse of w is written as w^R which is the string a_na_n₋₁ ... a₁, where each symbol a_i belongs to the basic alphabet Σ. A string z that is appearing consecutively within another string w is called a substring or subword of w. For example, aab is a substring of baabb.

The set of all strings over an alphabet Σ is denoted by Σ* which includes the empty string ε. For example, for Σ = {0, 1}, Σ* = {ε, 0, 1, 00, 01, 10, 11, ...}. Note that Σ* is a countably infinite set. Also, Σⁿ denotes the set of all strings over Σ whose length is n. Hence, Σ* = Σ⁰ ∪ Σ¹ ∪ Σ² ∪ Σ³ ... and Σ⁺ = Σ¹ ∪ Σ² ∪ Σ³ .... Subsets of Σ* are called languages.

For example, if Σ = {a, b}, the following are languages over Σ

L₁ = {ε, a, b} L₂ = {ab, aabb, aaabbb, ...} L₃ = {w ε Σ*\| number of a’s and number of b’s in w are equal.}

In the above example, L₁ is finite, and L₂ and L₃ are infinite languages. φ denotes an empty language.

We have the following inductive definition of Σ⁺ and Σ*, where Σ is any basic alphabet set.

Definition 1.4

Let Σ be any alphabet set. Σ⁺ is a set of non-empty strings over Σ defined as follows:

Basis: If a ∊ Σ, then a ∊ Σ⁺.
Induction: If α ∊ Σ⁺ and a ∊ Σ, αa, aα are in Σ⁺.
No other element belongs to Σ⁺.

Clearly, the set Σ⁺ contains all strings of length n, n ≥ 1.

Example 1.11.

Let Σ = {0,1,2}. Then

Σ⁺ = {0, 1, 2, 00, 01, 02, 10, 11, 12, 20, 21, 22, ... }

Suppose we wish to include ‘ε’ in Σ⁺, we modify the above definition as given below.

Definition 1.5

Let Σ be any alphabet set. Σ* is defined as follows:

Basis: ε ∊ Σ*.
Induction: If α ∊ Σ*, α ∊ Σ, then aα, α a ∊ Σ*.
No other element is in Σ*.

Since languages are sets, one can define the set-theoretic operations of union, intersection, difference, and complement in the usual fashion.

The following operations are also defined for languages. If x = a₁ ... a_n, y = b₁ ... b_m, the concatenation of x and y is defined as xy = a₁ ... a_nb₁ ... b_m. The catenation (or concatenation) of two languages L₁ and L₂ is defined by, L₁L₂ = {w₁w₂|w₁ ∊ L₁ and w₂ ∊ L₂}.

Note that concatenation of languages is associative because concatenation of strings is associative. Also L⁰ = {ε} and Lφ = φL = φ, Lε = εL = L.

The concatenation closure (Kleene closure) of a language L, in symbols L* is defined to be the union of all powers of L:

The right quotient and right derivative are the following sets, respectively.

Similarly, left quotient of a language L₁ by a language L₂ is defined by

L₂/L₁ = {z\|yz ∊ L₁ for some y ∊ L₂}.

The left derivative of a language L with respect to a word y is denoted as

L which is equal to {z|yz ∊ L}.

The mirror image (or reversal) of a language is the collection of the mirror images of its words and mir (L) = {mir (w)|w ∊ L} or L^R = {w^R|w ∊ L}.

The operations, substitutions and homomorphisms are defined as follows.

For each symbol a of an alphabet Σ, let σ(a) be a language over Σ_a. Also, σ(ε) = ε, σ(αβ)= σ(α).σ(β) for α, β ∊ Σ⁺. σ is a mapping from Σ* to 2^v* or use

(v*) where V is the union of the alphabets Σ_a, is called a substitution. For a language L over Σ, we define:

σ(L) = {α\|α ∊ σ(β) for some β ∊ L}.

A substitution is ε-free if and only if none of the language σ(a) contains ε. A family of languages is closed under substitution if and only if whenever L is in the family and σ is a substitution such that σ(a) is in the family, then σ(L) is also in the family.

A substitution σ such that σ(a) consists of a single word w_a is called a homomorphism. It is called ε-free homomorphism if none of σ(a) is ε.

Algebraically, one can see that Σ* is a free semigroup with ε as its identity. The homomorphism which is defined above agrees with the customary definition of homomorphism of one semigroup into another.

Inverse homomorphism can be defined as follows:

h⁻¹(w) = {x\|h(x) = w} h⁻¹(L) = {x\|h(x) is in L}

It should be noted that h(h^-1(L)) need not be equal to L. Generally h(h⁻¹(L))⊆ L and h⁻¹(h(L)) ⊇ L.

Asymptotic Behavior of Functions

The time taken to execute any algorithm depends upon the machine on which it is implemented and also on the algorithm. Hence, efficiency of any algorithm is measured by the amount of time it takes and also the space it needs for execution on the machine. Comparison of algorithms has become an important topic. We give here a mathematical basis for comparing algorithms.

A complexity function f is a function of n, the size of the problem or parameter on which the problem is dependent. That is, f(n) is either a measure of the time required to execute an algorithm on a problem of size n or the measure of memory space. If f(n) is describing the measure of time, then f(n) is called the time-complexity function; if f(n) is describing the measure of space, then it is called space-complexity function.

We have the following important definitions of complexity functions.

Definition 1.6

Let f and g be two functions from N to R. Then g asymptotically dominates f or is an asymptotic upper bound for f or f is asymptotically dominated by g if there exist k ≥ 0 and c ≥ 0 such that f(n) ≤ cg(n) for all n ≥ k.
The set of all functions which are asymptotically dominated by a given function g is denoted by O(g) and read as ‘big-oh’ of g. That is f∊ O(g), then f is said to be in O(g).

Example 1.12.

Let f(n) = n, g(n) = n². Then clearly f ∊ O(g) as n ≤ 1.n², whereas g ∉ O(f).
O(1) ⊂ O(logn) ⊂ O(n) ⊂ O(nlogn) ⊂ O(n²) ⊂ O(cⁿ) ⊂ O(n!).

Definition 1.7

Let f and g be two functions from N to R. Then g is asymptotically tight bound for f(n) if there exist positive constants c₁, c₂ and k such that 0 ≤ c₁g(n) ≤ f(n) ≤ c₂g(n) for all n ≥ k.
The set of all functions for which g is asymptotically tight bound is denoted by θ(g).

Example 1.13.

If f(n) = an³ + bn² + cn + d, where a, b, c, d are constants and a >0. Then f(n) ∊ θ(n³).

Definition 1.8

Let f and g be any two functions from N to R. Then g is said to be asymptotic lower bound for f if there exist positive constants c and k such that 0 ≤ cg(n) ≤ f(n) for all n ≥ k.
The set of all functions for which g is asymptotic lower bound is denoted by Ω(g).

Example 1.14.

f(n) = an² + bn + c, a, b, c, are constants, a > 0, belongs to Ω(n²).

Real Computer Science begins where we almost stop reading ...