next up previous
Next: Minimal Deterministic Automata Up: Automata Theory Previous: Automata Theory

Finite State Automata and Regular Languages

As above, we assume that A is the finite alphabet set. The set of all finite words in A is denoted with A*. A subset of A* is called a language. 1 is the null word in A* and A+ is the set of words with positive length.

Definition A Finite State Automaton is a graph determined with a 5-tuple $\cal {A}=(V,E,\lambda,I,T)$ where V is a finite set of vertices (states), E is a finite set of edges, $\lambda:E \rightarrow A$ is the labeling function, $I\subset V$ is the set of initial states, and $T\subset V$ is the set of terminal states. So every SA is also a finite state automaton by taking I=T=V. We extend $\lambda$ to a labeling function $\lambda:P\rightarrow A^*$ where P is the set of all finite paths in $\cal {A}$ in the natural way: if $p=e_1,\dots,e_k$ is a path in $\cal {A}$ than $\lambda(p)=\lambda(e_1)\dots\lambda(e_k)$. As before we consider two functions $s,t:E\rightarrow V$ as the source and the target of an edge. For a finite path $p=e_1,\dots,e_k$ we will call s(e1) the source of p and write s(p) and t(ek) will be the target of p and we will write t(p).

A finite state automaton is called deterministic if for every vertex the outcomming edges are labeled distinctly.

In the case of a deterministic finite state automaton we will use the following notations: Let p be a path with v=s(p), w=t(p) and $x=\lambda(p)$. We will write vx=w and if there is no path starting at v with label x, we will write $vx=\emptyset$.

Definition A language recognized with a finite state automaton $\cal {A}$ is the set

\begin{displaymath}L(\cal {A})= \{\,\lambda(p) \,\vert\, p\in P,\,s(p)\in I,\,t(p)\in T\,\}
\end{displaymath}

A language in A* for which there is a finite state automaton that recognizes it, is called a regular language.

The language recognized by a finite state automaton will not be changed if we erase every state v in the automaton for which: (a) there is no path from an initial state to v, or (b) there is no path from v to a terminal state. The process of erasing such states is called trimming and the obtained automaton is called trimmed. From now on, we will always assume that our automata are trimmed.

Definition Let $v\in V$. The right (left) context of v in the graph G is the set

\begin{displaymath}R(v,G)=\{\,\lambda(p)\,\vert\,p\in P,\,s(p)=v\,\}\quad (L(v,G)=\{\,\lambda(p)\,\vert\,p\in P,\,
t(p)=v\,\})
\end{displaymath}

We will write only R(v) and L(v) when G is known. Definition Let K be a language and $x\in A^*$. The right (left) context of x with respect to K is the set

\begin{displaymath}R(x,K)=\{\,y\in A^*\,\vert\, xy\in K\,\}\quad (L(x,K)=\{\,y\in A^*\,\vert\, yx \in K\,\})
\end{displaymath}

We will write only R(x) and L(x) when the language K is known.

Consider the following relation on the set of vertices V:

\begin{displaymath}v\sim w \Leftrightarrow R(v)=R(w)
\end{displaymath}

The relation $\sim$ is an equivalence and we call it the reduction on $\cal {A}$. We construct the automaton $\cal {A} /_{\sim} =(V/_{\sim},E_{\sim},\lambda_{\sim},I_{\sim},T_{\sim})$ where $V/_{\sim}$ is the set of equivalence classes of $\sim$ and there is an edge $e_\sim$ in $E_{\sim}$ from [v] to [w] with label a iff there is an edge $e\in E$ with $s(e)\in [v]$, $t(e)\in [w]$ and $\lambda(e)=a$. $[v]\in I_\sim$ $(T_\sim)$ iff $\exists w\in [v],\,\, w\in I\,(T)$.

Definition An automaton $\cal {A}$ is reduced if the equivalence classes of $\sim$ are singletons.

Note 1.2.1 $\cal {A}_{\sim}$ is reduced, $L(\cal
{A}_{\sim})=L(\cal {A})$ and if $\cal {A}$ is deterministic then so is $\cal {A}/_{\sim}$.

We now show that the example of an SFT given by the Figure 1.1 has no deterministic representation with less than 4 vertices and has no representation with 2 vertices.

Consider the words aa, aab, c, cb and the vertices 1, 2, 3, 4. All paths with label aa end at vertex 1, all paths with label aab end at vertex 2, all paths with label c end at vertex 3 and all paths with label cb end at vertex 4. So we have R(aa)=R(1), R(aab)=R(2), R(c)=R(3) and R(cb)=R(4). We can note that each of R(aa), R(c) and R(cb) has a word that is not in any other. For example $aab\in R(aa) - (R(c)\cup R(cb))$. This means that every reduced representation of S has at least three vertices 1',2',3', which are determined with the words aa,c,cb. The words aa,c,cb are in the left contexts of 1',2',3' respectively. Where will a path with label aab end? It can't end at 1' since $aa\in R(aa)-R(aab)$. It can't end only at 2' or only at 3' since $R(aab)=R(c)\cup R(cb)$. So it must be that $aab\in L(2')$ and $aab\in L(3')$ i.e. a path with label aab ends at 2' and at 3'. Where can a path with label aab start? It cannot start at 2' because $aab\not\in R(c)$, and cannot start at 3' because $aab\not \in R(cb)$. So it must be that there is a path labeled with aab starting at 1' and ending at 2' and at 3', but that is in contradiction with the determinism.


next up previous
Next: Minimal Deterministic Automata Up: Automata Theory Previous: Automata Theory
Natasha Jonoska
2000-05-17