fbpx
Wikipedia

String operations

In computer science, in the area of formal language theory, frequent use is made of a variety of string functions; however, the notation used is different from that used for computer programming, and some commonly used functions in the theoretical realm are rarely used when programming. This article defines some of these basic terms.

Strings and languages edit

A string is a finite sequence of characters. The empty string is denoted by  . The concatenation of two string   and   is denoted by  , or shorter by  . Concatenating with the empty string makes no difference:  . Concatenation of strings is associative:  .

For example,  .

A language is a finite or infinite set of strings. Besides the usual set operations like union, intersection etc., concatenation can be applied to languages: if both   and   are languages, their concatenation   is defined as the set of concatenations of any string from   and any string from  , formally  . Again, the concatenation dot   is often omitted for brevity.

The language   consisting of just the empty string is to be distinguished from the empty language  . Concatenating any language with the former doesn't make any change:  , while concatenating with the latter always yields the empty language:  . Concatenation of languages is associative:  .

For example, abbreviating  , the set of all three-digit decimal numbers is obtained as  . The set of all decimal numbers of arbitrary length is an example for an infinite language.

Alphabet of a string edit

The alphabet of a string is the set of all of the characters that occur in a particular string. If s is a string, its alphabet is denoted by

 

The alphabet of a language   is the set of all characters that occur in any string of  , formally:  .

For example, the set   is the alphabet of the string  , and the above   is the alphabet of the above language   as well as of the language of all decimal numbers.

String substitution edit

Let L be a language, and let Σ be its alphabet. A string substitution or simply a substitution is a mapping f that maps characters in Σ to languages (possibly in a different alphabet). Thus, for example, given a character a ∈ Σ, one has f(a)=La where La ⊆ Δ* is some language whose alphabet is Δ. This mapping may be extended to strings as

f(ε)=ε

for the empty string ε, and

f(sa)=f(s)f(a)

for string sL and character a ∈ Σ. String substitutions may be extended to entire languages as [1]

 

Regular languages are closed under string substitution. That is, if each character in the alphabet of a regular language is substituted by another regular language, the result is still a regular language.[2] Similarly, context-free languages are closed under string substitution.[3][note 1]

A simple example is the conversion fuc(.) to uppercase, which may be defined e.g. as follows:

character mapped to language remark
x fuc(x)
a { ‹A› } map lowercase char to corresponding uppercase char
A { ‹A› } map uppercase char to itself
ß { ‹SS› } no uppercase char available, map to two-char string
‹0› { ε } map digit to empty string
‹!› { } forbid punctuation, map to empty language
... similar for other chars

For the extension of fuc to strings, we have e.g.

  • fuc(‹Straße›) = {‹S›} ⋅ {‹T›} ⋅ {‹R›} ⋅ {‹A›} ⋅ {‹SS›} ⋅ {‹E›} = {‹STRASSE›},
  • fuc(‹u2›) = {‹U›} ⋅ {ε} = {‹U›}, and
  • fuc(‹Go!›) = {‹G›} ⋅ {‹O›} ⋅ {} = {}.

For the extension of fuc to languages, we have e.g.

  • fuc({ ‹Straße›, ‹u2›, ‹Go!› }) = { ‹STRASSE› } ∪ { ‹U› } ∪ { } = { ‹STRASSE›, ‹U› }.

String homomorphism edit

A string homomorphism (often referred to simply as a homomorphism in formal language theory) is a string substitution such that each character is replaced by a single string. That is,  , where   is a string, for each character  .[note 2][4]

String homomorphisms are monoid morphisms on the free monoid, preserving the empty string and the binary operation of string concatenation. Given a language  , the set   is called the homomorphic image of  . The inverse homomorphic image of a string   is defined as

 

while the inverse homomorphic image of a language   is defined as

 

In general,  , while one does have

 

and

 

for any language  .

The class of regular languages is closed under homomorphisms and inverse homomorphisms.[5] Similarly, the context-free languages are closed under homomorphisms[note 3] and inverse homomorphisms.[6]

A string homomorphism is said to be ε-free (or e-free) if   for all a in the alphabet  . Simple single-letter substitution ciphers are examples of (ε-free) string homomorphisms.

An example string homomorphism guc can also be obtained by defining similar to the above substitution: guc(‹a›) = ‹A›, ..., guc(‹0›) = ε, but letting guc be undefined on punctuation chars. Examples for inverse homomorphic images are

  • guc−1({ ‹SSS› }) = { ‹sss›, ‹sß›, ‹ßs› }, since guc(‹sss›) = guc(‹sß›) = guc(‹ßs›) = ‹SSS›, and
  • guc−1({ ‹A›, ‹bb› }) = { ‹a› }, since guc(‹a›) = ‹A›, while ‹bb› cannot be reached by guc.

For the latter language, guc(guc−1({ ‹A›, ‹bb› })) = guc({ ‹a› }) = { ‹A› } ≠ { ‹A›, ‹bb› }. The homomorphism guc is not ε-free, since it maps e.g. ‹0› to ε.

A very simple string homomorphism example that maps each character to just a character is the conversion of an EBCDIC-encoded string to ASCII.

String projection edit

If s is a string, and   is an alphabet, the string projection of s is the string that results by removing all characters that are not in  . It is written as  . It is formally defined by removal of characters from the right hand side:

 

Here   denotes the empty string. The projection of a string is essentially the same as a projection in relational algebra.

String projection may be promoted to the projection of a language. Given a formal language L, its projection is given by

 [citation needed]

Right and left quotient edit

The right quotient of a character a from a string s is the truncation of the character a in the string s, from the right hand side. It is denoted as  . If the string does not have a on the right hand side, the result is the empty string. Thus:

 

The quotient of the empty string may be taken:

 

Similarly, given a subset   of a monoid  , one may define the quotient subset as

 

Left quotients may be defined similarly, with operations taking place on the left of a string.[citation needed]

Hopcroft and Ullman (1979) define the quotient L1/L2 of the languages L1 and L2 over the same alphabet as L1/L2 = { s | ∃tL2. stL1 }.[7] This is not a generalization of the above definition, since, for a string s and distinct characters a, b, Hopcroft's and Ullman's definition implies yielding {}, rather than { ε }.

The left quotient (when defined similar to Hopcroft and Ullman 1979) of a singleton language L1 and an arbitrary language L2 is known as Brzozowski derivative; if L2 is represented by a regular expression, so can be the left quotient.[8]

Syntactic relation edit

The right quotient of a subset   of a monoid   defines an equivalence relation, called the right syntactic relation of S. It is given by

 

The relation is clearly of finite index (has a finite number of equivalence classes) if and only if the family right quotients is finite; that is, if

 

is finite. In the case that M is the monoid of words over some alphabet, S is then a regular language, that is, a language that can be recognized by a finite state automaton. This is discussed in greater detail in the article on syntactic monoids.[citation needed]

Right cancellation edit

The right cancellation of a character a from a string s is the removal of the first occurrence of the character a in the string s, starting from the right hand side. It is denoted as   and is recursively defined as

 

The empty string is always cancellable:

 

Clearly, right cancellation and projection commute:

 [citation needed]

Prefixes edit

The prefixes of a string is the set of all prefixes to a string, with respect to a given language:

 

where  .

The prefix closure of a language is

 

Example:
 

A language is called prefix closed if  .

The prefix closure operator is idempotent:

 

The prefix relation is a binary relation   such that   if and only if  . This relation is a particular example of a prefix order.[citation needed]

See also edit

Notes edit

  1. ^ Although every regular language is also context-free, the previous theorem is not implied by the current one, since the former yields a shaper result for regular languages.
  2. ^ Strictly formally, a homomorphism yields a language consisting of just one string, i.e.  .
  3. ^ This follows from the above-mentioned closure under arbitrary substitutions.

References edit

  • Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages and Computation. Reading, Massachusetts: Addison-Wesley Publishing. ISBN 978-0-201-02988-8. Zbl 0426.68001. (See chapter 3.)
  1. ^ Hopcroft, Ullman (1979), Sect.3.2, p.60
  2. ^ Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.4, p.60
  3. ^ Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.2, p.131
  4. ^ Hopcroft, Ullman (1979), Sect.3.2, p.60-61
  5. ^ Hopcroft, Ullman (1979), Sect.3.2, Theorem 3.5, p.61
  6. ^ Hopcroft, Ullman (1979), Sect.6.2, Theorem 6.3, p.132
  7. ^ Hopcroft, Ullman (1979), Sect.3.2, p.62
  8. ^ Janusz A. Brzozowski (1964). "Derivatives of Regular Expressions". J ACM. 11 (4): 481–494. doi:10.1145/321239.321249. S2CID 14126942.

string, operations, computer, science, area, formal, language, theory, frequent, made, variety, string, functions, however, notation, used, different, from, that, used, computer, programming, some, commonly, used, functions, theoretical, realm, rarely, used, w. In computer science in the area of formal language theory frequent use is made of a variety of string functions however the notation used is different from that used for computer programming and some commonly used functions in the theoretical realm are rarely used when programming This article defines some of these basic terms Contents 1 Strings and languages 2 Alphabet of a string 3 String substitution 4 String homomorphism 5 String projection 6 Right and left quotient 7 Syntactic relation 8 Right cancellation 9 Prefixes 10 See also 11 Notes 12 ReferencesStrings and languages editA string is a finite sequence of characters The empty string is denoted by e displaystyle varepsilon nbsp The concatenation of two string s displaystyle s nbsp and t displaystyle t nbsp is denoted by s t displaystyle s cdot t nbsp or shorter by s t displaystyle st nbsp Concatenating with the empty string makes no difference s e s e s displaystyle s cdot varepsilon s varepsilon cdot s nbsp Concatenation of strings is associative s t u s t u displaystyle s cdot t cdot u s cdot t cdot u nbsp For example b l e a h b l a h b l a h displaystyle langle b rangle cdot langle l rangle cdot varepsilon cdot langle ah rangle langle bl rangle cdot langle ah rangle langle blah rangle nbsp A language is a finite or infinite set of strings Besides the usual set operations like union intersection etc concatenation can be applied to languages if both S displaystyle S nbsp and T displaystyle T nbsp are languages their concatenation S T displaystyle S cdot T nbsp is defined as the set of concatenations of any string from S displaystyle S nbsp and any string from T displaystyle T nbsp formally S T s t s S t T displaystyle S cdot T s cdot t mid s in S land t in T nbsp Again the concatenation dot displaystyle cdot nbsp is often omitted for brevity The language e displaystyle varepsilon nbsp consisting of just the empty string is to be distinguished from the empty language displaystyle nbsp Concatenating any language with the former doesn t make any change S e S e S displaystyle S cdot varepsilon S varepsilon cdot S nbsp while concatenating with the latter always yields the empty language S S displaystyle S cdot cdot S nbsp Concatenation of languages is associative S T U S T U displaystyle S cdot T cdot U S cdot T cdot U nbsp For example abbreviating D 0 1 2 3 4 5 6 7 8 9 displaystyle D langle 0 rangle langle 1 rangle langle 2 rangle langle 3 rangle langle 4 rangle langle 5 rangle langle 6 rangle langle 7 rangle langle 8 rangle langle 9 rangle nbsp the set of all three digit decimal numbers is obtained as D D D displaystyle D cdot D cdot D nbsp The set of all decimal numbers of arbitrary length is an example for an infinite language Alphabet of a string editThe alphabet of a string is the set of all of the characters that occur in a particular string If s is a string its alphabet is denoted by Alph s displaystyle operatorname Alph s nbsp The alphabet of a language S displaystyle S nbsp is the set of all characters that occur in any string of S displaystyle S nbsp formally Alph S s S Alph s displaystyle operatorname Alph S bigcup s in S operatorname Alph s nbsp For example the set a c o displaystyle langle a rangle langle c rangle langle o rangle nbsp is the alphabet of the string c a c a o displaystyle langle cacao rangle nbsp and the above D displaystyle D nbsp is the alphabet of the above language D D D displaystyle D cdot D cdot D nbsp as well as of the language of all decimal numbers String substitution editLet L be a language and let S be its alphabet A string substitution or simply a substitution is a mapping f that maps characters in S to languages possibly in a different alphabet Thus for example given a character a S one has f a La where La D is some language whose alphabet is D This mapping may be extended to strings as f e efor the empty string e and f sa f s f a for string s L and character a S String substitutions may be extended to entire languages as 1 f L s L f s displaystyle f L bigcup s in L f s nbsp Regular languages are closed under string substitution That is if each character in the alphabet of a regular language is substituted by another regular language the result is still a regular language 2 Similarly context free languages are closed under string substitution 3 note 1 A simple example is the conversion fuc to uppercase which may be defined e g as follows character mapped to language remarkx fuc x a A map lowercase char to corresponding uppercase char A A map uppercase char to itself ss SS no uppercase char available map to two char string 0 e map digit to empty string forbid punctuation map to empty language similar for other charsFor the extension of fuc to strings we have e g fuc Strasse S T R A SS E STRASSE fuc u2 U e U and fuc Go G O For the extension of fuc to languages we have e g fuc Strasse u2 Go STRASSE U STRASSE U String homomorphism editA string homomorphism often referred to simply as a homomorphism in formal language theory is a string substitution such that each character is replaced by a single string That is f a s displaystyle f a s nbsp where s displaystyle s nbsp is a string for each character a displaystyle a nbsp note 2 4 String homomorphisms are monoid morphisms on the free monoid preserving the empty string and the binary operation of string concatenation Given a language L displaystyle L nbsp the set f L displaystyle f L nbsp is called the homomorphic image of L displaystyle L nbsp The inverse homomorphic image of a string s displaystyle s nbsp is defined asf 1 s w f w s displaystyle f 1 s w f w s nbsp while the inverse homomorphic image of a language L displaystyle L nbsp is defined asf 1 L s f s L displaystyle f 1 L s f s in L nbsp In general f f 1 L L displaystyle f f 1 L neq L nbsp while one does havef f 1 L L displaystyle f f 1 L subseteq L nbsp andL f 1 f L displaystyle L subseteq f 1 f L nbsp for any language L displaystyle L nbsp The class of regular languages is closed under homomorphisms and inverse homomorphisms 5 Similarly the context free languages are closed under homomorphisms note 3 and inverse homomorphisms 6 A string homomorphism is said to be e free or e free if f a e displaystyle f a neq varepsilon nbsp for all a in the alphabet S displaystyle Sigma nbsp Simple single letter substitution ciphers are examples of e free string homomorphisms An example string homomorphism guc can also be obtained by defining similar to the above substitution guc a A guc 0 e but letting guc be undefined on punctuation chars Examples for inverse homomorphic images are guc 1 SSS sss sss sss since guc sss guc sss guc sss SSS and guc 1 A bb a since guc a A while bb cannot be reached by guc For the latter language guc guc 1 A bb guc a A A bb The homomorphism guc is not e free since it maps e g 0 to e A very simple string homomorphism example that maps each character to just a character is the conversion of an EBCDIC encoded string to ASCII String projection editIf s is a string and S displaystyle Sigma nbsp is an alphabet the string projection of s is the string that results by removing all characters that are not in S displaystyle Sigma nbsp It is written as p S s displaystyle pi Sigma s nbsp It is formally defined by removal of characters from the right hand side p S s e if s e the empty string p S t if s t a and a S p S t a if s t a and a S displaystyle pi Sigma s begin cases varepsilon amp mbox if s varepsilon mbox the empty string pi Sigma t amp mbox if s ta mbox and a notin Sigma pi Sigma t a amp mbox if s ta mbox and a in Sigma end cases nbsp Here e displaystyle varepsilon nbsp denotes the empty string The projection of a string is essentially the same as a projection in relational algebra String projection may be promoted to the projection of a language Given a formal language L its projection is given by p S L p S s s L displaystyle pi Sigma L pi Sigma s vert s in L nbsp citation needed Right and left quotient editThe right quotient of a character a from a string s is the truncation of the character a in the string s from the right hand side It is denoted as s a displaystyle s a nbsp If the string does not have a on the right hand side the result is the empty string Thus s a b s if a b e if a b displaystyle sa b begin cases s amp mbox if a b varepsilon amp mbox if a neq b end cases nbsp The quotient of the empty string may be taken e a e displaystyle varepsilon a varepsilon nbsp Similarly given a subset S M displaystyle S subset M nbsp of a monoid M displaystyle M nbsp one may define the quotient subset as S a s M s a S displaystyle S a s in M vert sa in S nbsp Left quotients may be defined similarly with operations taking place on the left of a string citation needed Hopcroft and Ullman 1979 define the quotient L1 L2 of the languages L1 and L2 over the same alphabet as L1 L2 s t L2 st L1 7 This is not a generalization of the above definition since for a string s and distinct characters a b Hopcroft s and Ullman s definition implies yielding rather than e The left quotient when defined similar to Hopcroft and Ullman 1979 of a singleton language L1 and an arbitrary language L2 is known as Brzozowski derivative if L2 is represented by a regular expression so can be the left quotient 8 Syntactic relation editThe right quotient of a subset S M displaystyle S subset M nbsp of a monoid M displaystyle M nbsp defines an equivalence relation called the right syntactic relation of S It is given by S s t M M S s S t displaystyle sim S s t in M times M vert S s S t nbsp The relation is clearly of finite index has a finite number of equivalence classes if and only if the family right quotients is finite that is if S m m M displaystyle S m vert m in M nbsp is finite In the case that M is the monoid of words over some alphabet S is then a regular language that is a language that can be recognized by a finite state automaton This is discussed in greater detail in the article on syntactic monoids citation needed Right cancellation editThe right cancellation of a character a from a string s is the removal of the first occurrence of the character a in the string s starting from the right hand side It is denoted as s a displaystyle s div a nbsp and is recursively defined as s a b s if a b s b a if a b displaystyle sa div b begin cases s amp mbox if a b s div b a amp mbox if a neq b end cases nbsp The empty string is always cancellable e a e displaystyle varepsilon div a varepsilon nbsp Clearly right cancellation and projection commute p S s a p S s a displaystyle pi Sigma s div a pi Sigma s div a nbsp citation needed Prefixes editThe prefixes of a string is the set of all prefixes to a string with respect to a given language Pref L s t s t u for t u Alph L displaystyle operatorname Pref L s t vert s tu mbox for t u in operatorname Alph L nbsp where s L displaystyle s in L nbsp The prefix closure of a language is Pref L s L Pref L s t s t u s L t u Alph L displaystyle operatorname Pref L bigcup s in L operatorname Pref L s left t vert s tu s in L t u in operatorname Alph L right nbsp Example L a b c then Pref L e a a b a b c displaystyle L left abc right mbox then operatorname Pref L left varepsilon a ab abc right nbsp A language is called prefix closed if Pref L L displaystyle operatorname Pref L L nbsp The prefix closure operator is idempotent Pref Pref L Pref L displaystyle operatorname Pref operatorname Pref L operatorname Pref L nbsp The prefix relation is a binary relation displaystyle sqsubseteq nbsp such that s t displaystyle s sqsubseteq t nbsp if and only if s Pref L t displaystyle s in operatorname Pref L t nbsp This relation is a particular example of a prefix order citation needed See also editComparison of programming languages string functions Levi s lemma String computer science definition and implementation of more basic operations on stringsNotes edit Although every regular language is also context free the previous theorem is not implied by the current one since the former yields a shaper result for regular languages Strictly formally a homomorphism yields a language consisting of just one string i e f a s displaystyle f a s nbsp This follows from the above mentioned closure under arbitrary substitutions References editHopcroft John E Ullman Jeffrey D 1979 Introduction to Automata Theory Languages and Computation Reading Massachusetts Addison Wesley Publishing ISBN 978 0 201 02988 8 Zbl 0426 68001 See chapter 3 Hopcroft Ullman 1979 Sect 3 2 p 60 Hopcroft Ullman 1979 Sect 3 2 Theorem 3 4 p 60 Hopcroft Ullman 1979 Sect 6 2 Theorem 6 2 p 131 Hopcroft Ullman 1979 Sect 3 2 p 60 61 Hopcroft Ullman 1979 Sect 3 2 Theorem 3 5 p 61 Hopcroft Ullman 1979 Sect 6 2 Theorem 6 3 p 132 Hopcroft Ullman 1979 Sect 3 2 p 62 Janusz A Brzozowski 1964 Derivatives of Regular Expressions J ACM 11 4 481 494 doi 10 1145 321239 321249 S2CID 14126942 Retrieved from https en wikipedia org w index php title String operations amp oldid 1184535743 String homomorphism, wikipedia, wiki, book, books, library,

article

, read, download, free, free download, mp3, video, mp4, 3gp, jpg, jpeg, gif, png, picture, music, song, movie, book, game, games.