Algorithm For String Matching Objectives Computer Science Essay

Published: November 9, 2015 Words: 1370

String-matching is a very significant subject in the wide domain of text processing. Nowadays, this problem received an enormous deal of attention due to its various applications. String matching algorithms play a key role in most of computer science problems, challenges and in implementation of computer software. String-matching algorithms work as follows.

Compare the text of size n with the pattern of size equal to m. First put the left ends of the pattern and the text, then compare text characters with pattern characters and after a mismatch among the comparison between the pattern and the text or a whole match between them, shift the pattern to the right and the same procedure is repeated until the pattern reach to the right end of the text.

Performing this type of search quickly is important for many applications, other than in natural language processing. Spelling correctors, Optical Character Recognition applications and syntactic parsers, among others, all rely on quick approximate matching of some string against a pattern of strings. Intent of this research is to study theory of various string matching algorithms using finite automata and then performance analysis of the various analyses on the basis of different parameter (pre-processing time complexity, searching time complexity, space complexity etc.).

4. Work done (discussed below)

4.1. Review of literature (continue)

4.2. Course work

4.3. Publication

4.1 Literature review

Books

ANSI C - Balaguruswami, TMH Publication August 2010

Mastering C - Venugopal Prasad, TMH Publication, 2006

Data structure string Using C - Rizwan Khan, ACME Learning, 2010

E-Book- on Design Algorithm and Analysis - Rashid

Journals

Source: ACM Journal 2010 - String Matching Problem: A review of the most recent result

Author: SIMONE FARO and THIERRY LECROQ

Review the string matching algorithm implemented in the last decade and present experimental results.

He compared 85 string matching algorithms with 12 texts of different types. But the thing is they are efficient in some situations only.

Source: Springer Journal 2004 - Factor Automata of Automata and Applications.

Author: MEHRYAR MOHRI, PEDRO MORENO, AND EUGENE WEINSTEIN

This paper presents a novel analysis of the size of the factor automaton of an automaton, that is the minimal deterministic

They presented a novel analysis of the size of the factor automaton of an automaton in terms of the size of the original automaton and described the use of factor automata in a large-scale application. But the analysis shows that factor automata can be practical for a large number of strings only.

Source: Springer Journal 1999 - Factor oracle: a new structure for pattern matching

Author: ALLAUZEN, C., CROCHEMORE, M., AND RAFFINOT, M.

This paper introduces a new automaton on a word, sequence of letters taken in an alphabet ∑, that we call factor oracle. This automation is acyclic, recognizes at least the factors of pattern p, has m+1 states and a linear number of transitions.

The factor oracle allows new string matching algorithms. It would give an idea of the average memory space required by the string matching algorithms.

This algorithm is one of the slowest algorithm which takes longer time to scan because of that time complexity increases.

The factor oracle is not minimal considering the number of transitions among the automata of m+1 state. So the difference is much less in reduced automata that do not help to reduce its complexity.

Source: ACM Journal 1990 - A very fast substring search algorithm.

Author: SUNDAY D.M

Sunday proposed and designed a new algorithm for string matching, which is faster than the Boyer-moor algorithm and is considered one of the fastest algorithms in the string matching field. Its time and space complexity are O(m + n) and O (n), respectively. In terms of detecting matches between two strings, the quick search algorithm looks similar to the Boyer-moor algorithm. However, the difference between them is that the quick search algorithm only uses the bad-character shift table while the Boyer-Moore uses both bad-character shift and good suffix shift tables. Moreover, this algorithm starts searching from the left-most character to the right.

The Sunday algorithm is unusable when the alphabet is big because it is mainly uses character table.

In comparison with the sequential mode, the Quick Search executing time and speedup were highly improved. On the other hand, the efficiency of the processors decreased.

Source SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 10, 501-506 Journal 1980 - Practical Fast Searching in Strings

Author: R. NIGEL HORSPOOL

The problem is that of searching a large block of text to find the first occurrence of a substring (which we will call the 'pattern').

The purpose of this paper is to demonstrate the Boyer Moore algorithm and to show the circumstances under which it should be employed.

Many computers, particularly the larger machines, possess instructions to search for individual characters within main memory. One might think that these instructions would permit coding of routines that could beat the Boyer and Moore algorithm.

There is one important factor that was not considered in their experiments. The timings do not include the work of initializing tables - on the assumption that they want to find the limiting speed of each algorithm (i.e. the speed when searching an infinite volume of text).

Source: ACM Journal 1977 - Fast pattern matching in string

Author: Donald E knuth, James H. Morris, JR. and Vaugher R. Pratt.

An algorithm is presented which finds all occurrences of one given string within another, in running time proportional to the sum of the lengths of the strings.

The constant of proportionality is low enough to make this algorithm of practical use, and the procedure can also be extended to deal with some more general pattern-matching problems.

And the time complexity is much high if the pattern occurs at the end of the text string.

When the alphabet of characters is large, and rarely have a partial match, then program will waste a lot of time.

Source: ACM Journal 1977 - Fast String Searching algorithm

Author: Robert S Boyer, Stanford Research Institute, J Strother Moore, Xerox Palo Alto Research Center

They now present a search algorithm which is usually "sublinear. Furthermore the algorithm is sublinear in another sense: It has been implemented so that on the average it requires the execution of fewer than i + patlen machine instructions per search.

But there are several situations it may not be advisable to use this algorithm like-

If the expected penetration i at whom the pattern is found small, the pre processing time is significant and one might therefore consider using the obvious intuitive algorithm.

The algorithm typically skips through string in steps larger than 1, and the algorithm may back up through string. Unless these processes are coded efficiently, it is probably not worthwhile to use this algorithm.

Furthermore, it should be noted that because the algorithm can back up through string, it is possible to cross a page boundary more than once.

A final situation in which it is unadvisable to use our algorithm is if the string matching problem to be solved is actually more complicated than merely finding the first occurrence of a single substring.

4.2. Publications

Jamuna Bhandari, Anil kumar, Priyanka Dahiya "Review Paper of String Matching Using Finite Automata" National Conference on Advance in Computing Communication Networks & Electrical System UIET, Maharshi Dayanand University, Rothak, Haryana , India, March 2012.

4.3. Course work

Course work on 'Research Methodology' was successfully completed, from 14th May 2012 - 7th July 2012.

These are the works which I have done so far in my first six month research work.

5. Work left

More literature review needs to be done.

Paper publication.

Three seminars.

Implementations of existing algorithms.

Designing of new algorithm.

6. Schedule of work/deadlines

Course of Research

YEAR/MONTHS

2012

2013

2014

3

6

9

12

15

18

21

24

1

Literature review

C:\Users\user\Desktop\rt.png

*

*

*

*

*

2

Formulation of research objectives

C:\Users\user\Desktop\rt.png

3

Completion of course work/ Seminars

C:\Users\user\Desktop\rt.png

*

*

*

3

Analysis and implementations of algorithms

*

*

*

4

Submission of Research Proposal to AICTE/DST

*

5

Designing of new algorithm

*

*

*

6

Conclusion and implication

*

7

Thesis writing

*

*

8

Proof reading corrections

*

9

Communication for publication of papers

C:\Users\user\Desktop\rt.png

*

*

*