Regular Expressions in Java and Perl

The two main classes of the java.util.regex API are Pattern and Matcher. In the tutorial, you start by creating a java file, RegexTestHarness.java, that can be used to read in different regular expressions from the file regex.txt. The regular expression read from regex.txt is compiled into a pattern using the compile method of the Pattern class; the pattern is used to find instances that match the regular expression using the matcher method of the same class.

You should already be acquainted with basic string processing in Java–i.e., with the methods of the String class. (If you aren’t, have a look at the online Java documentation, or to Chapter 5 of Oliver Mason’s book (which also contains a gentle introduction to Java if you don’t know it).) In this lab we’ll be mainly concerned with the StringTokenizer class.

As discussed in class, tokenization is one of the fundamental tasks in NLE: extracting tokens from the input text. The definition of ‘token’ depends on the application, but in most cases complete words count as tokens; sometimes, punctuation markers do as well. Finite state methods are typically used for tokenization, because of their efficiency. In Java, the methods of the class StringTokenizer can be used for a very basic form of tokenization. For example, the code
This example is borrowed from the Java documentation for the StringTokenizer class at java.sun.com, like the following example of use of split.

Get pdf download Regular Expressions in Java and Perl

Related Tutorial

Tags: , , , , , , , , , , , , , , , , , , ,

Comments

Leave a Reply