Skip to main content

Java - Regular Expressions


Java provides the java.util.regex package for pattern matching with regular expressions. Java regular expressions are very similar to the Perl programming language and very easy to learn.
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings, using a specialized syntax held in a pattern. They can be used to search, edit, or manipulate text and data.
The java.util.regex package primarily consists of the following three classes:
  • Pattern Class: A Pattern object is a compiled representation of a regular expression. The Pattern class provides no public constructors. To create a pattern, you must first invoke one of its public static compile methods, which will then return a Pattern object. These methods accept a regular expression as the first argument.
  • Matcher Class: A Matcher object is the engine that interprets the pattern and performs match operations against an input string. Like the Pattern class, Matcher defines no public constructors. You obtain a Matcher object by invoking the matcher method on a Pattern object.
  • PatternSyntaxException: A PatternSyntaxException object is an unchecked exception that indicates a syntax error in a regular expression pattern.

Capturing Groups:

Capturing groups are a way to treat multiple characters as a single unit. They are created by placing the characters to be grouped inside a set of parentheses. For example, the regular expression (dog) creates a single group containing the letters "d", "o", and "g".
Capturing groups are numbered by counting their opening parentheses from left to right. In the expression ((A)(B(C))), for example, there are four such groups:
  • ((A)(B(C)))
  • (A)
  • (B(C))
  • (C)
To find out how many groups are present in the expression, call the groupCount method on a matcher object. The groupCount method returns an int showing the number of capturing groups present in the matcher's pattern.
There is also a special group, group 0, which always represents the entire expression. This group is not included in the total reported by groupCount.

Example:

Following example illustrate how to find a digit string from the given alphanumeric string:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    public static void main( String args[] ){

      // String to be scanned to find the pattern.
      String line = "This order was places for QT3000! OK?";
      String pattern = "(.*)(\\d+)(.*)";

      // Create a Pattern object
      Pattern r = Pattern.compile(pattern);

      // Now create matcher object.
      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
      } else {
         System.out.println("NO MATCH");
      }
   }
}
This would produce following result:
Found value: This order was places for QT3000! OK?
Found value: This order was places for QT300
Found value: 0

Regular Expression Syntax:

Here is the table listing down all the regular expression metacharacter syntax available in Java:
SubexpressionMatches
^Matches beginning of line.
$Matches end of line.
.Matches any single character except newline. Using m option allows it to match newline as well.
[...]Matches any single character in brackets.
[^...]Matches any single character not in brackets
\ABeginning of entire string
\zEnd of entire string
\ZEnd of entire string except allowable final line terminator.
re*Matches 0 or more occurrences of preceding expression.
re+Matches 1 or more of the previous thing
re?Matches 0 or 1 occurrence of preceding expression.
re{ n}Matches exactly n number of occurrences of preceding expression.
re{ n,}Matches n or more occurrences of preceding expression.
re{ n, m}Matches at least n and at most m occurrences of preceding expression.
a| bMatches either a or b.
(re)Groups regular expressions and remembers matched text.
(?: re)Groups regular expressions without remembering matched text.
(?> re)Matches independent pattern without backtracking.
\wMatches word characters.
\WMatches nonword characters.
\sMatches whitespace. Equivalent to [\t\n\r\f].
\SMatches nonwhitespace.
\dMatches digits. Equivalent to [0-9].
\DMatches nondigits.
\AMatches beginning of string.
\ZMatches end of string. If a newline exists, it matches just before newline.
\zMatches end of string.
\GMatches point where last match finished.
\nBack-reference to capture group number "n"
\bMatches word boundaries when outside brackets. Matches backspace (0x08) when inside brackets.
\BMatches nonword boundaries.
\n, \t, etc.Matches newlines, carriage returns, tabs, etc.
\QEscape (quote) all characters up to \E
\EEnds quoting begun with \Q

Methods of the Matcher Class:

Here is the lists of useful instance methods:

Index Methods:

Index methods provide useful index values that show precisely where the match was found in the input string:
SNMethods with Description
1public int start() 
Returns the start index of the previous match.
2public int start(int group)
Returns the start index of the subsequence captured by the given group during the previous match operation.
3public int end()
Returns the offset after the last character matched.
4public int end(int group)
Returns the offset after the last character of the subsequence captured by the given group during the previous match operation.

Study Methods:

Study methods review the input string and return a boolean indicating whether or not the pattern is found:
SNMethods with Description
1public boolean lookingAt() 
Attempts to match the input sequence, starting at the beginning of the region, against the pattern.
2public boolean find() 
Attempts to find the next subsequence of the input sequence that matches the pattern.
3public boolean find(int start
Resets this matcher and then attempts to find the next subsequence of the input sequence that matches the pattern, starting at the specified index.
4public boolean matches() 
Attempts to match the entire region against the pattern.

Replacement Methods:

Replacement methods are useful methods for replacing text in an input string:
SNMethods with Description
1public Matcher appendReplacement(StringBuffer sb, String replacement)
Implements a non-terminal append-and-replace step.
2public StringBuffer appendTail(StringBuffer sb)
Implements a terminal append-and-replace step.
3public String replaceAll(String replacement) 
Replaces every subsequence of the input sequence that matches the pattern with the given replacement string.
4public String replaceFirst(String replacement)
Replaces the first subsequence of the input sequence that matches the pattern with the given replacement string.
5public static String quoteReplacement(String s)
Returns a literal replacement String for the specified String. This method produces a String that will work as a literal replacement s in the appendReplacement method of the Matcher class.

The start and end Methods:

Following is the example that counts the number of times the word "cats" appears in the input string:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    private static final String REGEX = "\\bcat\\b";
    private static final String INPUT =
                                    "cat cat cat cattie cat";

    public static void main( String args[] ){
       Pattern p = Pattern.compile(REGEX);
       Matcher m = p.matcher(INPUT); // get a matcher object
       int count = 0;

       while(m.find()) {
         count++;
         System.out.println("Match number "+count);
         System.out.println("start(): "+m.start());
         System.out.println("end(): "+m.end());
      }
   }
}
This would produce following result:
Match number 1
start(): 0
end(): 3
Match number 2
start(): 4
end(): 7
Match number 3
start(): 8
end(): 11
Match number 4
start(): 19
end(): 22
You can see that this example uses word boundaries to ensure that the letters "c" "a" "t" are not merely a substring in a longer word. It also gives some useful information about where in the input string the match has occurred.
The start method returns the start index of the subsequence captured by the given group during the previous match operation, and end returns the index of the last character matched, plus one.

The matches and lookingAt Methods:

The matches and lookingAt methods both attempt to match an input sequence against a pattern. The difference, however, is that matches requires the entire input sequence to be matched, while lookingAt does not.
Both methods always start at the beginning of the input string. Here is the example explaining the functionality:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    private static final String REGEX = "foo";
    private static final String INPUT = "fooooooooooooooooo";
    private static Pattern pattern;
    private static Matcher matcher;

    public static void main( String args[] ){
       pattern = Pattern.compile(REGEX);
       matcher = pattern.matcher(INPUT);

       System.out.println("Current REGEX is: "+REGEX);
       System.out.println("Current INPUT is: "+INPUT);

       System.out.println("lookingAt(): "+matcher.lookingAt());
       System.out.println("matches(): "+matcher.matches());
   }
}
This would produce following result:
Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
lookingAt(): true
matches(): false

The replaceFirst and replaceAll Methods:

The replaceFirst and replaceAll methods replace text that matches a given regular expression. As their names indicate, replaceFirst replaces the first occurrence, and replaceAll replaces all occurences.
Here is the example explaining the functionality:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
    private static String REGEX = "dog";
    private static String INPUT = "The dog says meow. " +
                                    "All dogs say meow.";
    private static String REPLACE = "cat";

    public static void main(String[] args) {
       Pattern p = Pattern.compile(REGEX);
       // get a matcher object
       Matcher m = p.matcher(INPUT); 
       INPUT = m.replaceAll(REPLACE);
       System.out.println(INPUT);
   }
}
This would produce following result:
The cat says meow. All cats say meow.

The appendReplacement and appendTail Methods:

The Matcher class also provides appendReplacement and appendTail methods for text replacement.
Here is the example explaining the functionality:
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class RegexMatches
{
   private static String REGEX = "a*b";
   private static String INPUT = "aabfooaabfooabfoob";
   private static String REPLACE = "-";
   public static void main(String[] args) {
      Pattern p = Pattern.compile(REGEX);
      // get a matcher object
      Matcher m = p.matcher(INPUT);
      StringBuffer sb = new StringBuffer();
      while(m.find()){
         m.appendReplacement(sb,REPLACE);
      }
      m.appendTail(sb);
      System.out.println(sb.toString());
   }
}
This would produce following result:
-foo-foo-foo-

PatternSyntaxException Class Methods:

A PatternSyntaxException is an unchecked exception that indicates a syntax error in a regular expression pattern. The PatternSyntaxException class provides the following methods to help you determine what went wrong:
SNMethods with Description
1public String getDescription()
Retrieves the description of the error.
2public int getIndex() 
Retrieves the error index.
3public String getPattern() 
Retrieves the erroneous regular expression pattern.
4public String getMessage() 
Returns a multi-line string containing the description of the syntax error and its index, the erroneous regular expression pattern, and a visual indication of the error index within the pattern.

Popular posts from this blog

ORA-02051 Another Session Or Branch In Same Transaction Failed

ORA-02051 Another Session Or Branch In Same Transaction Failed (Doc ID 2253226.1)          SYMPTOMS for ORA-02051 Another Session Or Branch In Same Transaction Failed. Database performance is slow and caused   the transactions ORA-02051 another session or branch in same transaction failed or finalized CAUSE for ORA-02051 Another Session Or Branch In Same Transaction Failed. Session transactions branches caused the issue Excessive Waits On The Event "Global transaction acquire instance locks" SOLUTION Please use below sql and identified underscore parameter values for ORA-02051 Another Session Or Branch In Same Transaction Failed : SQL> select a.ksppinm "Parameter", b.ksppstvl "Session Value",c.ksppstvl "Instance Value"  FROM x$ksppi a,x$ksppcv b, x$ksppsv c  WHERE a.indx = b.indx AND a.indx = c.indx AND a.ksppinm LIKE '/_%' escape '/'  AND (a.ksppinm like '%clusterwide_global%' or a.ksppinm like '%disable_autotune_...

Video Conferencing Project in Java Source Code

Video Conferencing Project in Java Source Code     ################################################################################# FEATURE ################################################################################# 1.Multi Chat(Used Threadpole) 2.P2P Chat 3.P2P Audio Chat 4.P2P Video Chat 5.Complete Automated 6.H.263 compression Video 7.raw audio PREREQUISITE: 1. JUST INSTALL jmf-2.1.1 e @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ *****/  just need to copy the client side code and run it to a pc   not need any manual IP ***/  How to run : Just run server side code in a PC and then Run Client side code to different  PC.Then the work is done. Server Side Code: ClientListener.java Clients.java Main.java MessageListener.java ServerConstant.java ServerManager.java ServerMonitor.java ServerStatusListener.java   ClientListener.java /*  * To change this template, choose Tools | Templates  * and open the tem...

DBA_SCHEDULER_JOB_RUN_DETAILS and PURGE_LOG

How to purge DBA_SCHEDULER_JOB_RUN_DETAILS? Manually deleting from DBA_SCHEDULER_JOB_RUN_DETAILS is not recommended by oracle.DBA_SCHEDULER_JOB_RUN_DETAILS is a view that is using two master tables (scheduler$_job_run_details and scheduler$_event_log) and display the information about jobs history. As there is one procedure named PURGE_LOG and Oracle have Scheduler for this procedure. It will purges all rows in the job log that are older than 30 days.This is the default behavior of this procedure. You can change this to any number of days you want by setting the attribute "SET_SCHEDULER_ATTRIBUTE". e.g. exec DBMS_SCHEDULER.SET_SCHEDULER_ATTRIBUTE('log_history','15'); It will purge all logs older than 15days and it will maintain the history of 15days. But If you want manually purge these logs, you can use below solution:- exec DBMS_SCHEDULER.PURGE_LOG(log_history => 15, which_log => 'JOB_LOG'); It will purge all entries from the jog log that are o...

JAX-WS Hello World Example – RPC Style

JAX-WS Hello World Example – RPC Style AX-WS is bundled with JDK 1.6, which makes Java web service development easier to develop. This tutorial shows you how to do the following tasks: Create a SOAP-based RPC style web service endpoint by using JAX-WS. Create a Java web service client manually. Create a Java web service client via  wsimport  tool. Create a Ruby web service client. You will be surprise of how simple it is to develop a RPC style web service in JAX-WS. Note In general words, “ web service endpoint ” is a service which published outside for user to access; where “ web service client ” is the party who access the published service. JAX-WS Web Service End Point The following steps showing how to use JAX-WS to create a RPC style web service endpoint. 1. Create a Web Service Endpoint Interface File : HelloWorld.java package com.mkyong.ws ;   import javax.jws.WebMethod ; import javax.jws.WebService ; import javax.jws.soap.SOA...

Oracle character AL32UTF8

The character set determines what languages can be represented in the database. Oracle recommends using Unicode (AL32UTF8) as the database character set. AL32UTF8 is Oracle's name for the UTF-8 encoding of the Unicode standard. The Unicode standard is the universal character set that supports most of the currently spoken languages of the world. The use of the Unicode standard is indispensable for any multilingual technology, including database processing. Changing the database character set is a time consuming and complex project. Therefore, it is very important to select the right character set at installation time. If the language is American English or a Western European language, then the default character set is WE8MSWIN1252. Each Microsoft Windows ANSI Code Page can store data from only one language or a limited group of languages, such as only Western European, or only Eastern European, or only Japanese. AL32UTF8 is a multibyte character set, database operations on character...

ORA-02291: integrity constraint violated - parent key not found

“Error: ORA-02291: integrity constraint violated - parent key not found” Reason:    A Primary key does not have the same value as the foreign key. We will discuss it in detail later in this article. Action:   For  ORA-02291: integrity constraint violated - parent key not found  You may either delete the foreign key or the matching primary key can be added. In either way, you may try to get this error corrected. Let's understand more about ORA-02291: integrity constraint violated - parent key not found? The Oracle software brought us the strength by which multiple tables in the database can pass on information so efficiently. Not only this, there are numerous devices in this software which enables access to and sourcing data from multiple tables. You can easily execute complicated database issues without an unusual uncertainty by creating statements with the fantastic characteristic of this software. Realistically, if we talk about user or database no one is perf...