Regular Expressions in Oracle Database 10g

Oracle Database 10g supports regular expression in search and replace. It provides a set of new functions that provides regular expression capability. Oracle10g's implementation of regular expression complies with Portable Operating System for Unix (POSIX) standards.

Regular expression is a pattern that contains special characters, which specify what exactly we are looking for. Let us now understand some of the limitations we have with LIKE operator and INSTR function.

Let us assume we have to search for a name that starts with either N, M or P. The following LIKE operator, though lengthy, can do it.

select * from employees where first_name like 'N%' or first_name like 'M%' or first_name like 'P%';

But what if we want to get all names that start with any uppercase letter and contain only 5 letters. It is possible but it is not an easy condition (not at least for beginners). This is the purpose regular expressions can serve. They allow you to make your searching more precise. The following example searches for employees whose first name is starting with uppercase letter and contain only 5 letters.

select first_name from employees where regexp_like ( first_name, '^[:aplha:]{4}$');

Special characters in regular expression

The above example used a few special characters to specify what exactly we search for. Here is the list of special characters in regular expression.

Character	Description
*	Zero or more occurrences of the previous character.
+	One or more occurrences of the previous character.
?	Zero or one occurrences of the previous character.
.	Any character matches this position.
[]	Character must be any character in the brackets. Even a range of characters can be given. For Example, [A-Z] means any uppercase letter is a match.
[^]	Any character other than the characters given in the brackets is a match. Reverse of [].
^	Matches beginning of the string. For example, ^A means the string must start with A.
$	Matches end of the string. For example, A$ means string must end with A.
\|	Alternate operator. Either of the expressions must be a match. expr1\|expr2 means either expr1 or expr2 must be a match.
{n}	Previous expression must be repeated for exactly n times. For example, [0-9]{3} means three digits.
{n,m}	Previous expression must be repeated from n to m times. For example, [0-9]{3,5} means we may have 3 to 5 digits.
{n,}	Previous expression must be repeated for at least n times.
()	Groups the expression. The entire expression enclosed is treated as one expression.

The character class [:alpha:] matches any alphabet. The other characters classes available are given below:

Character Class	Description
[:alnum:]	Alphanumeric characters
[:alpha:]	Any alphabet either upper or lower case.
[:cntrl:]	Any control character. A non-printable character is called as control character.
[:digit:]	Any digit.
[:lower:]	Any lower case letter.
[:print:]	Any printable character.
[:punct:]	Any punctuation character.
[:space:]	All space characters.
[:upper:]	Any upper case letter.

New Functions

The following are the new functions and what they do. All these function extend the functionality of some existing function/operator with the capability of supporting regular expression.

Function	Description
REG_LIKE	Same as LIKE operator but supports regular expression. Allows search based on regular expression.
REG_INSTR	Same as INSTR function but supports regular expression. Allows you to locate a string in another string more precisely using regular expression.
REG_SUBSTR	Same as SUBSTR function but supports regular expression. Allows you to extract a part of another string using regular expression.
REG_REPLACE	Same as REPLACE function but supports regular expression. Allows you to search for source string using regular expression and replaces it with target string

Examples

Let us now explore these functions and see how they can be used. We start with REGEXP_LIKE function, which is extended version of LIKE operator.

select job_title from jobs where regexp_like ( job_title, '^S.*er$');

The above example displays job titles that start with S and end with er. It may contain anything in between these two.

If you want to modify the way REGEXP_LIKE compares characters then third parameter, which contains either 'c' for case sensitive or 'i' for ignore case, can be given as shown below.

select job_title from jobs where regexp_like ( job_title, '^S.*er$','i')

Now let us see how REGEXP_SUBSTR is used to extract a substring based on regular expression.

select regexp_substr('Oracle Database 10g is first grid aware database','[0-9]+') version from dual;

The above query displays 10 as it is consisting of one or more digits. The following example looks for a number followed by a character - displays 10g.

select regexp_substr('Oracle Database 10g is first grid aware database','[0-9]+[a-z]') version from dual;

The following query displays the starting position of one or more digits.

select regexp_instr('Oracle Database 10g is first grid aware database','[0-9]+') position from dual;

The following query returns the positon of first non-alphabet in the given string.

select regexp_instr('Abc123 xyz123','[^[:alpha:]]') from dual;

The following query places a space between Oracle its version using REGEXP_REPLACE function. For example, Oracle9i will become Oracle 9i, Oracle10g will become Oracle 10g. We are looking for a series of alphabets and take them as group 1. Then we are looking for a group of digits followed by any character and treat it as group 2. Then we replace the original with \1 (group 1) a space and \2 (group 2).

select regexp_replace('Oracle10g','([[:alpha:]]+)([[:digit:]]+.)','\1 \2') from dual;

Regular expressions in Oracle10g will make searching and replacing more precise. Certain searches that were not possible with simple LIKE operator are now possible with the introduction of regular expressions.