“Honest differences
are often a healthy sign of progress.” Mahatma Gandhi |
SQL Section:
Regular Expression
Introduction
It is a method for simple
and complex patterns for searching and manipulating a text. You can
search, extract, format, and manipulate a text in the database. At the
beginning, it appears that the syntax is not very intuitive but by
second look, it may look easy. The technique more reflects as UNIX
style regular expressions.
Interfaces: Oracle
Regular Expressions are implemented by the following functions
available in SQL and PL/SQL.
· REGEXP_LIKE,
· REGEXP_REPLACE,
· REGEXP_INSTR,
and
· REGEXP_SUBSTR
Meta-characters:
The following is a list of supported Oracle Meta-characters use in
Oracle Regular Expressions.
Syntax |
Description |
Classification |
|
Match any character |
Dot |
a? |
Match ‘a’ zero or
one time |
Quantifier |
a* |
Match ‘a’ zero or
more time |
Quantifier |
a+ |
Match ‘a’ one or
more time |
Quantifier |
a|b |
Match either ‘a’
or ‘b’ |
Alternation |
a{m} |
Match ‘a’ exactly
m times |
Quantifier |
a{m,} |
Match ‘a’ at
least m times |
Quantifier |
a{m,n} |
Match ‘a’ between
m and n times |
Quantifier |
[abc] |
Match either ‘a’
or ‘b’ or ‘c’ |
Bracket Expression |
(…) |
Group an expression |
Sub-expression |
|
Match nth
sub-expression |
Back-reference |
[:cc:] |
Match character class
in bracket expression |
Character Class |
[.ce.] |
Match collation
element in bracket expression |
Collation Element |
[=ec=] |
Match equivalence
class in bracket expression |
Equivalence Class |
NOTE:
It is important to realize
that the results of a particular regular expression query could be
different under a different locale such as changing NLS_SORT, Case and
Accent Sensitivity, etc.
Character Classes:
They are sensitive to the underlying character set such as the
[:lower:] character class.
The following is a list of
Oracle supports character classes, based on character class
definitions in NLS classification data:
Character Class
Syntax |
Meaning |
[:alnum:] |
All alphanumeric
characters |
[:alpha:] |
All alphabetic
characters |
[:blank:] |
All blank space
characters. |
[:cntrl:] |
All control
characters (nonprinting) |
[:digit:] |
All numeric digits |
[:graph:] |
All [:punct:],
[:upper:], [:lower:], and [:digit:] characters. |
[:lower:] |
All lowercase
alphabetic characters |
[:print:] |
All printable
characters |
[:punct:] |
All punctuation
characters |
[:space:] |
All space characters
(nonprinting) |
[:upper:] |
All uppercase
alphabetic characters |
[:xdigit:] |
All valid hexadecimal
characters |
Example: Consider a simple
query to convert the ‘McLean’ city name to a more readable format.
You should look for any instance for a lower case letter immediately
followed by an upper case letter. Your query should record these two
letters in back-references by using sub-expressions, then replaces the
first one, followed by a space, then followed by the second letter:
SQL> SELECT
REGEXP_REPLACE(‘McLean’,
‘([[:lower:]])([[:upper:]])’,
‘1 2’) as “City”
FROM dual;
Equivalence Class:
It allows searching for all characters that have a common base letter.
Example:
SQL> SELECT
REGEXP_SUBSTR
(‘iSelfSchooling NOT
ISelfSchooling’, ‘[[=i=]]SelfSchooling’) as name
FROM dual;
Datatypes Support
Keep this in your mind that
these functions support CHAR, VARCHAR2, CLOB, NCHAR, NVARCHAR, and
NCLOB datatypes.
REGEXP_LIKE function
It returns a Boolean
indicating whether the pattern matched or not.
Example: Consider to write
an expression that could search for common inflections of the verb ‘try’.
The following regular expression will match try, trying, tried, and
tries.
SQL> SELECT
REGEXP_LIKE (‘We are
trying to make the subject easier.’,
‘tr(y(ing)? | (ied) | (ies))’)
as REGEXT_SAMPLE
FROM dual;
REGEXP_SUBSTR function
It returns the actual data
that matches the specified pattern.
Example: Consider to write
an expression that could return the ‘trying’ specified pattern.
SQL> SELECT
REGEXP_SUBSTR (‘We are
trying to make the subject easier.’,
‘tr(y(ing)? | (ied) | (ies))’)
as REGEXT_SAMPLE
FROM dual;
REGEXP_INSTR function
It returns the character
position of either the beginning or end of the match.
Example: Consider to write
an expression that could return the position of ‘trying’ specified
pattern.
SQL> SELECT
REGEXP_INSTR (‘We are
trying to make the subject easier.’,
‘tr(y(ing)? | (ied) | (ies))’)
as REGEXT_SAMPLE
FROM dual;
REGEXP_REPLACE function
It looks for an occurrence
of a regular expression and replaces it with the contents of a
supplied text literal.
Example:
SQL> SELECT
REGEXP_REPLACE (‘We are
trying to make the subject easier.’,
‘.’, ‘ for you.’ )
as REGEXT_SAMPLE
FROM dual;
Or, query a list of all
employees’ name that hired between 1996 and 1999.
SQL> SELECT ename FROM
emp
WHERE REGEXP_REPLACE
(TO_CHAR(hire_date, ‘YYYY’),
‘^199[6-9]$’);
You used ‘^’ to
indicate that the beginning of the line has to be 199, and [-] with $
to specify range of valid characters.
Occurrence in the REGEXP
functions
All functions take an
occurrence that specifies you require the nth matching expression in
REGEXP_SUBSTR and REGEXP_INSTR, the default for which is 1.
Example: Consider
extracting the 3rd field being the Oracle system identification in a
column.
SQL> SELECT
REGEXP_SUBSTR(‘system/password@myhost:1521:mysid’,
‘[^:]+’, 1, 3) as “SID
name”
FROM dual;
You also can the REGEXP
function in the DDL statement. Consider to configure a column to allow
only alphabetical characters within a VARCHAR2 column.
SQL> CREATE TABLE mytest
(c1 VARCHAR2(20),
CHECK (REGEXP_LIKE(c1, ‘^[[:alpha:]]+$)));
“I have a dream
that my four little children will one day live in a nation where
they will not be judged by the color of their skin, but by the
content of their character.” Martin Luther King Jr. |
Questions:
Questions on
Regular Expression
Q: What is Regular
Expression (REGEXP) in the Oracle 10g Database?
Q: What are functions of
REGEXP?
Q: What are the
Meta-characters in REGEXP?
Q: What are the Character
Classes?
Q: Consider a simple query
to convert the ‘McLean’ city name to a more readable format (Mc
Lean). You should look for any instance for a lower case letter
immediately followed by an upper case letter. Your query should record
these two letters in back-references by using sub-expressions, then
replaces the first one, followed by a space, then followed by the
second letter.
Q: How to use REGULAR
EXPRESSIONS in Oracle
Q: What does the
REGEXP_LIKE function?
Q: Consider to write an
expression that could search for common inflections of the verb ‘try’.
Q: What does the
REGEXP_SUBSTR function?
Q: Consider to write an
expression that could return the ‘trying’ specified pattern.
Q: What does the
REGEXP_INSTR function?
Q: Consider to write an
expression that could return the position of ‘trying’ specified
pattern.
Q: What does the
REGEXP_REPLACE function?
Q: Query a list of all
employees’ name that hired between 1996 and 1999.
Q: What is occurrence in
the REGEXP functions?
Q: Consider extracting the
third field being the Oracle system identification in a column.
|