lib_mysqludf_preg
Introduction
lib_mysqludf_preg is a library of mysql UDFs (user-defined-functions) that provide access to the PCRE (perl compatible-regular-expressions) library for pattern matching. The PCRE library is a set of functions that implement regular expression pattern matching using the same syntax and semantics as Perl 5. This syntax can often handle more complex expressions and capturing than standard regular expression implementations. For more information about PCRE, please see: http://www.pcre.org/
lib_mysqludf_preg is a useful performance optimization for those applications that are already performing these regular expression matches in a high level language (ie. PHP) on the client side. It is also helpful when there is a need to capture a parenthesized subexpression from a regular expression, or simply as a slight performance boost over the builtin RLIKE/REGEXP functions.
Installation
lib_mysqludf_preg is distributed as a source package. (Binaries are available for the Windows version.) The instructions below provide some information about how to configure and compile the library. Please consult the INSTALL file included with the source package for more details. If installing from source for Windows please consult the doc/INSTALL.win file.From Source
Prerequisites
These UDFs require that the libpcre headers and library are installed. For debian/ubuntu type systems, installing libpcre3-dev should be sufficient. (apt-get install libpcre3-dev).Compilation
Most users should be able to simply type: ./configure ; make installIf mysql is an unusual place, you might need to add --with-mysql=<mysql directory>/bin/mysql_config.
Similarly, if licpcre is in an unusual place, --with-pcre can be added.
Example (for osx using fink): ./configure --with-pcre=/sw --with-mysql=/sw/bin/mysql_config
Provided the library has been installled into a directory that mysql server already has in its LD_LIBRARY_PATH, installation of the functions should be as easy as typing: make installdb. Any problems encountered are likely related to the server's environment and the installation directory. If no problems are encountered, type make test to perform some basic tests.
Windows Binaries
There is a zip file, downloadable from this site, that contains the necessary dll files to run this UDF on windows. One of these dll files is the UDF, itself, and the other required dll file is the PCRE library. These files both need to be installed in directories where the MySQL server can find them. For the 5.0 series of mysql server, this should be the bin directory directly under the Mysql\ Server installation. For the 5.1 MySQL servers this is the directory as specified by the 'plugin_dir' MySQL variable. After copying the files and restarting the server, the 3rd file in the archive can be used to create the functions.For more information, please consult the doc/INSTALL.windows file in the source package.
Functions
lib_mysqludf_preg provides the following functions that interface with the PCRE library.
- preg_capture capture a parenthesized subexpression from a PCRE pattern
- preg_check check if a string is a valid perl-compatible regular expression
- preg_position get position of the of a regular expression capture group in a string
- preg_replace perform regular expression search & replace using PCRE.
- preg_rlike test if a string matches a perl-compatible regular expression
- lib_mysqludf_preg_info get information about the installed lib_mysqludf_preg library
preg_capture
capture a parenthesized subexpression from a PCRE pattern
- Function Installation
- CREATE FUNCTION preg_capture RETURNS STRING SONAME 'lib_mysqludf_preg.so';
- Synopsis
- PREG_CAPTURE( pattern , subject [, group] [, occurence] )
- Parameters:
-
pattern - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter. subject -is the data to perform the match & capture on group - is the capture group that should be returned. This can be a numeric capture group or a named capture group. Numeric groups should be passed in as integers while named groups should be strings. If not speficied, this defaults to 0, which will capture the entire matching regular expression. occurence - which match of the regex to perform capture on. This is useful for subjects that have multiple matches of the pattern. If not speficied, this defaults to 1, which will capture the requested group, from the first matching occurence of the pattern.
- Returns:
- - string that was captured - if there was a match and the desired capture group is valid
- string that is the entire portion of subject which matches the pattern - if 0 is passed in as the group and pattern matches subject
- NULL - if pattern does not match the subject or group is not a valid capture group for the given pattern and subject.
- Examples:
Yields:
+----------------------------------------------------------+
| PREG_CAPTURE('/(.*?)(fox)/' , 'the quick brown fox' ,2 ) |
+----------------------------------------------------------+
| fox |
+----------------------------------------------------------+
SELECT PREG_CAPTURE( '/"([^"]+)"/' , 'the "quick" brown fox "jumped" over the "lazy" dog' , 1,2 );
+--------------------------------------------------------------------------------------------+ | PREG_CAPTURE( '/"([^"]+)"/' , 'the "quick" brown fox "jumped" over the "lazy" dog' , 1,2 ) | +--------------------------------------------------------------------------------------------+ | jumped | +--------------------------------------------------------------------------------------------+
SELECT PREG_CAPTURE( '/b[^\s]+/' , 'the quick brown fox jumped over' )
Yields:
+------------------------------------------------------------------+ | PREG_CAPTURE( '/b[^\s]+/' , 'the quick brown fox jumped over' ) | +------------------------------------------------------------------+ | brown | +------------------------------------------------------------------+
- Note:
- Remember to add a backslash to escape patterns that use \ notation
preg_check
Test if a perl-compatible regular expression is valid
- Function Installation
- CREATE FUNCTION preg_check RETURNS INTEGER SONAME 'lib_mysqludf_preg.so';
- Synopsis
- PREG_CHECK( pattern )
- Parameters:
-
pattern - is a string that might be a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php
- Returns:
- 1 - the pcre is valid
0 - the pcre is NULL, empty, or a bad regex
- Examples:
Yields:
+---------------------------------------------------------------+
| PREG_CHECK('/The quick brown fox/i' ) |
+---------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------+
SELECT * from patterns WHERE PREG_CHECK( pattern );
Yields: all of the rows containing valid pcre's.
preg_position
get position of the of a regular expression capture group in a string
- Function Installation
- CREATE FUNCTION preg_position RETURNS INTEGER SONAME 'lib_mysqludf_preg.so';
- Synopsis
- PREG_POSITION( pattern , subject [, group] [, occurence] )
- Parameters:
-
pattern - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter. subject -is the data to perform the match & position capture on group - the capture group whose position that should be returned. This can be a numeric capture group or a named capture group. Numeric groups should be passed in as integers, while named groups should be strings. 0 should be used to request to position of the entire matching expression. This parameter defaults to 0. occurence - which match of the regex to perform capture on. This is useful for subjects that have multiple matches of the pattern. This parameter defaults to 1.
- Returns:
- - integer position of the string that was captured - if there was a match and the desired capture group and occurence is valid
- NULL if pattern does not match the subject or group is not a valid capture group or the occurence is larger than the number of matches for the given pattern and subject.
- Examples:
Yields:
+-----------------------------------------------------------+
| PREG_POSITION('/(.*?)(fox)/' , 'the quick brown fox' ,2 ) |
+-----------------------------------------------------------+
| 17 |
+-----------------------------------------------------------+
SELECT PREG_POSITION('/"[^"]+"/' , '"quick","brown","fox" "jumped"',0,4)
Yields:
+-------------------------------------------------------------------+
| PREG_POSITION('/"[^"]+"/' , '"quick","brown","fox" "jumped"',0,4) |
+-------------------------------------------------------------------+
| 23 |
+-------------------------------------------------------------------+
- Note:
- Remember to add a backslash to escape patterns that use \ notation. The returned position starts at 1 for the first character, as is standard for MySQL string functions.
preg_replace
performs regular expression search & replace using PCRE.
- Function Installation
- CREATE FUNCTION preg_replace RETURNS STRING SONAME 'lib_mysqludf_preg.so';
- Synopsis
- PREG_REPLACE( pattern , replacement , subject [ , limit ] )
- Parameters:
-
pattern - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter. replacement - is the string to use as the replacement. This string may contain capture group references such as . You can also use for these in a similar fashion as in PHP. subject -is the data to perform the match & replace on limit - optional number that is the maximum replacements to perform. Use -1 (or leave empty) for no limit.
- Returns:
- - string - 'subject' with the instances of pattern replaced
- string - the same as passed in if there were no matches
- Examples:
Yields:
+-----------------------------------------------------------------+
| PREG_REPLACE('/(.*?)(fox)/' , 'dog' , 'the quick brown fox' ) |
+-----------------------------------------------------------------+
| the quick brown dog |
+-----------------------------------------------------------------+
SELECT PREG_REPLACE('/\s\s/+', ' ' , products.title FROM products;
Yields: The product names with all of the extra whitespace removed
- Note:
- Remember to add a backslash to escape patterns that use \ notation. Also, using $ notation makes things a little clearer when using backreferences in the replacement.
preg_rlike
Test if a string matches a perl-compatible regular expression
- Function Installation
- CREATE FUNCTION preg_rlike RETURNS INTEGER SONAME 'lib_mysqludf_preg.so';
- Synopsis
- PREG_RLIKE( pattern , subject )
- Parameters:
-
pattern - is a string that is a perl compatible regular expression as documented at: http://us.php.net/manual/en/ref.pcre.php This expression passed to this function should have delimiters and can contain the standard perl modifiers after the ending delimiter. subject - is the data to perform the test on.
- Returns:
- 1 - a match was found
0 - no match
- Examples:
Yields:
+---------------------------------------------------------------+
| PREG_RLIKE('/The quick brown fox/i' , 'the quick brown fox' ) |
+---------------------------------------------------------------+
| 1 |
+---------------------------------------------------------------+
SELECT * from products WHERE PREG_RLIKE( '/organic/i' , products.title )
Yields: all of the products with 'organic' in their titles
lib_mysqludf_preg_info
Return version information for lib_mysqludf_preg package
- Function Installation
- CREATE FUNCTION lib_mysqludf_preg_info RETURNS STRING SONAME 'lib_mysqludf_preg.so' ;
- Synopsis
- LIB_MYSQLUDF_PREG_INFO()
- Returns:
- string - version information for the lib_mysqludf_preg package
- Examples:
- SELECT LIB_MYSQLUDF_PREG_INFO();
+--------------------------+ | LIB_MYSQLUDF_PREG_INFO() | +--------------------------+ | lib_mysqludf_preg 0.6.1 | +--------------------------+