Useful Regex for Python Part II
Continuing on with our explanation of regular expressions (RE) from Part I, we will now turn to regex methods which can be applied to strings. I will go over them one at a time and describe their use cases with examples.
re.search(pattern, string, flags=0)
re.match(pattern, string, flags=0)
re.search() searches for the pattern in the string while re.match() only searches the pattern at the beginning of the string. They both return a match object that can tell you some information about the RE object. Running a .span at the end will tell you the pattern’s numerical location in the string.
re.search(r'(Har\b)+', string).span())
will tell us the location of the pattern in the string.
re.split(pattern, string, maxsplit=0, flags=0)
will find and split the string accordingly at that pattern. For example
string= '$279,000\n2bd1ba\n950sqft\n100W57thSt#17S\nMidtown,NewYork,NY're.split('\n', string)will return279,000
2bd1ba
950sqft
100W57thSt#17S
Midtown,NewYork,NYin a list.
The next method I will discuss is
re.findall(pattern, string, flags=0)
which finds all occurrences of the pattern in the string. For example
string= 'Smiling and laughing at the same time, Gerald was prancing happily up the street.'re.findall(r'\w+ing', string)would return [smiling, laughing, and prancing]re.findall(r'\w+ly', string)would return [happily]
In both cases we were able to use find all to find parts of speech in our text.
And finally
re.sub(pattern, repl, string, count=0, flags=0)
substitutes the pattern with a repl in the string. Also, repl can be a function. I wrote a function just for fun to illustrate. For example
string= 'All visitors of New York City must visit its five boroughs, Manhattan, Queens, The Bronx, Staten Island, and Brooklyn.'def repl_function(x):
for ele in x:
return 'the best county're.sub(r'(Brooklyn)', repl_function, string)would return All visitors of New York City must visit its five boroughs, Manhattan, Queens, The Bronx, Staten Island, and the best county.
And that is all I have to say about regular expressions (RE)! I find combining RE methods for data cleaning to be a useful application. I use RE methods at least a few times a month for my data science projects. Time and patience spent on RE will be useful in any computer science related field.
RE can be applied to NLP algorithms where words must be cleaned before they are tokenized. It can be used to extract certain elements in a text such as phone numbers, names, adverbs. The data cleaning uses are endless. This completes my two part series on RE! I hope you gained a better understanding of RE through this blog. Happy coding!