Photo by s w on Unsplash

Useful Regex for Python Part II

Jeffrey Ng
2 min readDec 14, 2020

Continuing on with our explanation of regular expressions (RE) from Part I, we will now turn to regex methods which can be applied to strings. I will go over them one at a time and describe their use cases with examples.

re.search(pattern, string, flags=0)
re.match(pattern, string, flags=0)

re.search() searches for the pattern in the string while re.match() only searches the pattern at the beginning of the string. They both return a match object that can tell you some information about the RE object. Running a .span at the end will tell you the pattern’s numerical location in the string.

re.search(r'(Har\b)+', string).span())

will tell us the location of the pattern in the string.

re.split(pattern, string, maxsplit=0, flags=0)

will find and split the string accordingly at that pattern. For example

string= '$279,000\n2bd1ba\n950sqft\n100W57thSt#17S\nMidtown,NewYork,NY're.split('\n', string)will return279,000
2bd1ba
950sqft
100W57thSt#17S
Midtown,NewYork,NY
in a list.

The next method I will discuss is

re.findall(pattern, string, flags=0)

which finds all occurrences of the pattern in the string. For example

string= 'Smiling and laughing at the same time, Gerald was prancing happily up the street.'re.findall(r'\w+ing', string)would return [smiling, laughing, and prancing]re.findall(r'\w+ly', string)would return [happily]

In both cases we were able to use find all to find parts of speech in our text.

And finally

re.sub(pattern, repl, string, count=0, flags=0)

substitutes the pattern with a repl in the string. Also, repl can be a function. I wrote a function just for fun to illustrate. For example

string= 'All visitors of New York City must visit its five boroughs, Manhattan, Queens, The Bronx, Staten Island, and Brooklyn.'def repl_function(x):
for ele in x:
return 'the best county'
re.sub(r'(Brooklyn)', repl_function, string)would return All visitors of New York City must visit its five boroughs, Manhattan, Queens, The Bronx, Staten Island, and the best county.

And that is all I have to say about regular expressions (RE)! I find combining RE methods for data cleaning to be a useful application. I use RE methods at least a few times a month for my data science projects. Time and patience spent on RE will be useful in any computer science related field.

RE can be applied to NLP algorithms where words must be cleaned before they are tokenized. It can be used to extract certain elements in a text such as phone numbers, names, adverbs. The data cleaning uses are endless. This completes my two part series on RE! I hope you gained a better understanding of RE through this blog. Happy coding!

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Jeffrey Ng
Jeffrey Ng

Written by Jeffrey Ng

Data Analyst | Lifelong Learner

No responses yet

Write a response