Regular Expressions- ' +? '

If I am using 'thi+ ', it will return combination of letters 't ’ followed by ‘h’ followed by ‘i’ which has to occur atleast one time.

If I am using ‘thi?’, it will return combination of letters ‘t’ followed by ‘h’ followed by ‘i’ (only once) if it is present & also all combinations of other ‘th’ if not followed by ‘i’.

Please explain ’ thi+? '. Was not able to understand properly in the video.

THANKS.

Can someone clarify this?
Thanks.

  1. “+” – means it string should be present one or more times.
    “thi+” --> thi, thithi, thithiddddddsssaart,

  2. “?” – This means if preceding character may
    or may not be present in the string to be matched.
    “thi?” --> thi, the,thi, these… etc.
    This will return “thi” only once and other strings starting with “th” etc.

In the third statement 3, I was combining the statement1 of thi+ and statement2 “?” .

this “?” means the preceding character(i) is optional(may be or may not be present i.e 0(absent),1(present) times).
It will try to match whatever character it finds from (thi, th) substrings this is called Greedy Approach.

In third “thi+?”

Here the preceding character before ? is “+”

Case1: if + is present then it will become like first “thi+”. as the minimum value + will take is 1. So our search string is “thi”.

Case2 :- if + is absent then it will find all exact “thi” as ? can

So, overall this expression is finding all “thi” strings strictly in the given string “a”
and this is same case of first.

This is what I was comparing!

Additional “*”

n=re.findall(“thi?”,a)

“*” can take value either 0 or 1 or more.
If take 1 then string accepts = “thi”
If taken 0 then the string accepts will be =“th”.

So ? can ignore “i” or take “i”

So it will greedily search all the “th” and “thi” strings.

o=re.findall(“thi*?”,a)

Case1: if * is present then the string will become like first “th?”. as the minimum value * will take is 0. So our search string is “th?”.

Case2 :- if * is absent then “i” is optional for “?” it will find all exact “th?”

So, overall this expression is finding all “th” strings strictly in the given string “a”.

Note :-
This is a custom example and my explanation is based on the regex definitions and results seen!

Here I have given examples kindly run it and observe.

*?, +?, ??

The '*', '+', and '?' qualifiers are all *greedy* ; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in *non-greedy* or *minimal* fashion; as *few* characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

@Abhinav_Singh is it clear now? or shall I explain by taking more examples? or if you have code to explain kindly send it.

In third “thi+?”

Here the preceding character before ? is “+”

Case1: if + is present then it will become like first “thi+”. as the minimum value + will take is 1. So our search string is “thi”.

In the CASE 1, that you explained, thi+ will return thi as it is the minimum value.
But + will also look for more than one ‘i’ if present, but it is being constrained by ‘?’ mentioned in the search pattern ''thi+?".

This is the reason behind it being referred as ‘non-greedy’, ie returning only ''thi" if the string is ‘thiiiis’ , right?

Yes that is true the “?” will nullify the + greedy action (to search more ) so that it will search as little and exact text as possible. so it will exactly search for “thi” that is why it is called “Non greedy” ?

same with *?. * can take minimum value 0, so it will search for “th”.
These all are a part of regular expressions based on “Theory of Computations” regex principle.

Is it clear now?

In more simpler terms.

Greedy :- Keep searching until condition is not satisfied that is finding the longest possible string. all the pre’s.

Non Greedy/Lazy :- Stop searching once condition is satisfied means match the shortest possible string. ( exact).

This is also a good read https://docs.python.org/3/howto/regex.html#greedy-versus-non-greedy

@Abhinav_Singh this is a last reminder to you is this clear now? do you have any code to be explained? any OOPS related doubts or any searching, sorting or optimizations algorithms that you want to implement?
reply immediately with your doubts.

Yes it is clear.

Thanks :slight_smile:

1 Like