Regular Expressions- ' +? '

If I am using 'thi+ ', it will return combination of letters 't ’ followed by ‘h’ followed by ‘i’ which has to occur atleast one time.

If I am using ‘thi?’, it will return combination of letters ‘t’ followed by ‘h’ followed by ‘i’ (only once) if it is present & also all combinations of other ‘th’ if not followed by ‘i’.

Please explain ’ thi+? '. Was not able to understand properly in the video.

THANKS.

Can someone clarify this?
Thanks.

  1. “+” – means it string should be present one or more times.
    “thi+” --> thi, thithi, thithiddddddsssaart,

  2. “?” – This means if preceding character may
    or may not be present in the string to be matched.
    “thi?” --> thi, the,thi, these… etc.
    This will return “thi” only once and other strings starting with “th” etc.

In the third statement 3, I was combining the statement1 of thi+ and statement2 “?” .

this “?” means the preceding character(i) is optional(may be or may not be present i.e 0(absent),1(present) times).
It will try to match whatever character it finds from (thi, th) substrings this is called Greedy Approach.

In third “thi+?”

Here the preceding character before ? is “+”

Case1: if + is present then it will become like first “thi+”. as the minimum value + will take is 1. So our search string is “thi”.

Case2 :- if + is absent then it will find all exact “thi” as ? can

So, overall this expression is finding all “thi” strings strictly in the given string “a”
and this is same case of first.

This is what I was comparing!

Additional “*”

n=re.findall(“thi?”,a)

“*” can take value either 0 or 1 or more.
If take 1 then string accepts = “thi”
If taken 0 then the string accepts will be =“th”.

So ? can ignore “i” or take “i”

So it will greedily search all the “th” and “thi” strings.

o=re.findall(“thi*?”,a)

Case1: if * is present then the string will become like first “th?”. as the minimum value * will take is 0. So our search string is “th?”.

Case2 :- if * is absent then “i” is optional for “?” it will find all exact “th?”

So, overall this expression is finding all “th” strings strictly in the given string “a”.

Note :-
This is a custom example and my explanation is based on the regex definitions and results seen!

Here I have given examples kindly run it and observe.

*?, +?, ??

The '*', '+', and '?' qualifiers are all *greedy* ; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE <.*> is matched against '<H1>title</H1>', it will match the entire string, and not just '<H1>'. Adding '?' after the qualifier makes it perform the match in *non-greedy* or *minimal* fashion; as *few* characters as possible will be matched. Using .*? in the previous expression will match only '<H1>'.

@Abhinav_Singh is it clear now? or shall I explain by taking more examples? or if you have code to explain kindly send it.