A student asked me to explain this wonderful regular expression
The latter is the easier one to understand. It's trying to represent a non-whitespace character. But, the usual regular expression \S has been preceded with a backslash, since the programming language, Java in this case, expects the backslash to be present as a escape character to subsequent \S
So, the more regular, regular expression would be
The technical term for is ?= "Positive Lookahead". It basically means that the string "should contain the specified set of characters, but they are not consumed". In simple terms, this is used to validate if a string contains any of the set of characters we are interested in, irrespective of their order or location in the string. Thus, (?=.*[0-9]) matches any string that has zero or more occurrences of any character followed by a digit. In much simpler terms, we expect the string to contain a digit.
So, to understand the entire regular expression, we need to first break down into smaller chunks. You will notice that there are 5 sets of "positive lookahead" blocks. If we eliminate them, we are left with
Now, let's add the first "positive lookahead" block
Now, let's add the second "positive lookahead" block
Similarly, let's add the third "positive lookahead" block
Now, let's add the fourth "positive lookahead" block
Finally, let's add the fifth "positive lookahead" block
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{6,}$There are two things you need to understand well, before attempting to determine what this regular expression matches. One is to understand the use of ?= and the other is to understand \\S . Assuming that you can understand the rest of the Regular Expression.
The latter is the easier one to understand. It's trying to represent a non-whitespace character. But, the usual regular expression \S has been preceded with a backslash, since the programming language, Java in this case, expects the backslash to be present as a escape character to subsequent \S
So, the more regular, regular expression would be
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\S+$).{6,}$Now, let's understand ?=
The technical term for is ?= "Positive Lookahead". It basically means that the string "should contain the specified set of characters, but they are not consumed". In simple terms, this is used to validate if a string contains any of the set of characters we are interested in, irrespective of their order or location in the string. Thus, (?=.*[0-9]) matches any string that has zero or more occurrences of any character followed by a digit. In much simpler terms, we expect the string to contain a digit.
So, to understand the entire regular expression, we need to first break down into smaller chunks. You will notice that there are 5 sets of "positive lookahead" blocks. If we eliminate them, we are left with
^.{6,}$This matches a string that has atleast 6 characters. The below table should help you visualise this better.
Now, let's add the first "positive lookahead" block
^(?=.*[0-9]).{6,}$This matches a string that has atleast 6 characters. It also validates that the string also has atleast one digit, as is mentioned in the first "positive lookahead" block. The below table should help you visualise this better.
Now, let's add the second "positive lookahead" block
^(?=.*[0-9])(?=.*[a-z]).{6,}$This matches a string that has atleast 6 characters. It also validates that the string also has meets "all" the following conditions.
- atleast one digit, as is mentioned in the first "positive lookahead" block
- atleast one small case alphabet as mentioned in the second "positive lookahead" block
Similarly, let's add the third "positive lookahead" block
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z]).{6,}$This matches a string that has atleast 6 characters. It also validates that the string also has meets "all" the following conditions.
- atleast one digit, as is mentioned in the first "positive lookahead" block
- atleast one small case alphabet as mentioned in the second "positive lookahead" block
- atleast one upper case alphabet as mentioned in the third "positive lookahead" block
Now, let's add the fourth "positive lookahead" block
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=]).{6,}$This matches a string that has atleast 6 characters. It also validates that the string also has meets "all" the following conditions.
- atleast one digit, as is mentioned in the first "positive lookahead" block
- atleast one small case alphabet as mentioned in the second "positive lookahead" block
- atleast one upper case alphabet as mentioned in the third "positive lookahead" block
- atleast one special character as mentioned in the fourth "positive lookahead" block
Finally, let's add the fifth "positive lookahead" block
^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[@#$%^&+=])(?=\\S+$).{6,}$This matches a string that has atleast 6 characters. It also validates that the string also has meets "all" the following conditions.
- atleast one digit, as is mentioned in the first "positive lookahead" block
- atleast one small case alphabet as mentioned in the second "positive lookahead" block
- atleast one upper case alphabet as mentioned in the third "positive lookahead" block
- atleast one special character as mentioned in the fourth "positive lookahead" block
- should not have a space character as mentioned in the fifth "positive lookahead" block