Magialisk wrote:I wish I had my own crazy findstr examples with me here at home. I'll have to grab them from my work PC and see if you guys can make any sense of it...
One of them was that there were two characters that seemed to be related, lets say the colon and the semicolon for example. If I told findtr to match a pattern that included the semicolon it would *also* match any string that included a colon, even when there was no semi-colon in the string and I wasn't asking it to match colons! Those weren't the actual characters that exhibited this I don't think, it might have been periods and commas, or some other pair entirely. I'll get the code and post an update...
The second oddity was that one character, this one I think was the colon?, could not be matched by findstr at all. If I put it in the regular expression set to be matched, it wouldn't just fail to match strings with colons (or whatever character this was) it would crash the parser with some kind of syntax error. No matter what I did I couldn't get it to accept that character in the set.
Someone just reminded me that I wanted to revisit this thread and provide sample code from my work PC that exhibited weird Findstr errors.
The errors aren't exactly as I described them above from memory, but close enough, and I whipped up some sample code to demonstrate them:
Code: Select all
@echo off
SETLOCAL DISABLEDELAYEDEXPANSION
:LOOP
set /p testStr="Input: "
:: Protects against double quotes in the input, by changing them to a different disallowed character
set "testStr=%testStr:"=Z%"
:: Protects against <, <<, > and >> by changing them to a different disallowed character
set "testStr=%testStr:>>=Z%"
set "testStr=%testStr:<<=Z%"
set "testStr=%testStr:>=Z%"
set "testStr=%testStr:<=Z%"
:: Three variations on FindStr RegEx (will be important later)
set /p ans="Colon test in front: " <nul
echo "%testStr%" | findstr /R "[\:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)
set /p ans="Colon test in middle: " <nul
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_\:;=\+\-[{\]}\\|',./?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)
set /p ans="Colon test at end: " <nul
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?\:]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)
:: colon and semicolon get caught in this findstr even though they aren't in the set...
set /p ans="Semi&Colon Weirdness: " <nul
echo "%testStr%" | findstr /R "[~`!@#$%^&*()-_=+[{\]}\\|',<.>/?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)
GOTO:LOOP
What you see above is a simple loop that takes user input and runs it against 4 different Findstr RegEx's. The top three are supposed to match any non-number character as "bad input". I was using this code to accept IP addresses from users in one script, except with the '.' removed from the RegEx in that case. The only difference between the three is the position of the escaped colon character in the RegEx. In the first one it's the first character, the second one has it in the middle, where I felt was the "logical" place for it in relation to other characters, and the third has it as the last character. This may not seem important, but it will be later...
The 4th RegEx is completely different, and demonstrates a completely different issue where semi-colons and colons get matched even though neither character exists in the RegEx set. I was using this RegEx in a script about 18 months ago to match special characters, and I guarantee it has other things wrong with it (just see the lack of escaping for example compared to my "newer" expressions 1-3), but it works well enough to show this specific issue.
So here's the tests to perform if you want to see these issues:
1.) Run the code as is (delayed expansion disabled) and try various strings:
- Any string with ; or : will say "BAD INPUT" on the 4th RegEx, even though it wasn't supposed to match those.
Code: Select all
Input: 123;
Colon test in front: BAD INPUT
Colon test in middle: BAD INPUT
Colon test at end: BAD INPUT
Semi&Colon Weirdness: BAD INPUT
Input: +
Colon test in front: BAD INPUT
Colon test in middle: BAD INPUT
Colon test at end: BAD INPUT
Semi&Colon Weirdness: BAD INPUT
- The first 3 RegEx's work perfectly (as far as I can tell)
2.) Modify any or all of the first three RegEx to remove the escape '\' from the + and - characters
- Notice that even though the 4th RegEx works fine matching un-escaped + (but not -), the first three now fail to match either + or -:
Code: Select all
Input: +
Colon test in front: GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end: GOOD INPUT
Semi&Colon Weirdness: BAD INPUT
Input: -
Colon test in front: GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end: GOOD INPUT
Semi&Colon Weirdness: GOOD INPUT
Input: 123+++321---+-+-+-
Colon test in front: GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end: GOOD INPUT
Semi&Colon Weirdness: BAD INPUT
Input:
3.) OK now put the escapes back in for + and - so the code is back to it's original state. Before continuing, change to ENABLEDELAYEDEXPANSION at the top of the script. (we're about to go off the deep end
)
-Testing with all the special characters shows them separate into 4 groups which I call BBB, BBG, BGG and GGG. The G's and B's represent the "
Good input" (failure to detect) or "
Bad Input" (expected operation) displayed by each of the three RegEx's. So for example any character in the "BGG" group is still correctly caught by the first RegEx with delayed expansion on, but no longer caught by the 2nd or 3rd RegEx (failure to operate as expected). I'm ignoring the 4th Regex here since it served its purpose demonstrating the other issues, however you will notice that it does catch some of the things that the other Regex's miss. Just more confusion to add to the mix... Here's the groupings:
Code: Select all
BBB ~ `
BBG + - = | \ ' ; ? [ ] { } , . /
BGG @ # $ % & * ( ) _ :
GGG ^ !
Obviously the ! character doesn't get caught because of the delayed expansion mode and I found out that doubling the carets in the RegEx's (ie: ...@#$%%
^^&*...) moves the caret into the BGG group. Of note, the 4th RegEx found the caret just fine without the doubling, which is annoying. Either way, what this leaves me with is the nonsensical situation where having my colon at the front of my RegEx allows it to match just about all special characters, but having it in the middle causes several characters to be missed and at the end causes it to miss everything but ~ and `... Seriously?!
Hopefully that's enough examples of complete craziness with Findstr to demonstrate my point that I think it's just buggy. I admit that some of you experts will probably find sensible reasons for why a few of these things happen under these specific circumstances. User error on my part is even likely for some of these, maybe I could have escaped something better... But the larger point is that Findstr does not operate in a logical and
consistent fashion. Depending on the order of the items in the RegEx, some characters break, but only in certain circumstances. It's a very manual trial-and-error process to fully test (and regression test) your Findstr RegEx's to make sure they're actually doing what you expect, even after seemingly minor changes.
Thanks for having a look, hope you guys have fun with the explanations