DosTips.com • FINDSTR exclusion pattern does not work. Backslashes prob

Page 1 of 1

FINDSTR exclusion pattern does not work. Backslashes prob

Posted: 03 May 2014 00:34

by pstein

I have a cmdline prgm whose output are (among other text) lines with filenames.

Some of the lines should be excluded from output. So I use the FINDSTR command for that (in a DOS batch script) like

mycmd.exe /someflags |FINDSTR /V /C:"D:\temp\" |FINDSTR "...."

Unfortunately the output still contains lines which contain the pattern

D:\temp\

why?

Do I have to mask the backslashes in the exclusion filter somehow?

Peter

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 06:21

by Squashman

pstein wrote:why?

Do I have to mask the backslashes in the exclusion filter somehow?

Yes. Read the help for the FINDSTR command and you will see why.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 07:21

by foxidrive

Squashman wrote:
pstein wrote:why?

Do I have to mask the backslashes in the exclusion filter somehow?

Yes. Read the help for the FINDSTR command and you will see why.

I don't see why - it seems like a bug in findstr and is probably mentioned on Dave's StackOverflow findstr page.

Code: Select all

echo D:\temp\|FINDSTR /I /L /V "D:\temp\"
D:\temp\

echo D:\temp\|FINDSTR /I /L /V "D:\temp"

It only fails when the trailing backslash is added, even when the /literal switch is used.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 08:23

by Squashman

I suppose you only need to escape witha regular expression. But since they are not using a regular expression why not use FIND instead.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 10:14

by Compo

foxidrive wrote:It only fails when the trailing backslash is added, even when the /literal switch is used.

…well it appears to be the double quotes that are problematic

Code: Select all

>echo(D:\temp\|findstr/iv d:\temp\

>echo(D:\temp\|findstr/ivl d:\temp\

>echo(D:\temp\|findstr/iv "d:\temp\"
D:\temp\

>echo(D:\temp\|findstr/ivl "d:\temp\"
D:\temp\

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 12:41

by foxidrive

It must be a parsing error and is escaping the double quote, with the \" at the end.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 16:03

by dbenham

FINDSTR backslash and quote escaping is whacked when doing a literal search

:roll:

Code: Select all

C:\test>echo d:\temp\|findstr /xlv "d:\temp\"
d:\temp\

C:\test>echo d:\temp\|findstr /xlv "d:\\temp\\"

C:\test>echo d:\temp\|findstr /xlv "d:\temp\\"

C:\test>echo d:\temp\|findstr /xlv d:\temp\

C:\test>echo d:\temp\|findstr /xlv d:\\temp\\

C:\test>

It's not the whole story, but the best I've got is at http://stackoverflow.com/a/8844873/1012053, under the headings Escaping Quote within command line search strings and Escaping Backslash within command line literal search strings.

I find it is generally safe to always escape quotes as \" and backslashes as \\

But there is a nasty special case when the search string includes quotes and backslashes with a particular pattern.

Code: Select all

C:\test>echo "d:\temp\"|findstr /xlv \"d:\\temp\\\"
"d:\temp\"

C:\test>echo "d:\temp\"|findstr /xlv \"d:\\temp\\\\\"

C:\test>

All the quotes and backslashes must be escaped, except the last backslash must be double escaped. :shock:

:shock:

:twisted:

I try to explain the pattern in my StackOverflow link

Dave Benham

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 16:19

by carlos

@dbenham. Why use /x and /v ?
It are not opposed?

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 16:33

by penpen

The escaping of the backslash and the doublequotes are no the only ones:
It happens to all meta characters (characters with special meaning for regular expressions),
so i think findstr is just not well documented.

The meta characters escape sequences are recognized in any mode (at least under my win xp home 32 bit),
although of not much use when using literal search strings:

Code: Select all

Z:\>echo("<>.a\*^$"|findstr /l "\"\^<\^>\.a\\\*\^^\$\""
"<>.a\*^$"

penpen

Edit: Added <>.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 17:25

by carlos

Squashman wrote:I suppose you only need to escape witha regular expression. But since they are not using a regular expression why not use FIND instead.

Yes, find works ok unlike findstr /L

Code: Select all

C:\Users\Carlos>echo "d:\temp\"|find "d:\temp\"
"d:\temp\"

C:\Users\Carlos>echo "d:\temp\"|findstr /L "d:\temp\"

findstr is really buggy

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 23:04

by dbenham

carlos wrote:@dbenham. Why use /x and /v ?
It are not opposed?

No, they work fine together. It returns all lines that don't match the search string exactly.

My examples probably are easier to follow without /V, but I kept it to be more in line with the original post.

penpen wrote:The escaping of the backslash and the doublequotes are no the only ones:
It happens to all meta characters (characters with special meaning for regular expressions),
so i think findstr is just not well documented.

The meta characters escape sequences are recognized in any mode (at least under my win xp home 32 bit),
although of not much use when using literal search strings:
Code: Select all
Z:\>echo("<>.a\*^$"|findstr /l "\"\^<\^>\.a\\\*\^^\$\""
"<>.a\*^$"

Yes, but the only characters that must be escaped when doing a literal search are backslash and quote.

Code: Select all

C:\test>echo("<>.a\*^$"|findstr /lx \"<>.a\\*^$\"
"<>.a\*^$"

C:\test>echo("<>.a\*^$"|findstr /lx "<>.a\*^$"

C:\test>

It is interesting that non-meta-characters cannot be escaped with literal searches, but can be escaped with regular expressions:

Code: Select all

C:\test>echo ABC |findstr /r \A\B\C
ABC

C:\test>echo ABC |findstr /l \A\B\C

C:\test>

Dave Benham

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 03 May 2014 23:35

by Magialisk

I wish I had my own crazy findstr example with me here at home. I'll have to grab it from my work PC and see if you guys can make any sense of it...

I had a findstr sequence that was intended to test a string for special characters. Of course it needed various escaping for some of them, but after a lot of trial and error I got it working *almost* perfectly. I can remember two specific problems that it had.

One of them was that there were two characters that seemed to be related, lets say the colon and the semicolon for example. If I told findtr to match a pattern that included the semicolon it would *also* match any string that included a colon, even when there was no semi-colon in the string and I wasn't asking it to match colons! Those weren't the actual characters that exhibited this I don't think, it might have been periods and commas, or some other pair entirely. I'll get the code and post an update...

The second oddity was that one character, this one I think was the colon?, could not be matched by findstr at all. If I put it in the regular expression set to be matched, it wouldn't just fail to match strings with colons (or whatever character this was) it would crash the parser with some kind of syntax error. No matter what I did I couldn't get it to accept that character in the set.

So I ended up with a Findstr expression that matched one character even when i didn't explicitly ask it to, and at the same time the expression couldn't match one other character no matter how nicely I tried to ask

I agree with the statement above that Findstr is just buggy. We might think we mostly understand it, but I think there's dragons inside

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 04 May 2014 02:09

by penpen

dbenham wrote:
penpen wrote:(...)
Code: Select all
Z:\>echo("<>.a\*^$"|findstr /l "\"\^<\^>\.a\\\*\^^\$\""
"<>.a\*^$"
Yes, but the only characters that must be escaped when doing a literal search are backslash and quote.
Code: Select all
C:\test>echo("<>.a\*^$"|findstr /lx \"<>.a\\*^$\"
"<>.a\*^$"

C:\test>echo("<>.a\*^$"|findstr /lx "<>.a\*^$"

C:\test>

I think the cause of the double \ above is, that the following character is a metacharacter, so findstr is searching for another matching string, as in the initial expression problem:

Code: Select all

Z:\>echo("<>.a*^$"|findstr /L ^"\"<>.a\*^$\"^"
"<>.a*^$"

Findstr is also able to find this:

Code: Select all

Z:\>echo D:\temp\ | findstr /L D:\temp\
D:\temp\

Findstr just closes unfinished (opened) strings (so foxidrive was right that findstr "is escaping the double quote, with the \" at the end", although i don't see this as a parsing error):

Code: Select all

Z:\>echo D:\temp^"|findstr /L "D:\temp\"
D:\temp"

Z:\>echo D:\temp^"|findstr /L "D:\temp\""
D:\temp"

dbenham wrote:It is interesting that non-meta-characters cannot be escaped with literal searches, but can be escaped with regular expressions:
Code: Select all
C:\test>echo ABC |findstr /r \A\B\C
ABC

C:\test>echo ABC |findstr /l \A\B\C

C:\test>

I assume this is because (for example) \A is a regular expression, whose computation is disabled when using literal searches, while "\\" (and similar) are escape sequences, that should be recognized always.

penpen

Edit: Corrected some flaws.

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 27 May 2014 14:04

by Magialisk

Magialisk wrote:I wish I had my own crazy findstr examples with me here at home. I'll have to grab them from my work PC and see if you guys can make any sense of it...

One of them was that there were two characters that seemed to be related, lets say the colon and the semicolon for example. If I told findtr to match a pattern that included the semicolon it would *also* match any string that included a colon, even when there was no semi-colon in the string and I wasn't asking it to match colons! Those weren't the actual characters that exhibited this I don't think, it might have been periods and commas, or some other pair entirely. I'll get the code and post an update...

The second oddity was that one character, this one I think was the colon?, could not be matched by findstr at all. If I put it in the regular expression set to be matched, it wouldn't just fail to match strings with colons (or whatever character this was) it would crash the parser with some kind of syntax error. No matter what I did I couldn't get it to accept that character in the set.

Someone just reminded me that I wanted to revisit this thread and provide sample code from my work PC that exhibited weird Findstr errors.

The errors aren't exactly as I described them above from memory, but close enough, and I whipped up some sample code to demonstrate them:

Code: Select all

@echo off
SETLOCAL DISABLEDELAYEDEXPANSION

:LOOP
set /p testStr="Input:  "

:: Protects against double quotes in the input, by changing them to a different disallowed character
set "testStr=%testStr:"=Z%"

:: Protects against <, <<, > and >> by changing them to a different disallowed character
set "testStr=%testStr:>>=Z%"
set "testStr=%testStr:<<=Z%"
set "testStr=%testStr:>=Z%"
set "testStr=%testStr:<=Z%"

:: Three variations on FindStr RegEx (will be important later)
set /p ans="Colon test in front:  " <nul
echo "%testStr%" | findstr /R "[\:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)

set /p ans="Colon test in middle: " <nul
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_\:;=\+\-[{\]}\\|',./?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)

set /p ans="Colon test at end:    " <nul
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?\:]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)

:: colon and semicolon get caught in this findstr even though they aren't in the set...
set /p ans="Semi&Colon Weirdness: " <nul
echo "%testStr%" | findstr /R "[~`!@#$%^&*()-_=+[{\]}\\|',<.>/?]" >nul
IF %ERRORLEVEL%==0 (echo BAD INPUT) ELSE (echo GOOD INPUT)

GOTO:LOOP

What you see above is a simple loop that takes user input and runs it against 4 different Findstr RegEx's. The top three are supposed to match any non-number character as "bad input". I was using this code to accept IP addresses from users in one script, except with the '.' removed from the RegEx in that case. The only difference between the three is the position of the escaped colon character in the RegEx. In the first one it's the first character, the second one has it in the middle, where I felt was the "logical" place for it in relation to other characters, and the third has it as the last character. This may not seem important, but it will be later...

The 4th RegEx is completely different, and demonstrates a completely different issue where semi-colons and colons get matched even though neither character exists in the RegEx set. I was using this RegEx in a script about 18 months ago to match special characters, and I guarantee it has other things wrong with it (just see the lack of escaping for example compared to my "newer" expressions 1-3), but it works well enough to show this specific issue.

So here's the tests to perform if you want to see these issues:
1.) Run the code as is (delayed expansion disabled) and try various strings:
- Any string with ; or : will say "BAD INPUT" on the 4th RegEx, even though it wasn't supposed to match those.

Code: Select all

Input:  123;
Colon test in front:  BAD INPUT
Colon test in middle: BAD INPUT
Colon test at end:    BAD INPUT
Semi&Colon Weirdness: BAD INPUT
Input:  +
Colon test in front:  BAD INPUT
Colon test in middle: BAD INPUT
Colon test at end:    BAD INPUT
Semi&Colon Weirdness: BAD INPUT

- The first 3 RegEx's work perfectly (as far as I can tell)

2.) Modify any or all of the first three RegEx to remove the escape '\' from the + and - characters
- Notice that even though the 4th RegEx works fine matching un-escaped + (but not -), the first three now fail to match either + or -:

Code: Select all

Input:  +
Colon test in front:  GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end:    GOOD INPUT
Semi&Colon Weirdness: BAD INPUT
Input:  -
Colon test in front:  GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end:    GOOD INPUT
Semi&Colon Weirdness: GOOD INPUT
Input:  123+++321---+-+-+-
Colon test in front:  GOOD INPUT
Colon test in middle: GOOD INPUT
Colon test at end:    GOOD INPUT
Semi&Colon Weirdness: BAD INPUT
Input:

3.) OK now put the escapes back in for + and - so the code is back to it's original state. Before continuing, change to ENABLEDELAYEDEXPANSION at the top of the script. (we're about to go off the deep end

)
-Testing with all the special characters shows them separate into 4 groups which I call BBB, BBG, BGG and GGG. The G's and B's represent the "Good input" (failure to detect) or "Bad Input" (expected operation) displayed by each of the three RegEx's. So for example any character in the "BGG" group is still correctly caught by the first RegEx with delayed expansion on, but no longer caught by the 2nd or 3rd RegEx (failure to operate as expected). I'm ignoring the 4th Regex here since it served its purpose demonstrating the other issues, however you will notice that it does catch some of the things that the other Regex's miss. Just more confusion to add to the mix... Here's the groupings:

Code: Select all

BBB ~ `
BBG + - = | \ ' ; ? [ ] { } , . /
BGG @ # $ % & * ( ) _ :
GGG ^ !

Obviously the ! character doesn't get caught because of the delayed expansion mode and I found out that doubling the carets in the RegEx's (ie: ...@#$%%^^&*...) moves the caret into the BGG group. Of note, the 4th RegEx found the caret just fine without the doubling, which is annoying. Either way, what this leaves me with is the nonsensical situation where having my colon at the front of my RegEx allows it to match just about all special characters, but having it in the middle causes several characters to be missed and at the end causes it to miss everything but ~ and `... Seriously?!

Hopefully that's enough examples of complete craziness with Findstr to demonstrate my point that I think it's just buggy. I admit that some of you experts will probably find sensible reasons for why a few of these things happen under these specific circumstances. User error on my part is even likely for some of these, maybe I could have escaped something better... But the larger point is that Findstr does not operate in a logical and consistent fashion. Depending on the order of the items in the RegEx, some characters break, but only in certain circumstances. It's a very manual trial-and-error process to fully test (and regression test) your Findstr RegEx's to make sure they're actually doing what you expect, even after seemingly minor changes.

Thanks for having a look, hope you guys have fun with the explanations

Re: FINDSTR exclusion pattern does not work. Backslashes pro

Posted: 27 May 2014 17:56

by penpen

1.)
The 4th regular expression does contain the ';'.
It is hiden in ")-_" ('-' is a metacharcter to define character ranges) which should match ")*,./:;?@[\]^_" (using codepages compatible to ASCII):

Code: Select all

echo ")" | findstr /R "[)-_]"
echo "*" | findstr /R "[)-_]"
echo "," | findstr /R "[)-_]"
:: ...
echo "_" | findstr /R "[)-_]"

The '-' character is a metacharacter which has to be escaped if you want to recognize it.
In addition the '%' has to be doubled in batch files, so this should work:

Code: Select all

echo "%testStr%" | findstr /R "[~`!@#$%%^&*()\-_=+[{\]}\\|',<.>/?]"

Sidenote: Character ranges are implemented a little bit strange (to me: i don't get how they are implemented);
the characters in the ASCII range ')' to '_' are ")*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_".
But it oftenly seems to use only characters from the same type, so here the signs, digits, comparison operators, and letters are left.
Sad to say that it isn't the rule for all ranges, so for example:
- "+--" will match nothing,
- ")-A" will match (among others): ")*+,.\0123456789:;<=>?@A[\]^_a{|}~" (in my eyes especially the a and A part is bad),
- "A-B" will match ABbâäàåÄÅæÆáªÁÂÀãÃ",
- ...
So the character ranges don't base on ASCII/UCS2/UTF16/any other character order i've ever seen... .

2.)
If you unescape the metacharacter '-' (unescaped '+' should be no problem, as it is no metacharacter),
then the regular expression part "+-[" will match nothing, so "+-[" will not be recognized anymore.
(codepages as above)

Code: Select all

(echo ";" & echo "-" & echo "_" ) | findstr /R "[+-[]"

3.)
This is no issue of findstring as the delayed expansion alters the command that is executed:
http://stackoverflow.com/questions/4094699/how-does-the-windows-command-interpreter-cmd-exe-parse-scripts

Code: Select all

:: example 1
echo "%testStr%" | findstr /R "[\:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?]" ^>nul

:: result after delayedExpansion (which is executed; without the leading :: )
:: echo "%testStr%" | findstr /R "[\:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`@#$%&*()_;=\+\-[{\]}\\|',./?]" >nul
:: missing characters: !^

:: delayed expansion version of example 1
echo "%testStr%" | findstr /R "[\:abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`^!@#$%%^^&*()_;=\+\-[{\]}\\|',./?]" >nul

:: #####################################

:: example 2
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_\:;=\+\-[{\]}\\|',./?]" ^>nul

:: result after delayedExpansion (which is executed; without the leading :: )
:: echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`;=\+\-[{\]}\\|',./?]" >nul
:: missing characters: !@#$%^&*()_\:

:: delayed expansion version of example 2
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`^!@#$%%^^&*()_\:;=\+\-[{\]}\\|',./?]" >nul

:: #####################################

:: example 3
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`!@#$%%^&*()_;=\+\-[{\]}\\|',./?\:]" ^>nul

:: result 3 after delayedExpansion (which is executed; without the leading :: )
:: echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`]" >nul
:: missing characters: !@#$%^&*()_;=\+\-[{\]}\\|',./?\:

:: delayed expansion version of example 3
echo "%testStr%" | findstr /R "[abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ~`^!@#$%%^^&*()_;=\+\-[{\]}\\|',./?\:]" >nul

penpen