Page 1 of 1

findstr for file extension at end of line

Posted: 11 Apr 2017 15:07
by aisha
Hello DOS Tips
We have a list of URLs in a txt file "c:\files\list.txt"

We are wanting to only keep URLs that end in any of the filetypes we care about.
For instance, only keep URLs that end in PDF, DOC, PPT, htm, or html and dump only these into "c:\files\list_scrubbed.txt"

What we have is:
findstr ".pdf" c:\files\list.txt >c:\files\list_scrubbed.txt
findstr ".doc" c:\files\list.txt >>c:\files\list_scrubbed.txt
findstr ".ppt" c:\files\list.txt >>c:\files\list_scrubbed.txt
findstr ".htm" c:\files\list.txt >>c:\files\list_scrubbed.txt
findstr ".html" c:\files\list.txt >>c:\files\list_scrubbed.txt

However, the outcome is that if the filetype is mentioned somewhere besides the end, then it still gets fed into the scrubbed list.
We are trying to avoid any URLs that do not actually END in the filetype.
So http://www.somedomain.com/FlashyBadgers ... hadows.asp
is still ending up in the scrubbed file.

Any advice?

Aisha

Re: findstr for file extension at end of line

Posted: 11 Apr 2017 15:40
by aGerman
FINDSTR uses "Regular Expressions" where the . is for "any character". You need to escape it using a backslash.

Code: Select all

findstr /rie "\.pdf \.doc \.ppt \.html*" c:\files\list.txt >c:\files\list_scrubbed.txt

Options:
r use regular expressions
i ignore case
e match the end of the string

Regular expressions:
\. literal period
* the character before the asterisk may or may not be content of the string

For further information execute FINDSTR /? or have a look at the command index.

Steffen

Re: findstr for file extension at end of line

Posted: 11 Apr 2017 16:21
by aisha
that is perfect - thank you sir

Aisha