Page 1 of 1

Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 02 Mar 2022 07:56
by DigitalXanatos
Hey everyone! I'm very new to batch scripting - I have some python and VBA experience, but batching is new to me :)

I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.

For example:

Termination List.txt - The list of employee names. With each name on a new line:

Ex:
John Doe
Jane Smith
etc

Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf

The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:

App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe

I've been trying to look at various websites and forums to try and understand it, but couldn't quite piece them together to make this. :?

Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn! :)

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 04 Mar 2022 13:07
by DigitalXanatos
[UPDATE] - Added a batch script I was trying to use below.

I'm very new to batch scripting - I have some python and VBA experience, but batching is new to me :)

I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.

For example:

Termination List.txt - The list of employee names. With each name on a new line:

Ex:
John Doe
Jane Smith
etc

Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf

The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:

App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe

I've been trying to look at various websites and forums, as well as some questions here in Stack Overflow to try and understand it, but couldn't quite piece them together to make this. :/

Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn! :)

Below is the code I was trying to use:

Code: Select all

@ECHO OFF
@ECHO OFF
SETLOCAL
SET "sourcedir=\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End"
rem Source directory of application files
SET "destdir=C:\Users\tpiccirillo\Desktop\Termination Scanner Output files"
rem Destination directory where report will go
for /f "delims=" %%a in ('dir /s /b "%sourcedir%\*.*"') do (
  echo(===== file %%~nxa ===== FINDSTR /N /g:"C:\Users\tpiccirillo\Documents\Senior IT Auditor\2022 Goals\Termination Scanner Process\Batch Script Version\searchlist.txt" "%%~fa") >> "%destdir%\results.csv"
  rem pnxa = path name and extension 
GOTO :EOF
But it appears to just list the different files in that source directory without matching and stating which employee names it found. I know for a fact that not all the files have names that match. I'm not sure exactly what I'm doing wrong.

Below is what appears for each file in that source directory:

===== file Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv ===== FINDSTR /N /g:"C:\Batch Script Version\searchlist.txt" "C:\Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv"

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 05 Mar 2022 09:30
by aGerman
Your approach is doomed to failure.
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
CSV is the only file format containing plain text. PDF may or may not work. XLSX is a compressed file format where you won't find anything using the capabilities of Batch scripting.

Steffen

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 05 Mar 2022 14:42
by DigitalXanatos
Hey Steffen! I can have them changed to csv if need be. And I believe xlsx can be unzipped according to Microsoft: “Xlsx files are just ZIP files, so you can simply unzip them right away using your favourite ZIP tool.“ https://answers.microsoft.com/en-us/mso ... 1a3f358040

Any ideas why my script’s output doesn’t seem to be working?

Any clarification would be great!

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 05 Mar 2022 17:27
by aGerman
Try something about like that:

Code: Select all

@echo off &setlocal

set "sourcedir=\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End"
set "destdir=%userprofile%\Desktop\Termination Scanner Output files"
set "listfile=%userprofile%\Documents\Senior IT Auditor\2022 Goals\Termination Scanner Process\Batch Script Version\searchlist.txt"

:: make sure they are all absolute paths
for %%i in ("%sourcedir%\.") do set "sourcedir=%%~fi"
for %%i in ("%destdir%\.") do set "destdir=%%~fi"
for %%i in ("%listfile%") do set "listfile=%%~fi"

:: get the list separator for CSV data from your registry
for /f "tokens=3" %%i in ('reg query "HKCU\Control Panel\International" /v "sList"') do set "listSeparator=%%i"

:: update the current directory (this makes FINDSTR return relative paths to ensure they don't contain a colon)
pushd "%sourcedir%"

REM redirect the stdout to the CSV file
>"%destdir%\results.csv" (
  REM read name by name in the list (NOTE: lines in the list must not contain trailing spaces!)
  for /f "usebackq tokens=*" %%i in ("%listfile%") do (
    REM recursively (/s) search the whole name incl. space (/c) in all files ("*.*"),
    REM use regex (/r) where "\<" and "\>" are for "word boundary" to ensure we
    REM  don't match something like "John Doeskin" while looking for "John Doe"
    for /f "delims=" %%j in ('findstr /src:"\<%%i\>" "*.*"') do (
      REM we get something like "[file name]:[whole line containing the name]" and thus, we have to cut off the colon and the rest of the output
      for /f "delims=:" %%k in ("%%~j") do echo "%sourcedir%\%%k"%listSeparator%"%%i"
    )
  )
)

popd

pause
DigitalXanatos wrote:
05 Mar 2022 14:42
And I believe xlsx can be unzipped according to Microsoft: “Xlsx files are just ZIP files, so you can simply unzip them right away using your favourite ZIP tool.“
Fair enough. Feel free to implement it.

Steffen

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 07 Mar 2022 09:33
by ShadowThief
XLSX is specifically a zip of XML files, so if you're going to try and parse them in a language that isn't vbscript, you're going to have a really bad time. I cannot discourage this strongly enough.

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 08 Mar 2022 06:13
by DigitalXanatos
Thanks, ShadowThief, for explaining the issue with xlsx - Maybe I can use a macro to change all the xlsx files to CSV before having the script run.

Thanks, aGerman - I tried the script, but for some reason it only listed one file path at the top of the results file and then just listed all the employee names underneath.

results.csv:
\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End\App1.xls John Doe
Jane Doh
Bruce Wayne
Peter Parker


In the searchlist, I have each name on one line and they are surrounded in quotes:

Searchlist.txt:
"John Doe"
"Jane Doh"
"Bruce Wayne"
"Peter Parker"

I tried running the script again without the quotes or with a comma at the end of each name, but then I get the error message that the search string is too long. According to the batch script, how should the names be listed?

Also, any ideas why the output file isn't getting the intended results?

Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings

Posted: 08 Mar 2022 09:19
by aGerman
Try to restrict the file search mask to plain text files (e.g. *.csv or *.txt rather than *.*) just to see if it works. IIRC once I've got this error message for files with binary content where everything is treated as one long line that doesn't contain byte 0x0A.

Steffen