Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Moderator: DosItHelp
-
- Posts: 4
- Joined: 02 Mar 2022 07:45
Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Hey everyone! I'm very new to batch scripting - I have some python and VBA experience, but batching is new to me
I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.
For example:
Termination List.txt - The list of employee names. With each name on a new line:
Ex:
John Doe
Jane Smith
etc
Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:
App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe
I've been trying to look at various websites and forums to try and understand it, but couldn't quite piece them together to make this.
Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn!
I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.
For example:
Termination List.txt - The list of employee names. With each name on a new line:
Ex:
John Doe
Jane Smith
etc
Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:
App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe
I've been trying to look at various websites and forums to try and understand it, but couldn't quite piece them together to make this.
Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn!
-
- Posts: 4
- Joined: 02 Mar 2022 07:45
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
[UPDATE] - Added a batch script I was trying to use below.
I'm very new to batch scripting - I have some python and VBA experience, but batching is new to me
I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.
For example:
Termination List.txt - The list of employee names. With each name on a new line:
Ex:
John Doe
Jane Smith
etc
Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:
App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe
I've been trying to look at various websites and forums, as well as some questions here in Stack Overflow to try and understand it, but couldn't quite piece them together to make this. :/
Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn!
Below is the code I was trying to use:
But it appears to just list the different files in that source directory without matching and stating which employee names it found. I know for a fact that not all the files have names that match. I'm not sure exactly what I'm doing wrong.
Below is what appears for each file in that source directory:
===== file Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv ===== FINDSTR /N /g:"C:\Batch Script Version\searchlist.txt" "C:\Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv"
I'm very new to batch scripting - I have some python and VBA experience, but batching is new to me
I'm trying to use a list of employee names from a termination list and search the files within a folder that contains the application lists of all the company's apps. This way if the employee appears on a user list, we know which app we must remove them from.
For example:
Termination List.txt - The list of employee names. With each name on a new line:
Ex:
John Doe
Jane Smith
etc
Folder containing all the Application user lists: "C:\Desktop\Application Userlists"
The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
The output file I'm aiming for:
TerminationScanner.xlsx or TerminationScanner.txt, etc - The file would show which employees appear on which lists:
App1.xlsx: John Doe, Jane Smith
App2.csv: Jane Smith
App3.pdf: Jon Doe
I've been trying to look at various websites and forums, as well as some questions here in Stack Overflow to try and understand it, but couldn't quite piece them together to make this. :/
Any help would be greatly appreciated! And if you can help me, if you could also explain how it would work I'd appreciate it even more as I do want to learn!
Below is the code I was trying to use:
Code: Select all
@ECHO OFF
@ECHO OFF
SETLOCAL
SET "sourcedir=\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End"
rem Source directory of application files
SET "destdir=C:\Users\tpiccirillo\Desktop\Termination Scanner Output files"
rem Destination directory where report will go
for /f "delims=" %%a in ('dir /s /b "%sourcedir%\*.*"') do (
echo(===== file %%~nxa ===== FINDSTR /N /g:"C:\Users\tpiccirillo\Documents\Senior IT Auditor\2022 Goals\Termination Scanner Process\Batch Script Version\searchlist.txt" "%%~fa") >> "%destdir%\results.csv"
rem pnxa = path name and extension
GOTO :EOF
Below is what appears for each file in that source directory:
===== file Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv ===== FINDSTR /N /g:"C:\Batch Script Version\searchlist.txt" "C:\Evidence\Application Userlists\Application Userlists - Year End\ActiveEmpGroupMembership-Report -01-31-2022.csv"
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Your approach is doomed to failure.
Steffen
CSV is the only file format containing plain text. PDF may or may not work. XLSX is a compressed file format where you won't find anything using the capabilities of Batch scripting.The user lists are in various formats: App1.xlsx, App2.csv, App3.pdf
Steffen
-
- Posts: 4
- Joined: 02 Mar 2022 07:45
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Hey Steffen! I can have them changed to csv if need be. And I believe xlsx can be unzipped according to Microsoft: “Xlsx files are just ZIP files, so you can simply unzip them right away using your favourite ZIP tool.“ https://answers.microsoft.com/en-us/mso ... 1a3f358040
Any ideas why my script’s output doesn’t seem to be working?
Any clarification would be great!
Any ideas why my script’s output doesn’t seem to be working?
Any clarification would be great!
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Try something about like that:
Steffen
Code: Select all
@echo off &setlocal
set "sourcedir=\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End"
set "destdir=%userprofile%\Desktop\Termination Scanner Output files"
set "listfile=%userprofile%\Documents\Senior IT Auditor\2022 Goals\Termination Scanner Process\Batch Script Version\searchlist.txt"
:: make sure they are all absolute paths
for %%i in ("%sourcedir%\.") do set "sourcedir=%%~fi"
for %%i in ("%destdir%\.") do set "destdir=%%~fi"
for %%i in ("%listfile%") do set "listfile=%%~fi"
:: get the list separator for CSV data from your registry
for /f "tokens=3" %%i in ('reg query "HKCU\Control Panel\International" /v "sList"') do set "listSeparator=%%i"
:: update the current directory (this makes FINDSTR return relative paths to ensure they don't contain a colon)
pushd "%sourcedir%"
REM redirect the stdout to the CSV file
>"%destdir%\results.csv" (
REM read name by name in the list (NOTE: lines in the list must not contain trailing spaces!)
for /f "usebackq tokens=*" %%i in ("%listfile%") do (
REM recursively (/s) search the whole name incl. space (/c) in all files ("*.*"),
REM use regex (/r) where "\<" and "\>" are for "word boundary" to ensure we
REM don't match something like "John Doeskin" while looking for "John Doe"
for /f "delims=" %%j in ('findstr /src:"\<%%i\>" "*.*"') do (
REM we get something like "[file name]:[whole line containing the name]" and thus, we have to cut off the colon and the rest of the output
for /f "delims=:" %%k in ("%%~j") do echo "%sourcedir%\%%k"%listSeparator%"%%i"
)
)
)
popd
pause
Fair enough. Feel free to implement it.DigitalXanatos wrote: ↑05 Mar 2022 14:42And I believe xlsx can be unzipped according to Microsoft: “Xlsx files are just ZIP files, so you can simply unzip them right away using your favourite ZIP tool.“
Steffen
Last edited by aGerman on 06 Mar 2022 06:30, edited 1 time in total.
-
- Expert
- Posts: 1166
- Joined: 06 Sep 2013 21:28
- Location: Virginia, United States
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
XLSX is specifically a zip of XML files, so if you're going to try and parse them in a language that isn't vbscript, you're going to have a really bad time. I cannot discourage this strongly enough.
-
- Posts: 4
- Joined: 02 Mar 2022 07:45
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Thanks, ShadowThief, for explaining the issue with xlsx - Maybe I can use a macro to change all the xlsx files to CSV before having the script run.
Thanks, aGerman - I tried the script, but for some reason it only listed one file path at the top of the results file and then just listed all the employee names underneath.
results.csv:
\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End\App1.xls John Doe
Jane Doh
Bruce Wayne
Peter Parker
In the searchlist, I have each name on one line and they are surrounded in quotes:
Searchlist.txt:
"John Doe"
"Jane Doh"
"Bruce Wayne"
"Peter Parker"
I tried running the script again without the quotes or with a comma at the end of each name, but then I get the error message that the search string is too long. According to the batch script, how should the names be listed?
Also, any ideas why the output file isn't getting the intended results?
Thanks, aGerman - I tried the script, but for some reason it only listed one file path at the top of the results file and then just listed all the employee names underneath.
results.csv:
\\FS01.winmain.local\main\InternalAudit\3. SOX\SOX 2021\2021 ITGC SOX\2. Final Fieldwork\Year-End Testing\YE.ITGC.04 - New Hires\Evidence\Application Userlists\Application Userlists - Year End\App1.xls John Doe
Jane Doh
Bruce Wayne
Peter Parker
In the searchlist, I have each name on one line and they are surrounded in quotes:
Searchlist.txt:
"John Doe"
"Jane Doh"
"Bruce Wayne"
"Peter Parker"
I tried running the script again without the quotes or with a comma at the end of each name, but then I get the error message that the search string is too long. According to the batch script, how should the names be listed?
Also, any ideas why the output file isn't getting the intended results?
Re: Finding Multiple Strings in Multiple Files and outputting a report showing which files contained which strings
Try to restrict the file search mask to plain text files (e.g. *.csv or *.txt rather than *.*) just to see if it works. IIRC once I've got this error message for files with binary content where everything is treated as one long line that doesn't contain byte 0x0A.
Steffen
Steffen