Scan multiple files and look for recurrent series in windows?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
zipomojiv
Posts: 1
Joined: 22 Mar 2018 08:36

Scan multiple files and look for recurrent series in windows?

#1 Post by zipomojiv » 22 Mar 2018 08:41

I have an undefined amount of files (each file is only 1 long string) and i need to find the most common series of 8 or more recurrent values among those files.

When we have theese files we don't know what string we are looking for, we only know that we need to generate a list of the most recurrent characters series among those files.

For example we have 3 files that contain these strings here:

Code: Select all

sdgfghybuyvadfhulookingforthissdfac gafdg342!312d00000000

Code: Select all

123 000000002lookingforthis353245

Code: Select all

3453453//£$lookingoforthisf4 435@tew

The string

Code: Select all

lookingforthis
is 1st recurrent repetition and

Code: Select all

00000000
is the 2nd one.

The research output would be (CSV formatted):

Code: Select all

lookingforthis, 00000000, [etc]
The main difficulty here is that we don't know what we are looking for. If we were looking for a specific string it would be easy.

The second difficult part is the "already on the system" concept. This mean it would work better if using a preinstalled cli scripting language like batch ot powershell.

I've looked around for days but found no solution to what I may implement to get my list.

Even only ideas on how to do it are welcome.

penpen
Expert
Posts: 2009
Joined: 23 Jun 2013 06:15
Location: Germany

Re: Scan multiple files and look for recurrent series in windows?

#2 Post by penpen » 22 Mar 2018 11:57

Please don't take it the wrong way, but that really sounds like you have to do a homework (for school/computer science).
If this is the case, then you probably have heard the solution in lecture:
Longest common substring algorithm.

If i am right with my assumption (==homework), then it won't help you if we post the plain solution:
You wouldnt learn something - and when asked in exam you might fail (on that point).


penpen

Squashman
Expert
Posts: 4486
Joined: 23 Dec 2011 13:59

Re: Scan multiple files and look for recurrent series in windows?

#3 Post by Squashman » 22 Mar 2018 12:33

I definitely would not attempt this with batch or powershell. There are scripting and programming languages specifically designed for data analytics. Rscript, SAS or python would be a better option.

Post Reply