JREPL: Cutting of a string after the n-th occurrence of specific character?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
vin97
Posts: 35
Joined: 17 Apr 2020 08:30

JREPL: Cutting of a string after the n-th occurrence of specific character?

#1 Post by vin97 » 21 Jul 2020 14:52

What would be the fastest way to achieve this?

I need to extract the absolute directory names from the dir /s output, shortened to the n-th recursion.
My plan was to make a loop where JREPL cuts off the line after the last "\" until the required recursion depth is reached.
I was hoping there is a faster way to achieve this.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#2 Post by aGerman » 21 Jul 2020 16:31

It's a matter of Regex rather than Batch.
Maybe this pattern will do the trick:
https://regex101.com/r/cylGiN/1

Steffen

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#3 Post by vin97 » 22 Jul 2020 07:03

I'll admit I have trouble with more advanced RegEx like that.
I guess a simple findstr doesn't work because of the lack of count support.
For JREPL /JMATCHQ, I cannot find the correct syntax.

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#4 Post by aGerman » 22 Jul 2020 07:54

I think the pattern should work out of the box along with JREPL. Did you try out already?

Steffen

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#5 Post by vin97 » 25 Jul 2020 09:42

Code: Select all

jrepl.bat "\b[A-Za-z]:(\\[^\\\/:*?\x22<>|\r\n]+){3}$" "$txt=$0" /jmatchq
This code works in that it gives out only the directory path for each line, if it matches the given number of backslashes.
So "blabla bla bla C:\Users\Guy\Desktop" turns into "C:\Users\Guy\Desktop".
But I also need a way to do this with all the directories, so that the ones with higher recursion depth are given out by JREPL as well, but in a shortened form.
So "blabla bla bla C:\Users\Guy\Desktop\Unnecessarily\Deep" becomes "C:\Users\Guy\Desktop".

Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?

aGerman
Expert
Posts: 4678
Joined: 22 Jan 2010 18:01
Location: Germany

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#6 Post by aGerman » 25 Jul 2020 11:26

Well, the site I referenced has a pretty good explanation of the pattern in the right pane. What about removing the $ from the pattern?
Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?
findstr might be faster but it has only poor regex support and it will always match the entire line where the pattern is found. In your case I suspect the dir command being the bottle neck since the recursion through all the folders and files is likely slower than the pattern matching.

Steffen

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#7 Post by vin97 » 25 Jul 2020 12:00

Thanks, removing the $ worked.
aGerman wrote:
25 Jul 2020 11:26
findstr will always match the entire line where the pattern is found.
Ah right, forgot about that. What about doing it with a for loop or other external tools like grep for windows? Would that bring significant performance improvements over JREPL?
aGerman wrote:
25 Jul 2020 11:26
In your case I suspect the dir command being the bottle neck
This is sadly not the case. My program scans for files updated after a certain date to help me with doing incremental backups. The algorithm is very slow at the moment, that's why I am going over everything and trying to optimize it.


...I'm afraid I also need some help for another RegEx. I've been trying different stuff but I can't get it to work.
I want to extract the directory size (in bytes) from the dir /s output.
I wasn't able to figure out the proper RegEx syntax for this:
Get last match for: [Any character except 0-9][1 occurrence of 0-9][0 or more occurrences of 0-9 or ,][Any character except 0-9 and ,]

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#8 Post by vin97 » 27 Jul 2020 11:24

Scrap the last part, I figured out a way to do it by cutting down the line a bit more beforehand. Doesn't look so nice but it requires no tempfiles and should work for any language setting and up to 999TB.
Example code:

Code: Select all

for /f "tokens=*" %%a in ('echo "%DirRawSizeOutputLine:~-25%"^| call jrepl.bat "([0-9])([0-9,])*" "$txt=$0" /jmatchq') do set RawSize=%%a
set "CleanSize=%RawSize:~-19,-16%%RawSize:~-15,-12%%RawSize:~-11,-8%%RawSize:~-7,-4%%RawSize:~-3%"
Even simpler, making use of a VBS file for stripping empty spaces and the changed output format when using the /-c switch for dir:

Code: Select all

echo WScript.Echo Eval(WScript.Arguments(0))> calc.vbs
for /f %%n in ('cscript //nologo calc.vbs "%RawLine:~-21,-6%"') do set "CleanSize=%%n"

Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#9 Post by Aacini » 28 Jul 2020 07:22

vin97 wrote:
27 Jul 2020 11:24

. . . . .

Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.
Perhaps if you post some examples of the input lines and the desired output for them, we could try to write a solution...

Antonio

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#10 Post by vin97 » 28 Jul 2020 15:55

I know there are still a lot of bad things in my algorithm but I want to try improving it first myself.
I was purely wondering about the speed of JREPL in general and how optimized it is.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#11 Post by Aacini » 28 Jul 2020 21:23

Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:

Code: Select all

C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
... then this code:

Code: Select all

for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
   if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
... show this output:

Code: Select all

C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.

Antonio

PS - findstr does NOT "extract contents"; it can only find complete lines with a given pattern. I don't understand why you want to use JREPL...

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#12 Post by vin97 » 29 Jul 2020 06:04

This recursion depth check is only one part of the program.
The more important one is finding out which directories were modified after a certain date for doing incremental backups.


How would you modify your code for variable recursion depth?

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#13 Post by Aacini » 29 Jul 2020 11:36

This is the fastest way for variable "recursion depth":

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Assemble output masks for variable directory nesting level
set "accum=%%a"
set "letter=bcdefghijklmnopqrstuvwxyz"
for /L %%i in (1,1,25) do (
   set "tok[%%i]=%%!letter:~0,1!"
   set "accum=!accum!\%%!letter:~0,1!"
   set "toks[%%i]=!accum!"
   set "letter=!letter:~1!"
)

:loop

echo/
set /P "level=Enter nesting level (1-26): "
if "%level%" equ "0" goto :EOF

set "token=!tok[%level%]!"
set "tokens=!toks[%level%]!"

for /F "tokens=1-26 delims=\" %%a in (file.txt) do (
   if "%token%" neq "" echo %tokens%
)

goto loop
File.txt:

Code: Select all

C:\Two\one
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four\three\two\one
C:\Six\five\four\three\two\one
C:\Four\three\two\one
Output example:

Code: Select all

Enter nesting level (1-26): 4
C:\Five\four\three\two
C:\Eight\seven\six\five
C:\Six\five\four\three
C:\Four\three\two\one

Enter nesting level (1-26): 2
C:\Two\one
C:\Five\four
C:\Eight\seven
C:\Six\five
C:\Four\three

Enter nesting level (1-26): 7
C:\Eight\seven\six\five\four\three\two

Enter nesting level (1-26): 5
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four
C:\Six\five\four\three\two

Enter nesting level (1-26): 0
Antonio

vin97
Posts: 35
Joined: 17 Apr 2020 08:30

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#14 Post by vin97 » 30 Jul 2020 04:49

Thanks for the example but file.txt would look a little different.

Since the textfile is produced by the DIR command, each line has some irrelevant string preceding the directory path that needs to be removed.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

#15 Post by Aacini » 31 Jul 2020 02:08

The irrelevant string preceding the directory path is cancelled if you use the /B switch in DIR command, as I did in my first answer. Didn't you saw it?

Aacini wrote:
28 Jul 2020 21:23
Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:

Code: Select all

C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
... then this code:

Code: Select all

for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
   if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
... show this output:

Code: Select all

C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.

Antonio
Previously, I asked you to "post some examples of the input lines and the desired output for them", but you didn't do it, so I can't fathom out what is the core problem you have in this thread...

IMHO, this is a very simple problem and I don't understand why you used JREPL in first place.

Antonio

PS - I suggest you to review this thread and also this post.

Post Reply