JREPL: Cutting of a string after the n-th occurrence of specific character?

Message

vin97 · #1 Post by **vin97** » 21 Jul 2020 14:52

What would be the fastest way to achieve this?

I need to extract the absolute directory names from the dir /s output, shortened to the n-th recursion.
My plan was to make a loop where JREPL cuts off the line after the last "\" until the required recursion depth is reached.
I was hoping there is a faster way to achieve this.

#2 Post by **aGerman** » 21 Jul 2020 16:31

It's a matter of Regex rather than Batch.
Maybe this pattern will do the trick:
https://regex101.com/r/cylGiN/1

Steffen

vin97 · #3 Post by **vin97** » 22 Jul 2020 07:03

I'll admit I have trouble with more advanced RegEx like that.
I guess a simple findstr doesn't work because of the lack of count support.
For JREPL /JMATCHQ, I cannot find the correct syntax.

#4 Post by **aGerman** » 22 Jul 2020 07:54

I think the pattern should work out of the box along with JREPL. Did you try out already?

Steffen

vin97 · #5 Post by **vin97** » 25 Jul 2020 09:42

Code: Select all

jrepl.bat "\b[A-Za-z]:(\\[^\\\/:*?\x22<>|\r\n]+){3}$" "$txt=$0" /jmatchq

This code works in that it gives out only the directory path for each line, if it matches the given number of backslashes.
So "blabla bla bla C:\Users\Guy\Desktop" turns into "C:\Users\Guy\Desktop".
But I also need a way to do this with all the directories, so that the ones with higher recursion depth are given out by JREPL as well, but in a shortened form.
So "blabla bla bla C:\Users\Guy\Desktop\Unnecessarily\Deep" becomes "C:\Users\Guy\Desktop".

Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?

#6 Post by **aGerman** » 25 Jul 2020 11:26

Well, the site I referenced has a pretty good explanation of the pattern in the right pane. What about removing the $ from the pattern?

Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?

findstr might be faster but it has only poor regex support and it will always match the entire line where the pattern is found. In your case I suspect the dir command being the bottle neck since the recursion through all the folders and files is likely slower than the pattern matching.

Steffen

vin97 · #7 Post by **vin97** » 25 Jul 2020 12:00

Thanks, removing the $ worked.

aGerman wrote: ↑
25 Jul 2020 11:26
findstr will always match the entire line where the pattern is found.

Ah right, forgot about that. What about doing it with a for loop or other external tools like grep for windows? Would that bring significant performance improvements over JREPL?

aGerman wrote: ↑
25 Jul 2020 11:26
In your case I suspect the dir command being the bottle neck

This is sadly not the case. My program scans for files updated after a certain date to help me with doing incremental backups. The algorithm is very slow at the moment, that's why I am going over everything and trying to optimize it.

...I'm afraid I also need some help for another RegEx. I've been trying different stuff but I can't get it to work.
I want to extract the directory size (in bytes) from the dir /s output.
I wasn't able to figure out the proper RegEx syntax for this:
Get last match for: [Any character except 0-9][1 occurrence of 0-9][0 or more occurrences of 0-9 or ,][Any character except 0-9 and ,]

vin97 · #8 Post by **vin97** » 27 Jul 2020 11:24

Scrap the last part, I figured out a way to do it by cutting down the line a bit more beforehand. Doesn't look so nice but it requires no tempfiles and should work for any language setting and up to 999TB.
Example code:

Code: Select all

for /f "tokens=*" %%a in ('echo "%DirRawSizeOutputLine:~-25%"^| call jrepl.bat "([0-9])([0-9,])*" "$txt=$0" /jmatchq') do set RawSize=%%a
set "CleanSize=%RawSize:~-19,-16%%RawSize:~-15,-12%%RawSize:~-11,-8%%RawSize:~-7,-4%%RawSize:~-3%"

Even simpler, making use of a VBS file for stripping empty spaces and the changed output format when using the /-c switch for dir:

Code: Select all

echo WScript.Echo Eval(WScript.Arguments(0))> calc.vbs
for /f %%n in ('cscript //nologo calc.vbs "%RawLine:~-21,-6%"') do set "CleanSize=%%n"

Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.

#9 Post by **Aacini** » 28 Jul 2020 07:22

vin97 wrote: ↑
27 Jul 2020 11:24

. . . . .

Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.

Perhaps if you post some examples of the input lines and the desired output for them, we could try to write a solution...

Antonio

vin97 · #10 Post by **vin97** » 28 Jul 2020 15:55

I know there are still a lot of bad things in my algorithm but I want to try improving it first myself.
I was purely wondering about the speed of JREPL in general and how optimized it is.

#11 Post by **Aacini** » 28 Jul 2020 21:23

Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:

Code: Select all

C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep

... then this code:

Code: Select all

for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
   if "%%d" neq "" echo %%a\%%b\%%c\%%d
)

... show this output:

Code: Select all

C:\Users\Guy\Desktop
C:\Users\Guy\Desktop

That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.

Antonio

PS - findstr does NOT "extract contents"; it can only find complete lines with a given pattern. I don't understand why you want to use JREPL...

vin97 · #12 Post by **vin97** » 29 Jul 2020 06:04

This recursion depth check is only one part of the program.
The more important one is finding out which directories were modified after a certain date for doing incremental backups.

How would you modify your code for variable recursion depth?

#13 Post by **Aacini** » 29 Jul 2020 11:36

This is the fastest way for variable "recursion depth":

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Assemble output masks for variable directory nesting level
set "accum=%%a"
set "letter=bcdefghijklmnopqrstuvwxyz"
for /L %%i in (1,1,25) do (
   set "tok[%%i]=%%!letter:~0,1!"
   set "accum=!accum!\%%!letter:~0,1!"
   set "toks[%%i]=!accum!"
   set "letter=!letter:~1!"
)

:loop

echo/
set /P "level=Enter nesting level (1-26): "
if "%level%" equ "0" goto :EOF

set "token=!tok[%level%]!"
set "tokens=!toks[%level%]!"

for /F "tokens=1-26 delims=\" %%a in (file.txt) do (
   if "%token%" neq "" echo %tokens%
)

goto loop

File.txt:

Code: Select all

C:\Two\one
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four\three\two\one
C:\Six\five\four\three\two\one
C:\Four\three\two\one

Output example:

Code: Select all

Enter nesting level (1-26): 4
C:\Five\four\three\two
C:\Eight\seven\six\five
C:\Six\five\four\three
C:\Four\three\two\one

Enter nesting level (1-26): 2
C:\Two\one
C:\Five\four
C:\Eight\seven
C:\Six\five
C:\Four\three

Enter nesting level (1-26): 7
C:\Eight\seven\six\five\four\three\two

Enter nesting level (1-26): 5
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four
C:\Six\five\four\three\two

Enter nesting level (1-26): 0

Antonio

vin97 · #14 Post by **vin97** » 30 Jul 2020 04:49

Thanks for the example but file.txt would look a little different.

Since the textfile is produced by the DIR command, each line has some irrelevant string preceding the directory path that needs to be removed.

#15 Post by **Aacini** » 31 Jul 2020 02:08

The irrelevant string preceding the directory path is cancelled if you use the /B switch in DIR command, as I did in my first answer. Didn't you saw it?

Aacini wrote: ↑
28 Jul 2020 21:23
Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:
Code: Select all
C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
... then this code:
Code: Select all
for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
   if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
... show this output:
Code: Select all
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.

Antonio

Previously, I asked you to "post some examples of the input lines and the desired output for them", but you didn't do it, so I can't fathom out what is the core problem you have in this thread...

IMHO, this is a very simple problem and I don't understand why you used JREPL in first place.

Antonio

PS - I suggest you to review this thread and also this post.

DosTips.com

JREPL: Cutting of a string after the n-th occurrence of specific character?

JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?

Re: JREPL: Cutting of a string after the n-th occurrence of specific character?