Page 1 of 2
JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 21 Jul 2020 14:52
by vin97
What would be the fastest way to achieve this?
I need to extract the absolute directory names from the dir /s output, shortened to the n-th recursion.
My plan was to make a loop where JREPL cuts off the line after the last "\" until the required recursion depth is reached.
I was hoping there is a faster way to achieve this.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 21 Jul 2020 16:31
by aGerman
It's a matter of Regex rather than Batch.
Maybe this pattern will do the trick:
https://regex101.com/r/cylGiN/1
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 22 Jul 2020 07:03
by vin97
I'll admit I have trouble with more advanced RegEx like that.
I guess a simple findstr doesn't work because of the lack of count support.
For JREPL /JMATCHQ, I cannot find the correct syntax.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 22 Jul 2020 07:54
by aGerman
I think the pattern should work out of the box along with JREPL. Did you try out already?
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 25 Jul 2020 09:42
by vin97
Code: Select all
jrepl.bat "\b[A-Za-z]:(\\[^\\\/:*?\x22<>|\r\n]+){3}$" "$txt=$0" /jmatchq
This code works in that it gives out only the directory path for each line, if it matches the given number of backslashes.
So
"blabla bla bla C:\Users\Guy\Desktop" turns into
"C:\Users\Guy\Desktop".
But I also need a way to do this with all the directories, so that the ones with higher recursion depth are given out by JREPL as well, but in a shortened form.
So
"blabla bla bla C:\Users\Guy\Desktop\Unnecessarily\Deep" becomes
"C:\Users\Guy\Desktop".
Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 25 Jul 2020 11:26
by aGerman
Well, the site I referenced has a pretty good explanation of the pattern in the right pane. What about removing the $ from the pattern?
Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?
findstr might be faster but it has only poor regex support and it will always match the entire line where the pattern is found. In your case I suspect the dir command being the bottle neck since the recursion through all the folders and files is likely slower than the pattern matching.
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 25 Jul 2020 12:00
by vin97
Thanks, removing the $ worked.
aGerman wrote: ↑25 Jul 2020 11:26
findstr will always match the entire line where the pattern is found.
Ah right, forgot about that. What about doing it with a for loop or other external tools like grep for windows? Would that bring significant performance improvements over JREPL?
aGerman wrote: ↑25 Jul 2020 11:26
In your case I suspect the dir command being the bottle neck
This is sadly not the case. My program scans for files updated after a certain date to help me with doing incremental backups. The algorithm is very slow at the moment, that's why I am going over everything and trying to optimize it.
...I'm afraid I also need some help for another RegEx. I've been trying different stuff but I can't get it to work.
I want to extract the directory size (in bytes) from the dir /s output.
I wasn't able to figure out the proper RegEx syntax for this:
Get last match for: [Any character except 0-9][1 occurrence of 0-9][0 or more occurrences of 0-9 or ,][Any character except 0-9 and ,]
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 27 Jul 2020 11:24
by vin97
Scrap the last part, I figured out a way to do it by cutting down the line a bit more beforehand. Doesn't look so nice but it requires no tempfiles and should work for any language setting and up to 999TB.
Example code:
Code: Select all
for /f "tokens=*" %%a in ('echo "%DirRawSizeOutputLine:~-25%"^| call jrepl.bat "([0-9])([0-9,])*" "$txt=$0" /jmatchq') do set RawSize=%%a
set "CleanSize=%RawSize:~-19,-16%%RawSize:~-15,-12%%RawSize:~-11,-8%%RawSize:~-7,-4%%RawSize:~-3%"
Even simpler, making use of a VBS file for stripping empty spaces and the changed output format when using the /-c switch for dir:
Code: Select all
echo WScript.Echo Eval(WScript.Arguments(0))> calc.vbs
for /f %%n in ('cscript //nologo calc.vbs "%RawLine:~-21,-6%"') do set "CleanSize=%%n"
Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 28 Jul 2020 07:22
by Aacini
vin97 wrote: ↑27 Jul 2020 11:24
. . . . .
Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.
Perhaps if you post some examples of the input lines and the desired output for them, we could try to write a solution...
Antonio
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 28 Jul 2020 15:55
by vin97
I know there are still a lot of bad things in my algorithm but I want to try improving it first myself.
I was purely wondering about the speed of JREPL in general and how optimized it is.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 28 Jul 2020 21:23
by Aacini
Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:
Code: Select all
C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
... then this code:
Code: Select all
for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
... show this output:
Code: Select all
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.
Antonio
PS -
findstr does
NOT "extract contents"; it can only find
complete lines with a given pattern. I don't understand why you want to use JREPL...
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 29 Jul 2020 06:04
by vin97
This recursion depth check is only one part of the program.
The more important one is finding out which directories were modified after a certain date for doing incremental backups.
How would you modify your code for variable recursion depth?
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 29 Jul 2020 11:36
by Aacini
This is the fastest way for variable "recursion depth":
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem Assemble output masks for variable directory nesting level
set "accum=%%a"
set "letter=bcdefghijklmnopqrstuvwxyz"
for /L %%i in (1,1,25) do (
set "tok[%%i]=%%!letter:~0,1!"
set "accum=!accum!\%%!letter:~0,1!"
set "toks[%%i]=!accum!"
set "letter=!letter:~1!"
)
:loop
echo/
set /P "level=Enter nesting level (1-26): "
if "%level%" equ "0" goto :EOF
set "token=!tok[%level%]!"
set "tokens=!toks[%level%]!"
for /F "tokens=1-26 delims=\" %%a in (file.txt) do (
if "%token%" neq "" echo %tokens%
)
goto loop
File.txt:
Code: Select all
C:\Two\one
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four\three\two\one
C:\Six\five\four\three\two\one
C:\Four\three\two\one
Output example:
Code: Select all
Enter nesting level (1-26): 4
C:\Five\four\three\two
C:\Eight\seven\six\five
C:\Six\five\four\three
C:\Four\three\two\one
Enter nesting level (1-26): 2
C:\Two\one
C:\Five\four
C:\Eight\seven
C:\Six\five
C:\Four\three
Enter nesting level (1-26): 7
C:\Eight\seven\six\five\four\three\two
Enter nesting level (1-26): 5
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four
C:\Six\five\four\three\two
Enter nesting level (1-26): 0
Antonio
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 30 Jul 2020 04:49
by vin97
Thanks for the example but file.txt would look a little different.
Since the textfile is produced by the DIR command, each line has some irrelevant string preceding the directory path that needs to be removed.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Posted: 31 Jul 2020 02:08
by Aacini
The irrelevant string preceding the directory path is cancelled if you use the
/B switch in
DIR command, as I did in my first answer. Didn't you saw it?
Aacini wrote: ↑28 Jul 2020 21:23
Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:
Code: Select all
C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
... then this code:
Code: Select all
for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
... show this output:
Code: Select all
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.
Antonio
Previously, I asked you to
"post some examples of the input lines and the desired output for them", but you didn't do it, so I can't fathom out what is the core problem you have in this thread...
IMHO, this is a very simple problem and I don't understand why you used JREPL in first place.
Antonio
PS - I suggest you to review
this thread and also
this post.