JREPL: Cutting of a string after the n-th occurrence of specific character?
Moderator: DosItHelp
JREPL: Cutting of a string after the n-th occurrence of specific character?
What would be the fastest way to achieve this?
I need to extract the absolute directory names from the dir /s output, shortened to the n-th recursion.
My plan was to make a loop where JREPL cuts off the line after the last "\" until the required recursion depth is reached.
I was hoping there is a faster way to achieve this.
I need to extract the absolute directory names from the dir /s output, shortened to the n-th recursion.
My plan was to make a loop where JREPL cuts off the line after the last "\" until the required recursion depth is reached.
I was hoping there is a faster way to achieve this.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
It's a matter of Regex rather than Batch.
Maybe this pattern will do the trick:
https://regex101.com/r/cylGiN/1
Steffen
Maybe this pattern will do the trick:
https://regex101.com/r/cylGiN/1
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
I'll admit I have trouble with more advanced RegEx like that.
I guess a simple findstr doesn't work because of the lack of count support.
For JREPL /JMATCHQ, I cannot find the correct syntax.
I guess a simple findstr doesn't work because of the lack of count support.
For JREPL /JMATCHQ, I cannot find the correct syntax.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
I think the pattern should work out of the box along with JREPL. Did you try out already?
Steffen
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Code: Select all
jrepl.bat "\b[A-Za-z]:(\\[^\\\/:*?\x22<>|\r\n]+){3}$" "$txt=$0" /jmatchq
So "blabla bla bla C:\Users\Guy\Desktop" turns into "C:\Users\Guy\Desktop".
But I also need a way to do this with all the directories, so that the ones with higher recursion depth are given out by JREPL as well, but in a shortened form.
So "blabla bla bla C:\Users\Guy\Desktop\Unnecessarily\Deep" becomes "C:\Users\Guy\Desktop".
Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Well, the site I referenced has a pretty good explanation of the pattern in the right pane. What about removing the $ from the pattern?
Steffen
findstr might be faster but it has only poor regex support and it will always match the entire line where the pattern is found. In your case I suspect the dir command being the bottle neck since the recursion through all the folders and files is likely slower than the pattern matching.Also, for simpler stuff like extracting the contents of brackets, what is faster: JREPL or findstr?
Steffen
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Thanks, removing the $ worked.
...I'm afraid I also need some help for another RegEx. I've been trying different stuff but I can't get it to work.
I want to extract the directory size (in bytes) from the dir /s output.
I wasn't able to figure out the proper RegEx syntax for this:
Get last match for: [Any character except 0-9][1 occurrence of 0-9][0 or more occurrences of 0-9 or ,][Any character except 0-9 and ,]
Ah right, forgot about that. What about doing it with a for loop or other external tools like grep for windows? Would that bring significant performance improvements over JREPL?
This is sadly not the case. My program scans for files updated after a certain date to help me with doing incremental backups. The algorithm is very slow at the moment, that's why I am going over everything and trying to optimize it.
...I'm afraid I also need some help for another RegEx. I've been trying different stuff but I can't get it to work.
I want to extract the directory size (in bytes) from the dir /s output.
I wasn't able to figure out the proper RegEx syntax for this:
Get last match for: [Any character except 0-9][1 occurrence of 0-9][0 or more occurrences of 0-9 or ,][Any character except 0-9 and ,]
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Scrap the last part, I figured out a way to do it by cutting down the line a bit more beforehand. Doesn't look so nice but it requires no tempfiles and should work for any language setting and up to 999TB.
Example code:
Even simpler, making use of a VBS file for stripping empty spaces and the changed output format when using the /-c switch for dir:
Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.
Example code:
Code: Select all
for /f "tokens=*" %%a in ('echo "%DirRawSizeOutputLine:~-25%"^| call jrepl.bat "([0-9])([0-9,])*" "$txt=$0" /jmatchq') do set RawSize=%%a
set "CleanSize=%RawSize:~-19,-16%%RawSize:~-15,-12%%RawSize:~-11,-8%%RawSize:~-7,-4%%RawSize:~-3%"
Code: Select all
echo WScript.Echo Eval(WScript.Arguments(0))> calc.vbs
for /f %%n in ('cscript //nologo calc.vbs "%RawLine:~-21,-6%"') do set "CleanSize=%%n"
Still intersted in whether or not things could be sped up by using something other than JREPL for these tasks.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
I know there are still a lot of bad things in my algorithm but I want to try improving it first myself.
I was purely wondering about the speed of JREPL in general and how optimized it is.
I was purely wondering about the speed of JREPL in general and how optimized it is.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:
... then this code:
... show this output:
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.
Antonio
PS - findstr does NOT "extract contents"; it can only find complete lines with a given pattern. I don't understand why you want to use JREPL...
Code: Select all
C:\Users\Guy
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop\Unnecessarily\Deep
Code: Select all
for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do (
if "%%d" neq "" echo %%a\%%b\%%c\%%d
)
Code: Select all
C:\Users\Guy\Desktop
C:\Users\Guy\Desktop
Antonio
PS - findstr does NOT "extract contents"; it can only find complete lines with a given pattern. I don't understand why you want to use JREPL...
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
This recursion depth check is only one part of the program.
The more important one is finding out which directories were modified after a certain date for doing incremental backups.
How would you modify your code for variable recursion depth?
The more important one is finding out which directories were modified after a certain date for doing incremental backups.
How would you modify your code for variable recursion depth?
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
This is the fastest way for variable "recursion depth":
File.txt:
Output example:
Antonio
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem Assemble output masks for variable directory nesting level
set "accum=%%a"
set "letter=bcdefghijklmnopqrstuvwxyz"
for /L %%i in (1,1,25) do (
set "tok[%%i]=%%!letter:~0,1!"
set "accum=!accum!\%%!letter:~0,1!"
set "toks[%%i]=!accum!"
set "letter=!letter:~1!"
)
:loop
echo/
set /P "level=Enter nesting level (1-26): "
if "%level%" equ "0" goto :EOF
set "token=!tok[%level%]!"
set "tokens=!toks[%level%]!"
for /F "tokens=1-26 delims=\" %%a in (file.txt) do (
if "%token%" neq "" echo %tokens%
)
goto loop
Code: Select all
C:\Two\one
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four\three\two\one
C:\Six\five\four\three\two\one
C:\Four\three\two\one
Code: Select all
Enter nesting level (1-26): 4
C:\Five\four\three\two
C:\Eight\seven\six\five
C:\Six\five\four\three
C:\Four\three\two\one
Enter nesting level (1-26): 2
C:\Two\one
C:\Five\four
C:\Eight\seven
C:\Six\five
C:\Four\three
Enter nesting level (1-26): 7
C:\Eight\seven\six\five\four\three\two
Enter nesting level (1-26): 5
C:\Five\four\three\two\one
C:\Eight\seven\six\five\four
C:\Six\five\four\three\two
Enter nesting level (1-26): 0
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
Thanks for the example but file.txt would look a little different.
Since the textfile is produced by the DIR command, each line has some irrelevant string preceding the directory path that needs to be removed.
Since the textfile is produced by the DIR command, each line has some irrelevant string preceding the directory path that needs to be removed.
Re: JREPL: Cutting of a string after the n-th occurrence of specific character?
The irrelevant string preceding the directory path is cancelled if you use the /B switch in DIR command, as I did in my first answer. Didn't you saw it?
IMHO, this is a very simple problem and I don't understand why you used JREPL in first place.
Antonio
PS - I suggest you to review this thread and also this post.
Previously, I asked you to "post some examples of the input lines and the desired output for them", but you didn't do it, so I can't fathom out what is the core problem you have in this thread...Aacini wrote: ↑28 Jul 2020 21:23Mmmm... I am afraid I don't understand what exactly the problem is... However, if you have these directories in the disk:
... then this code:Code: Select all
C:\Users\Guy C:\Users\Guy\Desktop C:\Users\Guy\Desktop\Unnecessarily\Deep
... show this output:Code: Select all
for /F "tokens=1-4 delims=\" %%a in ('dir /S /B') do ( if "%%d" neq "" echo %%a\%%b\%%c\%%d )
That is, it cancels the line #1 because it does not have three backslahes, then show the line #2, then cut the line #3 to the third nesting level of directories.Code: Select all
C:\Users\Guy\Desktop C:\Users\Guy\Desktop
Antonio
IMHO, this is a very simple problem and I don't understand why you used JREPL in first place.
Antonio
PS - I suggest you to review this thread and also this post.