Page 1 of 1

Unicode Batch to inject LineNumbers

Posted: 20 Jun 2021 22:38
by ThumpieBunnyEve
⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆
Second time trying to post this, I've additions to make to the prior post that didn't show up,
and I've been waiting about 30 min or more for it to appear in the forum.
So here it is again with my edits.

so far I've formulated only this much of an idea from reading other forum posts:

Code: Select all

for /f "tokens=1,* delims=]" %v in ('find /n /v "" ^< "%1" ^| findstr "^@㏑▪" ') do ???
not that i grasp the details of that, but it appears to be on the right track.
⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆⋆

im looking for a separate batch file that will remove, and inject,
line numbers in the command line parameter supplied path
where ever it finds the following string of unicode.
@㏑▪
and until it reaches this unicode character.

(Note the above character is not ascii colon, but a unicode version.)

so for example, the batch shortcut that commits these changes is
"d:\Batch\callMes\injectLineNumbers.bat" "c:\path\folder\intoMe.bat"

injectLineNumbers.bat

will search through
c:\path\folder\intoMe.bat

and anywhere it finds
@㏑▪
it will delete ZERO or more characters until it encounters a
; ←again this is a unicode character not a colon.
then inject the line number of the line it is currently editing.

perhaps resulting in @㏑▪215; or the like.

I -have- searched for such a batch function all over google, and the closest i came to required yet another library file, JREPL8.6.zip which i am not looking to include in my builds just to inject line numbers into my batch echos. The FIND method is a bit squirrely to me and beyond my grasp at this hour. im guessing some sort of for /f "tokens=


my rig in win7 and i favor concise code over cross-version compatibility.
but in this case, id prefer 7 and up compatibility.

anyone want to lend some brain storming to this concept? I don't see a lot of it going around on the net.

Re: Unicode Batch to inject LineNumbers

Posted: 21 Jun 2021 14:58
by aGerman
tl;dr Definitely use JREPL if you have to deal with characters that you call "unicode".

Long answer:
Everything is Unicode. You're talking about characters which are not ASCII. However, Batch works reliably with ASCII only. The first issue is that your text editor most likely saves the Batch script in another text encoding than the cmd.exe expect to read. The second issue is that the text file you want to update might be encoded in completely different charset. The ▪ that you see in a text editor is just the graphical rendition of the bytes that the editor read. Keeping with this example: ▪ is the "Black Small Square". The Unicode code point is U+25AA. If your text file is UTF-16 LE encoded, the bytes which represent this character are AA 25. In contrast, the bytes which represent it in UTF-8 are E2 96 AA. The charset that the command tools expect to read defaults to a single-byte charset. (Likely CP437 or CP850 which do not even support this character.)
To make some progress, first try to determine the encoding of the file, then use JREPL with the hexadecimal escape sequences of the bytes that represent the non-ASCII charcters.

Steffen

(FWIW First posts of a new member show up after moderation. Just to let you know ...)

Re: Unicode Batch to inject LineNumbers

Posted: 21 Jun 2021 21:07
by ThumpieBunnyEve
i mean theoretically can just change the search-for to something like
@Line# <search for 0 or more of any number> <until the first Space character>

then to use it anywhere,
if %debug% NEQ 0 echo script passing @Line# successfully.
or
if %debug% NEQ 0 echo script passing @Line#267 successfully.

so its ascii.
and anywhere that string is found in the searched file,
delete any numbers found, insert the current line number.

but im not sure how to structure the find option to do so.

this way cuts higher ascii and greater then 2 hex-digit fuss out of the mess completely.
but still leaves me with the dilema of how to structure the search.
i understand regular expression searches. not batch find searches.

Re: Unicode Batch to inject LineNumbers

Posted: 22 Jun 2021 03:24
by aGerman
But still the question is how your file is encoded. I mean, even if you use ASCII only for the search pattern, you don't get the read characters printed in Batch. If the ANSI encoding used by the involved command tools doesn't even support those characters you're out of luck. FIND.exe uses ANSI, FINDSTR.exe uses ANSI, the FOR /F loop parses a piped stream from the console host. Again, you should use a tool like JREPL which directly writes back into a file without the console host and its character encoding being somewhere in between. I just try to prevent you from wasting time with stuff which is doomed to fail.

Steffen

Re: Unicode Batch to inject LineNumbers

Posted: 22 Jun 2021 08:47
by ThumpieBunnyEve
Preacher Steffen wrote ""I just try to prevent you from wasting time with stuff which is doomed to fail."" ~ahh-men

Yeah i get it you all love JREPL here. But that's not what my OP asked about. the OP question asked for how to do it without JREPL

This thread wasn't started as a request to be converted to your flavor of dixie-cup juice. We can all board that spaceship to heaven on another thread.

lets assume im writing this in freshly opened notepads, saved as the default file type. no save-as unicode UTF-8 or any of that.
I understand that you swear by JREPL and it will deliver us from all evil and should be installed on every machine, and its cute you have faith in it.
But you've not solved my question with it, just told me i should convert to its holy superiority. yay! ⋆\o/⋆

My OP still stands. just now with no characters above 7F hex, 127 Dec, saved as Ansci. like any other default notepad document.
How is the search structured, without the holey tablets of JREPL and its commandments upon fellow man/wo'man.

Re: Unicode Batch to inject LineNumbers

Posted: 22 Jun 2021 10:51
by aGerman
The OP asked about searching ▪ (U+25AA) and ; (U+FF1B) while now you're telling everything is plain ASCII. OK. It's easy calling me a preacher. My answer is YCLAHTWBYCMID. Volunteers in this forum just try to help. They don't suggest taking a tool for a certain task only because they are used to. They are doing it to provide a robust solution :wink:
@Line# <search for 0 or more of any number> <until the first Space character>

Code: Select all

findstr "[0-9][0-9]* " "whatever.txt"
Note that FINDSTR will always match the whole line which contains the pattern.
No idea what @Line# actually means. Is this the literal begin of a line or just kind of a pseudo search pattern? Keep in mind that we don't have a crystal ball to look at your display.

Steffen

Z.pmk

Posted: 22 Jun 2021 13:22
by ThumpieBunnyEve
@Line# was just a nice pusedo pattern to search for that i wouldn't commonly write in comments/rem/coding. Its a nice symbology for At Line Number in asci. As opposed to that unicode string i dreamed up.

Using unicode for parsing is something i do regularly with non-batch scripting. And forgot is not native to simple batch. Not that batch is simple. Just not written with it in a decade.

When im home ill work with that find string and see what i can do to output a line number at the colomn and after the row it finds thoes @Line# at.

Ty for sticken with it despite difference in direction.

Re: Unicode Batch to inject LineNumbers

Posted: 22 Jun 2021 13:44
by aGerman
Still trying to get it. @Line# is for a number which appears at the beginning of the line in your text file. If so, the FINDSTR command can be concretized.

Code: Select all

findstr /brc:"[0-9][0-9]* " "whatever.txt"
... and because of knowing it's the first token, you can also split the line in a FOR /F loop

Code: Select all

for /f "tokens=1*" %%i in ('findstr /brc:"[0-9][0-9]* " "whatever.txt"') do (
  echo *line number* %%i
  echo *remaining text* %%j
)
Steffen

Re: Unicode Batch to inject LineNumbers

Posted: 22 Jun 2021 16:55
by ThumpieBunnyEve
no not at the start of each line. definitely not.
just, meh heres an example.

yoda.bat

Code: Select all

@echo off
REM this is a remark but i still want to see a line number after this gunky string @Line# . This is the current script line.
REM @Line# this is the next script line.
echo this here @Line# is a script line.
if debug neq 0 echo this line @Line#  has a debug based if statement. ELSE echo the script is at @Line# with no debug.
REM numbers should be inserted after the # and before the space in @Line# in this script line.
and once done should look like this

yoda.bat after processed

Code: Select all

@echo off
REM this is a remark but i still want to see a line number after this gunky string @Line#2 . This is the current script line.
REM @Line#3 this is the next script line.
echo this here @Line#4 is a script line.
if debug neq 0 echo this line @Line#5  has a debug based if statement. ELSE echo the script is at @Line#5 with no debug.
REM numbers should be inserted after the # and before the space in @Line#6 in this script line.

BUT it needs to result in the above, if the lines already have line numbers before processing.


yoda.bat before processed, should also result in the Above --^

Code: Select all

@echo off
REM this is a remark but i still want to see a line number after this gunky string @Line#16 . This is the current script line.
REM @Line#38 this is the next script line.
echo this here @Line#78554 is a script line.
if debug neq 0 echo this line @Line#521321  has a debug based if statement. ELSE echo the script is at @Line#766666 with no debug.
REM numbers should be inserted after the # and before the space in @Line#9131236 in this script line.
This mess here, needs to be detected, numbers removed and inserted so it results in the code in the middle with peopper line numbers.

the idea here is to create a smart injector, that can do its job more then 1 time to the same script. so as i code and elements move in their lines, they are re-established with proper line numbers even when the current line numbers are out of place.

if this were regular expressions, something like ^(.⋆)(@Line#)[ 0-9]⋆(.⋆)$
would catch every line where @Line# exists, and have all the beginning of the line data in /1
the searched string in /2
it would erase any space or numbers
and then add the rest of the string with /3 &nbsp;
or the like. as in regular expressions /# is a returned var based on what was in the ( )
also regular expressions has no way to tell you the line count, so that idea is shot down too.


but in batch I've never done such things.

Re: Unicode Batch to inject LineNumbers

Posted: 23 Jun 2021 02:44
by aGerman
That's likely the most terrible crap I ever wrote. Should work for your examples but may fail if there are special characters like !, ), & ... in the text. Also no regex applied. Whatever follows the search pattern will be replaced.

Code: Select all

@echo off &setlocal

set "src=yoda.bat"
set "dst=yoda2.bat"
set "pattern=@Line#"
set "patternlength=6"

setlocal EnableDelayedExpansion
<"!src!" >"!dst!" (
  for /f %%i in ('type "!src!"^|find /c /v ""') do for /l %%n in (1 1 %%i) do (
    set "line=" &set /p "line="
    if not defined line (
      echo(
    ) else if "!line!"=="!line:%pattern%=!" (
      echo(!line!
    ) else (
      call :sub %%n
    )
  )
)
exit /b

:sub
set "newline="
:loop
for /f "tokens=1*" %%i in ("!line!") do (
  set "token=%%i"
  set "line=%%j"
  if "!token:~,%patternlength%!"=="!pattern!" (
    set "newline=!newline!!pattern!%1 "
  ) else (
    set "newline=!newline!!token! "
  )
)
if defined line goto loop
echo(!newline:~,-1!
exit /b
Steffen

Re: Unicode Batch to inject LineNumbers

Posted: 23 Jun 2021 06:28
by ThumpieBunnyEve
so far so perfect!

I changed the first few lines to:

Code: Select all

@echo off &setlocal
REM note to self,  set "src=%~d1%~p1%~n1%~x1"   is the same as set "src=%~f1"
set "src=%~f1"
echo source file is  %src%
set "dst=%~dpn1+LN%~x1"
echo will save to  %dst%
so i can use it as a shortcut with parameters. and get out fully quantified paths outside of its own directory. For use with drag n drop and such.
the current batch file i was working with, it cleaned out any pre-existent line numbers after the search pattern. and injected new accurate ones. regardless as to their use in echos, or REM's. :)
behavior exactly as desired! it didn't appear to snag on any ( ) ! & but I've not used many of those as of yet in my echos or comments.

So far, this will come in handy for any and all future and even some past batch scripts i will/have make/made!

it takes a source now such as OrigionalFileName.bat or SrcFile.txt
it saves to a destination file that looks like this now
OrigionalFileName+LN.bat
or
SrcFile+LN.txt
whatever the extension may be.

ill be picking through what you provided to fill the gaps in my understanding. some of that script-fu you have there i haven't seen before. I've never indicated vars with !var! before for example.

ty Steffen

EDIT: at line 321 i do have some ( and ) and it seems to have done okay with those at least!

Code: Select all

rem @Line#321 FOR %%i IN (%VAR%) DO IF EXIST %%~si\NUL ECHO It's a directory
so i went ahead and tested your warnings:

Code: Select all

@echo off
REM @Line#  @Line#
REM @Line#13434  @Line#13434
REM @Line# & @Line#
REM @Line#13434 & @Line#13434
REM @Line# ! @Line#
REM @Line#13434 ! @Line#13434
REM @Line# ) @Line#
REM @Line#13434 ) @Line#13434
successfully became the following.

Code: Select all

@echo off
REM @Line#2 @Line#2
REM @Line#3 @Line#3
REM @Line#4 & @Line#4
REM @Line#5 & @Line#5
REM @Line#6 @Line#6
REM @Line#7 @Line#7
REM @Line#8 ) @Line#8
REM @Line#9 ) @Line#9
so the characters ! & ) seem safe to play with, as does multiple patterns on the same line! well done!

Re: Unicode Batch to inject LineNumbers

Posted: 23 Jun 2021 11:19
by aGerman
Good to hear. However, exclamation points are removed from the output. It'll certainly get worse if a word is surrounded with exclamation points. And a sequence of consecutive spaces will be shortened to one. Not sure what additional side effects may occur. Alas, you insisted on pure Batch so you have to grin and bear it ¯\_(ツ)_/¯

Steffen