Edit subtitle with batch?

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Message
Author
Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Edit subtitle with batch?

#1 Post by Gamer95 » 27 Sep 2015 11:42

Hello i have a question and i hope someone can help me.
Here below is a bit from an srt.

1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>

2
00:01:03,200 --> 00:01:05,521
<i>stories of how the world once was.</i>

3
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>

4
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>

Is it possible for a bat script to look for the word "world" and then completely remove the sentences and everything above?
And change the line numbers accordingly?
So it becomes like this:



1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>

2
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>

3
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>


Thanks!

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#2 Post by Gamer95 » 27 Sep 2015 14:37

Anyone?

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Edit subtitle with batch?

#3 Post by dbenham » 27 Sep 2015 18:36

You have not given an adequate specification as to how the source file might be formatted. For example, Is the text always enclosed within <i>...</i> :?:

Regardless, this is not a problem I would want to tackle using pure batch. It could be done, but, yuck.

This is much better suited to something like PowerShell, JScript, or VBS.

I like to use my hybrid JScript/batch hybrid utility called JREPL.BAT.

Assuming no section contains more than one </i>, and every section ends with </i>, then the following JREPL.BAT solution works just fine :D

Code: Select all

call jrepl "^(\d+)(\s*\n[\s\S]+?</i>\s*\n?\r?\n?)" "$2.match(/world/i)?'':(n+=1)+$2" /m /i /j /jbeg "var n=0" /f "test.txt" /o "output.txt"


If you really want a pure batch solution, then this works as long as all lines are <= 1021 bytes long, the total length of each section is < ~8191 bytes,. and each section is separated by one or more empty lines. It also use \n (newline) instead of \r\n (carriage return and newline) at the end of each line of output. The line terminator can be fixed with a bit of additional code if needed.

Code: Select all

@echo off
setlocal enableDelayedExpansion

set "inFile=test.txt"
set "outFile=output.txt"

set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

for /f %%N in ('find /c /v "" ^<"!inFile!"') do set "cnt=%%N"

set /a n=1
set "str="
<"!inFile!" >"!outFile!" (
  for /l %%N in (1 1 !cnt!) do (
    set "ln="
    set /p "ln="
    if defined ln (
      if not defined str (
        set "str=!n!!LF!"
      ) else (
        set "str=!str!!ln!!LF!"
      )
    ) else (
      if defined str if "!str:world=!" equ "!str!" (
        echo(!str!!LF!
        set /a n+=1
      )
      set "str="
    )
  )
  if defined str if "!str:world=!" equ "!str!" echo(!str!!LF!
)


Dave Benham

Edit addtions are in blue
Last edited by dbenham on 28 Sep 2015 05:04, edited 2 times in total.

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Edit subtitle with batch?

#4 Post by Aacini » 27 Sep 2015 21:10

Try this:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

for %%v in (num times wordFound i n) do set "%%v="
(for /F "delims=" %%a in (input.txt) do (
   if not defined num (
      set "num=%%a"
   ) else if not defined times (
      set "times=%%a"
   ) else (
      set /A i+=1
      set "line[!i!]=%%a"
      set "line=%%a"
      if "!line:world=!" neq "!line!" set wordFound=true
      if "!line:~-4!" equ "</i>" (
         if not defined wordFound (
            set /A n+=1
            echo !n!
            echo !times!
            for /L %%i in (1,1,!i!) do echo !line[%%i]!
            echo/
         )
         for %%v in (num times wordFound i) do set "%%v="
      )
   )
)) > output.txt

Output:

Code: Select all

1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>

2
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>

3
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>

This solution remove exclamation marks; this point may be fixed, if needed.

Antonio

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#5 Post by Gamer95 » 28 Sep 2015 02:40

Thank you for helping me Dave & Antonio!

I have tried both scripts and Dave's second script gave me this output:

When i open the output.txt file i got this:

00:01:01,520 --> 00:01:03,160<i>Before they died,my parents told me</i>
00:01:06,520 --> 00:01:08,807<i>What it was like long before I was born.</i>
00:01:09,720 --> 00:01:11,882<i>Before the war with the machines.</i>

But when i copy and paste that i get it correct!

1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>


2
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>


3
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>


Subtitle files are a lot of lines but i tried the same script on the original subtitle file witch is 1413 lines long,
And it still works but only when you copy and paste it.
Is there anyway to solve this?


And i've tried your script Antonio and that give's me this output:

35
00:05:33,840 --> 00:05:35,365
<i>But John is more.</i>

36
00:05:36,000 --> 00:05:37,764
<i>We're here because tonight,</i>

37
00:05:38,160 --> 00:05:40,447
<i>he's going to lead us to crush Skynet.</i>

38
00:05:41,080 --> 00:05:42,411
<i>For good.</i>

39
00:05:42,920 --> 00:05:45,287
Sir? Request to join
the Colorado offensive.
47
00:05:45,920 --> 00:05:47,410
I need you with me, Reese.
48
00:05:47,680 --> 00:05:50,040
We're talking about the
complete destruction of Skynet, sir.
49
00:05:50,160 --> 00:05:52,288
The Colorado unit will succeed.
50
00:05:52,360 --> 00:05:54,567
The machines will fall tonight.

All the lines with <i> are great but lines without that don't have spaces between them.
Is this fixable?

I must say it's amazing that you guys know all this and i'm very thankfull that your willing to help!
I've been trying to do this for quite some time now and you guys did more in a day than i did in 2 weeks!
Again thank you.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Edit subtitle with batch?

#6 Post by dbenham » 28 Sep 2015 05:29

You need to copy and paste because of the line feed vs. carriage return/line feed issue I talked about. Below is code that fixes that problem. I also fixed a minor bug that was introducing an extra line feed between each section.

Code: Select all

@echo off
setlocal enableDelayedExpansion

set "inFile=test.txt"
set "outFile=output.txt"

:: Define LF to contain a line feed character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: Defiine CR to contain a carriage return character
for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

:: Count the number of lines within the input file
for /f %%N in ('find /c /v "" ^<"!inFile!"') do set "cnt=%%N"

set /a n=1
set "str="
<"!inFile!" >"!outFile!" (
  for /l %%N in (1 1 !cnt!) do (
    set "ln="
    set /p "ln="
    if defined ln (
      if not defined str (
        set "str=!n!!CR!!LF!"
      ) else (
        set "str=!str!!ln!!CR!!LF!"
      )
    ) else (
      if defined str if "!str:world=!" equ "!str!" (
        echo(!str!
        set /a n+=1
      )
      set "str="
    )
  )
  if defined str if "!str:world=!" equ "!str!" echo(!str!
)


If you think you may have many text editing tasks, then I strongly recommend you learn regular expressions and try out JREPL.BAT. There are a great many pitfalls with trying to edit text with pure batch, and a robust batch solution can be unacceptably slow with large files. JREPL is much simpler (once you learn regular expressions, and perhaps a bit of JScript), and it is much faster and more robust.

Here is an improved JREPL solution that does not rely on <i>...</i>, and instead assumes one or more empty lines are used to delimit sections.

Code: Select all

@call jrepl "^(\d+)([\s\S]+?(\n(?:\r?\n)+|(?![\s\S])))" "$2.match(/world/i)?'':(n+=1)+$2" /m /i /j /jbeg "var n=0" /f "test.txt" /o "output.txt"


Dave Benham

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#7 Post by Gamer95 » 28 Sep 2015 06:43

Amazing Dave, what a great script!
Just what i was looking for, but is there also i way i can add more than just 1 word to find?
Like world, hello, beyond etc?

And how can i replace a character?
Like:

This is a line
- This is line 2


Replace - with - so it becomes like:

This is a line
-This is line 2

Again thank you for you're help!

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Edit subtitle with batch?

#8 Post by Aacini » 28 Sep 2015 11:18

Wow! So this is another "your solution don't works" and "ok, add this feature now" topic? Please, carefully read this post; then, realize that you didn't specified the format of the file! User dbenham clearly asked you:

dbenham wrote:You have not given an adequate specification as to how the source file might be formatted. For example, Is the text always enclosed within <i>...</i> :?:

But you just ignored that question! You showed the wrong output from my script, but you didn't showed the input! How do you think I can modify my code if I have not the data to test it? :evil: Said that...

The new code:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set i=0
for %%r in (world hello beyond Colorado) do (
   set /A i+=1
   set "remove[!i!]=%%r"
)

for %%v in (num times wordFound i n) do set "%%v="
(

for /F "tokens=1* delims=:" %%a in ('findstr /N "^" input.txt') do (
   set "line=%%b"
   if not defined line (
      if not defined wordFound (
         set /A n+=1
         echo !n!
         echo !times!
         for /L %%i in (1,1,!i!) do echo !line[%%i]!
         echo/
      )
      for %%v in (num times wordFound i) do set "%%v="
   ) else if not defined num (
      set "num=!line!"
   ) else if not defined times (
      set "times=!line!"
   ) else (
      set /A i+=1
      set "line[!i!]=!line!"
      for /F "tokens=2 delims==" %%r in ('set remove') do (
         if "!line:%%r=!" neq "!line!" set wordFound=true
      )
   )
)
if defined i if not defined wordFound (
   set /A n+=1
   echo !n!
   echo !times!
   for /L %%i in (1,1,!i!) do echo !line[%%i]!
   echo/
)

) > output.txt

The input:

Code: Select all

1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>

2
00:01:03,200 --> 00:01:05,521
<i>stories of how the world once was.</i>

3
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>

4
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>

5
00:05:38,160 --> 00:05:40,447
<i>he's going to lead us to crush Skynet.</i>

6
00:05:41,080 --> 00:05:42,411
<i>For good.</i>

7
00:05:42,920 --> 00:05:45,287
Sir? Request to join
the Colorado offensive.

8
00:05:45,920 --> 00:05:47,410
I need you with me, Reese.

9
00:05:47,680 --> 00:05:50,040
We're talking about the
complete destruction of Skynet, sir.

10
00:05:50,160 --> 00:05:52,288
The Colorado unit will succeed.

11
00:05:52,360 --> 00:05:54,567
The machines will fall tonight.

The output:

Code: Select all

1
00:01:01,520 --> 00:01:03,160
<i>Before they died,
my parents told me</i>

2
00:01:06,520 --> 00:01:08,807
<i>What it was like long before I was born.</i>

3
00:01:09,720 --> 00:01:11,882
<i>Before the war with the machines.</i>

4
00:05:38,160 --> 00:05:40,447
<i>he's going to lead us to crush Skynet.</i>

5
00:05:41,080 --> 00:05:42,411
<i>For good.</i>

6
00:05:45,920 --> 00:05:47,410
I need you with me, Reese.

7
00:05:47,680 --> 00:05:50,040
We're talking about the
complete destruction of Skynet, sir.

8
00:05:52,360 --> 00:05:54,567
The machines will fall tonight.


Antonio

PS - If I would knew the exact specifications of this problem before, I would not posted a pure Batch file solution! It is very slow...

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#9 Post by Gamer95 » 28 Sep 2015 11:34

I'm very sorry Antonio!
I didn't read the question right.
And i am very sorry that i didn't read the rules.
I also gave no information about the full input, that is completely my mistake and i am very thankfull
that you're willing to help!

I only posted a couple of lines of the input and i should have said that there are lines without <i> and </i>. :oops:
Again thank you for helping me.

I've tried your new code and it works but like you said it's pretty slow.

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#10 Post by Gamer95 » 29 Sep 2015 03:58

Can someone please help me?
This is currently the script:

Code: Select all

@echo off
setlocal enableDelayedExpansion

set inFile= %1
set outFile= %~n1_no_world.srt

>nul findstr "^[0-9].*-->" %1 && (
  goto process_srt
) || (
  goto:eof
)


:process_srt
:: Define LF to contain a line feed character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: Defiine CR to contain a carriage return character
for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

:: Count the number of lines within the input file
for /f %%N in ('find /c /v "" ^<"!inFile!"') do set "cnt=%%N"

set /a n=1
set "str="
<"!inFile!" >"!outFile!" (
  for /l %%N in (1 1 !cnt!) do (
    set "ln="
    set /p "ln="
    if defined ln (
      if not defined str (
        set "str=!n!!CR!!LF!"
      ) else (
        set "str=!str!!ln!!CR!!LF!"
      )
    ) else (
      if defined str if "!str:world=!" equ "!str!" (
        echo(!str!
        set /a n+=1
      )
      set "str="
    )
  )
  if defined str if "!str:world=!" equ "!str!" echo(!str!
)


I'm looking for a way that when i drag a srt on the batch file it processes the srt.
But currently it's saying that the path can't be found.

Also how can i extend the script so it searches multiple words?
Now it only finds the word "world" but how can i make it search world, hello, beyond etc...

I've used Antonio's script above but it's really slow.

dbenham
Expert
Posts: 2461
Joined: 12 Feb 2011 21:02
Location: United States (east coast)

Re: Edit subtitle with batch?

#11 Post by dbenham » 29 Sep 2015 04:46

:roll:

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#12 Post by Gamer95 » 29 Sep 2015 06:09

:(

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#13 Post by Gamer95 » 29 Sep 2015 14:19

Can someone please tell me why this doesn't work? :oops:
When i only use the word "world" it works but when i add "hope" it doesn't.

Code: Select all

@echo off
setlocal enableDelayedExpansion

set "inFile=%~1"
set "outFile=%~n1_no_world.txt"

:: Define LF to contain a line feed character
set ^"LF=^

^" The empty line above is critical - DO NOT REMOVE

:: Define CR to contain a carriage return character
for /f %%a in ('copy /Z "%~dpf0" nul') do set "CR=%%a"

:: Count the number of lines within the input file
for /f %%N in ('find /c /v "" ^<"!inFile!"') do set "cnt=%%N"

set /a n=1
set "str="



<"!inFile!" >"!outFile!" (

     for /l %%N in (1 1 !cnt!) do (

                set "ln="
                set /p "ln="
            
            
            set "TRUE="
            if "!str:world=!" equ "!str!" set TRUE=1
            if "!str:hope=!" equ "!str!" set TRUE=1

                if defined ln (

                        if not defined str (

                                         set "str=!n!!CR!!LF!"

                        ) else (

                                         set "str=!str!!ln!!CR!!LF!"

                       )

                ) else (

                        if defined TRUE (echo(!str!
                             set /a n+=1


                        )

                        set "str="

                )

     )

     if defined TRUE echo(!str!

)

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Edit subtitle with batch?

#14 Post by Aacini » 29 Sep 2015 15:49

Code: Select all

@set @a=0  /*
@CScript //nologo //E:JScript "%~F0" < "%~F1" > "%~DPN1_no_world.txt"
@goto :EOF */

var fileContents = WScript.StdIn.ReadAll(),
    search = /(\d+\r\n)(.+\r\n((.+\r\n)+)(\r\n)?)/g,
    ignoreWord = /world|hello|beyond/, match, n=0;

while ( match = search.exec(fileContents) ) {
   if ( ! ignoreWord.test(match[3]) ) {
      WScript.Stdout.Write(++n+"\r\n"+match[2]);
   }
}


Code: Select all

Input data                              search = /regexp/

1                                       (\d+\r\n)       \d=a digit, +=one or more times, CR+LF     -> match[1]
00:01:01,520 --> 00:01:03,160           (.+\r\n         (.=any char, +=one or more times, CR+LF
<i>Before they died,                     ((.+\r\n)       (.=any char, +=one or more times, CR+LF)
my parents told me</i>                            +)        +=one or more times     -> match[3]
                                         (\r\n)?         empty line, ?=zero or one time (zero times in last line)
                                        )               )   -> match[2]


Full regexp details here.

Antonio

Gamer95
Posts: 32
Joined: 26 Sep 2015 03:05

Re: Edit subtitle with batch?

#15 Post by Gamer95 » 30 Sep 2015 04:28

Thank you Antonio!
What a great script and it's lightning fast! :D
The final thing to complete the script is to add a replace function.
I'm trying to replace - with +.
This is what i got, but it doesn't work. :cry:

Code: Select all

@set @a=0  /*
@echo off

if NOT %~x1 == .txt goto :EOF

@CScript //nologo //E:JScript "%~F0" < "%~F1" > "%~DPN1_no_world.txt"
@goto :EOF */


var fileContents = WScript.StdIn.ReadAll(),
    search = /(\d+\r\n)(.+\r\n((.+\r\n)+)(\r\n)?)/g,
    ignoreWord = /world|hello|beyond|[)]/, match, n=0;
    replaceWord = /-/, match, n=0;



while ( match = search.exec(fileContents) ) {
   if ( ! ignoreWord.test(match[3]) ) {
      WScript.Stdout.Write(++n+"\r\n"+match[2]);
   }

   if ( ! replaceWord.test(match[3]) ) {
      WScript.Stdout.replace(search, replaceWord "+")
}

Post Reply