Page 1 of 1

Quick way to extract data from multiples lines and many files using jrepl?

Posted: 07 Aug 2017 03:11
by zimxavier
I would like to extract all values of which from hundred files but only when it is a parameter of set_variable or change_variable. These functions can be on one line or several. Commented functions are excluded.
Example:

Code: Select all

ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}
change_variable = { which = "laws" value = 1 }

change_variable = { value = -1 which = debate_score }

bad_function = { which = not_a_value }
#change_variable = { value = -1 which = not_a_value }


Correct values:
current_potion_quality
laws
debate_score



My script works fine, but it is excruciatingly long (At least 10 minutes):

Code: Select all

for %%F in ("C:\game\decisions\*.txt") do (
call BATCH_JREPL "(^[^#]*?)(\b(set_variable|change_variable)\s*=)([\s\S]*?)(which\s*=\s*\q?)([A-Za-z0-9_]+)" "$txt=$6" /jmatchq /x /m /f "%%F" >> "TEMP\ztemp0_all_variables.txt"
for %%F in ("C:\game\events\*.txt") do (
call BATCH_JREPL "(^[^#]*?)(\b(set_variable|change_variable)\s*=)([\s\S]*?)(which\s*=\s*\q?)([A-Za-z0-9_]+)" "$txt=$6" /jmatchq /x /m /f "%%F" >> "TEMP\ztemp0_all_variables.txt"
)


/m (multilines parameter) seems to be the cause, the rest of the script being fast.
Any ideas ?
Thanks.

Re: Quick way to extract data from multiples lines and many files using jrepl?

Posted: 07 Aug 2017 06:41
by dbenham
zimxavier wrote:My script works fine...
I don't think so. Your code will mistakenly extract "current_potion_quality" from the following:

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}

Also, you are capturing 6 groups, when you only need 1.

Here is a fixed JREPL call with only one capturing group:

Code: Select all

call jrepl ^
  "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
  "$txt=$1" /jmatchq /x /m /f "%%F" >>"TEMP\ztemp0_all_variables.txt"
But I don't think this will speed anything up. In fact, it may slow things down because the bug fix can lead to additional regex backtracking.

Each call to JREPL requires significant start up time. The simplest way I can think to speed things up is to minimize the number of times JREPL is called. All your results go to a single output file, so it should be possible to get by with only one JREPL call (provided the sum of all source text files is less than ~1 GB)

If you can guarantee that the last character of every source file is a newline character, then you can use the following:

Code: Select all

type "C:\game\decisions\*.txt" "C:\game\events\*.txt" 2>nul |^
jrepl "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
      "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"

But if some files are missing the final newline, then the first line of a file may be appended to the last line of the prior file.

A FOR loop can solve the problem:

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|jrepl "^(?![\s])[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


Dave Benham

Re: Quick way to extract data from multiples lines and many files using jrepl?

Posted: 07 Aug 2017 08:50
by zimxavier
Thank you Dave :)

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
}


Actually, in that case I want to extract current_potion_quality, because it is read by the game (not in root scope, but the syntax is correct). Only the extra final curly bracket is always wrong and breaks the syntax highlighting of the rest of the file. # must be somewhere before and in the same line than change_variable\set_variable or which (Never see that latest case though). Sorry it wasn't clear.

3 correct syntaxes:

current_potion_quality is not valid:

Code: Select all

#ROOT = {
#    change_variable = {
#        which = current_potion_quality
#        value = 1
#    }
#}


current_potion_quality is valid:

Code: Select all

#ROOT = {
    change_variable = {
        which = current_potion_quality
        value = 1
    }
#}


current_potion_quality is not valid and a_value is valid:

Code: Select all

#ROOT = {
    change_variable = {
        #which = current_potion_quality
        which = a_value
        value = 1
    }
#}


Anyways, I tested your latest script. It took... 2 seconds. It doesn't work as expected though. It found 94 occurrences instead of 508.

I replaced ^(?![\s])[^#] with ^[^#]*? (like in my initial script). This code works but takes 2'24'' (!):

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|BATCH_JREPL "^[^#]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


I don't understand why the result is not the same.

Re: Quick way to extract data from multiples lines and many files using jrepl?

Posted: 08 Aug 2017 01:00
by zimxavier
After a good night's sleep, I believe I found it:

Code: Select all

@echo off
(
  for %%F in (
    "C:\game\decisions\*.txt"
    "C:\game\events\*.txt"
    "C:\game\common\scripted_effects\*.txt"
  ) do (
    type "%%F"
    echo(
  )
)|BATCH_JREPL "^[^#\n]*?(?:\bset_variable|\bchange_variable)\s*=[\s\S]*?which\s*=\s*\q?([A-Za-z0-9_]+)"^
        "$txt=$1" /jmatchq /x /m /o "TEMP\ztemp0_all_variables.txt"


I added \n as exclusion. This script takes 1 or 2 seconds. Amazing! :mrgreen: My third example is not supported, but not a big deal.
Thank you again Dave. Your changes are very promising. All durations are drastically reduced.