A couple points needs to be cleared. Suppose that a file contain 100 lines, and that you want the 20% of they, that is 20 lines. Mean this that lines 81 to 100 is an acceptable result, but 82 to 100 not because it have only 19 lines? If the request is 99% just two results are possible: lines 1-99 or lines 2-100. Is this correct? If so, the Batch file below solve your problem:
Code: Select all
@echo off
setlocal EnableDelayedExpansion
rem Usage: Get_PercentOfFile_ percent filename
rem Get a percentage of lines from filename
rem Get number of lines in file
for /F %%a in ('find /c /v "" ^< %2) do set numLines=%%a
rem Get the desired percentage of lines
set /A desiredLines=numLines*%1/100
rem Get a random starting position for such subset
set /A skipLines=(numLines-desiredLines)*%random%/32768
set skip=
if %skipLines% gtr 0 set skip=skip=%skipLines%
rem Show the subset of lines
for /F "%skip% delims=" %%a in (%2) do (
echo %%a
set /A desiredLines-=1
if !desiredLines! equ 0 goto continue
)
:continue
However, if you want that
every line in the subset be randomly selected, then the method must be changed this way:
Code: Select all
@echo off
rem Usage: Get_PercentOfFile_ percent filename
rem Get a percentage of lines from filename
rem Get number of lines in file
for /F %%a in ('find /c /v "" ^< %2) do set numLines=%%a
rem Get the desired percentage of lines
set /A desiredLines=numLines*%1/100
rem Create a "copyThisLine" array with the "desired lines" number of elements
rem with randomly-generated number of line
:nextElem
set /A randNum=numLines*%random%/32768 + 1
if defined copyThisLine[%randNum%] goto nextElem
set copyThisLine[%randNum%]=TRUE
set /A desiredLines-=1
if %desiredLines% gtr 0 goto nextElem
rem Show the subset of randomly selected lines
for /F "tokens=1* delims=:" %%a in ('findstr /N "^" %2') do (
if defined copyThisLine[%%a] echo %%b
)
Previous method should be adjusted to
delete lines from a full array, instead of insert lines in an empty array, if the number of required lines is greater than 50% of the file. Otherwise, the program would take too long...
I hope it helps.
Antonio