Combinations of N numbers in sets of K with R repeated elements

Discussion forum for all Windows batch related topics.

Moderator: DosItHelp

Post Reply
Message
Author
Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Combinations of N numbers in sets of K with R repeated elements

#1 Post by Aacini » 26 May 2023 00:11

1- Using random numbers to generate combinations

At this SO question an interesting Statistical Combinations problem is stated:
SO_question wrote: Generating lines of six different numbers between 1 and 39 that have no more than two numbers in common with any previous line.
The OP requested that 300 lines be generated this way. I liked the problem a lot, so I wrote a solution (and several additional solutions later). I posted this thread here in order to have documented and ordered these programs (and to invite you to write a solution of your own).

This problem is solved via two basic steps: 1- Generate lines with six non-repeating numbers between 1 and 39 (a numeric combination), and 2- Compare the new line versus all previously generated lines to count the numbers in common with each one. If the number is less or equal 2 in all previous lines, insert the new line in the solution. When 300 lines are generated this way, stop. This problem in complicated because the huge number of possible combinations and the number of operations involved. In the SO question the skeleton of a simple solution based on random numbers is included (in the most rudimentary and slowest way).

My first solution was also based on random numbers, but just to take 6 anyone numbers from the 39 available ones (using 6 random numbers only). After 6 numbers are taken, they are compared versus the numbers of all previously generated lines. If a number caused to have 3 or more repeated numbers with a previous line, it is eliminated and the process go back to get another available number. If the 6 numbers are correct, a new line is generated. The creation of lines continue while available numbers exists in this set. When the available numbers ends, a new set of 39 numbers is created and the whole process is repeated. (Ok, my first method certainly was not a very good one)...

Code: Select all

@echo off
setlocal EnableDelayedExpansion


rem The 300 lines are generated as a 2-dimensional matrix called "n" with this shape:
rem    n[%line%][%num%]=1
rem Where "line" is the index of the line: from 1 to 300
rem and "num" is *each number* (value between 1 and 39) that comprise such a line
rem
rem Each new line is previously generated in "num" array with 6 elements in the same way


rem Randomize
for /L %%i in (0,1,%time:~-1%) do set "x=!random!"

rem Generate 300 lines with 6 numbers each
set "line=0"
set "num[0]=0"
:nextLine

   rem Set numbers 1..39 as "available"
   for /L %%i in (1,1,39) do set "avail[%%i]=%%i"
   set "numsAvail=39"

   rem Remove previously generated numbers
   for /F "tokens=2 delims==" %%n in ('set num[') do set "num[%%n]="
   set "numbers=0"

   rem Generate 6 random numbers (from "available" ones):

   rem Get the index of anyone (random  order) of available numbers
   rem Insert such available number in "num" array
   rem Move to the (position of the) used available number the *last one*, and eliminate (the last) one available number
   rem And pass to the next number
   :nextAvail
   set /A indexAvail=%random% %% numsAvail + 1
   set /A num[!avail[%indexAvail%]!]=avail[%indexAvail%], avail[%indexAvail%]=avail[%numsAvail%], numsAvail-=1, numbers+=1
   if %numbers% lss 6 (
      if %numsAvail% gtr 0 (
         goto nextAvail
      ) else (
         goto nextLine
      )
   )

   rem Compare the new 6 numbers vs. *all* previously generated lines
   rem and eliminate the (last) number that cause to have more than 2 repeated numbers:

   for /L %%i in (1,1,%line%) do (
      set "repeated=0"
      for /F "tokens=2 delims==" %%j in ('set num[') do (
         if defined n[%%i][%%j] set /A repeated+=1
         if !repeated! gtr 2 (
            set "num[%%j]="
            set /A numbers-=1
            if %numsAvail% gtr 0 (
               goto nextAvail
            ) else (
               goto nextLine
            )
         )
      )
   )

   rem Insert (and show) the new line:
   set /A line+=1
   < NUL (
   set /P "=%time:~0,-3%  %line%- "
   for /F "tokens=2 delims==" %%n in ('set num[') do (
      set "n[%line%][%%n]=1"
      set /P "=%%n "
   ))
   echo/

if %line% lss 300 goto nextLine
I run this program for 1 hour and 50 minutes in my old-and-slow machine until it generates 100 sets. Obviously, the first sets are generated very fast and the time increases as the number of sets aument. This is the listing of the first 100 results (with timing of each line):

Code: Select all

21:52:36  1- 15 18 21 34 5 6
21:52:36  2- 10 14 15 26 33 6
21:52:36  3- 13 16 25 30 32 33
21:52:36  4- 14 23 29 2 35 39
21:52:36  5- 14 25 27 34 36 4
21:52:37  6- 12 18 20 22 36 7
21:52:37  7- 12 20 21 24 27 37
21:52:37  8- 22 28 30 35 6 9
21:52:38  9- 12 18 28 35 38 5
21:52:38  10- 18 23 25 30 36 8
21:52:39  11- 15 17 24 3 5 8
21:52:39  12- 17 22 24 26 29 31
21:52:40  13- 21 2 32 37 39 7
21:52:41  14- 11 23 28 2 33 7
21:52:41  15- 16 19 1 30 34 36
21:52:43  16- 15 21 32 38 4 9
21:52:43  17- 12 21 25 33 34 39
21:52:44  18- 16 24 32 6 8 9
21:52:46  19- 12 1 27 35 36 8
21:52:48  20- 15 18 20 33 35 3
21:52:49  21- 11 14 18 28 34 37
21:52:51  22- 20 22 25 26 35 39
21:52:53  23- 10 11 26 34 35 38
21:52:54  24- 18 19 22 28 2 39
21:52:56  25- 12 16 20 30 31 5
21:52:57  26- 15 16 23 27 36 6
21:52:59  27- 10 12 23 32 39 4
21:53:09  28- 17 23 26 28 34 4
21:53:11  29- 16 17 1 25 26 9
21:53:14  30- 12 13 15 16 2 7
21:53:15  31- 10 20 24 2 36 3
21:53:20  32- 10 11 17 1 20 23
21:53:24  33- 11 13 15 20 21 22
21:53:28  34- 14 15 20 28 29 36
21:53:30  35- 15 16 18 19 25 31
21:53:33  36- 10 15 17 18 32 37
21:53:47  37- 11 14 15 25 2 32
21:53:49  38- 19 21 28 7 8 9
21:53:55  39- 11 1 22 24 27 30
21:53:58  40- 10 14 18 20 30 4
21:54:01  41- 1 21 26 27 28 39
21:54:03  42- 10 13 25 34 6 9
21:54:04  43- 19 23 27 30 33 38
21:54:17  44- 10 12 13 18 27 3
21:54:25  45- 11 18 26 29 30 32
21:54:28  46- 13 14 15 37 38 39
21:54:30  47- 14 18 1 23 3 9
21:54:36  48- 12 13 20 23 28 6
21:54:48  49- 11 15 1 26 7 8
21:55:05  50- 11 12 13 17 26 39
21:55:10  51- 11 18 21 24 35 36
21:55:23  52- 10 12 15 29 38 8
21:55:46  53- 10 11 12 14 5 7
21:55:50  54- 13 26 27 2 33 36
21:55:58  55- 16 17 20 22 34 37
21:56:20  56- 10 11 13 31 33 4
21:56:23  57- 13 18 29 2 4 5
21:56:25  58- 1 22 23 2 31 5
21:56:30  59- 14 16 17 19 35 6
21:56:53  60- 10 12 16 17 21 36
21:57:29  61- 13 1 20 25 5 7
21:57:51  62- 10 14 17 24 25 38
21:58:03  63- 11 13 14 16 1 29
21:58:13  64- 10 15 16 28 30 39
21:58:20  65- 10 16 19 37 3 4
21:59:37  66- 10 11 21 25 37 8
21:59:40  67- 19 1 28 31 38 3
21:59:46  68- 15 25 26 27 30 3
22:00:10  69- 11 12 19 20 3 8
22:01:36  70- 11 13 30 35 3 7
22:02:09  71- 24 29 34 4 6 7
22:03:36  72- 24 26 35 37 7 9
22:05:18  73- 11 12 15 24 28 4
22:05:41  74- 22 23 24 33 34 36
22:07:15  75- 12 17 1 30 33 7
22:07:17  76- 19 25 32 4 5 6
22:07:35  77- 10 12 19 1 22 6
22:07:57  78- 12 14 15 17 22 23
22:08:57  79- 10 11 19 27 32 36
22:08:59  80- 11 25 31 36 5 9
22:10:39  81- 13 1 22 26 32 3
22:11:41  82- 11 12 22 29 33 35
22:11:54  83- 12 14 24 31 34 8
22:15:02  84- 12 13 14 19 30 9
22:17:18  85- 10 19 24 28 29 33
22:19:24  86- 17 1 24 37 39 6
22:21:04  87- 11 16 17 18 27 33
22:21:14  88- 12 1 20 2 32 38
22:22:37  89- 11 16 22 38 39 7
22:22:50  90- 10 18 1 29 31 39
22:27:39  91- 15 17 19 1 27 2
22:29:04  92- 13 19 21 23 24 2
22:36:51  93- 12 14 32 33 36 3
22:43:11  94- 13 18 20 24 31 32
22:53:34  95- 16 18 20 21 23 38
23:00:23  96- 11 14 30 31 38 6
23:02:39  97- 10 1 27 34 37 7
23:04:17  98- 13 16 17 23 31 3
23:08:38  99- 19 20 21 29 31 4
23:42:58  100- 10 12 26 28 31 37
Antonio

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#2 Post by Aacini » 26 May 2023 00:11

2- Divide solution in 3 parts; the last uses findstr on Files

I reviewed my program and concluded that a better method could be to divide the solution in 3 parts in order to get No, One and Two repeated elements in the generated lines via a precise (non-random) method. In this way I will be sure to generate all lines with such a number of repeated elements.

The first part generates lines with NO repeating elements in a very simple way by taking groups of 6 numbers until the initial 39 are finished:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

set /A last=39, n=6, nM1=n-1

echo Combinations of 6 numbers with No repetitions:
echo/

set /A lastI=(last/n-1)*n+1, count=0
< NUL (
for /L %%i in (1,%n%,%lastI%) do (
   set /A lastJ=%%i+nM1
   for /L %%j in (%%i,1,!lastJ!) do (
      set /P "=%%j "
   )
   echo/
   set /A count+=1
))

echo/
echo Total = %count%
Output:

Code: Select all

Combinations of 6 numbers with No repetitions:

1 2 3 4 5 6
7 8 9 10 11 12
13 14 15 16 17 18
19 20 21 22 23 24
25 26 27 28 29 30
31 32 33 34 35 36

Total = 6
The second part generates lines with ONE repeated element. I used a method based on a base number that vary from 1 to 34, because the last possible line will have the numbers 34,35,36,37,38,39; the base number will be the repeated element in each group of generated lines. I used an array of "available numbers" that is initialized with the 2..39 numbers the first time. Then, it generates lines that have the base number repeated and taking 5 numbers from the available ones until no more lines can be generated.

From the second base number on: after the available numbers are initialized, review all previously generated lines looking for the new base number and remove from the available ones all numbers in found lines.

Code: Select all

echo off
setlocal EnableDelayedExpansion

echo Combinations with ONE repeated element:
echo/

rem Initialize output file and available numbers
rem/> lines.txt
set "avail= "
for /L %%n in (2,1,39) do set "avail=!avail!%%n "
set /A numAvail=38, lines=0, base=0

:nextBase
   set /A base+=1

   :nextLine
      set "line= %base% "

      if %base% gtr 1 (

         rem Initialize available numbers
         set "avail= "
         for /L %%n in (1,1,39) do set "avail=!avail!%%n "
         set "numAvail=39"

         rem Remove available numbers that appears in previous lines for this base number
         for /F "delims=" %%a in ('findstr /C:" %base% " lines.txt') do (
            for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
               set "avail=!avail: %%b = !" & set /A numAvail-=1
            )
         )

      )

      rem Insert 5 numbers taking they from available ones
      set "nums=0"
      for /L %%n in (1,1,5) do if !numAvail! neq 0 (
         for /F "tokens=1*" %%x in ("!avail!") do (
            set "line=!line!%%x " & set "avail= %%y" & set /A nums+=1, numAvail-=1

            rem Remove available numbers that appears in previous lines for this new number
            for /F "delims=" %%a in ('findstr /C:" %%x " lines.txt') do (
               for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
                  set "avail=!avail: %%b = !" & set /A numAvail-=1
               )
            )

         )
      )

      if %nums% equ 5 (
         echo %line%
         >> lines.txt echo %line%
         set /A lines+=1
      )

   if %numAvail% geq 5 goto nextLine

if %base% lss 34 goto nextBase

echo/
echo %lines% lines created
Output:

Code: Select all

Combinations with ONE repeated element:

 1 2 3 4 5 6
 1 7 8 9 10 11
 1 12 13 14 15 16
 1 17 18 19 20 21
 1 22 23 24 25 26
 1 27 28 29 30 31
 1 32 33 34 35 36
 2 7 12 17 22 27
 2 8 13 18 23 28
 2 9 14 19 24 29
 2 10 15 20 25 30
 2 11 16 21 26 31
 3 7 13 19 25 31
 3 8 12 20 24 32
 4 7 14 18 26 30
 4 8 15 17 29 33
 5 7 15 21 23 32
 6 7 16 20 28 33
 7 24 34 37 38 39
 8 5 14 22 31 34
 9 3 15 18 22 35
 10 3 14 17 23 36
 12 4 9 21 25 28
 12 5 10 18 29 37
 13 4 10 22 32 38
 14 6 11 25 27 32
 17 5 9 13 26 39
 22 6 19 30 36 37
 31 4 20 23 35 37

29 lines created
The third part generates lines with TWO repeated numbers. I used a method similar to the previous one: set two "base numbers" that will be the repeated ones in each group of generated lines, and take 4 numbers from the "available ones" that exclude the two base numbers. However, in this case the selection of the "available numbers" is more complicated than before because from the 4 new numbers to insert, can not be more than 3 of them in the same line in anyone of previously generated lines.

I didn't wanted to use the rudimentary method of counting repeated numbers one by one, I would prefer a smarter method that gives the same result with fewer operations. This is the reasoning I used.

I need to select 4 numbers: the first two not matters because if the same two numbers exist in another line, then such a line will have just these 2 numbers repeated. The problem is with the third number: if the two previous numbers exist in any other line, then the third number can't exist in the same line; otherwise, you can insert such a third number! Instead of (slowly) count repeated numbers one-by-one in the lines, I used an "accumulative search": I first searched for the first number and get the resulting lines. Then I searched the second number in these lines: the result will be the lines that contain both numbers. Finally I searched the third number in these last lines: if a line is found then a line exists with the same three numbers, so you can't insert the third number. Note that don't matters which line is the one with the three repeated numbers; just if it exists or not.

A more complicated problem happens with the fourth number: if the first two numbers exists and also the fourth number, you can't insert it; NOR if the first and third numbers exists and also the fourth number, NOR if the second and third number exists and also the fourth number. These three cases (and the former one) uses auxiliary files that store the lines found in each previous step; they are described in a small scheme in the code below. All the accumulative searchs are completed via findstr commands on these files.

Code: Select all

@goto begin

https://stackoverflow.com/questions/76219423/batch-script-need-help-generating-lines-of-six-different-numbers-between-1-and

The method is based in TWO BASE NUMBERS that are the numbers that will be *repeated in the currently generated lines*,
plus FOUR "new numbers" that will be inserted, taken from a list of "available numbers" that exclude the two base numbers.
The purpose is that all available numbers does NOT increase beyond 2 the number of repeated numbers with previous lines.

1- Eliminate from "available numbers" the numbers in previously generated lines that already have the two base numbers.
2- Check if the next four available numbers don't cause to increment more than 2 repeated numbers vs. previously generated lines:

  Previous lines:       01 02 03 04 05 06   Base numbers: 20 and 21. Available numbers at this moment: 11 15 17 18 . . .
  Two bases + new nums: 20 21 11 15 17 18   <- The possible new line.
                                            The 11 and 15 are *always* inserted. The 17 and 18 depends on next tests:
                               A  A  A      If 11 appears AND 15 appears AND 17 appears (accumulative searchs): the new line have 3 repeated, OR
                               B  B     B   If 11 appears in 3th pos AND 15 appears in 4th pos AND 18 appears in 6th pos: the same, OR
                               C     C  C   If 11 appears in 3th pos AND 17 appears in 5th pos AND 18 appears in 6th pos: the same, OR
                                  D  D  D   If 15 appears in 4th pos AND 17 appears in 5th pos AND 18 appears in 6th pos: the same (3 repeated)
                                            If A test don't pass, the 17 is eliminated. If B, C or D don't pass, the 18 is eliminated.

3- Eliminate from "available numbers" the numbers in previously generated lines that have the new inserted number and *anyone* of base numbers.

:begin
echo off
setlocal EnableDelayedExpansion

echo Combinations with TWO repeated elements:
echo/

rem/> lines.txt
set "start=%time:~0,-3%"
set /A lines=0, base1=1, base2=1
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for %%a in (findstr.exe) do set "findstr=%%~$PATH:a"
if not defined findstr set "findstr=C:\Windows\System32\findstr.exe"

:nextBase
   set /A base2+=1
   if %base2% equ %base1% set /A base2+=1
   if %base2% gtr 39 set /A base1+=1, base2=1
   if %base1% gtr 39 goto endLine

   set /P "=%base1% %base2%  !CR!" < NUL

   :nextLine
      set "line= %base1% %base2% "
      set "linesInBases=0"

      rem Initialize available numbers
      set "avail= "
      for /L %%n in (1,1,39) do set "avail=!avail!%%n "
      set "avail=!avail: %base1% = !"
      set "avail=!avail: %base2% = !"
      set "numAvail=37"

      rem Remove available numbers that appears in previous lines for *these same* two base numbers
      for /F "delims=" %%a in ('%findstr% /C:" %base1% " lines.txt  ^|  %findstr% /C:" %base2% "') do (
         for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
            set "avail=!avail: %%b = !" & set /A "numAvail-=1"
         )
      )
      if %numAvail% lss 4 goto endLine

      rem Insert 4 numbers taking they from available ones
      set "nums=0"
      :nextNum
         for /F "tokens=1*" %%x in ("%avail%") do set "num=%%x"  &  set "avail= %%y"  &  set /A numAvail-=1

         rem Search the new numbers in previous lines looking for 3 repetitions (see counting scheme above)
         if %nums% equ 0 (
            %findstr% /C:" %num% " lines.txt > ABC1.txt
         ) else if %nums% equ 1 (
            %findstr% /C:" %num% " ABC1.txt > AB2.txt
            %findstr% /C:" %num% " lines.txt > D1.txt
         ) else if %nums% equ 2 (
            %findstr% /C:" %num% " AB2.txt > NUL
            if not errorlevel 1 goto endNum
            %findstr% /C:" %num% " ABC1.txt > C2.txt
            %findstr% /C:" %num% " D1.txt   > D2.txt
         ) else (
            %findstr% /C:" %num% " AB2.txt > NUL
            if not errorlevel 1 goto endNum
            %findstr% /C:" %num% " C2.txt > NUL
            if not errorlevel 1 goto endNum
            %findstr% /C:" %num% " D2.txt > NUL
            if not errorlevel 1 goto endNum
         )
         set "line=!line!%num% "  &  set /A nums+=1

         rem Remove available numbers that appears in previous lines for this new number *and anyone* of two base numbers
         for /F "delims=" %%a in ('%findstr% /C:" %num% " lines.txt  ^|  %findstr% /C:" %base1% " /C:" %base2% "') do (
            for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
               set "avail=!avail: %%b = !" & set /A "numAvail-=1"
            )
         )

      :endNum
      if %nums% lss 4 if %numAvail% gtr 0 goto nextNum

      if %nums% equ 4 (
         set /A lines+=1, linesInBases+=1
         echo !lines!- %line%
         >> lines.txt echo !lines!- %line%
      )

   if %linesInBases% gtr 0 goto nextLine

   :endLine

if %base1% leq 39 goto nextBase

echo/     
echo %lines% lines created
echo Start: %start%
echo End:   %time:~0,-3%
This program takes about 15 minutes to generate 216 lines in my old-and-slow computer:

Code: Select all

1-  1 2 3 4 5 6 
2-  1 2 7 8 9 10 
3-  1 2 11 12 13 14 
4-  1 2 15 16 17 18 
5-  1 2 19 20 21 22 
6-  1 2 23 24 25 26 
7-  1 2 27 28 29 30 
8-  1 2 31 32 33 34 
9-  1 2 35 36 37 38 
10-  1 3 7 11 15 19 
11-  1 3 8 12 16 20 
12-  1 3 9 13 17 21 
13-  1 3 10 14 18 22 
14-  1 3 23 27 31 35 
15-  1 3 24 28 32 36 
16-  1 3 25 29 33 37 
17-  1 3 26 30 34 38 
18-  1 4 7 12 17 22 
19-  1 4 8 11 18 21 
20-  1 4 9 14 15 20 
21-  1 4 10 13 16 19 
22-  1 4 23 28 33 38 
23-  1 4 24 27 34 37 
24-  1 4 25 30 31 36 
25-  1 4 26 29 32 35 
26-  1 5 7 13 18 20 
27-  1 5 8 14 17 19 
28-  1 5 9 11 16 22 
29-  1 5 10 12 15 21 
30-  1 5 23 29 34 36 
31-  1 5 24 30 33 35 
32-  1 5 25 27 32 38 
33-  1 5 26 28 31 37 
34-  1 6 7 14 16 21 
35-  1 6 8 13 15 22 
36-  1 6 9 12 18 19 
37-  1 6 10 11 17 20 
38-  1 6 23 30 32 37 
39-  1 6 24 29 31 38 
40-  1 6 25 28 34 35 
41-  1 6 26 27 33 36 
42-  2 3 7 12 18 21 
43-  2 3 8 11 17 22 
44-  2 3 9 14 16 19 
45-  2 3 10 13 15 20 
46-  2 3 23 28 34 37 
47-  2 3 24 27 33 38 
48-  2 3 25 30 32 35 
49-  2 3 26 29 31 36 
50-  2 4 7 11 16 20 
51-  2 4 8 12 15 19 
52-  2 4 9 13 18 22 
53-  2 4 10 14 17 21 
54-  2 4 23 27 32 36 
55-  2 4 24 28 31 35 
56-  2 4 25 29 34 38 
57-  2 4 26 30 33 37 
58-  2 5 7 14 15 22 
59-  2 5 8 13 16 21 
60-  2 5 9 12 17 20 
61-  2 5 10 11 18 19 
62-  2 5 23 30 31 38 
63-  2 5 24 29 32 37 
64-  2 5 25 28 33 36 
65-  2 5 26 27 34 35 
66-  2 6 7 13 17 19 
67-  2 6 8 14 18 20 
68-  2 6 9 11 15 21 
69-  2 6 10 12 16 22 
70-  2 6 23 29 33 35 
71-  2 6 24 30 34 36 
72-  2 6 25 27 31 37 
73-  2 6 26 28 32 38 
74-  3 4 7 8 13 14 
75-  3 4 9 10 11 12 
76-  3 4 15 16 21 22 
77-  3 4 17 18 19 20 
78-  3 4 23 24 29 30 
79-  3 4 25 26 27 28 
80-  3 4 31 32 37 38 
81-  3 4 33 34 35 36 
82-  3 5 7 9 23 25 
83-  3 5 8 10 24 26 
84-  3 5 11 13 27 29 
85-  3 5 12 14 28 30 
86-  3 5 15 17 31 33 
87-  3 5 16 18 32 34 
88-  3 5 19 21 35 37 
89-  3 5 20 22 36 38 
90-  3 6 7 10 27 30 
91-  3 6 8 9 28 29 
92-  3 6 11 14 23 26 
93-  3 6 12 13 24 25 
94-  3 6 15 18 35 38 
95-  3 6 16 17 36 37 
96-  3 6 19 22 31 34 
97-  3 6 20 21 32 33 
98-  3 7 16 24 31 39 
99-  3 8 15 23 32 39 
100-  4 5 7 10 28 29 
101-  4 5 8 9 27 30 
102-  4 5 11 14 24 25 
103-  4 5 12 13 23 26 
104-  4 5 15 18 36 37 
105-  4 5 16 17 35 38 
106-  4 5 19 22 32 33 
107-  4 5 20 21 31 34 
108-  4 6 7 9 24 26 
109-  4 6 8 10 23 25 
110-  4 6 11 13 28 30 
111-  4 6 12 14 27 29 
112-  4 6 15 17 32 34 
113-  4 6 16 18 31 33 
114-  4 6 19 21 36 38 
115-  4 6 20 22 35 37 
116-  4 9 16 23 34 39 
117-  4 10 15 24 33 39 
118-  4 32 7 18 25 39 
119-  5 6 7 8 11 12 
120-  5 6 9 10 13 14 
121-  5 6 15 16 19 20 
122-  5 6 17 18 21 22 
123-  5 6 23 24 27 28 
124-  5 6 25 26 29 30 
125-  5 6 31 32 35 36 
126-  5 6 33 34 37 38 
127-  6 21 8 24 35 39 
128-  6 22 7 23 36 39 
129-  7 8 15 16 25 26 
130-  7 8 17 18 23 24 
131-  7 8 19 20 27 28 
132-  7 8 21 22 29 30 
133-  7 9 11 13 31 32 
134-  7 9 12 14 33 34 
135-  7 9 15 17 27 29 
136-  7 9 16 18 28 30 
137-  7 10 11 14 35 36 
138-  7 10 12 13 37 38 
139-  7 10 15 18 31 34 
140-  7 10 16 17 32 33 
141-  7 10 19 21 23 26 
142-  7 10 20 22 24 25 
143-  7 11 17 21 25 28 
144-  7 11 18 22 26 27 
145-  7 12 15 20 23 30 
146-  7 12 16 19 29 35 
147-  7 13 15 21 24 36 
148-  7 14 17 20 26 31 
149-  8 7 31 33 35 37 
150-  8 7 32 34 36 38 
151-  8 9 11 14 37 38 
152-  8 9 12 13 35 36 
153-  8 10 11 13 33 34 
154-  8 10 12 14 31 32 
155-  8 10 15 17 28 30 
156-  8 10 16 18 27 29 
157-  8 10 19 22 35 38 
158-  8 10 20 21 36 37 
159-  8 11 15 20 24 29 
160-  8 11 16 19 23 30 
161-  8 12 17 21 26 27 
162-  8 12 18 22 25 28 
163-  8 13 17 20 25 32 
164-  8 13 18 19 26 31 
165-  8 17 4 29 31 39 
166-  9 10 15 16 35 37 
167-  9 10 17 18 25 26 
168-  9 10 19 20 29 30 
169-  9 10 21 22 27 28 
170-  9 10 23 24 31 36 
171-  9 11 17 19 24 33 
172-  9 11 18 20 23 35 
173-  9 11 25 27 34 36 
174-  9 12 15 22 24 32 
175-  9 12 16 21 25 31 
176-  9 13 15 19 23 28 
177-  9 13 16 20 24 27 
178-  9 14 17 22 23 30 
179-  9 14 18 21 24 29 
180-  9 18 3 27 37 39 
181-  10 12 17 19 34 36 
182-  10 14 15 19 25 27 
183-  10 14 16 20 23 28 
184-  10 21 3 25 34 39 
185-  11 12 15 16 27 28 
186-  11 12 17 18 29 30 
187-  11 12 19 20 25 26 
188-  11 12 21 22 23 33 
189-  11 13 15 17 26 35 
190-  11 13 16 18 25 37 
191-  11 14 15 18 32 33 
192-  12 6 15 26 31 39 
193-  13 14 15 16 29 30 
194-  13 14 17 18 27 28 
195-  13 14 19 20 33 35 
196-  13 14 21 22 25 26 
197-  13 14 23 24 32 34 
198-  13 14 31 36 37 39 
199-  14 5 16 26 33 39 
200-  15 20 17 21 38 39 
201-  16 18 12 14 36 38 
202-  16 20 17 22 29 34 
203-  16 21 10 11 24 38 
204-  17 5 7 30 34 39 
205-  17 20 23 27 33 37 
206-  18 21 10 13 30 32 
207-  18 22 15 19 29 39 
208-  20 22 8 9 26 33 
209-  24 26 11 28 34 39 
210-  24 26 12 14 35 37 
211-  27 29 19 21 31 32 
212-  27 29 20 25 35 39 
213-  27 30 11 14 20 21 
214-  27 30 12 13 19 22 
215-  33 38 7 11 29 39 
216-  37 38 15 19 24 30 

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#3 Post by Aacini » 26 May 2023 00:12

3- Change two repetitions method to variables

I liked the relatively short time (15 minutes) that generating combinations with TWO repetitions takes. However, I immediately tried to optimize such a method. I changed the auxiliary files used in "accumulated searchs" to environment variables hoping that the multiple accesses to memory was faster than to disk. I also wanted to avoid the use of findstr command to perform succesive searchs, so I defined a series of variables that (I thought) do an equivalent task. The development of such a method was difficult because the multiple cases where the numbers could be repeated required the definition of several specific accumulative variables. For example, if I take the two base numbers as the repeated numbers, I need to know if the other 4 numbers cause to have 3 repeated ones with other line, but this could happen in 4 different cases: numbers 1,2,3 are repeated, or numbers 1,2,4 are repeated, or numbers 1,3,4 are repeated, or numbers 2,3,4 are repeated. The code below have a more extensive description of this problem.

After completed the change and run the program, it generated more than 216 lines. A simple visual test show that there were lines with more than 2 repeated elements. I need a precise method to know which lines fail in order to identify the problem, so I wrote a program that check that all lines in the result be correct:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem Check that the set of generated lines have no more than 2 repeated elements between *all* lines
rem The format of input file include a line number: line- #1 #2 #3 #4 #5 #6

if "%~1" equ "" echo Usage: %0 inputFile.txt & goto :EOF
if not exist "%~1" echo File not found: "%~1" & goto :EOF

for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
for /F "skip=4 delims=pR tokens=2" %%a in (
       'reg query hkcu\environment /v temp' ) do set "TAB=%%a"
for /F "tokens=2 delims=0" %%a in ('shutdown /? ^| findstr /BC:E') do if not defined TAB set "TAB=%%a"
set "TAB2=!TAB!"
set /A "rep=2, count=0"
< NUL (
for /F "usebackq tokens=1* delims=,- " %%a in ("%~1") do (
   set /P "=Line %%a!CR!"
   set "header="
   for /F "usebackq tokens=1-7 delims=,- " %%h in ("%~1") do (
      if %%h neq %%a (
         set "reps=0"
         for %%c in (%%b) do set /A "reps+=^!(%%c-%%i)+^!(%%c-%%j)+^!(%%c-%%k)+^!(%%c-%%l)+^!(%%c-%%m)+^!(%%c-%%n)"
         if !reps! gtr %rep% (
            if not defined header (
               set /A count+=1
               if %%a geq 100 set "TAB2="
               echo ---------------------------------
               echo !count!- #%%a!TAB!!TAB2!%%b
               set "header=1"
            )
            echo       #%%h (!reps!^)!TAB!%%i %%j %%k %%l %%m %%n
         )
      )
   )
))

echo/
echo/
echo %count% lines with more than %rep% repeated elements
This program allows me to identify the cause of the problem: when the lines were stored in the File version, the stored data are whole lines that are reviewed by findstr command. In the Variables version I can't store whole lines because I would need to process all numbers one-by-one again. Instead, I defined specific variables that allows a very quick test via IF DEFINED command. For example, to check if 4 variables (p0, p1, p2, p3) have 3 repeated numbers, I defined these 4 variables: p012, p013, p023, p123. However, these variables don't encomprises other values besides these ones so the method fail and allows more than 2 repeated elements in lines.

I completed a series of unpleasant tests and succesive modifications until I got a program that works correctly. The result was not pretty at all and was very far from what I thougth at first:

Code: Select all

@goto begin

https://stackoverflow.com/questions/76219423/batch-script-need-help-generating-lines-of-six-different-numbers-between-1-and

The method is based in TWO BASE NUMBERS that are the numbers that will be *repeated in the currently generated lines*,
plus FOUR "new numbers" that will be inserted, taken from a list of "available numbers" that exclude the two base numbers.
The purpose is that all available numbers does NOT increase beyond 2 the number of repeated numbers with previous lines.

1- Eliminate from "available numbers" the numbers in previously generated lines that already have the two base numbers.

2- Check if the last two next available numbers don't cause to increment more than 2 numbers vs. previously generated lines:

  Previous lines:       01 02 03 04 05 06   Base numbers: 20 and 21. Available numbers at this moment: 11 15 17 18 . . .
  Two bases + new nums: 20 21 11 15 17 18   <- The possible new line.
                                            The 11 and 15 are *always* inserted. The 17 and 18 depends on next tests:
                              p0 p1 p2 p3   <- Positions of *new numbers* (they are position 3, 4, 5 and 6 in new line)
                               X  X  X      If exists a line with p0 == 11 AND p1 == 15 AND p2 == 17: new line have 3 repeated, reject the 17
                               X  X     X   If exists a line with p0 == 11 AND p1 == 15 AND p3 == 18: new line have 3 repeated, reject the 18
                               X     X  X   If exists a line with p0 == 11 AND p2 == 17 AND p3 == 18: new line have 3 repeated, reject the 18
                                  X  X  X   If exists a line with p1 == 15 AND p2 == 17 AND p3 == 18: new line have 3 repeated, reject the 18

3- *ADDENDUM* The change from File to Vars method introduced several problems. In File method *complete lines* are saved to file, so findstr
              always review *all numbers* in previous lines. The method described in point 2- above exclude a third or fourth new number
              when they accumulate more than 2 repeated elements vs previous lines. This method **ASSUMES** that the two base numbers of the
              new line are NOT included in the previous lines. Unfortunately, this is not true. In order for this method to work it is necessary
              to insert several additional variables (as strings stored in arrays with name "p[") that are described in the code below...

4- Eliminate from "available numbers" the numbers in previously generated lines that have the new inserted number and *anyone* of base numbers.

:begin
@echo off
setlocal EnableDelayedExpansion

echo Combinations with TWO repeated elements:
echo/

rem/> lines.txt
set "start=%time:~0,-3%"
set /A lines=0, base1=1, base2=1
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"

:nextBase
   set /A base2+=1
   if %base2% gtr 39 set /A base1+=1, base2=1
   if %base1% gtr 34 goto endLine

   set /P "=%base1% %base2% !CR!" < NUL

   :nextLine
      set "line=%base1% %base2%"
      set "linesInBases=0"

      rem Initialize available numbers
      set "avail= "
      for /L %%n in (1,1,39) do set "avail=!avail!%%n "
      set "avail=!avail: %base1% = !"
      set "avail=!avail: %base2% = !"
      set "numAvail=37"

      rem Remove available numbers that appears in previous lines for *these same* two base numbers
      for /F "delims=x=" %%a in ('2^>NUL set x  ^|  findstr /C:" %base1% "  ^|  findstr /C:" %base2% "') do (
         for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
            set "avail=!avail: %%b = !" & set /A "numAvail-=1"
         )
      )
      if %numAvail% lss 4 goto endLine

      rem Insert 4 numbers taking they from available ones
      set "nums=0"
      :nextNum
         for /F "tokens=1*" %%x in ("%avail%") do set "num=%%x"  &  set "avail= %%y"  &  set /A numAvail-=1

         rem Count in how many previous lines the last four new numbers appears; limit it to 2 (see count scheme above)
         if %nums% equ 0 (
            set "p0=%num%"
         ) else if %nums% equ 1 (
            set "p1=%num%"
            set "p01=%p0%_%num%"
         ) else if %nums% equ 2 (
            set "p2=%num%"
            if defined r%p01%_%num% goto endNum
            set "p012=%p01%_%num%"

            if defined p[%p01%] if "!p[%p01%]: %num% =!" neq "!p[%p01%]!" goto endNum
            set "p02=%p0%_%num%"
            set "p12=%p1%_%num%"
         ) else ( rem %nums% equ 3
            if defined r%p01%_%num% goto endNum
            set "p013=%p01%_%num%"
            if defined r%p02%_%num% goto endNum
            set "p023=%p02%_%num%"
            if defined r%p12%_%num% goto endNum
            set "p123=%p12%_%num%"

            if defined p[%p01%] if "!p[%p01%]: %num% =!" neq "!p[%p01%]!" goto endNum
            if defined p[%p02%] (
               if "!p[%p02%]: %p1% =!" neq "!p[%p02%]!" goto endNum
               if "!p[%p02%]: %num% =!" neq "!p[%p02%]!" goto endNum
            )
            if defined p[%p0%_%num%] (  rem p03
               if "!p[%p0%_%num%]: %p1% =!" neq "!p[%p0%_%num%]!" goto endNum
               if "!p[%p0%_%num%]: %p2% =!" neq "!p[%p0%_%num%]!" goto endNum
            )
            if defined p[%p12%] (
               if "!p[%p12%]: %p0% =!" neq "!p[%p12%]!" goto endNum
               if "!p[%p12%]: %num% =!" neq "!p[%p12%]!" goto endNum
            )
            if defined p[%p1%_%num%] (  rem p13
               if "!p[%p1%_%num%]: %p0% =!" neq "!p[%p1%_%num%]!" goto endNum
               if "!p[%p1%_%num%]: %p2% =!" neq "!p[%p1%_%num%]!" goto endNum
            )
            if defined p[%p2%_%num%] (  rem p23
               if "!p[%p2%_%num%]: %p0% =!" neq "!p[%p2%_%num%]!" goto endNum
               if "!p[%p2%_%num%]: %p1% =!" neq "!p[%p2%_%num%]!" goto endNum
            )
         )
         set "r%p012%=1"
         set "r%p013%=1"
         set "r%p023%=1"
         set "r%p123%=1"

         set "line=%line% %num%"  &  set /A nums+=1

         rem Remove available numbers that appears in previous lines for this new number *and anyone* of two base numbers
         for /F "delims=x=" %%a in ('2^>NUL set x  ^|  findstr /C:" %num% "  ^|  findstr /C:" %base1% " /C:" %base2% "') do (
            for %%b in (%%a) do if "!avail!" neq "!avail: %%b = !" (
               set "avail=!avail: %%b = !" & set /A "numAvail-=1"
            )
         )

      :endNum

      if %nums% lss 4 if %numAvail% gtr 0 goto nextNum
                                                                    rem In the code below, giving "line=8 17 29 31 34 39" we need to define:
                                                                    rem          set "p[8_17]= 29 31 34 39"
                                                                    rem    set "p[8_29]= 31 34 39 "   set "p[17_29]= 31 34 39 "
                                                                    rem    set "p[8_31]= 34 39 29 "   set "p[17_31]= 34 39 29 "
                                                                    rem    set "p[8_34]= 39 29 31 "   set "p[17_34]= 39 29 31 "
                                                                    rem    set "p[8_39]= 29 31 34 "   set "p[17_39]= 29 31 34 "
                                                                    rem Posterior lines must *add* their numbers, without duplicates
      if %nums% equ 4 (
         set /A lines+=1, linesInBases+=1
         echo !lines!- %line%
         >> lines.txt echo !lines!- %line%

         set "x %line% =1"
         for /F "tokens=1,2*" %%i in ("%line%") do (                rem "line=8 17 29 31 34 39":  i=8,  j=17,  "k=29 31 34 39"
            if not defined p[%%i_%%j] (                             rem p[8_17]
               set "p[%%i_%%j]= %%k "                             & rem "p[8_17] = 29 31 34 39 "
            ) else (                                                rem "p[8_17] = X Y Z "   have previous values
               for %%z in (%%k) do (                                rem "k=29 31 34 39": z=29,31,34,39
                  set "p[%%i_%%j]=!p[%%i_%%j]: %%z = !%%z "       & rem "p[8_17]= X Y Z 29 31 34 39 "   no duplicates
               )
            )
            set "nums=%%k"                                        & rem "nums=29 31 34 39"
            for %%b in (%%i %%j) do for /L %%# in (1,1,4) do (      rem b=8, b=17: repeat the next 4 times:
               for /F "tokens=1*" %%x in ("!nums!") do (            rem "nums=29 31 34 39":  x=29,  "y=31 34 39"
                  if not defined p[%%b_%%x] (                       rem p[8_29], p[8_31],... p[8_39], p[17_29],... p[17_39]
                     set "p[%%b_%%x]= %%y "                       & rem "p[8_29]= 31 34 39 ", "p[8_31]= 34 39 29", ...
                  ) else (                                          rem "p[8_29]= X Y Z "   have previous values
                     for %%z in (%%y) do (                          rem "y=31 34 39": z=31,34,39
                        set "p[%%b_%%x]=!p[%%b_%%x]: %%z = !%%z " & rem "p[8_29]= X Y Z 31 34 39 "   no duplicates
                     )
                  )
                  set "nums=%%y %%x"                              & rem "nums=31 34 39 29"   for next cycle
               )
            )
         )

      )

   if %linesInBases% gtr 0 goto nextLine

   :endLine

if %base1% leq 34 goto nextBase

echo/     
echo %lines% lines created

echo Start: %start%
echo End:   %time:~0,-3%
The program run in 13 minutes, slightly faster than the File version (the first Vars version, the one that failed, run in 6:30 minutes: less than half the time of File version! This was one of the reasons why I was determined to complete a working version of Vars). However, it generated a different number of lines than File version. Moreover, during my multiple tests one of my modifications generated a result with 244 lines. Unfortunately I deleted such a program, but nevertheless I kept the result. Here it is:

Code: Select all

1- 1,2,3,4,5,6
2- 1,2,7,8,9,10
3- 1,2,11,12,13,14
4- 1,2,15,16,17,18
5- 1,2,19,20,21,22
6- 1,2,23,24,25,26
7- 1,2,27,28,29,30
8- 1,2,31,32,33,34
9- 1,2,35,36,37,38
10- 1,3,7,11,15,19
11- 1,3,8,12,16,20
12- 1,3,9,13,17,21
13- 1,3,10,14,18,22
14- 1,3,23,27,31,35
15- 1,3,24,28,32,36
16- 1,3,25,29,33,37
17- 1,3,26,30,34,38
18- 1,4,7,12,17,22
19- 1,4,8,11,18,21
20- 1,4,9,14,15,20
21- 1,4,10,13,16,19
22- 1,4,23,28,33,38
23- 1,4,24,27,34,37
24- 1,4,25,30,31,36
25- 1,4,26,29,32,35
26- 1,5,7,13,18,20
27- 1,5,8,14,17,19
28- 1,5,9,11,16,22
29- 1,5,10,12,15,21
30- 1,5,23,29,34,36
31- 1,5,24,30,33,35
32- 1,5,25,27,32,38
33- 1,5,26,28,31,37
34- 1,6,7,14,16,21
35- 1,6,8,13,15,22
36- 1,6,9,12,18,19
37- 1,6,10,11,17,20
38- 1,6,23,30,32,37
39- 1,6,24,29,31,38
40- 1,6,25,28,34,35
41- 1,6,26,27,33,36
42- 2,3,7,12,18,21
43- 2,3,8,11,17,22
44- 2,3,9,14,16,19
45- 2,3,10,13,15,20
46- 2,3,23,28,34,37
47- 2,3,24,27,33,38
48- 2,3,25,30,32,35
49- 2,3,26,29,31,36
50- 2,4,7,11,16,20
51- 2,4,8,12,15,19
52- 2,4,9,13,18,22
53- 2,4,10,14,17,21
54- 2,4,23,27,32,36
55- 2,4,24,28,31,35
56- 2,4,25,29,34,38
57- 2,4,26,30,33,37
58- 2,5,7,14,15,22
59- 2,5,8,13,16,21
60- 2,5,9,12,17,20
61- 2,5,10,11,18,19
62- 2,5,23,30,31,38
63- 2,5,24,29,32,37
64- 2,5,25,28,33,36
65- 2,5,26,27,34,35
66- 2,6,7,13,17,19
67- 2,6,8,14,18,20
68- 2,6,9,11,15,21
69- 2,6,10,12,16,22
70- 2,6,23,29,33,35
71- 2,6,24,30,34,36
72- 2,6,25,27,31,37
73- 2,6,26,28,32,38
74- 3,4,7,8,13,14
75- 3,4,9,10,11,12
76- 3,4,15,16,21,22
77- 3,4,17,18,19,20
78- 3,4,23,24,29,30
79- 3,4,25,26,27,28
80- 3,4,31,32,37,38
81- 3,4,33,34,35,36
82- 3,5,7,9,23,25
83- 3,5,8,10,24,26
84- 3,5,11,13,27,29
85- 3,5,12,14,28,30
86- 3,5,15,17,31,33
87- 3,5,16,18,32,34
88- 3,5,19,21,35,37
89- 3,5,20,22,36,38
90- 3,6,7,10,27,30
91- 3,6,8,9,28,29
92- 3,6,11,14,23,26
93- 3,6,12,13,24,25
94- 3,6,15,18,35,38
95- 3,6,16,17,36,37
96- 3,6,19,22,31,34
97- 3,6,20,21,32,33
98- 3,7,16,24,31,39
99- 3,8,15,23,32,39
100- 3,9,18,26,33,39
101- 3,10,17,25,34,39
102- 3,11,20,28,35,39
103- 3,12,19,27,36,39
104- 3,13,22,30,37,39
105- 3,14,21,29,38,39
106- 4,5,7,10,28,29
107- 4,5,8,9,27,30
108- 4,5,11,14,24,25
109- 4,5,12,13,23,26
110- 4,5,15,18,36,37
111- 4,5,16,17,35,38
112- 4,5,19,22,32,33
113- 4,5,20,21,31,34
114- 4,6,7,9,24,26
115- 4,6,8,10,23,25
116- 4,6,11,13,28,30
117- 4,6,12,14,27,29
118- 4,6,15,17,32,34
119- 4,6,16,18,31,33
120- 4,6,19,21,36,38
121- 4,6,20,22,35,37
122- 4,7,15,25,33,39
123- 4,8,16,26,34,39
124- 4,9,17,23,31,39
125- 4,10,18,24,32,39
126- 4,11,19,29,37,39
127- 4,12,20,30,38,39
128- 4,13,21,27,35,39
129- 4,14,22,28,36,39
130- 5,6,7,8,11,12
131- 5,6,9,10,13,14
132- 5,6,15,16,19,20
133- 5,6,17,18,21,22
134- 5,6,23,24,27,28
135- 5,6,25,26,29,30
136- 5,6,31,32,35,36
137- 5,6,33,34,37,38
138- 5,7,17,26,32,39
139- 5,8,18,25,31,39
140- 5,9,15,24,34,39
141- 5,10,16,23,33,39
142- 5,11,21,30,36,39
143- 5,12,22,29,35,39
144- 5,13,19,28,38,39
145- 5,14,20,27,37,39
146- 6,7,18,23,34,39
147- 6,8,17,24,33,39
148- 6,9,16,25,32,39
149- 6,10,15,26,31,39
150- 6,11,22,27,38,39
151- 6,12,21,28,37,39
152- 6,13,20,29,36,39
153- 6,14,19,30,35,39
154- 7,8,15,16,27,28
155- 7,8,17,18,29,30
156- 7,8,19,20,23,24
157- 7,8,21,22,25,26
158- 7,8,31,33,35,37
159- 7,8,32,34,36,38
160- 7,9,11,13,31,32
161- 7,9,12,14,33,34
162- 7,9,15,17,35,36
163- 7,9,16,18,37,38
164- 7,9,19,21,27,29
165- 7,9,20,22,28,30
166- 7,10,11,14,35,38
167- 7,10,12,13,36,37
168- 7,11,17,21,23,28
169- 7,11,18,22,24,33
170- 7,11,25,30,34,37
171- 7,12,15,20,26,29
172- 7,12,16,19,25,35
173- 7,13,15,21,24,30
174- 7,13,16,22,23,29
175- 7,14,17,20,25,31
176- 7,14,18,19,26,28
177- 8,9,11,14,36,37
178- 8,9,12,13,35,38
179- 8,10,11,13,33,34
180- 8,10,12,14,31,32
181- 8,10,15,17,37,38
182- 8,10,16,18,35,36
183- 8,10,19,21,28,30
184- 8,10,20,22,27,29
185- 8,11,15,20,30,31
186- 8,11,19,26,27,32
187- 8,12,17,21,27,34
188- 8,12,18,22,23,28
189- 8,13,17,20,26,28
190- 8,13,24,27,31,36
191- 8,14,15,24,29,35
192- 8,14,16,22,30,33
193- 9,10,15,16,29,30
194- 9,10,17,18,27,28
195- 9,10,19,20,25,26
196- 9,10,21,22,23,24
197- 9,10,31,33,36,38
198- 9,10,32,34,35,37
199- 9,11,17,19,24,30
200- 9,11,18,20,23,29
201- 9,11,25,27,33,35
202- 9,12,15,22,25,31
203- 9,12,16,21,26,36
204- 9,13,15,19,23,33
205- 9,13,16,20,27,34
206- 9,14,17,22,26,29
207- 9,14,18,21,25,30
208- 10,11,15,22,28,32
209- 10,11,16,21,25,31
210- 10,12,17,19,23,29
211- 10,12,20,24,28,33
212- 10,13,17,22,31,35
213- 10,13,18,21,26,29
214- 10,14,15,19,27,34
215- 10,14,20,23,30,36
216- 11,12,15,16,23,24
217- 11,12,17,18,25,26
218- 11,12,28,29,31,34
219- 11,13,15,25,36,38
220- 11,13,16,26,35,37
221- 11,16,17,29,32,33
222- 11,19,22,23,35,36
223- 11,20,21,24,26,38
224- 12,13,15,18,27,32
225- 12,14,17,24,36,38
226- 12,18,24,30,31,37
227- 12,19,22,26,37,38
228- 12,20,23,25,32,34
229- 13,14,17,18,23,37
230- 13,14,19,25,29,32
231- 13,14,21,28,31,33
232- 13,18,19,24,34,35
233- 14,15,21,26,32,37
234- 14,16,23,28,32,35
235- 14,16,26,27,31,38
236- 15,17,22,23,27,30
237- 15,18,28,30,33,34
238- 15,19,24,25,28,37
239- 16,17,22,24,28,34
240- 16,18,24,25,27,29
241- 17,20,24,27,32,35
242- 18,19,21,23,31,32
243- 18,22,26,30,32,36
244- 20,22,23,26,31,33
These results confuses me. Supposedly, a well defined method should generate precise results. However, this problem is extremely complex because the amount of involved data, so small differences in the code could produce very different results (like Chaos theory! :D ). How could I be sure that a program generates all the possible valid lines?

On the other hand, when I was changing the File method to Vars, the OP was still using a rudimentary program based on random numbers which he called "a working solution" ("Magoo, sincerely, thank you. The first script is really perfect. Speed wasn't ever necessary in this, time there is plenty. The second was a truly inspired idea, but really the first is perfect"). It occurred to me that I could show the OP a very clear example of the infeasibility of using random numbers to solve this problem.

Finally, I was looking for a method that allows me to change the last number N, the number K of elements, and the number R of repeated elements to any other values in a relatively simple way, because the OP was requesting it. My solutions so far required an extensive code rewrite in order to change the K or R values. I need a simpler-to-write method that could be assembled in "sections" so it can be automatically generated via a program.

For all these reasons I thought of developing a "brute force" method...

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#4 Post by Aacini » 26 May 2023 00:12

4- "Brute force" method

I wrote a first version of a "brute force" method program that, giving the max number N, generates all the combinations with K numbers, count the number of elements in common with each previous accepted line, and accept new lines when this number is less or equal R. The total number of combinations is given by the formula N!/(K!*(N-K)!) that gives 3262623 combinations for N=39 and K=6. The method used to generate the combinations consists on 6 (=K) number of nested FOR /L commands, each one varying its index up to 39 (=N) value. Generated lines are stored in variables in order to not use (slower) files nor findstr command, but fast SET command. The number of common elements is counted via a FOR over elements of previously generated lines that review in one step one element versus the entire other line (six iterations). The review of the rest of the lines is "cancelled" (ignored via an IF command) when a line with more than Rep repeated elements is found. Plain and simple, in order to make it as fast as possible:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem "Brute force" method to get all combinations of N numbers in sets of 6 elements
rem with a maximum of Rep repetitions

rem Set last number and max number of repetitions
set /A n=39, rep=2
rem In this version the number of elements in combinations (K) is fixed (hardcoded) to 6!

rem/> lines.txt
set /A nM1=n-1, nM2=n-2, nM3=n-3, nM4=n-4, nM5=n-5,  lines=0
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
set "start=%time:~0,-3%"

< NUL 2> NUL (
for /L %%i in (1,1,%nM5%) do (
   echo !time:~0,-3!          
   set /A iP1=%%i+1
   for /L %%j in (!iP1!,1,%nM4%) do (
      set /A jP1=%%j+1
      for /L %%k in (!jP1!,1,%nM3%) do (
         set /A kP1=%%k+1
         for /L %%l in (!kP1!,1,%nM2%) do (
            set /A lP1=%%l+1
            for /L %%m in (!lP1!,1,%nM1%) do (
               set /A mP1=%%m+1
               for /L %%n in (!mP1!,1,%n%) do (
                  set /P "=%%i %%j %%k %%l %%m %%n      !CR!"
                  set "reps=0"
                  for /F "tokens=2 delims=[]" %%a in ('set c[') do (
                     if !reps! leq %rep% (
                        for %%b in (%%a) do set /A "reps+=^!(%%b-%%i)+^!(%%b-%%j)+^!(%%b-%%k)+^!(%%b-%%l)+^!(%%b-%%m)+^!(%%b-%%n)"
                        if !reps! leq %rep% set "reps=0"
                     )
                  )
                  if !reps! leq %rep% (
                     set /A lines+=1
                     echo !lines!- %%i %%j %%k %%l %%m %%n      
                     >> lines.txt echo !lines!- %%i %%j %%k %%l %%m %%n
                     set "c[%%i,%%j,%%k,%%l,%%m,%%n]=1"
                  )
               )
            )
         )
      )
   )
))

echo Start: %start%
echo End:   %time:~0,-3%
I run this program at the same time I continue using the computer for other tasks, and stop it after almost 16 hours when 121 lines were generated (and the 3 first parts from 39 are processed, see results below). Given the huge number of combinations (3262623) I wanted to know exactly how many combinations are generated in each part with the same first number. I wrote an auxiliary program and create the table below (that include the time taken by the program in the first 3 parts):

Code: Select all

Part 1:  from 1,2,3,4,5,6       to 1,35,36,37,38,39  (501942 combinations) took 5:30 hours (5.5 decimal)
Part 2:  from 2,3,4,5,6,7       to 2,35,36,37,38,39  (435897 combinations) took 4:56 hours (4.93 decimal)
Part 3:  from 3,4,5,6,7,8       to 3,35,36,37,38,39  (376992 combinations) took 5:15 hours (with busy computer)
Part 4:  from 4,5,6,7,8,9       to 4,35,36,37,38,39  (324632 combinations)
Part 5:  from 5,6,7,8,9,10      to 5,35,36,37,38,39  (278256 combinations)
Part 6:  from 6,7,8,9,10,11     to 6,35,36,37,38,39  (237336 combinations)
Part 7:  from 7,8,9,10,11,12    to 7,35,36,37,38,39  (201376 combinations)
Part 8:  from 8,9,10,11,12,13   to 8,35,36,37,38,39  (169911 combinations)
Part 9:  from 9,10,11,12,13,14  to 9,35,36,37,38,39  (142506 combinations)
Part 10: from 10,11,12,13,14,15 to 10,35,36,37,38,39 (118755 combinations)
Part 11: from 11,12,13,14,15,16 to 11,35,36,37,38,39 ( 98280 combinations)
Part 12: from 12,13,14,15,16,17 to 12,35,36,37,38,39 ( 80730 combinations)
Part 13: from 13,14,15,16,17,18 to 13,35,36,37,38,39 ( 65780 combinations)
Part 14: from 14,15,16,17,18,19 to 14,35,36,37,38,39 ( 53130 combinations)
Part 15: from 15,16,17,18,19,20 to 15,35,36,37,38,39 ( 42504 combinations)
Part 16: from 16,17,18,19,20,21 to 16,35,36,37,38,39 ( 33649 combinations)
Part 17: from 17,18,19,20,21,22 to 17,35,36,37,38,39 ( 26334 combinations)
Part 18: from 18,19,20,21,22,23 to 18,35,36,37,38,39 ( 20349 combinations)
Part 19: from 19,20,21,22,23,24 to 19,35,36,37,38,39 ( 15504 combinations)
Part 20: from 20,21,22,23,24,25 to 20,35,36,37,38,39 ( 11628 combinations)
Part 21: from 21,22,23,24,25,26 to 21,35,36,37,38,39 (  8568 combinations)
Part 22: from 22,23,24,25,26,27 to 22,35,36,37,38,39 (  6188 combinations)
Part 23: from 23,24,25,26,27,28 to 23,35,36,37,38,39 (  4368 combinations)
Part 24: from 24,25,26,27,28,29 to 24,35,36,37,38,39 (  3003 combinations)
Part 25: from 25,26,27,28,29,30 to 25,35,36,37,38,39 (  2002 combinations)
Part 26: from 26,27,28,29,30,31 to 26,35,36,37,38,39 (  1287 combinations)
Part 27: from 27,28,29,30,31,32 to 27,35,36,37,38,39 (   792 combinations)
Part 28: from 28,29,30,31,32,33 to 28,35,36,37,38,39 (   462 combinations)
Part 29: from 29,30,31,32,33,34 to 29,35,36,37,38,39 (   252 combinations)
Part 30: from 30,31,32,33,34,35 to 30,35,36,37,38,39 (   126 combinations)
Part 31: from 31,32,33,34,35,36 to 31,35,36,37,38,39 (    56 combinations)
Part 32: from 32,33,34,35,36,37 to 32,35,36,37,38,39 (    21 combinations)
Part 33: from 33,34,35,36,37,38 to 33,35,36,37,38,39 (     6 combinations)
Part 34: from 34,35,36,37,38,39 to the same          (      1 combination)
Just the second set was generated when the computer was not used in any other task, but I think anyway that the first 3 times are representative of the process. These results makes me guess that the whole process could take about 60 hours, perhaps less...

What's the next step? Optimize the "brute force" method, obviously! 8)

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#5 Post by Aacini » 26 May 2023 12:19

5- "Brute force" optimization

I analyzed the "brute force" method and introduced optimizations in two main points:

Cancel loops at end. Lets pay attention to %%i, %%j and %%k loops. When a valid line is found, no other valid line with the same %%k value will be found (because %%i, %%j and %%k would be 3 repeated values). In this way, as soon as a new line is generated, the current iteration of %%k (and all nested loops inside it) is cancelled in order to pass to the next value of %%k. This is achieved via the method described at The ultimate While loop. This trick allows to omit three nested loops that may represent a very large number of null iterations depending on the point, in the loops, where the valid line was found.

Omit repeated numbers at beginning. Normally, a FOR /L loop process all its values. However, after a valid line is found we know that its last four numbers can't appear in any posterior line at indices %%k, %%l, %%m and %%n (with the same indices %%i and %%j). In this way, we could assemble a list of "already used/not use" numbers for each (%%i,%%j) pair that will be omitted in each one of the last four nested FOR /L loops. Again, depending on the point in which these values are checked, this list of not-use numbers could eliminate a lot of null iterations.

I also eliminated the environment variables with generated lines and revert to use files and findstr command. It seems that the large number of variables slow down the process in a major grade than using disk files.

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem "Brute force" *optimized* method to get all combinations of N numbers in sets of 6 elements
rem with a maximum of Rep repetitions
rem 1- Cancel loops at end
rem 2- Omit repeated numbers at beginning

rem Dispatcher for nested FOR in another CMD.exe
if "%1" equ "ProcessIndicesLMN" goto ProcessIndicesLMN

rem Set last number and max number of repetitions
set /A n=39, rep=2
rem In this version the number of elements in combination is fixed to 6!

set /A nM1=n-1, nM2=n-2, nM3=n-3, nM4=n-4, nM5=n-5,  lines=0
rem/> lines.txt
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
set "start=%time:~0,-3%"

< NUL 2> NUL (
for /L %%i in (1,1,%nM5%) do (
   echo !time:~0,-3!          
   set /A iP1=%%i+1
   for /L %%j in (!iP1!,1,%nM4%) do (
      set "not=,0,"
      for /F "delims=" %%a in ('findstr ",%%i," lines.txt ^| findstr ",%%j,"') do (
         for %%b in (%%a) do set "not=!not:,%%b,=,!%%b,"
      )
      set /A jP1=%%j+1
      for /L %%k in (!jP1!,1,%nM3%) do if "!not:,%%k,=!" equ "!not!" (
         del aNewLine.txt
         cmd /Q /C "%~F0" ProcessIndicesLMN %%i %%j %%k
         if exist aNewLine.txt (
            set /P "line=" < aNewLine.txt
            set /A lines+=1
            echo !lines!- !line!
            >> lines.txt echo ,!line!,
            for %%b in (!line!) do set "not=!not:,%%b,=,!%%b,"
         )
      )
   )
))

echo Start: %start%
echo End:   %time:~0,-3%
goto :EOF


                         %1   %2   %3   %4
         :ProcessIndicesLMN  %%i  %%j  %%k

         set /A kP1=%4+1
         for /L %%l in (!kP1!,1,%nM2%) do if "!not:,%%l,=!" equ "!not!" (
            set /A lP1=%%l+1
            for /L %%m in (!lP1!,1,%nM1%) do if "!not:,%%m,=!" equ "!not!" (
               set /A mP1=%%m+1
               for /L %%n in (!mP1!,1,%n%) do if "!not:,%%n,=!" equ "!not!" (

                  set /P "=%2,%3,%4,%%l,%%m,%%n      !CR!"
                  set "reps=0"
                  for /F "delims=" %%a in (lines.txt) do if !reps! leq %rep% (
                     for %%b in (%%a) do set /A "reps+=^!(%%b-%2)+^!(%%b-%3)+^!(%%b-%4)+^!(%%b-%%l)+^!(%%b-%%m)+^!(%%b-%%n)"
                     if !reps! leq %rep% set "reps=0"
                  )
                  if !reps! leq %rep% (
                     > aNewLine.txt echo %2,%3,%4,%%l,%%m,%%n
                     exit
                  )

               )
            )
         )

         exit
I run this code overnight and it takes 4:45 HOURS!!! to generate 244 lines, that is, less than the time that the original program tooks to generate the first of the 39 parts! We can make a very rough estimate of the time it would take for the original program to generate and review all the combinations. The second part (the only one that was generated with the computer not busy in other tasks) took 4.93 hours, so it is possible that the whole 33262623 combinations be processed in 37 hours (although I think it would be more). Lets get 40 hours. This means that my optimizations eliminated 88% of the original run time! :D

This is the result:

Code: Select all

1- 1,2,3,4,5,6
2- 1,2,7,8,9,10
3- 1,2,11,12,13,14
4- 1,2,15,16,17,18
5- 1,2,19,20,21,22
6- 1,2,23,24,25,26
7- 1,2,27,28,29,30
8- 1,2,31,32,33,34
9- 1,2,35,36,37,38
10- 1,3,7,11,15,19
11- 1,3,8,12,16,20
12- 1,3,9,13,17,21
13- 1,3,10,14,18,22
14- 1,3,23,27,31,35
15- 1,3,24,28,32,36
16- 1,3,25,29,33,37
17- 1,3,26,30,34,38
18- 1,4,7,12,17,22
19- 1,4,8,11,18,21
20- 1,4,9,14,15,20
21- 1,4,10,13,16,19
22- 1,4,23,28,33,38
23- 1,4,24,27,34,37
24- 1,4,25,30,31,36
25- 1,4,26,29,32,35
26- 1,5,7,13,18,20
27- 1,5,8,14,17,19
28- 1,5,9,11,16,22
29- 1,5,10,12,15,21
30- 1,5,23,29,34,36
31- 1,5,24,30,33,35
32- 1,5,25,27,32,38
33- 1,5,26,28,31,37
34- 1,6,7,14,16,21
35- 1,6,8,13,15,22
36- 1,6,9,12,18,19
37- 1,6,10,11,17,20
38- 1,6,23,30,32,37
39- 1,6,24,29,31,38
40- 1,6,25,28,34,35
41- 1,6,26,27,33,36
42- 2,3,7,12,18,21
43- 2,3,8,11,17,22
44- 2,3,9,14,16,19
45- 2,3,10,13,15,20
46- 2,3,23,28,34,37
47- 2,3,24,27,33,38
48- 2,3,25,30,32,35
49- 2,3,26,29,31,36
50- 2,4,7,11,16,20
51- 2,4,8,12,15,19
52- 2,4,9,13,18,22
53- 2,4,10,14,17,21
54- 2,4,23,27,32,36
55- 2,4,24,28,31,35
56- 2,4,25,29,34,38
57- 2,4,26,30,33,37
58- 2,5,7,14,15,22
59- 2,5,8,13,16,21
60- 2,5,9,12,17,20
61- 2,5,10,11,18,19
62- 2,5,23,30,31,38
63- 2,5,24,29,32,37
64- 2,5,25,28,33,36
65- 2,5,26,27,34,35
66- 2,6,7,13,17,19
67- 2,6,8,14,18,20
68- 2,6,9,11,15,21
69- 2,6,10,12,16,22
70- 2,6,23,29,33,35
71- 2,6,24,30,34,36
72- 2,6,25,27,31,37
73- 2,6,26,28,32,38
74- 3,4,7,8,13,14
75- 3,4,9,10,11,12
76- 3,4,15,16,21,22
77- 3,4,17,18,19,20
78- 3,4,23,24,29,30
79- 3,4,25,26,27,28
80- 3,4,31,32,37,38
81- 3,4,33,34,35,36
82- 3,5,7,9,23,25
83- 3,5,8,10,24,26
84- 3,5,11,13,27,29
85- 3,5,12,14,28,30
86- 3,5,15,17,31,33
87- 3,5,16,18,32,34
88- 3,5,19,21,35,37
89- 3,5,20,22,36,38
90- 3,6,7,10,27,30
91- 3,6,8,9,28,29
92- 3,6,11,14,23,26
93- 3,6,12,13,24,25
94- 3,6,15,18,35,38
95- 3,6,16,17,36,37
96- 3,6,19,22,31,34
97- 3,6,20,21,32,33
98- 3,7,16,24,31,39
99- 3,8,15,23,32,39
100- 3,9,18,26,33,39
101- 3,10,17,25,34,39
102- 3,11,20,28,35,39
103- 3,12,19,27,36,39
104- 3,13,22,30,37,39
105- 3,14,21,29,38,39
106- 4,5,7,10,28,29
107- 4,5,8,9,27,30
108- 4,5,11,14,24,25
109- 4,5,12,13,23,26
110- 4,5,15,18,36,37
111- 4,5,16,17,35,38
112- 4,5,19,22,32,33
113- 4,5,20,21,31,34
114- 4,6,7,9,24,26
115- 4,6,8,10,23,25
116- 4,6,11,13,28,30
117- 4,6,12,14,27,29
118- 4,6,15,17,32,34
119- 4,6,16,18,31,33
120- 4,6,19,21,36,38
121- 4,6,20,22,35,37
122- 4,7,15,25,33,39
123- 4,8,16,26,34,39
124- 4,9,17,23,31,39
125- 4,10,18,24,32,39
126- 4,11,19,29,37,39
127- 4,12,20,30,38,39
128- 4,13,21,27,35,39
129- 4,14,22,28,36,39
130- 5,6,7,8,11,12
131- 5,6,9,10,13,14
132- 5,6,15,16,19,20
133- 5,6,17,18,21,22
134- 5,6,23,24,27,28
135- 5,6,25,26,29,30
136- 5,6,31,32,35,36
137- 5,6,33,34,37,38
138- 5,7,17,26,32,39
139- 5,8,18,25,31,39
140- 5,9,15,24,34,39
141- 5,10,16,23,33,39
142- 5,11,21,30,36,39
143- 5,12,22,29,35,39
144- 5,13,19,28,38,39
145- 5,14,20,27,37,39
146- 6,7,18,23,34,39
147- 6,8,17,24,33,39
148- 6,9,16,25,32,39
149- 6,10,15,26,31,39
150- 6,11,22,27,38,39
151- 6,12,21,28,37,39
152- 6,13,20,29,36,39
153- 6,14,19,30,35,39
154- 7,8,15,16,27,28
155- 7,8,17,18,29,30
156- 7,8,19,20,23,24
157- 7,8,21,22,25,26
158- 7,8,31,33,35,37
159- 7,8,32,34,36,38
160- 7,9,11,13,31,32
161- 7,9,12,14,33,34
162- 7,9,15,17,35,36
163- 7,9,16,18,37,38
164- 7,9,19,21,27,29
165- 7,9,20,22,28,30
166- 7,10,11,14,35,38
167- 7,10,12,13,36,37
168- 7,11,17,21,23,28
169- 7,11,18,22,24,33
170- 7,11,25,30,34,37
171- 7,12,15,20,26,29
172- 7,12,16,19,25,35
173- 7,13,15,21,24,30
174- 7,13,16,22,23,29
175- 7,14,17,20,25,31
176- 7,14,18,19,26,28
177- 8,9,11,14,36,37
178- 8,9,12,13,35,38
179- 8,10,11,13,33,34
180- 8,10,12,14,31,32
181- 8,10,15,17,37,38
182- 8,10,16,18,35,36
183- 8,10,19,21,28,30
184- 8,10,20,22,27,29
185- 8,11,15,20,30,31
186- 8,11,19,26,27,32
187- 8,12,17,21,27,34
188- 8,12,18,22,23,28
189- 8,13,17,20,26,28
190- 8,13,24,27,31,36
191- 8,14,15,24,29,35
192- 8,14,16,22,30,33
193- 9,10,15,16,29,30
194- 9,10,17,18,27,28
195- 9,10,19,20,25,26
196- 9,10,21,22,23,24
197- 9,10,31,33,36,38
198- 9,10,32,34,35,37
199- 9,11,17,19,24,30
200- 9,11,18,20,23,29
201- 9,11,25,27,33,35
202- 9,12,15,22,25,31
203- 9,12,16,21,26,36
204- 9,13,15,19,23,33
205- 9,13,16,20,27,34
206- 9,14,17,22,26,29
207- 9,14,18,21,25,30
208- 10,11,15,22,28,32
209- 10,11,16,21,25,31
210- 10,12,17,19,23,29
211- 10,12,20,24,28,33
212- 10,13,17,22,31,35
213- 10,13,18,21,26,29
214- 10,14,15,19,27,34
215- 10,14,20,23,30,36
216- 11,12,15,16,23,24
217- 11,12,17,18,25,26
218- 11,12,28,29,31,34
219- 11,13,15,25,36,38
220- 11,13,16,26,35,37
221- 11,16,17,29,32,33
222- 11,19,22,23,35,36
223- 11,20,21,24,26,38
224- 12,13,15,18,27,32
225- 12,14,17,24,36,38
226- 12,18,24,30,31,37
227- 12,19,22,26,37,38
228- 12,20,23,25,32,34
229- 13,14,17,18,23,37
230- 13,14,19,25,29,32
231- 13,14,21,28,31,33
232- 13,18,19,24,34,35
233- 14,15,21,26,32,37
234- 14,16,23,28,32,35
235- 14,16,26,27,31,38
236- 15,17,22,23,27,30
237- 15,18,28,30,33,34
238- 15,19,24,25,28,37
239- 16,17,22,24,28,34
240- 16,18,24,25,27,29
241- 17,20,24,27,32,35
242- 18,19,21,23,31,32
243- 18,22,26,30,32,36
244- 20,22,23,26,31,33
These lines are exactly the same than a previous program generated (although I have not the code; read the story at this reply). This makes me feel confident that now I do have a method to get all the lines that comprises the result to this problem!

Whats Next? Make this method a general one that works for any value of N, K, and R. This is possible because the way this code is written allows a program to generate it simply by changing the number of nested FOR /L commands to the number K of elements in the combinations, and the position where the :ProcessIndices subroutine is called after the Rth nested FOR /L. The only value that don't needs any change is N! :mrgreen:

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#6 Post by Aacini » 02 Jun 2023 11:49

6- "Brute force" further optimization

After looking at the last method, I thought of a further optimization:

Early cutoff values ​​with more than 2 iterations. In the last method above, there is a "not use" vector that is initialized in the FOR %%j loop with numbers on previously generated lines that include both ​​%%i and %%j values. This vector is used in nested FOR %%k, %%l, %%m and %%n loops to skip processing those numbers.

When a value is reached in the last FOR %%n loop, the six FOR loop parameters are compared with all previously generated lines and the numbers in common are counted. If the count is less than or equal to 2 on all lines, the six parameters are inserted into the result as a new line.

We could achieve the same result in a slightly different way. Instead of comparing the six parameters in the last nested FOR, we could, for example, compare and count five parameters in the penultimate nested FOR; if the count is greater than 2, then the last nested FOR is not executed even once. We could also find the generated lines where the penultimate FOR parameter appears in addition to any of the previous FOR parameters and add all these numbers to the "not use" vector. This would further limit the number of elements processed in the following nested FOR.

We could extend this idea to all loops. For example: in the third FOR %%k loop compare and count the lines that have the parameter %%k in addition to %%i or %%j. If the test passes (repeated elements less than or equal to 2), add to the "not use" vector all the numbers on the lines with %%k and any of %%i or %%j. In the fourth FOR %%l loop compare lines with %%l and any of %%i, %%j or %%k and add these numbers to a new "not use in %%M" vector. In the fifth FOR %%m loop, compare lines with %%m and any of %%i, %%j, %%k, or %%l and update the "not use in %%N" vector.

When we get to the last FOR %%n loop, we'll be sure that numbers allowed at this level will not cause the repeated 2 elements to be exceeded, so we don't have to count nothing, just insert the numbers into new result lines! In fact, we don't count any numbers in previous FORs, we just do a "cumulative search" for new numbers in previous lines (and update the different "not use" cumulative vectors) such that the numbers allowed in each nested FOR will always comply with the rule of not exceeding 2 repeated elements.

Although these may seem like a lot of operations, the number of nested FOR's omitted early in the process could save a significant amount of time. "Cumulative searches" can be accomplished very efficiently via a couple of piped findstr commands, the second looking for several numbers (%%i, %%j, %%k, %%l) at once in a single operation. Here it is:

Code: Select all

@echo off
setlocal EnableDelayedExpansion

rem "Brute force" *further optimized* method to get all combinations of N numbers in sets of 6 elements
rem with a maximum of Rep repetitions
rem 1- Cancel loops at end
rem 2- Omit repeated numbers at beginning
rem 3- Early cutoff values with more than 2 repetitions

rem Dispatcher for nested FOR in another CMD.exe
if "%1" equ "ProcessIndicesLMN" goto ProcessIndicesLMN

rem Set last number and max number of repetitions
set /A n=39, rep=2
rem In this version the number of elements in combination is fixed to 6!

set /A nM1=n-1, nM2=n-2, nM3=n-3, nM4=n-4, nM5=n-5,  lines=0
rem/> lines.txt
for /F %%a in ('copy /Z "%~F0" NUL') do set "CR=%%a"
set "start=%time:~0,-3%"

< NUL 2> NUL (
for /L %%i in (1,1,%nM5%) do (
   echo %%i: !time:~0,-3!          
   set /A iP1=%%i+1
   for /L %%j in (!iP1!,1,%nM4%) do (
      set "notK=,0,"
      for /F "delims=" %%a in ('findstr ",%%i," lines.txt ^| findstr ",%%j,"') do (
         for %%b in (%%a) do set "notK=!notK:,%%b,=,!%%b,"
      )
      set /A jP1=%%j+1
      for /L %%k in (!jP1!,1,%nM3%) do if "!notK:,%%k,=!" equ "!notK!" (
         set /P "=%%i,%%j,%%k         !CR!"

         del aNewLine.txt
         cmd /Q /C "%~F0" ProcessIndicesLMN %%i %%j %%k
         if exist aNewLine.txt (
            set /P "line=" < aNewLine.txt
            set /A lines+=1
            echo !lines!- !line!
            >> lines.txt echo ,!line!,
            for %%b in (!line!) do set "notK=!notK:,%%b,=,!%%b,"
         )

      )
   )
))

echo Start: %start%
echo End:   %time:~0,-3%
goto :EOF


                         %1   %2   %3   %4
         :ProcessIndicesLMN  %%i  %%j  %%k

         set "notL=!notK!"
         for /F "delims=" %%a in ('findstr ",%4," lines.txt ^| findstr ",%2, ,%3,"') do (
            for %%b in (%%a) do set "notL=!notL:,%%b,=,!%%b,"
         )
         set /A kP1=%4+1
         for /L %%l in (!kP1!,1,%nM2%) do if "!notL:,%%l,=!" equ "!notL!" (
            set /P "=%2,%3,%4,%%l      !CR!"
            set "notM=!notL!"
            for /F "delims=" %%a in ('findstr ",%%l," lines.txt ^| findstr ",%2, ,%3, ,%4,"') do (
               for %%b in (%%a) do set "notM=!notM:,%%b,=,!%%b,"
            )
            set /A lP1=%%l+1
            for /L %%m in (!lP1!,1,%nM1%) do if "!notM:,%%m,=!" equ "!notM!" (
               set /P "=%2,%3,%4,%%l,%%m   !CR!"
               set "notN=!notM!"
               for /F "delims=" %%a in ('findstr ",%%m," lines.txt ^| findstr ",%2, ,%3, ,%4, ,%%l,"') do (
                  for %%b in (%%a) do set "notN=!notN:,%%b,=,!%%b,"
               )
               set /A mP1=%%m+1
               for /L %%n in (!mP1!,1,%n%) do if "!notN:,%%n,=!" equ "!notN!" (
                  > aNewLine.txt echo %2,%3,%4,%%l,%%m,%%n
                  exit
               )
            )
         )

         exit

When I ran this program

:arrow: . IT TOOK 18 MINUTES!!! . :shock: :D

... instead of the 4 hours and 45 minutes of the previous optimized version (and the 40-60 hours inferred duration of the non-optimized version). That is, around the same time as the first convoluted version based on Files!

(A very happy) Antonio :mrgreen:

miskox
Posts: 630
Joined: 28 Jun 2010 03:46

Re: Combinations of N numbers in sets of K with R repeated elements

#7 Post by miskox » 03 Jun 2023 13:26

Congratulations Antonio!

I must admit that I don't understand what are you talking (writing) about but it was an interesting to read it.

I am always amazed with the solutions you have (I remember that clever solution for that very loooong XML schema viewtopic.php?f=3&t=10579#p67850 ).

Saso

T3RRY
Posts: 250
Joined: 06 May 2020 10:14

Re: Combinations of N numbers in sets of K with R repeated elements

#8 Post by T3RRY » 04 Jun 2023 09:36

This was indeed an interesting problem. here was my own solution, a very brute force approach.
Runs reasonably quick on my machine ~ 30 seconds for 100 lines, but mines a reasonably new and not so slow machine.
Speed drops off exponentially as the total list size increases

EDIT: Testing for the 300 lines has me certain your method is far more optimized.
Sitting at just over half an hour with only 160 lines generated

Code: Select all

@Echo off & CD /D "%~dp0"
Setlocal EnableDelayedExpansion
Set /a "total=0","target=100","objects=6","min=1","max=39","rangeMax=(max-min)+1","rangeMin=min"
Set "outfile=Range_%min%-%max%_items_%objects%_repeats_%target%.txt"
Del "%outfile%" 2> nul

:newString
Set "thisString=0" & Set "String= "
:newNum
	Set /A Rand=!Random! %%rangeMax + rangeMin
	If "!String!"==" " ( Set "String= !Rand!, " )Else If not "!String: %Rand%, =!"=="!String!" (Goto:newNum
	)Else (	Set "String=!String!!Rand!, "
		Set /a ThisString+=1
		If !ThisString! NEQ !Objects! Goto:NewNum
	)
	Set /a ThisString+=1
	If !ThisString! LSS !Objects! Goto:NewNum
	For /l %%i in (0 1 9)Do Set "String=!String: %%i, = 0%%i, !"
	Set "String=!String:,=!"
	If not exist "%outfile%" (1>"%outfile%" Echo(!String!
		Set /A total+=1
		Goto:NewString
	)
	For /f "delims=" %%G in (%outfile%)Do (Set "ic=0"
		For %%i in (%%G)Do If not !ic! GTR 2 For %%n in (!String!)Do If %%i==%%n Set /a ic+=1
		If !ic! GTR 2 (Goto:NewString)
	)
	>>"%outfile%" Echo(!String!
	<nul Set /p "=."&Set /A total+=1
	If !Total! GEQ !Target! Goto:end
Goto:NewString
:sort_L_to_H <List_VarName>
	Set "i=0"
	FOR %%P In (!%~1!) DO (%= Split the list into an Array =%
		Set /A "i+=1"
		Set "%1[!i!]=%%~P"
	)
	For /L %%a In (1,1,!i!) Do (%= Sort the Array using a Temporary variable to exchange values. =%
		Set /A "S_Offset=%%a"
		For /L %%b IN (1,1,%%a) DO If not %%a==%%b (
			IF !%1[%%a]! LSS !%1[%%b]! (
				Set "tmp=!%1[%%a]!"
				Set "%1[%%a]=!%1[%%b]!"
				Set "%1[%%b]=!tmp!"
	)	)	)
	Set "Newlist="%= Rebuild the Array into a list and replace the original list. trims leading whitespace =%
	For /l %%i in (1 1 !i!)Do Set "Newlist=!Newlist! !%~1[%%i]!"
	Set "%~1=!Newlist:~1!"
Exit /B
:end
	>"%~dp0output.txt" (
		For /f "UsebackQ Delims=" %%G in ("%Outfile%")Do (
			Set "String=%%G"
			Call:sort_L_to_H String
			Echo(!String!
		)
	)
	TYPE "%~dp0output.txt" | Sort >"%Outfile%"
	Del "%~dp0output.txt"
	CLS& TYPE "%OutFile%"
	Endlocal & Exit /B
[/code

Aacini
Expert
Posts: 1914
Joined: 06 Dec 2011 22:15
Location: México City, México
Contact:

Re: Combinations of N numbers in sets of K with R repeated elements

#9 Post by Aacini » 08 Jun 2023 10:52

miskox wrote:
03 Jun 2023 13:26
Congratulations Antonio!

I must admit that I don't understand what are you talking (writing) about but it was an interesting to read it.

. . .

Saso
Mmm... Why is that?

Although English is not my milk tongue, I try to write as clear as possible! When I have a doubt about my writting, I used a web translator...

Could you indicate me in which parts my writting is not clear enough?

TIA

Antonio


==============================================================

T3RRY wrote:
04 Jun 2023 09:36
This was indeed an interesting problem. here was my own solution, a very brute force approach.
Runs reasonably quick on my machine ~ 30 seconds for 100 lines, but mines a reasonably new and not so slow machine.
Speed drops off exponentially as the total list size increases

EDIT: Testing for the 300 lines has me certain your method is far more optimized.
Sitting at just over half an hour with only 160 lines generated
Using random numbers to solve this type of problems is never a good idea. The first lines appears very quickly, but the time increases as more lines are generated. However, the worst part of using random numbers is that you never will know the exact number of lines that should be generated in the result. Your program could run for years and will never generate the total number of lines. Or it could run for a couple of days and generate all the lines! However, how would you know that? What is the ending condition for a program based on random numbers? Nobody knows...

Antonio

miskox
Posts: 630
Joined: 28 Jun 2010 03:46

Re: Combinations of N numbers in sets of K with R repeated elements

#10 Post by miskox » 09 Jun 2023 04:47

Aacini wrote:
08 Jun 2023 10:52
miskox wrote:
03 Jun 2023 13:26
Congratulations Antonio!

I must admit that I don't understand what are you talking (writing) about but it was an interesting to read it.

. . .

Saso
Mmm... Why is that?

Although English is not my milk tongue, I try to write as clear as possible! When I have a doubt about my writting, I used a web translator...

Could you indicate me in which parts my writting is not clear enough?

TIA

Antonio

I was talking about all the hard problems you are solving - your clever approaches. Not that I don't understand what you write (as for 'bad' English but for example if I read some text that has to do with some things I am not familiar with (nuclear physics or medicine)).

Saso

Saso

Post Reply