Hi,
I need a method for (lossless for the moment) compress a file. I search for 2 phase method:
- compress a file and convert to any base rappresentable in a batch file. (Slow is not problem)
- simple routine that extract the file from cmd and expand the file. (and Fast)
Hi need to compress to max 25% (or 50% for initial) of original size for the moment, but the goal is higher compression ratio.
If possible to do in dos batch (compress an decompress) and having alternative in Vbs/jsc/mshta for faster compress.
The makecab solution is not accettable because the ratio of compression is too low. But a show method is accepted.
This is HARD work, only for expert!
I show a path:
- Use procedural compression, fractal type compression, "use random number generator for compression", fuzzy compression, genetic algos
RIF:
http://www.pouet.net/prod.php?which=62917
https://the8bitpimp.wordpress.com/2014/08/06/fuzzy-compression/
TIA!
PS: There is a dos batch random generator that work via SEED?
Einstein1969
High compression for dos batch files
Moderator: DosItHelp
-
- Expert
- Posts: 960
- Joined: 15 Jun 2012 13:16
- Location: Italy, Rome
High compression for dos batch files
Last edited by einstein1969 on 13 Dec 2015 11:28, edited 1 time in total.
Re: High compression for dos batch files
Which file types do you want to compress (arbitrary binary data, or a limited somehow like text files mainly consisting of A-Z, a-z, space)?einstein1969 wrote:Hi need to compress to max 25% (or 50% for initial) of original size for the moment, but the goal is higher compression ratio.
Note that such compression ratios are illusory for arbitrary binary files.
For these filetypes there is even no guarantee that any algorithm results in a shorter file.
Makecab uses LZX (LZ77 family) and so it is not that bad:einstein1969 wrote:The makecab solution is not accettable because the ratio of compression is too low.
LZ77 can achieve the optimal compression possible for any file (but the optimal compression possible may be equal to the original filesize; because these compression families are using dictionaries the resulting archieve may exceed the original filesize up to 4MB).
Other algorithms may be better for specific files, but this is true for every algorithm. Best you could do, when you need the best compression:
Just run all algorithms (you could access and want to support) with all options and choose the best one.
You may choose an archiever based on your requirements:
https://en.wikipedia.org/wiki/Comparison_of_file_archivers#Archive_format_support
penpen
-
- Expert
- Posts: 960
- Joined: 15 Jun 2012 13:16
- Location: Italy, Rome
Re: High compression for dos batch files
I need to compress images, fonts files, and text files
The file to compress is not to big , then a dictionary is a problem.
My idea is to develop an algorithm based on family of pseudo random generator.
The file to compress is not to big , then a dictionary is a problem.
My idea is to develop an algorithm based on family of pseudo random generator.
Re: High compression for dos batch files
Most image formats are already in a compressed state.
Re: High compression for dos batch files
As Squashman said most image formats are already compressing the image data.einstein1969 wrote:I need to compress images, fonts files, and text files
Some image formats are supporting multiple compression algorithms (optimized for maximum compression, fast viewing, or a treadeoff of both).
But the default option should be maximum compression, so i won't expect that compressing image files will give you any benefit.
I don't know whether fonts store their data in compressed form, or not; because they are some kind of image files i would expect it.
Text files are good candidates for being compressed.
But for this mix, i would recommend to use something like 7-zip (or any other archiever you like; see the link in my above post),
test multiple compression types/dictionary sizes, .... and use the agorithm (and options) with the best results.
Note:
Depending on the filetype maximum compression may not lead to the best results, so you should really test all combinations.
(Maybe except "big dictionaries" which you should only needed if you want to compress very huge data portions.)
Developing compression algorithms normally is a hard task,einstein1969 wrote:My idea is to develop an algorithm based on family of pseudo random generator.
but if you have a good idea it may also work.
But then you should describe your algorithm idea in detail (if you want to share your idea with public).
penpen
Re: High compression for dos batch files
In the following text, two different encoding methods are illustrated.
Both methods give the same size at random.
Both methods give the same size at random.
Code: Select all
Huffman coding
Link: https://en.wikipedia.org/wiki/Huffman_coding
Link: http://huffman.ooz.ie/?text=NEVER%20CHANGE%20A%20RUNNING%20SYSTEM.
NEVER CHANGE A RUNNING SYSTEM. 30 characters
NEVER CHANGE A RUNNING SYSTEM.
V CH U I Y T M. character one times
A R G S character two times (last found)
# E character four times (last found)
N character five times (last found)
(V = 1) (C = 1) (H = 1) (U = 1) (I = 1) (Y = 1) (T = 1) (M = 1) (DOT = 1) 1 character
(V + I = 2) (C + Y = 2) (H + T = 2) (U + M = 2) (A = 2) (R = 2) (G = 2) (S = 2) 2
(V + I + DOT = 3) 3
(C + Y + H + T = 4) (U + M + A = 4) (R + G = 4) (SPC = 4) (E = 4) 4
(S + V + I + DOT = 5) (N = 5) 5
(C + Y + H + T + U + M + A = 8) (R + G + SPC = 8) 8
(E + S + V + I + DOT = 9) 9
(N + C + Y + H + T + U + M + A = 13) 13
(R + G + SPC + E + S + V + I + DOT = 17) 17
(N + C + Y + H + T + U + M + A + R + G + SPC + E + S + V + I + DOT = 30) 30
-------------------------------------------------- 30 char -------------------------------------------------
/ \
--------- 13 char ------- -------------------------- 17 char ------------------
/ \ / \
N
5 char ---------- 8 char --------- ------------ 8 char ------- -------- 9 char -----------
/ \ / \ / \
SPC E
4 char 4 char 4 char 4 char 4 char 5 char
/ \ / \ / \ / \
A R G S
2 char 2 char 2 char 2 char 2 char 2 char 2 char 3 char
/ \ / \ / \ / \
C Y H T U M DOT
1 char 1 char 1 char 1 char 1 char 1 char 1 char 2 char
/ \
V I
1 char 1 char
N 00
SPC 101
E 110
A 0111
R 1000
G 1001
S 1110
C 01000
Y 01001
H 01010
T 01011
U 01100
M 01101
DOT 11110
V 111110
I 111111
NEVER CHANGE A RUNNING SYSTEM.
00 110 111110 110 1000 101
01000 01010 0111 00 1001 110 101
0111 101
1000 01100 00 00 111111 00 1001 101
1110 01001 1110 01011 110 01101 11110
00110111110110100010101000010100111001001110101011110110000110000001111110010011011110010011110010111100110111110 113 bits = 15 bytes
00 110 111110 110 1000 101 01000 01010 0111 00 1001 110 101 0111 101 1000 01100 00 00 111111 00 1001 101 1110 01001 1110 01011 110 01101 11110
N E V E R SPC C H A N G E SPC A SPC R U N N I N G SPC S Y S T E M DOT
N 00
C 01000
Y 01001
H 01010
T 01011
U 01100
M 01101
A 0111
R 1000
G 1001
SPC 101
E 110
S 1110
DOT 11110
V 111110
I 111111
Shannon–Fano coding
Link: https://en.wikipedia.org/wiki/Shannon%E2%80%93Fano_coding
NEVER CHANGE A RUNNING SYSTEM. 30 characters
NEVER CHANGE A RUNNING SYSTEM.
V CH U I Y T M. character one times
A R G S character two times (last found)
# E character four times (last found)
N character five times (last found)
(N = 5) (SPC = 4) (E = 4) (A = 2) (R = 2) (G = 2) (S = 2) (V = 1) (C = 1) (H = 1) (U = 1) (I = 1) (Y = 1) (T = 1) (M = 1) (DOT = 1) result = 30
[ (N = 5) (SPC = 4) (E = 4) (A = 2) ] + [ (R = 2) (G = 2) (S = 2) (V = 1) (C = 1) (H = 1) (U = 1) (I = 1) (Y = 1) (T = 1) (M = 1) (DOT = 1) ]
/--- (N = 5) 000
/--- 9
/ \--- (SPC = 4) 001
15
/ \ /--- (E = 4) 010
/ \--- 6
/ \--- (A = 2) 011
30
\ /--- (R = 2) 1000
\ /--- 4
\ / \--- (G = 2) 1001
\ 8
\ / \ /--- (S = 2) 1010
\ / \--- 4
\ / \ /--- (V = 1) 10110
\ / \--- 2
\/ \--- (C = 1) 10111
15
\ /--- (H = 1) 11000
\ /--- 2
\ / \--- (U = 1) 11001
\ /--- 4
| / \ /--- (I = 1) 11010
| / \--- 2
\ / \--- (Y = 1) 11011
7
\ /--- (T = 1) 11100
\ /--- 2
\ / \--- (M = 1) 11101
\--- 3
\
\----------- (DOT = 1) 1111
NEVER CHANGE A RUNNING SYSTEM.
000 010 10110 010 1000 001
10111 11000 011 000 1001 010 001
011 001
1000 11001 000 000 11010 000 1001 001
1010 11011 1010 11100 010 11101 1111
00001010110010100000110111110000110001001010001011001100011001000000110100001001001101011011101011100010111011111 113 bits = 15 bytes