Jump to content


Check out our Community Blogs

Register and join over 40,000 other developers!


Recent Status Updates

View All Updates

Photo
- - - - -

Looking for (c++) programmer to assist in developing a compression algorithm

compression


This topic has been archived. This means that you cannot reply to this topic.
183 replies to this topic

#1 SoraK05

SoraK05

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 115 posts

Posted 30 April 2014 - 03:06 AM

Hello.

I have spent some time looking at and researching compression.

 

I have a writeup and pseudocode to allow for a systematic approach to compress a file in general, considered as a string of 0 and 1 bits, irrespective of its 'data type' (as this is a rearrangement of 0 and 1 bits).

 

The concept is a 'one tool to compress any file if possible', and will generally have additional features to compliment compression further.

 

Whether it is video, audio, text, general data, this should compress it as a 'one too to compress any file'.

 

 

Considering this example:

BMP: 1.53 MB (1,606,230 bytes)
JPG: 118 KB (121,704 bytes)
PNG: 46.7 KB (47,905 bytes)

7z of BMP is 24.7KB
7z of JPG is 112KB
7z of PNG is 45.5KB

 

The BMP in mind is 85-90% one colour and the rest is 9 other colours.

Generally a 'raw' file will compress the most, as seen with a 7z of the BMP of raw 24 bit string 'pixels' compared to the JPG and PNG results (which are 'optimized for images').

 

Considering that the majority of the file is the same, 24KB is enormous.

 

 

This tool will in its systematic approach to any string result in a file which resembles the 1KB it should, and follow this approach on any string.

This not only will provide this level of compression as a general purpose tool, but will provide this for files in general.

This is the result of a 'general purpose' scheme which isn't too far off the 'best' scheme which will generally do more.

 

 

The calculations and temporary storage are fair and generally higher numbers are recommended, in order to do the calculations required.

The finished product is one that can decrypt instantly in its compressed form with minimal additional calculations.

With patience and storage space, the result is a compressed file with instant access.

 

 

For any programmer interested, with c++ ideally, I would be willing to share the information on this and construct the tool together privately.


Edited by SoraK05, 30 April 2014 - 03:20 AM.


#2 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 30 April 2014 - 04:06 AM

Ummm... You do realize that people have been coming up with "new, improved, universal" compression schemes for a while now, right? Most of them end up being worse, one way or another, than what we already have.

 

What I would be interested in are metrics such as the following:

Compared to 7z, will your scheme take longer or shorter to get equivalent compression?

Compared to 7z, will your scheme take longer or shorter to decompress?

Compared to MP3 compression format, will your scheme take longer or shorter to get equivalent compression?

Compared to MP3 compression format, will your scheme take longer or shorter to decompress?

Have you done the math on your compression scheme already, so that you know the raw metrics for non-random and random data?


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#3 SoraK05

SoraK05

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 115 posts

Posted 30 April 2014 - 04:24 AM

I'm after 'most compression', which wouldn't necessarily have 'compression levels' - it should be one general result when running.

 

Considering the construction of the algorithm one can run a scheme which is 'general' and is able to decode immediately (resembling read/write speed besides minor calculations), and one can persist for ideally squeezing compression (which will be like a russian doll).

 

The principle is to get the smallest file, and the process involved relatively intense calculations and temporary storage.

The result is an instantly decodable file for the general scheme (which will be like the 1KB file off the BMP), and creating the dolls from the smallest one for the 'best' most persistent scheme (which won't save much more).

 

 

It is meant to get compression if possible with persistence.

It does have some code similar to a huffman code as a part which is based on what 7z and mp3 does.

However, this goes further and is more optimized and will get more than them in terms of getting the BMP to be the 1KB it should.

 

 

In general, compression is calculation intensive and temporary storage is required.

It is not 'quick' to encode and does require the temporary storage for its calculating, however the result (if general scheme) is immediate decoding with minimal calculations (such as the 1KB file will decode to the 1.5MB BMP in a speed similar to read/write speed).

 

It is recommended to encode larger files such as videos on a quicker device such as high end pc with generous temporary storage.

The result is an accessible and instantly decodable file, where the file can be decoded on low end devices.

 

 

It will take 'the time it needs' and use 'the space it needs' to get the compressed and instantly decodable result.

 

I have the details.

 

 

 

EDIT:

In terms of 'random / nonrandom', this will look to compress files of 0 and 1 bits, irrespective of their placement.

 

Where compression can be detected, it will be done.

 

Consider that resultant files are files constructed to be unable to have compression detected.

In the spectrum of files, these are the expected 'slots' of files to be used for compressed results.

 

The result (with the best scheme) will resemble such a file as much as possible.


Edited by SoraK05, 30 April 2014 - 04:29 AM.


#4 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 30 April 2014 - 09:39 AM

I'm trying to think of an application for this, as it is described, and the only ones that I can think of are the following:

 

Compressing videos for distribution: bandwidth trumps compression time, and you want the decode to be simple.

 

Compressing static content for distribution over the internet: similar to video for above, but I'm thinking images, etc.

 

A new video container for use in Blue-ray, etc. Here, the idea is you compress once, and distribute on physical medium. Given that Blue-ray had a massive war to come out on top in the latest video format war, I'm not convinced anyone really wants new devices. A similar option might work for video game systems, however, where content could be compressed on disk.

 

I'm hard-pressed to come up with other uses for this, as for most circumstances, you don't want the time it takes to compress/uncompress a file to overshadow the gains made in transmitting the file. If an uncompressed file is 1.53MB, I can download it in a few seconds. If that's going to be a one-time transfer, I don't want the compress/uncompress to take longer than a few seconds, or I've wasted my time. I regularly deal with situations where I'm transferring database backups to/from customers. When dealing with a 20GB database, it can take a while to transfer. A less-than-one hour compression will usually drop it down to around 2GB with normal zip routines. At some point, high compression gives diminishing returns, where the resources to create the compressed file don't pay off.

 

If you actually have something valuable here, I'd look at getting it patented and taking it to MicroSoft or one of the big game studios.


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#5 SoraK05

SoraK05

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 115 posts

Posted 30 April 2014 - 09:52 AM

There are 3 schemes:

one-off, which will determine the quickest of a particular method (and does require thorough scans for confirmation).

While this isn't necessarily 'quick' per se, it will confirm a means of having a general overall compression and do this the quickest.

 

general, which will run its processes where possible systematically to overall produce a highly compressed file with immediate decoding (and extra features such as instant decoding of particular offsets with some precalculation).

 

archival, which won't be much more than general, to squeeze what is left into a russian doll type system.

 

 

Considering the extra feature of additional information able to be in the header, this allows for having specific offsets be able to have immediate decoding, have resolution header information stored separately from the data and other things.

 

For a game, all the game art can be lossless in its files, compressed together (with file shuffling in the data to optimize some extra compression), and then once having the data be able to have immediate access with the additional information ability all filenames and such can be stripped. The result is a single 'image asset' file, and the same can be done for audio, video, maps etc. A 2D game with lossless content can be in kb range (and immediate decoding).

 

 

It is for general use and with general archiving in mind.

It is ideal to have a lossless source of image, audio or video, and use this with enough time and space to create a compressed file with general read/write decryption speed and few bytes of ram as a counter to decode (to ram or hdd).

 

While media does generally encourage repetition such as text, image, audio, this is for any file in general that is possible (and goes further than 7z/rar).

 

 

Nothing in this is 'fast' per se about the encoding, and does require storage - it is calculated in a thorough manner, where there are no 'compression levels' and the results are dependent on the degree of perseverance (schemes).

For all the calculations and storage, the result is a highly compressed file with instant decoding.

i.e. the BMP file of 85-90% one 24 bit string and nine other variations being the ~1KB it can and should be with immediate decoding at read/write speed and minor calculations.

 

 

The algorithm can be considered ("systematically compress x number of values in a length of y number of spaces where possible"), and x is 0 or 1 bits and y is the total length.

It will compress where possible (and predetermined calculations for confirmation as part of the process).

 

 

There is an element similar to 'huffman encoding', although there are no compression levels as in 7z/rar etc and with the assistance of a temporary file will result in the shortest number of sequences and an optimized tree. Thorough and direct. This will not compensate for time/hdd space such as these options.


Edited by SoraK05, 30 April 2014 - 10:02 AM.


#6 WingedPanther73

WingedPanther73

    A spammer's worst nightmare

  • Moderator
  • 17757 posts

Posted 30 April 2014 - 12:05 PM

As I indicated before, this will be a highly specialized encoding method, and you should really be pushing your idea to appropriate companies. With that said, they will likely have a healthy dose of skepticism for your claims, much as I do. It is very, very easy to get the idea of a compression mechanism. It is very, very hard to get the idea to work the way you expected.

 

Unfortunately, as well, your above post is vague, at best. There is preprocessing and a few analysis phases and it will use a TON of resources to compress. Add on the "start decoding form any point" and I see lots of header information, which will degrade the compression ratio. You also repeatedly state a goal of "instant decompression", but with various tables to be read before decompression can start, there are obvious concerns about how well that can work, too.

 

In short, as I try to think about what your compression algorithm might be doing, I can see potential issues and nothing to indicate how you would practically overcome them.


Programming is a branch of mathematics.
My CodeCall Blog | My Personal Blog

My MineCraft server site: http://banishedwings.enjin.com/


#7 Sundance

Sundance

    CC Devotee

  • Validating
  • PipPipPipPipPipPip
  • 572 posts

Posted 30 April 2014 - 04:45 PM

if you can provide PoC that would be great, until then good luck!


Please read the

FaQ & Guidelines


#8 0xFACEB004

0xFACEB004

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 625 posts

Posted 30 April 2014 - 06:15 PM

I really don't want to start a long conversation about this, but I have a question:

 

 A 2D game with lossless content can be in kb range (and immediate decoding).

 

Are you saying that a 2D game with a short music loop as a resource file -- let's say a 4 megabyte wave file, which is a very short loop -- are you saying that can compress this whole file, along with it's resources, down to the kilobyte range without losing any information?

If you have found a way to do this, you are going to be the wealthiest person on CC, because Microsoft, Apple, Facebook, Google, and anyone else who works with large files, or high bandwidth traffic, will want this compression method.
 

 

The algorithm can be considered ("systematically compress x number of values in a length of y number of spaces where possible"), and x is 0 or 1 bits and y is the total length.

 

Please tell me that the algorithm is not this:

11000011 01011100

becomes:

1204120111011302

because that one has been done already.


Anyways, good luck with this...and don't forget to donate to Codecall after you're rich!


                                                                                                                                                                            FACEB00K Likes this.


#9 SoraK05

SoraK05

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 115 posts

Posted 30 April 2014 - 09:06 PM

This will do images of 24 bit sequences in general much more than 7z/rar.

The way the BMP example that is 85-90% one colour and there are 9 other variations and this file compressed is 24KB where it can and should be ~1, where lossless content is similar in nature and repetition is prevalent this method will be more thorough than rar/7z.

 

mp3 is like huffman which is done on a wav, and this is generally more thorough.

 

A game with some music and some assets could be :)

 

 

And in general it is not about money. Some things should be free :)

Some compression tools like 7z is open source.

 

 

btw KJGino I came across this:

http://en.wikipedia....ki/KGB_Archiver

http://en.wikipedia.org/wiki/PAQ

 

looks very bloated and my algorithm can decode instantly (with some minor calculations) and have the ability to create minimal data to decode instantly from an offset. the requirement is low spec to decode and few bytes of ram in general, which can write the file to ram/hdd. it is light - the product is after thorough calculation and temporary storage usage, and the scheme for instant decoding is optimized to reduce the difference between that and the archival 'russian doll' one which will catch more in the cases where possible.

 

Also, lossless source is generally recommended as the only data to use when compressing, which this will do more thoroughly than 7z/rar from the writeup/pseudocode and the BMP example.

This can allow additional information as a feature to store the resolution and bitrate while letting the data be the raw image(s) / audio to give more overall compression (and other uses like offset decoding info), shuffle files to be more appropriately positioned where it counts..


Edited by SoraK05, 30 April 2014 - 09:21 PM.


#10 0xFACEB004

0xFACEB004

    CC Devotee

  • Senior Member
  • PipPipPipPipPipPip
  • 625 posts

Posted 30 April 2014 - 09:32 PM

But .mp3 is not a lossless format, which is why it works so well for size but is not so good for quality.

 

The reason I decided to ask about this is because I do a lot of work with audio, and, for example, a 2 hour recording @ 48kHz, 24 bit is just over 2Gb, but after compressing down to a 192kHz mp3, it is less than 180Mb -- which is much better for transfer, but suffers for the quality loss.

 

If you can compress audio that much in a lossless format, you have a winner.


                                                                                                                                                                            FACEB00K Likes this.


#11 SoraK05

SoraK05

    CC Addict

  • Advanced Member
  • PipPipPipPipPip
  • 115 posts

Posted 30 April 2014 - 09:35 PM

Thanks for the example.

When I have a working tool which I'd use for archiving and general over others, I can test :)

 

It is systematic in its approach - the archival scheme will have the best results with a russian doll storage where each doll needs creation, and otherwise the direct streaming scheme is optimized to be compressed throughout.

 

The schemes are due to persistence and development of layers.

The WAV would be heavily compressed with the archival scheme yet require time to decompress each iteration / layer.



#12 Sundance

Sundance

    CC Devotee

  • Validating
  • PipPipPipPipPipPip
  • 572 posts

Posted 30 April 2014 - 09:37 PM

Hey, leave Russians alone.

Please read the

FaQ & Guidelines





Recommended from our users: Dynamic Network Monitoring from WhatsUp Gold from IPSwitch. Free Download