| |
| LZMA Utils history |
| ------------------ |
| |
| Tukaani distribution |
| |
| In 2005, there was a small group working on Tukaani distribution, which |
| was a Slackware fork. One of the project goals was to fit the distro on |
| a single 700 MiB ISO-9660 image. Using LZMA instead of gzip helped a |
| lot. Roughly speaking, one could fit data that took 1000 MiB in gzipped |
| form into 700 MiB with LZMA. Naturally compression ratio varied across |
| packages, but this was what we got on average. |
| |
| Slackware packages have traditionally had .tgz as the filename suffix, |
| which is an abbreviation of .tar.gz. A logical naming for LZMA |
| compressed packages was .tlz, being an abbreviation of .tar.lzma. |
| |
| At the end of the year 2007, there's no distribution under the Tukaani |
| project anymore. Development of LZMA Utils still continues. Still, |
| there are .tlz packages around, because at least Vector Linux (a |
| Slackware based distribution) uses LZMA for its packages. |
| |
| First versions of the modified pkgtools used the LZMA_Alone tool from |
| Igor Pavlov's LZMA SDK as is. It was fine, because users wouldn't need |
| to interact with LZMA_Alone directly. But people soon wanted to use |
| LZMA for other files too, and the interface of LZMA_Alone wasn't |
| comfortable for those used to gzip and bzip2. |
| |
| |
| First steps of LZMA Utils |
| |
| The first version of LZMA Utils (4.22.0) included a shell script called |
| lzmash. It was wrapper that had gzip-like command line interface. It |
| used the LZMA_Alone tool from LZMA SDK to do all the real work. zgrep, |
| zdiff, and related scripts from gzip were adapted work with LZMA and |
| were part of the first LZMA Utils release too. |
| |
| LZMA Utils 4.22.0 included also lzmadec, which was a small (less than |
| 10 KiB) decoder-only command line tool. It was written on top of the |
| decoder-only C code found from the LZMA SDK. lzmadec was convenient in |
| situations where LZMA_Alone (a few hundred KiB) would be too big. |
| |
| lzmash and lzmadec were written by Lasse Collin. |
| |
| |
| Second generation |
| |
| The lzmash script was an ugly and not very secure hack. The last |
| version of LZMA Utils to use lzmash was 4.27.1. |
| |
| LZMA Utils 4.32.0beta1 introduced a new lzma command line tool written |
| by Ville Koskinen. It was written in C++, and used the encoder and |
| decoder from C++ LZMA SDK with little modifications. This tool replaced |
| both the lzmash script and the LZMA_Alone command line tool in LZMA |
| Utils. |
| |
| Introducing this new tool caused some temporary incompatibilities, |
| because LZMA_Alone executable was simply named lzma like the new |
| command line tool, but they had completely different command line |
| interface. The file format was still the same. |
| |
| Lasse wrote liblzmadec, which was a small decoder-only library based on |
| the C code found from LZMA SDK. liblzmadec had API similar to zlib, |
| although there were some significant differences, which made it |
| non-trivial to use it in some applications designed for zlib and |
| libbzip2. |
| |
| The lzmadec command line tool was converted to use liblzmadec. |
| |
| Alexandre Sauvé helped converting build system to use GNU Autotools. |
| This made is easier to test for certain less portable features needed |
| by the new command line tool. |
| |
| Since the new command line tool never got completely finished (for |
| example, it didn't support LZMA_OPT environment variable), the intent |
| was to not call 4.32.x stable. Similarly, liblzmadec wasn't polished, |
| but appeared to work well enough, so some people started using it too. |
| |
| Because the development of the third generation of LZMA Utils was |
| delayed considerably (roughly two years), the 4.32.x branch had to be |
| kept maintained. It got some bug fixes now and then, and finally it was |
| decided to call it stable, although most of the missing features were |
| never added. |
| |
| |
| File format problems |
| |
| The file format used by LZMA_Alone was primitive. It was designed for |
| embedded systems in mind, and thus provided only minimal set of |
| features. The two biggest problems for non-embedded use were lack of |
| magic bytes and integrity check. |
| |
| Igor and Lasse started developing a new file format with some help from |
| Ville Koskinen, Mark Adler and Mikko Pouru. Designing the new format |
| took quite a long time. It was mostly because Lasse was quite slow at |
| getting things done due to personal reasons. |
| |
| Near the end of the year 2007 the new format was practically finished. |
| Compared to LZMA_Alone format and the .gz format used by gzip, the new |
| .lzma format is quite complex as a whole. This means that tools having |
| *full* support for the new format would be larger and more complex than |
| the tools supporting only the old LZMA_Alone format. |
| |
| For the situations where the full support for the .lzma format wouldn't |
| be required (embedded systems, operating system kernels), the new |
| format has a well-defined subset, which is easy to support with small |
| amount of code. It wouldn't be as small as an implementation using the |
| LZMA_Alone format, but the difference shouldn't be significant. |
| |
| The new .lzma format allows dividing the data in multiple independent |
| blocks, which can be compressed and uncompressed independenly. This |
| makes multi-threading possible with algorithms that aren't inherently |
| parallel (such as LZMA). There's also a central index of the sizes of |
| the blocks, which makes it possible to do limited random-access reading |
| with granularity of the block size. |
| |
| The new .lzma format uses the same filename suffix that was used for |
| LZMA_Alone files. The advantage is that users using the new tools won't |
| notice the change to the new format. The disadvantage is that the old |
| tools won't work with the new files. |
| |
| |
| Third generation |
| |
| LZMA Utils 4.42.0alphas drop the rest of the C++ LZMA SDK. The LZMA and |
| other included filters (algorithm implementations) are still directly |
| based on LZMA SDK, but ported to C. |
| |
| liblzma is now the core of LZMA Utils. It has zlib-like API, which |
| doesn't suffer from the problems of the API of liblzmadec. liblzma |
| supports not only LZMA, but several other filters, which together |
| can improve compression ratio even further with certain file types. |
| |
| The lzma and lzmadec command line tools have been rewritten. They uses |
| liblzma to do the actual compressing or uncompressing. |
| |
| The development of LZMA Utils 4.42.x is still in alpha stage. Several |
| features are still missing or don't fully work yet. Documentation is |
| also very minimal. |
| |