blob: 24467cd08a2d090e5b3d8392b11675feacce27f1 [file] [log] [blame]
XZ Utils
========
Important
This is a beta version. The .xz file format is now stable though,
which means that files created with the beta version will be
decompressible with all future XZ Utils versions too (assuming
that there are no catastrophic bugs).
liblzma API is pretty stable now, although minor tweaks may still
be done if really needed. The ABI is not stable yet. The major
soname will be bumped right before the first stable release.
Probably it will be bumped to something like .so.5.0.0 because
some distributions using the alpha versions already had to use
other versions than .so.0.0.0.
Excluding the Doxygen style docs in liblzma API headers, the
documentation in this package (including the rest of this
README) is not very up to date, and may contain incorrect or
misleading information.
Overview
LZMA is a general purpose compression algorithm designed by
Igor Pavlov as part of 7-Zip. It provides high compression ratio
while keeping the decompression speed fast.
XZ Utils are an attempt to make LZMA compression easy to use
on free (as in freedom) operating systems. This is achieved by
providing tools and libraries which are similar to use than the
equivalents of the most popular existing compression algorithms.
XZ Utils consist of a few relatively separate parts:
* liblzma is an encoder/decoder library with support for several
filters (algorithm implementations). The primary filter is LZMA.
* libzfile (or whatever the name will be) enables reading from and
writing to gzip, bzip2 and LZMA compressed and uncompressed files
with an API similar to the standard ANSI-C file I/O.
[ NOTE: libzfile is not implemented yet. ]
* xz command line tool has almost identical syntax than gzip
and bzip2. It makes LZMA easy for average users, but also
provides advanced options to finetune the compression settings.
* A few shell scripts make diffing and grepping LZMA compressed
files easy. The scripts were adapted from gzip and bzip2.
Supported platforms
XZ Utils are developed on GNU+Linux, but they should work at
least on *BSDs and Solaris. They probably work on some other
POSIX-like operating systems too.
If you use GCC to compile XZ Utils, you need at least version
3.x.x. GCC version 2.xx.x doesn't support some C99 features used
in XZ Utils source code, thus GCC 2 won't compile XZ Utils.
If you have written patches to make XZ Utils to work on previously
unsupported platform, please send the patches to me! I will consider
including them to the official version. It's nice to minimize the
need of third-party patching.
One exception: Don't request or send patches to change the whole
source package to C89. I find C99 substantially nicer to write and
maintain. However, the public library headers must be in C89 to
avoid frustrating those who maintain programs, which are strictly
in C89 or C++.
Platform-specific notes
On some Tru64 systems using the native C99 compiler, the configure
script may reject the compiler as non-C99 compiler. This may happen
if there is no stdbool.h available. You can still compile XZ Utils
on such a system by passing ac_cv_prog_cc_c99= to configure script.
Fixing this bug seems to be non-trivial since if the configure
doesn't check for stdbool.h, it runs into problems at least on
Solaris.
Version numbering
The version number of XZ Utils has absolutely nothing to do with
the version number of LZMA SDK or 7-Zip. The new version number
format of XZ Utils is X.Y.ZS:
- X is the major version. When this is incremented, the library
API and ABI break.
- Y is the minor version. It is incremented when new features are
added without breaking existing API or ABI. Even Y indicates
stable release and odd Y indicates unstable (alpha or beta
version).
- Z is the revision. This has different meaning for stable and
unstable releases:
* Stable: Z is incremented when bugs get fixed without adding
any new features.
* Unstable: Z is just a counter. API or ABI of features added
in earlier unstable releases having the same X.Y may break.
- S indicates stability of the release. It is missing from the
stable releases where Y is an even number. When Y is odd, S
is either "alpha" or "beta" to make it very clear that such
versions are not stable releases. The same X.Y.Z combination is
not used for more than one stability level i.e. after X.Y.Zalpha,
the next version can be X.Y.(Z+1)beta but not X.Y.Zbeta.
configure options
If you are not familiar with `configure' scripts, read the file
INSTALL first.
In most cases, the default --enable/--disable/--with/--without options
are what you want. Don't touch them if you are unsure.
--disable-encoder
Do not compile the encoder component of liblzma. This
implies --disable-match-finders. If you need only
the decoder, you can decrease the library size
dramatically with this option.
The default is to build the encoder.
--disable-decoder
Do not compile the decoder component of liblzma.
The default is to build the decoder.
--enable-filters=
liblzma supports several filters. See liblzma-intro.txt
for a little more information about these.
The default is to build all the filters.
--enable-match-finders=
liblzma includes two categories of match finders:
hash chains and binary trees. Hash chains (hc3 and hc4)
are quite fast but they don't provide the best compression
ratio. Binary trees (bt2, bt3 and bt4) give excellent
compression ratio, but they are slower and need more
memory than hash chains.
You need to enable at least one match finder to build the
LZMA filter encoder. Usually hash chains are used only in
the fast mode, while binary trees are used to when the best
compression ratio is wanted.
The default is to build all the match finders.
--enable-checks=
liblzma support multiple integrity checks. CRC32 is
mandatory, and cannot be omitted. See liblzma-intro.txt
for more information about usage of the integrity checks.
--disable-assembler
liblzma includes some assembler optimizations. Currently
there is only assembler code for CRC32 and CRC64 for
32-bit x86.
All the assembler code in liblzma is position-independent
code, which is suitable for use in shared libraries and
position-independent executables. So far only i386
instructions are used, but the code is optimized for i686
class CPUs. If you are compiling liblzma exclusively for
pre-i686 systems, you may want to disable the assembler
code.
--enable-small
Omits precomputed tables. This makes liblzma a few KiB
smaller. Startup time increases, because the tables need
to be computed first.
--enable-debug
This enables the assert() macro and possibly some other
run-time consistency checks. It slows down things somewhat,
so you normally don't want to have this enabled.
--enable-werror
Makes all compiler warnings an error, that abort the
compilation. This may help catching bugs, and should work
on most systems. This has no effect on the resulting
binaries.
Static vs. dynamic linking of the command line tools
By default, the command line tools are linked statically against
liblzma. There a are a few reasons:
- The executable(s) can be in /bin while the shared liblzma can still
be in /usr/lib (if the distro uses such file system hierarchy).
- It's easier to copy the executables to other systems, since they
depend only on libc.
- It's slightly faster on some architectures like x86.
If you don't like this, you can get the command line tools linked
against the shared liblzma by specifying --disable-static to configure.
This disables building static liblzma completely.