Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: comp.std.unix
Subject: tar vs. cpio
Message-ID: <8188@ut-sally.UUCP>
Date: Mon, 1-Jun-87 22:25:08 EDT
Article-I.D.: ut-sally.8188
Posted: Mon Jun 1 22:25:08 1987
Date-Received: Wed, 3-Jun-87 04:11:49 EDT
Reply-To: std-u...@sally.utexas.edu
Lines: 309
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
Included below is a draft proposal for IEEE P1003.1 regarding
the recently raised issue of Archive/Data Interchange Format.
I will deliver a proposal resembling it to P1003.1 at their
next meeting, which is three weeks from today, in Seattle.
Note two things: this is a proposal for P1003.1, not P1003.2,
or any other group; if you disagree with my conclusions, you
can submit your own proposal-- the address is below.
If you agree with my approach but think it needs adjusting,
you can send me mail or submit articles. If you disagree, you
can also do those things.
tar vs. cpio IEEE P1003.1 N.___
1 June 1987
John S. Quarterman
Institutional Representative from USENIX
usenix!jsq
Secretary, IEEE Standards Board
Attention: P1003 Working Group
345 East 47th St.
New York, NY 10017
In both the Trial Use Standard and Draft 10, POSIX sS10.1
describes a data interchange format based on the tar
program. That section has appeared in every draft of IEEE
1003.1 in some form and has always been based on tar format.
The P1003.1 Working Group has recently received two related
proposals regarding that section: one to add cpio format
(including old-style, non-ASCII (non c option) format);
<N.048 Lorraine C. Kevra> <V11N14> <V11N25 Eric S. Raymond>
the other to replace the existing tar-based format with cpio
format. <N.043 X/OPEN> <V11N13> Some clarifications were
received to the former. <N.064 Dominic Dunlop> <V11N15> It
was also proposed verbally in the latest Working Group
meeting to drop sS10.1 altogether and let P1003.2 handle the
issue. <V11N08> <V11N11> <V11N09 Guy Harris> <V11N12 Doug
Gwyn>
The present note is a response to those proposals. Much of
the detail in it is derived from articles posted in the
USENET newsgroup comp.std.unix. Those articles are
referenced with this format: <V11N09 Guy Harris> which gives
the volume (11) and number of the article, and the name of
the submittor. If no submittor name is given, the posting
was by the moderator, John S. Quarterman. Thanks to those
who submitted articles. However, the content of this note
is solely the responsibility of the author.
There are a number of problems with both cpio formats.
First, those related to the non-ASCII format:
1. Numerous parameters, including inode numbers, mode
bits, and user and group IDs, are kept in two-byte
binary integers. This has historically produced
serious byte-order problems when data is moved among
systems with different byte orders. <V11N09 Guy
Harris>
2. The byte-swapping and word-swapping options to the
cpio program are inadequate patches; with an ASCII
format the problem would not be present. The options
are not consistent across versions of the program: in
Page 2 tar vs. cpio IEEE P1003.1 N.___
System III, data blocks and file names are byte
swapped; in System V, only data blocks are byte
swapped. <V11N09 Guy Harris>
3. The two-byte integer format limits the range of inode
numbers to 1..65535. Many current file systems are
bigger than that. <V11N37 Paul Eggert> <V11N39 Henry
Spencer>
Non-ASCII cpio format is clearly not portable and should not
even be considered for standardization. <V11N12 Doug Gwyn>
There are several problems that occur even with the ASCII
cpio format:
1. Many implementations of cpio only look at the lower 16
(or even 15) bits of the inode number, even in ASCII
format. <V11N39 Henry Spencer> This is because the
variable that is used to contain the value is declared
to be unsigned short, just as in binary format. Thus,
even though ASCII cpio format does not constrain this
number, it is still less than portable. <V11N37 Paul
Eggert>
2. The proposed cpio ASCII format as specified, <N.048
Lorraine C. Kevra> <V11N14> is not portable because
the proposal assumes that sizeof(int) == sizeof(long).
<N.064 Dominic Dunlop> <V11N15>
3. The file type written in a numerical format, making it
UNIX specific rather than POSIX specific, since POSIX
(and tar) specifies symbolic, rather than numerical,
values for file types. <V11N09 Guy Harris>
4. Hard links are not handled well, since cpio format
does not record that two files are linked. If two
files that are linked are written in cpio format, two
copies will be written. There is an option to the
cpio program to detect duplicate files by matching
pairs of (h_dev, h_ino) and producing links, but that
is done after the fact. <V11N09 Guy Harris> (There is
a program, afio, that handles cpio format more
efficiently in this and other cases than the licensed
versions of the program.) <V11N21 Chuck Forsberg>
5. Symbolic links are not handled at all, and no type
value is reserved for them. This makes cpio useless
on a large class of historical implementations (those
based on 4.2BSD or its file system) for one of the
main purposes of POSIX sS10.1: archiving files for
later retrieval and use on the same system.
Page 3 tar vs. cpio IEEE P1003.1 N.___
6. The cpio format is less common than tar format: there
are few historical implementations from Version 7 on
that do not have tar; there are many that do not have
cpio. <V11N09 Guy Harris> <V11N10 Charles Hedrick>
<V11N24 Jim Cottrell> It is true that cpio (non-ASCII
format) was invented before tar, <V11N22 Joseph S. D.
Yao> apparently in PWB System 1.0. <V11N26 Joseph S.
D. Yao> However, cpio was not available outside AT&T
before the release of System III, while tar was in
wide use with Version 7 and is still much more common.
Also, it appears that the cpio format of PWB was not
the same as that of System III. <V11N39 Henry
Spencer> Although System III and perhaps early
releases of System V did not include tar, <V11N26
Joseph S. D. Yao> current releases of System V do.
7. It is very late in the process to propose that P1003.1
adopt cpio format now, especially considering that it
was originally proposed to and rejected by the
/usr/group committee before P1003.1 was even formed.
<V11N39 Henry Spencer>
There are several advantages to the current tar-based format
as specified in sS10.1:
1. There are no byte- or word-swapping issues caused by
the format, since all the header values are ASCII byte
streams. <V11N17 John Gilmore>
2. There are no inode numbers recorded, and file types
are kept in symbolic form, so the format is less
implementation-specific than cpio format. <V11N17
John Gilmore>
3. Historical tar format is the most widely used, as
discussed in 6. above, despite apparent assertions to
the contrary. <N.043 X/OPEN> <V11N13>
4. The format specified in sS10.1 is upward-compatible
with tar format. Old tar archives can be extracted by
a program that implements sS10.1. Archives using some
of the extensions of sS10.1 can be extracted with old
(Version 7) tar programs, although symbolic links will
not be extracted and contiguous files will not be
handled properly (cpio does not handle these
capabilities at all). Files with very long names will
not be handled properly (cpio does no better at this).
All tar implementations are compatible to this extent.
<V11N17 John Gilmore>
Page 4 tar vs. cpio IEEE P1003.1 N.___
5. The /usr/group working group and P1003.1 have already
done the work <P.061> <M.019 5.1.121 Pg.13> <RFC.003
#121> <P.038> <P.006> required to add optional
extensions (such as symbolic links, contiguous files,
and long file names) that are needed on many
historical implementations and that cpio format lacks.
6. The format is extensible for future facilities.
<V11N39 Henry Spencer>
7. There is a public domain implementation of the format
of sS10.1. That implementation provided feedback which
led to improvements in the current specification, and
has been in use for years in transferring data with
licensed tar implementations. <V11N17 John Gilmore>
8. Many people prefer the user interface of the cpio
program to that of the tar program, because the former
can accept a list of pathnames to archive on standard
input while the latter takes them as arguments,
limiting the length of the list. <V11N34 Andrew
Tannenbaum> However, the above-mentioned public domain
implementation of tar accepts pathnames on standard
input. <V11N17 John Gilmore> <V11N19 Jim Cottrell>
Diffs to standard tar to add an option to accept
pathnames on standard input when creating an archive
have also been posted to USENET. <V11N36 John
Gilmore> The user interface is, in any case,
irrelevant to P1003.1. <V11N39 Henry Spencer> <V11N40
Rahul Dhesi>
There are some problems that neither tar nor cpio handles
well.
1. An option to prevent crossing mount points would be
useful for backups. <V11N19 Jim Cottrell> <V11N22
Joseph S. D. Yao> However, this appears to be more of
an implementation issue than a format issue, <V11N28
Dave Brower> <V11N32 Joseph S. D. Yao> especially
considering that there are options to find in 4.2BSD,
<V11N24 Jim Cottrell> SunOS 3.2, <V11N36 John Gilmore>
and System V Release 3.0 <V11N35 Mike Akre> that take
care of this.
2. The default block size in many tar implementations is
too large for some tape controllers to read <V11N27
Rob Lake> (the 3B20 has this problem). This is not a
problem with the interchange format, however.
There is nothing that the proposed cpio can handle that the
tar-based format already in POSIX sS10.1 cannot handle; in
Page 5 tar vs. cpio IEEE P1003.1 N.___
fact, the former is less capable. If cpio format were
augmented to handle missing capabilities, it would be
subject to the same objections now aimed at the format given
in sS10.1: that it was not identical with an existing format.
There is no advantage in replacing the current tar-based
format of sS10.1 with cpio format. There is also no
advantage in adding cpio format, because two standards are
not as good as a single standard.
Some have recommended removing sS10.1 from POSIX altogether,
<V11N12 Doug Gwyn> perhaps with a recommendation for P1003.2
to pick up the idea. <V11N09 Guy Harris> While I believe
that that would be preferable to adding cpio format, whether
or not tar format remains, I recommend leaving sS10.1 as it
is, because
o+ The inclusion of an archive/interchange file format is
in agreement with the purpose of POSIX to promote
portability of application programs across interface
implementations. Some format will be used. It is to
the advantage of the users of the standard for there to
be a standard format.
o+ The de facto standard is tar format. The current sS10.1
standardizes that, and provides upward-compatible
extensions in areas that were previously lacking.
The Archive/Interchange File Format should be left as it is.
Thank you,
John S. Quarterman
Volume-Number: Volume 11, Number 41
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ucbvax!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8199@ut-sally.UUCP>
Date: Tue, 2-Jun-87 14:35:19 EDT
Article-I.D.: ut-sally.8199
Posted: Tue Jun 2 14:35:19 1987
Date-Received: Sat, 6-Jun-87 09:44:37 EDT
Sender: std-u...@ut-sally.UUCP
Reply-To: g...@sun.com (Guy Harris)
Lines: 53
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
From: g...@sun.com (Guy Harris)
I agree with the proposal; these are just some nits.
...meeting to drop sS10.1 altogether...
The sequence "s^HS" appears here, and in several other places - is
this intentional or a bizarre result from "nroff"?
[ It's nroff's attempt to produce a section sign. The actual note
will be formatted with troff, which can handle it. I will incorporate
your other comments. -mod ]
4. Hard links are not handled well, since cpio format
does not record that two files are linked. If two
files that are linked are written in cpio format, two
copies will be written. There is an option to the
cpio program to detect duplicate files by matching
pairs of (h_dev, h_ino) and producing links, but that
is done after the fact.
Actually, this is the standard way "cpio" handles hard links; it's
not an option.
5. Symbolic links are not handled at all, and no type
value is reserved for them. This makes cpio useless
on a large class of historical implementations (those
based on 4.2BSD or its file system) for one of the
main purposes of POSIX sS10.1: archiving files for
later retrieval and use on the same system.
(Another s^HS here) It is possible to extend this format to handle
symbolic links; we have done this.
[ But remember that what was proposed to P1003.1 was existing System V
cpio format. -mod ]
...However, cpio was not available outside AT&T
before the release of System III, while tar was in
wide use with Version 7 and is still much more common.
Actually, the old "cpio" was available with PWB/UNIX 1.0, which AT&T
did release.
Also, it appears that the cpio format of PWB was not
the same as that of System III. <V11N39 Henry
Spencer> Although System III and perhaps early
releases of System V did not include tar, <V11N26
Joseph S. D. Yao> current releases of System V do.
No, System III and all releases of S5 included "tar".
Volume-Number: Volume 11, Number 45
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: MIKEMAC%UNBMVS1.BIT...@wiscvm.wisc.edu (Michael MacDonald)
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8208@ut-sally.UUCP>
Date: Fri, 5-Jun-87 14:04:28 EDT
Article-I.D.: ut-sally.8208
Posted: Fri Jun 5 14:04:28 1987
Date-Received: Wed, 10-Jun-87 00:47:21 EDT
References: <8188@ut-sally.UUCP>
Sender: std-u...@ut-sally.UUCP
Reply-To: MIKEMAC%UNBMVS1.BIT...@wiscvm.wisc.edu
Lines: 77
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
From: MIKEMAC%UNBMVS1.BIT...@wiscvm.wisc.edu (Michael MacDonald)
I have just finished working on a CPIO tape reader and approx 1 year
ago a TAR tape reader for our IBM3090 180/vf running MVS/XA.
The following comments may be of interest as they come from a slightly
different point of view. I do not have significant *ix experience and
the following comments come as a result of trying to pick apart these tapes
when they are used for data interchange.
TAR and CPIO are *used* for purposes of backup AND data interchange.
TAR Format comments.
1) Data is written as blocks of 512 bytes. This allows for faster
processing and this is important for BIG files.
[ Most implementations allow using tape blocks larger than that. -mod ]
2) There is room left in the header. This allows for customization
by a site while still allowing other sites to read the tape
without using the customized version (if they do it right).
3) The length of the NAME and the LINKNAME field is not enough.
Extending the length to 256 would extend the header to 2 blocks
but I think that extending the length outweighs the disadvantages.
[ In addition to
#define NAMSIZ 100
char name[NAMSIZ];
POSIX Section 10.1 also has
#define PFXSIZ 155
char prefix[PFXSIZ];
which is used when name isn't big enough. The total of the two is set
to match the minimum permissible value of PATH_MAX. -mod ]
4) All of the tape drives that I have worked with (not that many)
are capable of writing a short block. If TAR would recognize
a physical end of file rather than two blocks of hex 00's.
This would solve a number of problems with TAR.
5) Limited amount of Unix dependent information in the header.
If a *backup* system is used for data interchange is it really
necessary to add many Operating System dependent features.
Are the advantages gained by using these dependencies *really*
advantages even in a backup system?
CPIO Format comments.
1) Data is not block oriented. This slows down processing
considerably.
2) There is no room left in the header. No customization
possible (without also sending the customized program).
3) Is 128 that much better than 100? See TAR note 3.
4) The CPIO end of file mark (TRAILER!!!) why not a physical EOF
See TAR note 4.
5) When it comes to OS dependent information the CPIO header is
full of it.
6) After writing the CPIO tape reader I came across a ?serious?
problem. (The following note is from the unix manual page cpio(4)
The h_name field is "h_namesize rounded to word" long. The
header must begin on a word boundary (although not documented).
The wordsize of the machine is not a CPIO option (as far as I can
tell). This means CPIO tapes cannot be read on a machine with
a different wordsize. I question if this "feature" should be
standardized without at least a wordsize option.
Michael MacDonald
Software Specialist, School of Computer Science
University of New Brunswick
Po. Box 4400
Fredericton, New Brunswick
CANADA E3B 5A3
(506) 453-4566
Netnorth/BITNET: MIKEMAC@UNB
Disclaimer: The opinions stated are mine, no one likes them around here either.
Volume-Number: Volume 11, Number 50
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8209@ut-sally.UUCP>
Date: Fri, 5-Jun-87 17:25:10 EDT
Article-I.D.: ut-sally.8209
Posted: Fri Jun 5 17:25:10 1987
Date-Received: Wed, 10-Jun-87 00:47:45 EDT
References: <8188@ut-sally.UUCP>
Reply-To: bst...@gorgo.uucp
Lines: 5
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
Since the tar and cpio comments keep coming in, I will let them collect
(posting them meanwhile) until about 16 June, after which I will incorporate
them into the note I posted previously and deliver same to IEEE P1003.1.
Volume-Number: Volume 11, Number 51
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8248@ut-sally.UUCP>
Date: Sun, 7-Jun-87 13:17:33 EDT
Article-I.D.: ut-sally.8248
Posted: Sun Jun 7 13:17:33 1987
Date-Received: Sun, 14-Jun-87 18:40:33 EDT
References: <8188@ut-sally.UUCP>
Sender: std-u...@ut-sally.UUCP
Reply-To: bi...@dvlmarv.uucp (Bill Jones)
Organization: Develcon Electronics, Toronto
Lines: 19
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
From: bi...@dvlmarv.uucp (Bill Jones)
In article <8...@ut-sally.UUCP> you write:
> However, cpio was not available outside AT&T
> before the release of System III, while tar was in
> wide use with Version 7 and is still much more common.
My memory is fuzzy now, but I recall cpio having been distributed on
the V7 addendum tape, whose other contents were (I think) fsck, the
line printer driver, and a c2 cured of certain overoptimism. (This is
a nit picked for historical accuracy only: I believed then, and still
do, that tar is the better format. I'm not even keen that this should be
posted, especially if you cannot verify the assertion.)
[ Can anybody verify this? -mod ]
--
Bill Jones, Develcon Electronics, 856 51 St E, Saskatoon S7K 5C7 Canada
uucp: ...ihnp4!sask!zaphod!billj phone: +1 306 931 1504
Volume-Number: Volume 11, Number 53
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8251@ut-sally.UUCP>
Date: Thu, 11-Jun-87 09:28:37 EDT
Article-I.D.: ut-sally.8251
Posted: Thu Jun 11 09:28:37 1987
Date-Received: Sun, 14-Jun-87 18:41:15 EDT
References: <8188@ut-sally.UUCP> <8208@ut-sally.UUCP>
Sender: std-u...@ut-sally.UUCP
Reply-To: david...@steinmetz.uucp (William E. Davidsen Jr)
Organization: General Electric CRD, Schenectady, NY
Lines: 62
Keywords: tar cpio
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
From: david...@steinmetz.uucp (William E. Davidsen Jr)
In article <8...@ut-sally.UUCP>:
>From: MIKEMAC%UNBMVS1.BIT...@wiscvm.wisc.edu (Michael MacDonald)
> TAR Format comments.
I realize this hasn't stopped some people, but I will pass on tar
comments because I'm not an expert.
>
> CPIO Format comments.
> 1) Data is not block oriented. This slows down processing
> considerably.
I miss this one. It may slow things under MVS, but there's no reason
why reading less physical data should slow things down. Quite the
opposite.
> 2) There is no room left in the header. No customization
> possible (without also sending the customized program).
This is a major advantage. Save us from "custom standard' format. The
custom stuff belongs in the *file*, not the format (in my opinion).
> 3) Is 128 that much better than 100? See TAR note 3.
Although I've never been bitten by this, it could be a problem. I'm not
sure that it justified scrapping a format which is widely used. cpio
does allow dumping from a relative directory if you have a system with
pathnames longer than the files.
> 4) The CPIO end of file mark (TRAILER!!!) why not a physical EOF
> See TAR note 4.
cpio will run nicely to other media sice as floppy disk and/or
removable disk packs. Most device drivers don't support any EOF on
these other than the physical size of the media. You can also have
multiple cpio dumps on a single file, although this is most useful when
doing incremental backups.
> 5) When it comes to OS dependent information the CPIO header is
> full of it.
We *are* talking about a U*IX standard here. For data interchange
between unlike systems we have the ANSI standard for tapes, which has
been around since at least 1975 because I wrote a driver for it on a
custom o/s. In FORTRAN. Yes, barf!
[ See comments in previous article about what IEEE 1003.1 is. -mod ]
> 6) After writing the CPIO tape reader I came across a ?serious?
> problem. (The following note is from the unix manual page cpio(4)
> The h_name field is "h_namesize rounded to word" long. The
> header must begin on a word boundary (although not documented).
> The wordsize of the machine is not a CPIO option (as far as I can
> tell). This means CPIO tapes cannot be read on a machine with
> a different wordsize. I question if this "feature" should be
> standardized without at least a wordsize option.
I confess I don't understand the wording here, but cpio is *not*
limited in this way as far as I can tell. I routinely transfer files
from Xenix (16 bit) to PC/IX (16 bit), to VAX, Sun3, and unix-pc (all
32 bit), and from time to time Cray2 (64 bit). It all works, so I think
the wording is at fault here, not the method.
--
bill davidsen (w...@ge-crd.arpa)
{chinet | philabs | sesimo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me
Volume-Number: Volume 11, Number 56
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!cbosgd!ihnp4!ptsfa!ames!rutgers!
husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8262@ut-sally.UUCP>
Date: Sun, 14-Jun-87 16:09:29 EDT
Article-I.D.: ut-sally.8262
Posted: Sun Jun 14 16:09:29 1987
Date-Received: Tue, 16-Jun-87 01:16:42 EDT
Sender: std-u...@ut-sally.UUCP
Reply-To: g...@sun.com (Guy Harris)
Lines: 17
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
From: g...@sun.com (Guy Harris)
> My memory is fuzzy now, but I recall cpio having been distributed on
> the V7 addendum tape, whose other contents were (I think) fsck, the
> line printer driver, and a c2 cured of certain overoptimism. ...
> [ Can anybody verify this? -mod ]
No, but I think I can refute it with a reasonable degree of accuracy.
It's been a while since I've seen the V7 addendum tape, but I don't
remember "cpio" being on it. (There were other things, like a
beefed-up F77, some fixes to "fgrep", and a newer version of "awk".
The versions that came with various 4BSD releases seemed to be the V7
addendum tape versions; "cpio" didn't come with any 4BSD release,
which suggests that "cpio" wasn't on the V7 addendum tape, although
it doesn't indicate it for sure.)
Volume-Number: Volume 11, Number 62
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!utgpu!water!watmath!clyde!rutgers!husc6!ut-sally!std-unix
From: std-u...@ut-sally.UUCP
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <8272@ut-sally.UUCP>
Date: Mon, 15-Jun-87 12:24:13 EDT
Article-I.D.: ut-sally.8272
Posted: Mon Jun 15 12:24:13 1987
Date-Received: Thu, 18-Jun-87 00:46:05 EDT
References: <8188@ut-sally.UUCP> <8248@ut-sally.UUCP>
Sender: std-u...@ut-sally.UUCP
Reply-To: r...@seismo.CSS.GOV (Rick Adams)
Organization: Center for Seismic Studies, Arlington, VA
Lines: 96
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
Summary: cpio was definitely NOT on the v7 addendum tape
From: r...@seismo.CSS.GOV (Rick Adams)
This is the README file off of the V7 addendum tape.
cpio is clearly not on the tape. Note that the addendum tape
is in TAR format! (That should say something...)
---rick
--------------BEGIN README from V7 addendum---------------------
Addenda to UNIX 7th edition distribution tape, 12/2/80.
Format: tar(1), 800 bpi.
Contents:
README: this descriptive file.
lp.c: the missing line printer driver that
belongs in /usr/sys/dev/lp.c.
The program comes from PWB, and needs minor
changes to work in version 7; see comment
at head of program.
lpr: a directory containing the lpr utility and
its daemon lpd.
See lpr/makefile for instructions on putting it
together.
lpd.8: the manual section for the line printer daemon.
fgrep.c: new source for fgrep(1) corrects certain
troubles with keys with common prefixes
c2: directory containing C optimizer cured of
certain instances of overoptimism.
The existing C makefile works
awk: directory with complete new awk processor,
see README and makefile therein
tmac.r: macros to simulate old "roff" in "nroff",
to support -mr option mentioned in roff(1)
f77: directory with complete new fortran compiler,
contains makefiles.
Further improvements to the I/O library have
been made at UC Berkeley, and may be obtainable
from them.
malloc.c: new source for malloc(3) corrects rare bug
dev: directory with more robust mag tape drivers for /usr/sys/dev
fsck: directory with new, stringent file system checking
program and manual section, far superior to old
[ind]check. It checks some data not maintained
by v7, in particular superblock counts; resulting
complaints are harmless
Other bug fixes:
/usr/sys/h/param.h: CMAPSIZ and SMAPSIZ
should both be defined as (NPROC/2)
otherwise trouble will occur with very large
/usr/sys/conf/low.s: replace br7+7. with br7+10.
memories
/usr/src/cmd/sed/sed0.c: delete continue after
case '\0' in compile()
/usr/src/cmd/cu.c: args 1 and 2 of some ioctl calls
may be interchanged
a ~ may be lacking from references to ECHO or CRMOD
in case (f == 1) of mode(f)
The following bugs exist, no fix is included.
(1) adb does not report floating registers correctly
(2) ldiv, lmod fail with largest negative dividend
(these implement division of longs in C);
the division (unsigned)32768/1 also fails
(3) dump(1) maintains ddate incorrectly.
This bug is relatively innocuous; it causes
more dumping than necessary on some occasions.
(4) join(1) treats null keys as end of file
(5) sort -t includes the following tab in some field comparisons
(6) hs(4) is irrevocably lost
(7) exec writes arguments into swap space with buffered
I/O, which may happen physically much later, after
the space has been used for a core image. The
solution is to preallocate
a portion of swap space to this single purpose.
(8) break is turned into a DEL regardless
of what the current interrupt character is
(?) and others, see warranty
Volume-Number: Volume 11, Number 65
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!seismo!ut-sally!std-unix
From: std-u...@ut-sally.UUCP (Moderator, John Quarterman)
Newsgroups: comp.std.unix
Subject: tar vs. cpio
Message-ID: <8280@ut-sally.UUCP>
Date: Wed, 17-Jun-87 10:22:34 EDT
Article-I.D.: ut-sally.8280
Posted: Wed Jun 17 10:22:34 1987
Date-Received: Sun, 21-Jun-87 09:29:50 EDT
Reply-To: std-u...@sally.utexas.edu
Lines: 361
Approved: j...@sally.utexas.edu (Moderator, John Quarterman)
Yesterday was 16 June, which was the day I said I would collect
tar and cpio comments until. Included below is the revised note
for P1003.1, incorporating those comments. I will deliver it
to P1003.1 in Seattle Monday.
tar vs. cpio IEEE P1003.1 N.___
17 June 1987
John S. Quarterman
Institutional Representative from USENIX
usenix!jsq
Secretary, IEEE Standards Board
Attention: P1003 Working Group
345 East 47th St.
New York, NY 10017
In both the Trial Use Standard and the current Draft 10,
POSIX sS10.1 describes a data interchange format based on the
tar program. That section has appeared in every draft of
IEEE 1003.1 in some form and has always been based on tar
format. The P1003.1 Working Group has recently received two
related proposals regarding that section: one to add cpio
format (including old-style, non-ASCII (non c option)
format); <N.048 Lorraine C. Kevra> <V11N14> <V11N25 Eric S.
Raymond> the other to replace the existing tar-based format
with cpio format. <N.043 X/OPEN> <V11N13> Some
clarifications were received to the former. <N.064 Dominic
Dunlop> <V11N15> It was also proposed verbally in the latest
Working Group meeting to drop sS10.1 altogether and let
P1003.2 handle the issue. <V11N08> <V11N11> <V11N09 Guy
Harris> <V11N12 Doug Gwyn>
The present note is a response to those proposals. Much of
the detail in it is derived from articles posted in the
USENET newsgroup comp.std.unix. Those articles are
referenced with this format: <V11N09 Guy Harris> which gives
the volume (always 11) and number of the article, and the
name of the submittor. If no submittor name is given, the
posting was by the moderator, John S. Quarterman. Thanks to
those who submitted articles. However, the content of this
note is solely the responsibility of the author.
This note is addressed to P1003.1, and is concerned with
data interchange formats. Although user interface issues
may be of interest to P1003.2, they are not addressed here.
There are a number of problems with both cpio formats.
First, those related to the non-ASCII format:
1. Numerous parameters, including inode numbers, mode
bits, and user and group IDs, are kept in two-byte
binary integers. This has historically produced
serious byte-order problems when data is moved among
systems with different byte orders. <V11N09 Guy
Harris>
Page 2 tar vs. cpio IEEE P1003.1 N.___
2. The byte-swapping and word-swapping options to the
cpio program are inadequate patches; with an ASCII
format the problem would not be present. The options
are not consistent across versions of the program: in
System III, data blocks and file names are byte
swapped; in System V, only data blocks are byte
swapped. <V11N09 Guy Harris> <V11N47 Andrew
Tannenbaum>
3. The two-byte integer format limits the range of inode
numbers to 0..65535. Many current file systems are
bigger than that. <V11N37 Paul Eggert> <V11N39 Henry
Spencer>
Non-ASCII cpio format is clearly not portable and should not
even be considered for standardization. <V11N12 Doug Gwyn>
There are several problems that occur even with the ASCII
cpio format:
1. Many implementations of cpio only look at the lower 16
(or even 15) bits of the inode number, even in ASCII
format. <V11N39 Henry Spencer> This is because the
variable that is used to contain the value is declared
to be unsigned short, just as in binary format. Thus,
even though ASCII cpio format only constrains this
number to the range 0..262143, the format is still
less than portable. <V11N37 Paul Eggert>
2. The proposed cpio ASCII format as specified, <N.048
Lorraine C. Kevra> <V11N14> is not portable because
the proposal assumes that sizeof(int) == sizeof(long).
<N.064 Dominic Dunlop> <V11N15>
3. The file type is written in a numerical format, making
it UNIX specific rather than POSIX specific, since
POSIX (and tar) specifies symbolic, rather than
numerical, values for file types. <V11N09 Guy Harris>
4. Hard links are not handled well, since cpio format
does not directly record that two files are linked.
If two files that are linked are written in cpio
format, two copies will be written. The cpio program
detects duplicate files by matching pairs of (h_dev,
h_ino) and producing links, but that is done after the
fact. <V11N09 Guy Harris> <V11N45 Guy Harris> <V11N54
Ian Donaldson> (There is a program, afio, that handles
cpio format more efficiently in this and other cases
than the licensed versions of the program.) <V11N21
Chuck Forsberg>
Page 3 tar vs. cpio IEEE P1003.1 N.___
5. Symbolic links are not handled at all, and no type
value is reserved for them. This makes cpio useless
on a large class of historical implementations (those
based on 4.2BSD or its file system) for one of the
main purposes of POSIX sS10.1: archiving files for
later retrieval and use on the same system. Although
it is possible to extend cpio to handle symbolic
links, and at least one vendor has done this, <V11N45
Guy Harris> the format proposed to P1003.1 is the
format in the SVID, and does not handle symbolic
links.
6. The cpio format is less common than tar format: there
are few historical implementations from Version 7 on
that do not have tar; there are many that do not have
cpio. <V11N09 Guy Harris> <V11N10 Charles Hedrick>
<V11N24 Jim Cottrell> It is true that cpio (non-ASCII
format) was invented before tar, <V11N22 Joseph S. D.
Yao> apparently in PWB System 1.0. <V11N26 Joseph S.
D. Yao> The cpio program was first available outside
AT&T with PWB/UNIX 1.0, <V11N45 Guy Harris> <V11N63
Joseph S. D. Yao> and later with System III. However,
in the interim, Version 7, which did not include cpio
<V11N53 Bill Jones> <V11N62 Guy Harris> but did
include tar, became the most influential system.
There was a V7 addendum tape, but it also did not
include cpio (according to its README file); <V11N65
Rick Adams> the addendum tape was in tar format.
Also, it appears that the cpio format of PWB was not
the same as that of System III. <V11N39 Henry
Spencer> And System III and all releases of System V
include tar. <V11N26 Joseph S. D. Yao> <V11N63 Joseph
S. D. Yao> <V11N45 Guy Harris> <V11N47 Andrew
Tannenbaum>
7. It is very late in the process to propose that P1003.1
adopt cpio format now, especially considering that it
was originally proposed to and rejected by the
/usr/group committee before P1003.1 was even formed.
<V11N39 Henry Spencer>
Advantages of cpio format include:
1. Both X/OPEN <N.043 X/OPEN> <V11N13> and the SVID
<N.048 Lorraine C. Kevra> <V11N14> use it, although
evidently defined somewhat differently. <N.064
Dominic Dunlop> <V11N15>
2. Archives made in cpio format are often smaller than
ones in tar format. <V11N44 Mark Horton> But this is
only because of the headers, and thus the effect
Page 4 tar vs. cpio IEEE P1003.1 N.___
diminishes with larger files.
3. On a local (non-networked) system, cpio is more
efficient at copying directory trees than tar.
<V11N46 Steve Blasingame> However, this is really an
implementation issue.
There are several advantages to the current tar-based format
as specified in sS10.1:
1. There are no byte- or word-swapping issues caused by
the format, since all the header values are ASCII byte
streams. <V11N17 John Gilmore>
2. There are no inode numbers recorded, and file types
are kept in symbolic form, so the format is less
implementation-specific than cpio format. <V11N17
John Gilmore>
3. Historical tar format is the most widely used, as
discussed in 6. above, despite apparent assertions to
the contrary. <N.043 X/OPEN> <V11N13>
4. The format specified in sS10.1 is upward-compatible
with tar format. Old tar archives can be extracted by
a program that implements sS10.1. Archives using some
of the extensions of sS10.1 can be extracted with old
(Version 7) tar programs, although symbolic links will
not be extracted and contiguous files will not be
handled properly (cpio does not handle these
capabilities at all). Files with very long names will
not be handled properly (cpio does no better at this).
All tar implementations are compatible to this extent.
<V11N17 John Gilmore>
5. The /usr/group working group and P1003.1 have already
done the work <P.061> <M.019 5.1.121 Pg.13> <RFC.003
#121> <P.038> <P.006> required to add optional
extensions (such as symbolic links, long file names,
<V11N49 Jerry Schwarz> <V11N50 Michael MacDonald> and
contiguous files) that are needed on many historical
implementations and that cpio format lacks.
6. The format is extensible for future facilities.
<V11N39 Henry Spencer>
7. There is a public domain implementation of the format
of sS10.1. That implementation provided feedback which
led to improvements in the current specification, and
has been in use for years in transferring data with
licensed tar implementations. <V11N17 John Gilmore>
Page 5 tar vs. cpio IEEE P1003.1 N.___
8. Many people prefer the user interface of the cpio
program to that of the tar program, because the former
can accept a list of pathnames to archive on standard
input while the latter takes them as arguments,
limiting the length of the list. <V11N34 Andrew
Tannenbaum> However, the above-mentioned public domain
implementation of tar accepts pathnames on standard
input, <V11N17 John Gilmore> <V11N19 Jim Cottrell> and
at least one vendor sells a version of tar that can do
this. <V11N48 Michael Gersten> Diffs to standard tar
to add an option to accept pathnames on standard input
when creating an archive have also been posted to
USENET. <V11N36 John Gilmore> The user interface is,
in any case, irrelevant to P1003.1. <V11N39 Henry
Spencer> <V11N40 Rahul Dhesi>
Disadvantages of tar format:
1. If an attempt is made to extract only the second of a
pair of hard linked files the tar program will attempt
to link the second file to the nonexistent first file,
and nothing will be extracted. Although a
sufficiently clever implementation could avoid this,
the problem can be considered to be in the archive
format. <V11N66 Kenneth Almquist>
There are some problems that neither tar nor cpio handles
well.
1. File names still longer than the length of PATH_MAX
(at least 255) <V11N50 Michael MacDonald> that the
POSIX format allows (and than the 128 that cpio
permits or than the 100 that historical tar allows)
would be preferable, although the POSIX limit is
useful for most cases. <V11N54 Ian Donaldson>
2. An option to prevent crossing mount points would be
useful for backups. <V11N19 Jim Cottrell> <V11N22
Joseph S. D. Yao> However, this appears to be more of
an implementation issue than a format issue, <V11N28
Dave Brower> <V11N32 Joseph S. D. Yao> especially
considering that there are options to find in 4.2BSD,
<V11N24 Jim Cottrell> SunOS 3.2, <V11N36 John Gilmore>
and System V Release 3.0 <V11N35 Mike Akre> that take
care of this.
3. The default block size in many tar implementations is
too large for some tape controllers to read <V11N27
Rob Lake> (the 3B20 has this problem). This is not a
problem with the interchange format, however.
Page 6 tar vs. cpio IEEE P1003.1 N.___
There is nothing that the proposed cpio can handle that the
tar-based format already in POSIX sS10.1 cannot handle; in
fact, the former is less capable. If cpio format were
augmented to handle missing capabilities, it would be
subject to the same objections now aimed at the format given
in sS10.1: that it was not identical with an existing format.
There is no advantage in replacing the current tar-based
format of sS10.1 with cpio format. There is also no
advantage in adding cpio format, because two standards are
not as good as a single standard.
Some have recommended removing sS10.1 from POSIX altogether,
<V11N12 Doug Gwyn> perhaps with a recommendation for P1003.2
to pick up the idea. <V11N09 Guy Harris> While I believe
that that would be preferable to adding cpio format, whether
or not tar format remains, I recommend leaving sS10.1 as it
is, because
o+ The inclusion of an archive/interchange file format is
in agreement with the purpose of POSIX to promote
portability of application programs across interface
implementations. Some format will be used. It is to
the advantage of the users of the standard for there to
be a standard format.
o+ The de facto standard is tar format. The current sS10.1
standardizes that, and provides upward-compatible
extensions in areas that were previously lacking.
The Archive/Interchange File Format should be left as it is.
Thank you,
John S. Quarterman
Volume-Number: Volume 11, Number 67
Relay-Version: version B 2.10 5/3/83; site utzoo.UUCP
Path: utzoo!mnetor!uunet!std-unix
From: std-u...@uunet.UU.NET (Moderator, John Quarterman)
Newsgroups: comp.std.unix
Subject: Re: tar vs. cpio
Message-ID: <638@uunet.UU.NET>
Date: Wed, 15-Jul-87 00:43:41 EDT
Article-I.D.: uunet.638
Posted: Wed Jul 15 00:43:41 1987
Date-Received: Fri, 17-Jul-87 01:49:04 EDT
Lines: 50
Approved: j...@uunet.uu.net (Moderator, John Quarterman)
Organisation: USENIX Association
From: usenix!jsq (John S. Quarterman)
A belated report about the Seattle P1003 meeting,
regarding section 10.
No one proposes non-ASCII cpio format any more.
A revised cpio proposal was received. It is in
appropriate format for P1003.1, but is still straight
System V cpio.
The proposer of that proposal has agreed to supply
an updated proposal, including optional extensions
for symbolic links, contiguous files, and a general
method of extension. This is analogous to what is
already in Draft 10 about the ustar format.
P1003.1 Draft 11 will include the updated cpio proposal
in addition to the already-present ustar format.
Some notes have been moved from Section 10 into the Rationale.
The introductory matter in 10.1 about the user of permission
information on extraction of archives has been reworded, mostly
to avoid the word "utility" (this is 1003.1, i.e., the programming
language interface standard, that we are discussing.)
A note is expected from X/OPEN to address the issues raised in my
previous note (IEEE 1003.1 N.100, "tar vs. cpio"), and to include
some comments about the motivation for the cpio proposals.
The cpio proponents have been invited to post that note and
the new cpio proposal in this newsgroup.
N.100 will appear in the next issue of ;login:, the Newsletter
of the USENIX Association. The cpio proponents have been
invited to submit equivalent material. There is a possibility
that similar articles may appear in the EUUG newsletter.
An actual decision on what format(s) will be in the IEEE 1003.1
Full Use Standard is expected at the September meeting in
Nashua, New Hampshire. Though, of course, there is still the
possibility that it will be determined in actual balloting.
[ Note that I am posting this report as the USENIX Institutional
Representative to IEEE P1003, not as the moderator. Replies
and related submissions are solicited. -mod ]
Volume-Number: Volume 11, Number 91