--MimeMultipartBoundary
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Henrik Nordstrom wrote:
>
> Alex Rousskov wrote:
>
> > If you go back to the previous posts on this topic, you will
> > see that by SquidFS I mean "big", "DB-style" files with many
> > objects per file _on top_ of an existing FS. Thus, we gain
> > performance benefits (see previous posts for a long list) and
> > preserve advantages of Unix FS (recovery and such).
>
> No. You lose the structural consistency checks, block allocation and the
> crash recovery built into a os FS. Using one big file, or writing
> directly to a partition is essentially the same thing (except that
> writing directly to a partition is slightly less overhead).
>
> There is many things one can do to optimize performance of a standard
> filesystem: tuning OS/FS parameters, design the application in such a
> way that the OS caches gets used in a efficient way, and using
> asyncronous features to avoid blocking. Much of this is needed when
> implementing a private FS as well.
>
> What we gain by using a private FS is the FS overhead imposed by having
> one extra level of indirection (the directories), and the user<->kernel
> mode switch overhead on open/close operations.
It might be worth considering as a disk storage module, however. For
many cache-setups disk-access is not going to be a noticeable
bottleneck. For example, for Brisnet, I simply asked for "the slowest,
largest HD I can get for $X". Even if we triple our link capacity, a
slow IDE is going to keep pace for the foreseeable future.
Storing the cache as (say) an enormous gdbm has an advantage in
inodes..you can configure a minimal number on the cache-filesystem and
make use of that extra space. MUD and MUSH servers use [g|n]dbm for just
this sort of grunt work, usually with smaller objects, but sometimes
with objects of comparable size to our notional 'average 13K' object.
They also usually utilise a memory-cache to front-end the database, but
the database performance seems quite acceptable, regardless.
Of course, I have my doubts about this sort of thing being usable for
larger setups, especially ones where disk i/o is a critical item.
It's like 'sensible behaviour', in a way. Most people seem to need to
override it at some point and do something apparently absurd or wrong
with a piece of software. Imperfect world.
> > Is reducing hit response time by half "huge"?
>
> It depends. Given the nature of uncached HTTP it probably isn't. I would
> say that high sustained throughtput is far more important than snappy
> (as opposed to quick) response times, and of course most important of
> all is stability, and speedy recovery when it do fail.
I ain't all that big on snappy response times on hits. It's nice, I
guess, but the users may never notice it...not if they're on dialup
links or using slower workstations than the network administrator, or
whatever. That said, the slower the response is, the higher the
file-descriptor usage is for a given request rate. If requests last
longer, on average, then the average level of consumed file-descriptors
increases.
We noticed this on the schnet setup recently, where we were so close to
the wall on file-descriptors, that minor network jitters on the
australian backbone caused us to max out. Just a slight fluctuation was
all it took.
So, hit response times may be more important for existing larger caches
than we thought. Of course, most of those probably run with large
configured limits, and are not a problem...but there are all sorts of
flow-on effects of these little details.
> > At least third of hit response time is due to local disk I/O.
> > The actual effect is probably bigger (>=50%?) because disk I/Os
> > compete with network I/Os in one select loop. I have no hard proof
> > that SquidFS can eliminate all of this overhead, of course. These
> > are just estimations.
>
> I/O will always be there unless you have enought ram to keep all objects
> in ram. And if the OS is tuned appropriately (enought directory and
> inode caches, and no atime updates) the number of disk accesses should
> be roughtly the same.
<digress>
Apparently there's this fellow in Brisbane who's designed a
Gigabytes/Terabytes-of-static-RAM-on-something-the-size-of-a-PCMCIA-CARD-with-a-hard-disk-interface-on-it
thing. Skeptical, though I was, apparently some company has examined the
dingus, signed him up, and started tooling up for manufacture. It's
possible our global storage situation may change suddenly in a year or
so.
Curious, no?
</digress>
D
-- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GAT d- s++: a C++++$ UL++++B+++S+++C++H++U++V+++$ P+++$ L+++ E- W+++(--)$ N++ w++$>--- t+ 5++ X+() R+ tv b++++ DI+++ e- h-@ ------END GEEK CODE BLOCK------ --MimeMultipartBoundary--Received on Tue Jul 29 2003 - 13:15:47 MDT
This archive was generated by hypermail pre-2.1.9 : Tue Dec 09 2003 - 16:11:45 MST