On 13/03/2012 10:14 p.m., Alexander Komyagin wrote:
> Hello. We're now trying to give a chance to the new Squid 3.2 on our
> server, mainly because of it's SMP feature. But our tests are showing
> that 3.2 (3.2.0.14 and 3.2.0.16 were tested) performance is noticeably
> lower than 3.1 (3.1.15).
>
> We're using "httperf --client=0/1 --hog --server x.x.x.x --rate=100
> --num-conns=1000 --timeout=5 --num-calls=10" for testing. And for 3.2
> it's showing about 140 client timeouts (from 1000), while for 3.1 there
> are no errors at all.
>
> Different workers numbers were checked (1,2,4), but results are still
> the same -- completely unchanged -- which is rather _strange_, since as
> far as I know (by squid website and source browsing), in our
> configuration workers shall NOT share anything but one listening socket
> (y.y.y.y:3128).
> More than that, CPU use is _only_ about 20% per worker (2 CPU's - 2
> workers), vmstat reports no high memory consumption and iostat reports
> 0% on iowait.
>
> Also according to logs, that clients timeouts are caused by some of new
> connections not being spotted and accepted as well (not gone through
> doAccept() routine from TcpAcceptor.cc).
That is sounding very much like a kernel issue, or TCP accept rate
limiting issue.
Once a TCP connection is picked up by oldAccept() in the doAccept()
sequence the results can be attributed to Squid, but if they never
actually arrive there something is wrong at a deeper level down around
the TCP stack or sockets libraries.
If it is onlly hapening with workers it would seem to be something to
do with the socket sharing at the library level.
> Note that we're building squid against ulibc, so some differences with
> glibc may take part. Also squid cache is disabled for now and squid is
> configured as transparent proxy (config file attached). Config options
> are:'--build=x86_64-linux' '--host=x86_64-altell-linux-uclibc'
> '--target=x86_64-altell-linux-uclibc' '--prefix=/usr'
> '--exec_prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin'
> '--libexecdir=/usr/libexec' '--datadir=/usr/share' '--sysconfdir=/etc'
> '--sharedstatedir=/com' '--localstatedir=/var' '--libdir=/usr/lib'
> '--includedir=/usr/include' '--oldincludedir=/usr/include'
> '--infodir=/usr/share/info' '--mandir=/usr/share/man' '--disable-epoll'
> '--disable-nls' '--enable-ssl' '--enable-auth-digest=LDAP'
> '--enable-auth-basic=LDAP' '--enable-epoll' '--enable-icap-client'
Did you want to disable or enable epoll?
> '--with-dl' '--enable-linux-netfilter'
> '--disable-negotiate-auth-helpers' '--with-krb5-config=no'
> '--enable-external-acl-helpers=wbinfo_group' '--enable-auth'
> '--enable-auth-ntlm=smb_lm,fake'
>
> To check that workers share the listening socket in a proper way and
> principally can deal with all the connections, I have modified
> parseHttpRequest() function for one kid to return Method-Not-Allowed for
> all requests. That way all 1000 connections were successfully
> established, with more than 800 of them refused due to this modification
> (which is OK).
>
> So it seems to me that either Squid kid's share some resources while
> processing client requests, or I am completely missing something
> important.
Each worker grabs the connections it can until loaded. By responding
immediately you have raised the loading capacity of that one worker
very, very much. I achieved >300K req/sec using a small in-memory cached
object when testing the 3.2 accept() routines without workers. Your
fixed-response should have an even higher capacity as it cuts out the
caching and most parse logics.
So from your results I conclude that one worker grabbed almost all
the traffic and responded OK. But there is insufficient data about the
interesting part of the traffic. What was going on there? which kid
serviced it? You can use debug_options level 11,2 to get a cache.log
dump of the HTTP received for correlation.
Amos
Received on Mon Mar 19 2012 - 05:07:15 MDT
This archive was generated by hypermail 2.2.0 : Tue Mar 20 2012 - 12:00:07 MDT