From: Rik van Riel <r...@conectiva.com.br>
Subject: TODO list for new VM
Date: 2000/09/16
Message-ID: <linux.kernel.Pine.LNX.4.21.0009160544000.1519-100000@duckman.distro.conectiva>#1/1
X-Deja-AN: 670474882
Approved: n...@nntp-server.caltech.edu
X-To: linux...@kvack.org
Content-Type: TEXT/PLAIN; charset=US-ASCII
MIME-Version: 1.0
X-cc: linux-ker...@vger.kernel.org, Linus Torvalds
<torva...@transmeta.com>, Matthew Dillon <dil...@apollo.backplane.com>
Newsgroups: mlist.linux.kernel
Hi,
Here is the TODO list for the new VM. The only thing
really needed for 2.4 is the OOM handler and the
page->mapping->flush() callback is really wanted by
the journaling filesystem folks.
The rest are mostly extra's that would be nice; these
things won't be pushed for inclusion except if it turns
out to be really trivial to implement, high performance
on the cases they're supposed to affect and their influence
is highly localised...
(sorry folks, but for 2.4 I'll be really conservative)
---> TODO list for the new VM <---
for kernel 2.4, necessary:
- out of memory handling
[integrate the OOM killer, 10 minutes work]
for kernel 2.4, really wanted:
- page->mapping->flush() callback in page_launder(),
for easier integration with journaling filesystems
and maybe the network filesystems
[about 30 minutes of work on the VM side]
for kernel 2.4, wanted:
- include Ben LaHaise's code, which moves readahead
to the VMA level, this way we can do streaming swap
IO, complete with drop_behind()
- code to make the "knee" smoother, currently the system
keeps eating memory from the cache up to a certain point
and then starts to swap a lot, it would be nice to smooth
this curve a bit
- thrashing control, maybe process suspension with some
forced swapping ?
for kernel 2.5:
- physical->virtual reverse mapping, so we can do much
better page aging with less CPU usage spikes
- better IO clustering for swap (and filesystem) IO
- move all the global VM variables, lists, etc. into
the pgdat struct for better NUMA scalability
- (maybe) some QoS things, as far as they are major
improvements with minor intrusion
regards,
Rik
--
"What you're running that piece of s*** Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
http://www.conectiva.com/ http://www.surriel.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br>
Subject: TODO list for new VM (oct 2000)
Date: 2000/10/02
Message-ID: <linux.kernel.Pine.LNX.4.21.0010021447430.22539-100000@duckman.distro.conectiva>#1/1
X-Deja-AN: 676743861
Approved: n...@nntp-server.caltech.edu
X-To: linux-ker...@vger.kernel.org
Content-Type: TEXT/PLAIN; charset=US-ASCII
MIME-Version: 1.0
X-cc: linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com>,
Linus Torvalds <torva...@transmeta.com>
Newsgroups: mlist.linux.kernel
[MM TODO list, updated for october 2000]
---
Here is the TODO list for the new VM. The only thing
really needed for 2.4 is the OOM handler and a fix
for the highmem deadlock.
The page->mapping->flush() callback is really wanted
by the journaling filesystem folks.
The rest are mostly extra's that would be nice; these
things won't be pushed for inclusion except if it turns
out to be really trivial to implement, high performance
on the cases they're supposed to affect and their influence
is highly localised...
(sorry folks, but for 2.4 I'll be really conservative)
---> TODO list for the new VM <---
for kernel 2.4, necessary:
- out of memory handling
[integrate the OOM killer, 10 minutes work]
- fix the highmem deadlock, where the swapper cannot create
low memory bounce buffers OR swap out low memory because
it has consumed all resources
[old bug, already reported with 2.4.0-test6, probably before]
for kernel 2.4, really wanted:
- page->mapping->flush() callback in page_launder(),
for easier integration with journaling filesystems
and maybe the network filesystems
[about 30 minutes of work on the VM side]
for kernel 2.4, wanted:
- maybe rebalance the swapper a bit ... we do page aging
now so maybe refill_inactive_scan() / shm_swap() and
swap_out() need to be rebalanced a bit
for kernel 2.5: (maybe available as patch for 2.4 ???)
- physical->virtual reverse mapping, so we can do much
better page aging with less CPU usage spikes
- better IO clustering for swap (and filesystem) IO
- move all the global VM variables, lists, etc. into
the pgdat struct for better NUMA scalability
- (maybe) some QoS things, as far as they are major
improvements with minor intrusion
- thrashing control, maybe process suspension with some
forced swapping ?
- include Ben LaHaise's code, which moves readahead
to the VMA level, this way we can do streaming swap
IO, complete with drop_behind()
regards,
Rik
--
"What you're running that piece of s*** Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
http://www.conectiva.com/ http://www.surriel.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
From: Linus Torvalds <torva...@transmeta.com>
Subject: Re: TODO list for new VM (oct 2000)
Date: 2000/10/02
Message-ID: <linux.kernel.Pine.LNX.4.10.10010021117540.828-100000@penguin.transmeta.com>#1/1
X-Deja-AN: 676743872
Approved: n...@nntp-server.caltech.edu
X-To: Rik van Riel <r...@conectiva.com.br>
Content-Type: TEXT/PLAIN; charset=US-ASCII
MIME-Version: 1.0
X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org,
Matthew Dillon <dil...@apollo.backplane.com>
Newsgroups: mlist.linux.kernel
Why do you apparently ignore the fact that page-out write-back performance
is horribly crappy because it always starts out doing synchronous writes?
I pointed out previously in a private email that page_launder() must be
buggy as it stands now, you seem to have ignored that part (and the
test-program that shows 1MB/s writeout speeds due to it) completely.
The whole _point_ of the new VM was performance. Without that, the new VM
is pointless, and discussing TODO features is equally pointless.
Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br>
Subject: Re: TODO list for new VM (oct 2000)
Date: 2000/10/02
Message-ID: <linux.kernel.Pine.LNX.4.21.0010021524140.22539-100000@duckman.distro.conectiva>#1/1
X-Deja-AN: 676743875
Approved: n...@nntp-server.caltech.edu
X-To: Linus Torvalds <torva...@transmeta.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
MIME-Version: 1.0
X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org,
Matthew Dillon <dil...@apollo.backplane.com>
Newsgroups: mlist.linux.kernel
On Mon, 2 Oct 2000, Linus Torvalds wrote:
> Why do you apparently ignore the fact that page-out write-back
> performance is horribly crappy because it always starts out
> doing synchronous writes?
Because it is fixed in the patch I mailed yesterday?
regards,
Rik
--
"What you're running that piece of s*** Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
http://www.conectiva.com/ http://www.surriel.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
From: Rik van Riel <r...@conectiva.com.br>
Subject: Re: TODO list for new VM (oct 2000)
Date: 2000/10/02
Message-ID: <linux.kernel.Pine.LNX.4.21.0010021531360.22539-100000@duckman.distro.conectiva>#1/1
X-Deja-AN: 676743870
Approved: n...@nntp-server.caltech.edu
X-To: Linus Torvalds <torva...@transmeta.com>
Content-Type: TEXT/PLAIN; charset=US-ASCII
MIME-Version: 1.0
X-cc: linux-ker...@vger.kernel.org, linux...@kvack.org,
Matthew Dillon <dil...@apollo.backplane.com>
Newsgroups: mlist.linux.kernel
On Mon, 2 Oct 2000, Rik van Riel wrote:
> On Mon, 2 Oct 2000, Linus Torvalds wrote:
>
> > Why do you apparently ignore the fact that page-out write-back
> > performance is horribly crappy because it always starts out
> > doing synchronous writes?
>
> Because it is fixed in the patch I mailed yesterday?
One small warning though. Please don't apply that patch
yet because I fixed 3 more small problems today. I'll
send you an updated patch...
- the compile warnings are fixed
- in try_to_free_pages(), we forgot to set
PF_MEMALLOC in current->flags (oops)
- in grow_buffers(), in case we cannot get a
buffer head, we must unlock the page
A patch against 2.4.0-test9-pre8 with these 3 changes will
be on its way once I've tested it a bit...
regards,
Rik
--
"What you're running that piece of s*** Gnome?!?!"
-- Miguel de Icaza, UKUUG 2000
http://www.conectiva.com/ http://www.surriel.com/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/
From: Matthew Dillon <dil...@apollo.backplane.com>
Subject: Re: TODO list for new VM (oct 2000)
Date: 2000/10/04
Message-ID: <linux.kernel.200010050108.SAA83892@apollo.backplane.com>#1/1
X-Deja-AN: 677710359
Approved: n...@nntp-server.caltech.edu
X-To: Rik van Riel <r...@conectiva.com.br>
X-Cc: Linus Torvalds <torva...@transmeta.com>, linux-ker...@vger.kernel.org,
linux...@kvack.org, Matthew Dillon <dil...@apollo.backplane.com>
Newsgroups: mlist.linux.kernel
:On Mon, 2 Oct 2000, Rik van Riel wrote:
:> On Mon, 2 Oct 2000, Linus Torvalds wrote:
:>
:> > Why do you apparently ignore the fact that page-out write-back
:> > performance is horribly crappy because it always starts out
:> > doing synchronous writes?
:>
:> Because it is fixed in the patch I mailed yesterday?
:
:One small warning though. Please don't apply that patch
:yet because I fixed 3 more small problems today. I'll
:send you an updated patch...
:...
:regards,
:
:Rik
My experience with FreeBSD's asynchronous paging
is that you have to carefully limit the number of
I/O's you queue at once. Or, more specifically, you
have to limit the seeking load the async pageouts
place on the system.
The performance curve from the point of user processes
in the system looks like a bell, while the paging
performance looks like a log curve (increased performance
with diminishing returns)... if you queue too few
pages (degenerate into synchronous paging), you have low
paging performance and high user process performance,
but you can't clean pages fast enough in a heavily loaded
system. If you queue too many pages at once, you have
high paging performance (but with diminishing returns)
and low user process performance due to the seeking
load you've placed on the disk. Excessive seeking
from pageouts will ruin the disk's performance from
the point of view of other processes in the system.
FreeBSD has a sysctl variable called vm.max_page_launder
which limits the number of pages the pageout daemon
will queue to I/O at once. The default is 32. Numbers
between 16 and 32 were found to fit the sweet spot of
the curve the best. Numbers lower then 16 reduced
system performance because potentially contiguous pageouts
would get split (causing more seeking rather then less when
mixed with I/O initiated from user processes), and numbers
higher then 32 reduced user process performance due to the
additional seeking from the queued pageouts.
The sysadmin can adjust the value to effectively give
paging more or less priority. A smaller number reduces
paging performance but increasing system performance
for other processes (though anything less then 4 will
reduce performance for everyone). A higher number
increases paging performance at the cost of system
performance for other processes. Virtually all FreeBSD
installations that I know about leave the sysctl variable
alone.
Note that the performance bell holds true whether you
sort disk requests or not, the whole bell simply moves up
or down on the graph.
There are a number of things that can be done to mitigate
the seeking issue, which I discussed with Rik a few months
ago. The jist of it, though, is that there is a trade-off
between page-in and page-out performance based on how you
try to cluster swap allocation. FreeBSD clusters swap
allocations to optimize page-out performance at the cost
of page-in performance and that seems to work very
well under heavy system loads.
-Matt
Matthew Dillon
<dil...@backplane.com>
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/