PATCH: Rewrite of truncate_inode_pages (WIP)

From: Juan J. Quintela (quintela@fi.udc.es)
Date: Mon May 29 2000 - 20:29:23 EST


Hi
        the actual truncate_inode_pages can actually *write* part of
the file that has just been truncated. That makes that things doesn't
work. I can trigger the BUG in fs:/inode.c:clear_inode using mmap002
and riel aging patch in 2.4.0-test1. The BUG triggered is:

        if (inode->i_data.nrpages)
                BUG();

The problem is that inode->i_data.nrpages can become negative. I have
got here -6 truncating a file of 384MB.

This patch deals with this problem, and try to do truncate_inode_pages
safe.

Notice that the function only works for truncate complete files, I
have left the old function for the partial case. If people think that
this function is OK, I will incorporate the partial case. I have no
done it because I want feedback from the MM people and the FS people,
and the algorithms for complete files are cleaner.

I would like to now if it is possible to have a locked buffer without
the page being locked, i.e. If I can manipulate the buffers in one
page without locking them when I have the page lock. (I don't think
so, and then I have to lock also buffers).

With this patch things are rock solid here.

This patch does:

- defines a new function: discard_buffer, it will lock the buffer
  (waiting if needed) and remove the buffer from all the queues.
  It is like unmap_buffer, but makes sure that we don't do any IO
  and that we remove the buffer from all the lists.

- defines a new function: block_destroy_buffers: it is a mix of
  block_flushpage and do_try_to_free_pages. It will make all the
  buffers in that page disappear calling discard_buffer. Notice the
  way that we iterate through all the buffer heads. I think that it
  is not racy, but I would like to hear comments from people than know
  more about buffer heads handling.

- It changes __remove_inode_pages to check that there are not buffers
  in that page. Users of the function must make sure about that. It
  changes all the callers of the function satisfy that requirement.

- I change invalidate_inode_pages (again). Now block_destroy_buffers
  can wait, then we are *civilized* citizens and drop any lock that
  we have before call that block_destroy_buffers, and reaquire later.

- It has my *old* rewrite of truncate_inode_pages to use two auxiliary
  functions and lock the same way for the partial page and for the
  rest. (That code is wrong anyway, pass to the next function).

- The new function truncate_inode_pages, that is a copy of
  invalidate_inodes, but we wait for locked pages.

Comments, please??

I am very interested in the coments from the fs-people. What they
think about the locking that I am using in the buffer.c file.

Acknolegments: This patch would have not been possible without the comments or
                arjan, bcrl, davej, riel and RogerL, .... in
                #kernelnewbies (irc.openprojects.net). They did a lot
                of review of previous versions of the patch. The
                errors are mine, but good ideas come from discussions
                with them :)

Later, Juan.

diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/fs/buffer.c testing/fs/buffer.c
--- base/fs/buffer.c Tue May 30 01:21:47 2000
+++ testing/fs/buffer.c Tue May 30 02:33:50 2000
@@ -1281,6 +1281,56 @@
         }
 }
 
+/**
+ * discard_buffer - discard that buffer without doing any IO
+ * @bh: buffer to discard
+ *
+ * This function removes a buffer from all the queues, without doing
+ * any IO, we are not interested in the contents of the buffer. This
+ * function can block if the buffer is locked.
+ */
+static struct buffer_head *discard_buffer(struct buffer_head * bh)
+{
+ int index = BUFSIZE_INDEX(bh->b_size);
+ struct buffer_head *next;
+
+ /* grab the lru lock here to block bdflush. */
+ atomic_inc(&bh->b_count);
+ lock_buffer(bh);
+ next = bh->b_this_page;
+ clear_bit(BH_Uptodate, &bh->b_state);
+ clear_bit(BH_Mapped, &bh->b_state);
+ clear_bit(BH_Req, &bh->b_state);
+ clear_bit(BH_New, &bh->b_state);
+
+ spin_lock(&lru_list_lock);
+ write_lock(&hash_table_lock);
+ spin_lock(&free_list[index].lock);
+ spin_lock(&unused_list_lock);
+
+ if (!atomic_dec_and_test(&bh->b_count))
+ BUG();
+
+ __hash_unlink(bh);
+ /* The bunffer can be either on the regular
+ * queues or on the free list..
+ */
+ if (bh->b_dev != B_FREE)
+ __remove_from_queues(bh);
+ else
+ __remove_from_free_list(bh, index);
+ __put_unused_buffer_head(bh);
+ spin_unlock(&unused_list_lock);
+ write_unlock(&hash_table_lock);
+ spin_unlock(&free_list[index].lock);
+ spin_unlock(&lru_list_lock);
+ /* We can unlock the buffer, we have just returned it.
+ * Ditto for the counter
+ */
+ return next;
+}
+
+
 /*
  * We don't have to release all buffers here, but
  * we have to be sure that no dirty buffer is left
@@ -1313,24 +1363,43 @@
                 bh = next;
         } while (bh != head);
 
- /*
- * subtle. We release buffer-heads only if this is
- * the 'final' flushpage. We have invalidated the get_block
- * cached value unconditionally, so real IO is not
- * possible anymore.
- *
- * If the free doesn't work out, the buffers can be
- * left around - they just turn into anonymous buffers
- * instead.
- */
- if (!offset) {
- if (!try_to_free_buffers(page, 0)) {
- atomic_inc(&buffermem_pages);
- return 0;
- }
- }
-
         return 1;
+}
+
+/**
+ * block_destroy_buffers - Will destroy the contents of all the
+ * buffers in this page
+ * @page: page to examine the buffers
+ *
+ * This function destroy all the buffers in one page without making
+ * any IO. The function can block due to the fact that discad_bufferr
+ * can block.
+ */
+void block_destroy_buffers(struct page *page)
+{
+ struct buffer_head *bh, *head;
+
+ if (!PageLocked(page))
+ BUG();
+ if (!page->buffers)
+ return;
+
+ head = page->buffers;
+ bh = head;
+ do {
+ /* We need to get the next buffer from discard buffer
+ * because discard buffer can block and anybody else
+ * can change the buffer list under our feet.
+ */
+ bh = discard_buffer(bh);
+ }while (bh != head);
+
+ /* Wake up anyone waiting for buffer heads */
+ wake_up(&buffer_wait);
+
+ /* And free the page */
+ page->buffers = NULL;
+ page_cache_release(page);
 }
 
 static void create_empty_buffers(struct page *page, struct inode *inode, unsigned long blocksize)
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/fs/inode.c testing/fs/inode.c
--- base/fs/inode.c Tue May 30 01:21:47 2000
+++ testing/fs/inode.c Tue May 30 02:00:57 2000
@@ -322,7 +322,7 @@
 
                 inode = list_entry(inode_entry, struct inode, i_list);
                 if (inode->i_data.nrpages)
- truncate_inode_pages(&inode->i_data, 0);
+ truncate_all_inode_pages(&inode->i_data);
                 clear_inode(inode);
                 destroy_inode(inode);
         }
@@ -768,7 +768,7 @@
                                 spin_unlock(&inode_lock);
 
                                 if (inode->i_data.nrpages)
- truncate_inode_pages(&inode->i_data, 0);
+ truncate_all_inode_pages(&inode->i_data);
 
                                 destroy = 1;
                                 if (op && op->delete_inode) {
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/include/linux/fs.h testing/include/linux/fs.h
--- base/include/linux/fs.h Tue May 30 01:21:47 2000
+++ testing/include/linux/fs.h Tue May 30 02:03:21 2000
@@ -1104,6 +1104,7 @@
 
 /* Generic buffer handling for block filesystems.. */
 extern int block_flushpage(struct page *, unsigned long);
+extern void block_destroy_buffers(struct page *);
 extern int block_symlink(struct inode *, const char *, int);
 extern int block_write_full_page(struct page*, get_block_t*);
 extern int block_read_full_page(struct page*, get_block_t*);
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/include/linux/mm.h testing/include/linux/mm.h
--- base/include/linux/mm.h Tue May 30 01:21:47 2000
+++ testing/include/linux/mm.h Tue May 30 02:03:21 2000
@@ -457,6 +457,7 @@
 extern unsigned long page_unuse(struct page *);
 extern int shrink_mmap(int, int);
 extern void truncate_inode_pages(struct address_space *, loff_t);
+extern void truncate_all_inode_pages(struct address_space *);
 
 /* generic vm_area_ops exported for stackable file systems */
 extern int filemap_swapout(struct page * page, struct file *file);
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/mm/filemap.c testing/mm/filemap.c
--- base/mm/filemap.c Tue May 30 01:21:47 2000
+++ testing/mm/filemap.c Tue May 30 03:25:41 2000
@@ -90,10 +90,16 @@
 /*
  * Remove a page from the page cache and free it. Caller has to make
  * sure the page is locked and that nobody else uses it - or that usage
- * is safe.
+ * is safe. We need that the page don't have any buffers.
  */
 static inline void __remove_inode_page(struct page *page)
 {
+ if (!PageLocked(page))
+ PAGE_BUG(page);
+
+ if (page->buffers)
+ BUG();
+
         remove_page_from_inode_queue(page);
         remove_page_from_hash_queue(page);
         page->mapping = NULL;
@@ -101,9 +107,6 @@
 
 void remove_inode_page(struct page *page)
 {
- if (!PageLocked(page))
- PAGE_BUG(page);
-
         spin_lock(&pagecache_lock);
         __remove_inode_page(page);
         spin_unlock(&pagecache_lock);
@@ -116,14 +119,13 @@
  * This function only removes the unlocked pages, if you want to
  * remove all the pages of one inode, you must call truncate_inode_pages.
  */
-
 void invalidate_inode_pages(struct inode * inode)
 {
         struct list_head *head, *curr;
         struct page * page;
 
         head = &inode->i_mapping->pages;
-
+repeat:
         spin_lock(&pagecache_lock);
         spin_lock(&pagemap_lru_lock);
         curr = head->next;
@@ -135,20 +137,60 @@
                 /* We cannot invalidate a locked page */
                 if (TryLockPage(page))
                         continue;
-
- __lru_cache_del(page);
+ if (page->buffers) {
+ page_cache_get(page);
+ spin_unlock(&pagemap_lru_lock);
+ spin_unlock(&pagecache_lock);
+ block_destroy_buffers(page);
+ remove_inode_page(page);
+ lru_cache_del(page);
+ page_cache_release(page);
+ UnlockPage(page);
+ goto repeat;
+ }
                 __remove_inode_page(page);
+ __lru_cache_del(page);
                 UnlockPage(page);
                 page_cache_release(page);
         }
-
         spin_unlock(&pagemap_lru_lock);
         spin_unlock(&pagecache_lock);
 }
 
-/*
+static inline void truncate_partial_page(struct page *page, unsigned partial)
+{
+ memclear_highpage_flush(page, partial, PAGE_CACHE_SIZE-partial);
+
+ if (page->buffers)
+ block_flushpage(page, partial);
+
+}
+
+static inline void truncate_complete_page(struct page *page)
+{
+ if (page->buffers)
+ block_destroy_buffers(page);
+ lru_cache_del(page);
+
+ /*
+ * We remove the page from the page cache _after_ we have
+ * destroyed all buffer-cache references to it. Otherwise some
+ * other process might think this inode page is not in the
+ * page cache and creates a buffer-cache alias to it causing
+ * all sorts of fun problems ...
+ */
+ remove_inode_page(page);
+ page_cache_release(page);
+}
+
+/**
+ * truncate_inode_pages - truncate *all* the pages from an offset
+ * @mapping: mapping to truncate
+ * @lstart: offset from with to truncate
+ *
  * Truncate the page cache at a set offset, removing the pages
  * that are beyond that offset (and zeroing out partial pages).
+ * If any page is locked we wait for it to become unlocked.
  */
 void truncate_inode_pages(struct address_space * mapping, loff_t lstart)
 {
@@ -168,11 +210,10 @@
 
                 page = list_entry(curr, struct page, list);
                 curr = curr->next;
-
                 offset = page->index;
 
- /* page wholly truncated - free it */
- if (offset >= start) {
+ /* Is one of the pages to truncate? */
+ if ((offset >= start) || (partial && (offset + 1) == start)) {
                         if (TryLockPage(page)) {
                                 page_cache_get(page);
                                 spin_unlock(&pagecache_lock);
@@ -183,22 +224,14 @@
                         page_cache_get(page);
                         spin_unlock(&pagecache_lock);
 
- if (!page->buffers || block_flushpage(page, 0))
- lru_cache_del(page);
-
- /*
- * We remove the page from the page cache
- * _after_ we have destroyed all buffer-cache
- * references to it. Otherwise some other process
- * might think this inode page is not in the
- * page cache and creates a buffer-cache alias
- * to it causing all sorts of fun problems ...
- */
- remove_inode_page(page);
+ if (partial && (offset + 1) == start) {
+ truncate_partial_page(page, partial);
+ partial = 0;
+ } else
+ truncate_complete_page(page);
 
                         UnlockPage(page);
                         page_cache_release(page);
- page_cache_release(page);
 
                         /*
                          * We have done things without the pagecache lock,
@@ -209,38 +242,59 @@
                          */
                         goto repeat;
                 }
- /*
- * there is only one partial page possible.
- */
- if (!partial)
- continue;
+ }
+ spin_unlock(&pagecache_lock);
+}
 
- /* and it's the one preceeding the first wholly truncated page */
- if ((offset + 1) != start)
- continue;
+/**
+ * truncate_all_inode_pages - truncate *all* the pages
+ * @mapping: mapping to truncate
+ *
+ * Truncate all the inode pages. If any page is locked we wait for it
+ * to become unlocked. This function can block.
+ */
+void truncate_all_inode_pages(struct address_space * mapping)
+{
+ struct list_head *head, *curr;
+ struct page * page;
+
+ head = &mapping->pages;
+repeat:
+ spin_lock(&pagecache_lock);
+ spin_lock(&pagemap_lru_lock);
+ curr = head->next;
+
+ while (curr != head) {
+ page = list_entry(curr, struct page, list);
+ curr = curr->next;
 
- /* partial truncate, clear end of page */
                 if (TryLockPage(page)) {
+ page_cache_get(page);
+ spin_unlock(&pagemap_lru_lock);
                         spin_unlock(&pagecache_lock);
+ wait_on_page(page);
+ page_cache_release(page);
                         goto repeat;
                 }
- page_cache_get(page);
- spin_unlock(&pagecache_lock);
-
- memclear_highpage_flush(page, partial, PAGE_CACHE_SIZE-partial);
- if (page->buffers)
- block_flushpage(page, partial);
-
- partial = 0;
-
- /*
- * we have dropped the spinlock so we have to
- * restart.
- */
+ if (page->buffers) {
+ page_cache_get(page);
+ spin_unlock(&pagemap_lru_lock);
+ spin_unlock(&pagecache_lock);
+ block_destroy_buffers(page);
+ remove_inode_page(page);
+ lru_cache_del(page);
+ page_cache_release(page);
+ UnlockPage(page);
+ page_cache_release(page);
+ goto repeat;
+ }
+ __lru_cache_del(page);
+ __remove_inode_page(page);
                 UnlockPage(page);
                 page_cache_release(page);
- goto repeat;
         }
+
+ spin_unlock(&pagemap_lru_lock);
         spin_unlock(&pagecache_lock);
 }
 
diff -urN --exclude-from=/home/lfcia/quintela/work/kernel/exclude base/mm/swap_state.c testing/mm/swap_state.c
--- base/mm/swap_state.c Tue May 30 01:21:47 2000
+++ testing/mm/swap_state.c Tue May 30 02:00:57 2000
@@ -102,9 +102,10 @@
         if (!PageLocked(page))
                 BUG();
 
- if (block_flushpage(page, 0))
- lru_cache_del(page);
+ if (page->buffers)
+ block_destroy_buffers(page);
 
+ lru_cache_del(page);
         __delete_from_swap_cache(page);
         page_cache_release(page);
 }

-- 
In theory, practice and theory are the same, but in practice they 
are different -- Larry McVoy

- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.rutgers.edu Please read the FAQ at http://www.tux.org/lkml/



This archive was generated by hypermail 2b29 : Wed May 31 2000 - 21:00:23 EST