Re: [PCI Driver]Physical address being returned as 0 in mmap

From: newton mailinglist
Date: Sat Jul 02 2011 - 21:02:38 EST


ok yeah I think I am getting the hang of it now.

Well let me explain some more about what I am doing . This is going to
be a bit long as I need to explain my work a bit to explain what the
issue I am facing is. My research involves accelerating software
functions of a c program, in hardware(thus the fpga). Our platform is
unique because it allows the fpga to have direct access to the main
memory of its host computer.

The way I accelerate a function is I make the user mark it using a
#pragma. Then using a compiler pass I remove this function's code and
replace it by code to send the function parameters to the fpga,
execute it there and read the values back.

Say we have a program like the following :
__a t t r i b u t e__ ( ( us e r ( ” r e p l a c e ” ) ) )
int foo ( int a ){ int b ;
. . .
. . .
return b ;
}
int main ( void){
return foo ( 0 ) ;
}

Then when the program is compiled the function body is replaced by the
following code :

int foo ( int a ){
int b ;
htex_write ( a , 0 ) ;
htex_execute ( ) ;
b = htex_read ( 1 ) ;
return b ;
}

int main(void){
htex_open();
return foo(0);
}

The htex_write() invokes the device driver write function(you can see
molen_htex.c if you are interested, its attached). So this causes the
a parameter to be written to a memory mapped I/O region from where the
parameter goes to the fpga. The htex_execute() causes a IOCTL call
which actually sets a control register in the FPGA to signal it to
start executing(again using memory mapped I/O). At this point the
process it put to sleep and thus the c program is stalled at the
htex_execute() till woken up.

The htex_*() functions are just wrappers for invoking device driver
operations.(the device driver file is also attached). Once done, the
FPGA interrupts the OS and the driver handles the interrupt. It wakes
up the sleeping process. Then htex_read() reads back the parameter and
puts it in the variable b. The software function can now return. The
reads and write to the fpga actually causes data to be written to and
read from BRAM inside the fpga(we call it exchange registers, but that
is just mundane hardware details)

program -> wrapper functions -> device driver -> fpga

So as you can see , I have quite a bit of working code setup to read
and write to the fpga from software(compiler passes , device driver
code etc).
Recently I have been trying to send some configuration data to the fpga.

This involves calling the htex_set() function defined in molen_htx.c
The function opens the fpga configuration file(the bitstream) and
memory maps it in the fpga. It then sends the address of the memory
mapped region to the fpga via an ioctl call(ioctl calls are handled in
htex_ioctl() in the driver).

Once the fpga receives this address it sends an interrupt to the OS to
translate the address to a physical address which the fpga can then
send to the dma unit. This virtual to physical address translation is
done in the driver. So this address translation interrupt is different
from an execution completion interrupt and there is code in place in
the interrupt handler to distinguish these 2 types of interrupts
properly. The actual address translation happens in
translate_address() defined in the device driver file. The address
translation involves pinning the requested user page to the memory
using get_user_pages() and then getting the physical address using
pci_map_page(). This works because the calling process is asleep when
this happens.

Note that even while executing a C function which is accessing a large
amount of data from memory, the FPGA only gets the virtual address of
the array(in the exchange registers) and it requests a translation to
the physical address, which is sent to the dma unit. So I decided to
use the same process for sending the configuration data as well.

The entire process works at the moment but the only issue is the C
program is stalled when configuration data is being sent and I dont
want that.

So to improve the system what I did was not put the process to sleep
but to let it continue. Thus i want to continue receiving address
translation interrupts from the fpga and then send physical addresses
to it while the C program runs.

But the address translation immediately failed when I tried that. I
thought it was due to trying to access the pages of a running process
and pin them using get_user_pages() as the driver fails at this exact
call.
(you can see the translate_address() function in the driver file to
see what I mean).

All I need to do is send about 1.1 mb of data to the device and I dont
really need the user process to synchronize because I want to copy all
the data to a buffer from where the device will read it. Its not
streaming data. Its a one time only send.

Ok, so to finally come to the point I decided to implement the mmap()
function call in the driver and copy the user data to kernel space.
The I would let the process run. Doing this allows me to translate
addresses which are in kernel space to physical addresses without
bothering to pin user pages to memory which wil cause problems if the
calling process is running at the time.

This is how I modified the htex_set(), to copy all the data into
kernel space(I copy all the data using memcpy(), so you can see why I
was talking about memcpy() before) :

void molen_set(const char* filename)
{
int fd = open(filename, O_RDWR);
int length = 0;
char *arg, *bs_map;
struct stat buf;

if (fd < 0)
{
perror("molen_set : Filename Open");
return;
}
fstat(fd, &buf);
length = buf.st_size;
if(length == 0)
{
fprintf(stderr, "\nmolen_set : Empty input file\n");
return;
}

//Map bitstream file to memory to read it in via page faults(faster
than explicit file I/O)
bs_map = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, fd, 0);
printf("molen_set : Bitstream mapped to = %p\n", bs_map);
printf("molen_set : Bistream size = %d\n", length);

//Get user space access to empty kernel memory of length bytes
arg = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, htex_handle, 0);
printf("molen_set : Kernel buffer user space address = %p\n", arg);

//Copy bs data to kernel space buffer : this is when the page faults
actually occur ?
memset(arg, 0, length);
memcpy(arg, bs_map, length);


printf("molen_set : ioctl called\n");
ioctl(htex_handle, HTEX_IOSET, arg);
printf("molen_set : ioctl returned\n");

munmap(arg, length);
munmap(bs_map, length);
}

The mmap() call to the device driver lands in the function htex_mmap()
defined in htex_driver.c which is where I want to do the dma setup as
I am already using htex_read() and htex_write() for function
parameters are explained before.

So from our dicussion so far I think what might work is if I read in
the user file and copy it to a DMA buffer and then send the address of
this buffer to the fpga. To copy the data I was using memcpy() inside
the modified htex_set()...see above. I can perhaps continue to copy
the user data to the dma buffer using memcpy().

Hopefully once copied, I can let the C program proceed, and the DMA
api will give me a physical address which I can send to the fpga to
read the buffer from memory.

Thanks,
Abhi


On Sat, Jul 2, 2011 at 6:24 PM, Kyle Moffett <kyle@xxxxxxxxxxxxxxx> wrote:
> Please reply inline, top-posting is unwelcome on LKML.
>
> On Sat, Jul 2, 2011 at 11:27, newton mailinglist
> <newtonmailinglist@xxxxxxxxx> wrote:
>> ok thanks Kyle, I will try the DMA API now.
>>
>>  I was wondering, when my driver gets the mmap() call then has the
>>  kernel already allocated memory and put its details in the vma
>>  parameter or do I need to allocate memory myself ?
>
> Hmm, I'm not exactly clear what you are asking here... I would need
> to see your driver code to be able to give you better advice, but let
> me give it a shot anyways:
>
> The "vma" is a vm_area_struct which describes the user side of the
> memory-map, and in your "mmap()" call you essentially populate that
> "vma" with information about the physical CPU memory addresses.
> Typically that means memory allocated with dma_alloc_coherent(),
> and I think the DMA API docs have some example code for this.
>
>
>> I am already using the driver read and write functions to read/write
>> some other parameters to the FPGA, so I will setup the DMA in mmap()
>
> The reason I suggest read()/write() is because you are talking about
> using mmap() and then doing memcpy(), whereas if you implement
> your read()/write() handlers properly then you might not need to do
> a memcpy() at all.
>
> If you're just using read()/write() for parameter data, I would switch
> those to an ioctl() call and use read()/write() for your bulk data transfer.
> Even if you elect to use mmap() the way you have described, you
> will still need to implement some ioctl() calls to allow your userspace
> program to trigger the appropriate dma_sync_*() API calls on the
> memory when communicating with the hardware.
>
> Ideally, for streaming large amounts of data out of a device, you
> would call read() from the device onto a large multiple-page buffer
> in your userspace program.  The userspace addresses get passed
> down into your kernel driver, and you use the appropriate APIs to
> get the page structs for those addresses and directly DMA-map
> those pages.
>
> That means that in the best case your device will read directly into
> user memory.  If the user memory is not accessible to your device
> then the kernel will use the "swiotlb" driver to bounce-buffer it for
> you.
>
> Cheers,
> Kyle Moffett
>
#include <stdint.h>
#include <fcntl.h>
#include <unistd.h>
#include <stdio.h>
#include <sys/ioctl.h>
#include <sys/stat.h>
#include <sys/mman.h>
#include "htex_fcntl.h"
#include <string.h>

int htex_handle = 0;

int molen_htx_init(void)
{
int handle = 0;
if(htex_handle)
return 0;
handle = open("/dev/htex",O_RDWR);
if(handle < 0)
{
perror("molen_htx_init : open");
return 1;
}
htex_handle = handle;
return 0;
}

void molen_write(unsigned long arg, unsigned long index)
{
lseek(htex_handle, index*sizeof(unsigned long), SEEK_SET);
write(htex_handle, (void *)&arg, sizeof(unsigned long));
}

unsigned long molen_read(unsigned long index)
{
unsigned long arg;
lseek(htex_handle, index*sizeof(unsigned long), SEEK_SET);
read(htex_handle, (void *)&arg, sizeof(unsigned long));
return arg;
}

void molen_execute(void)
{
ioctl(htex_handle, HTEX_IOEXECUTE);
}

void molen_set(const char* filename)
{
int fd = open(filename, O_RDWR);
int length = 0;
char *arg, *bs_map;
struct stat buf;

if (fd < 0)
{
perror("molen_set : Filename Open");
return;
}
fstat(fd, &buf);
length = buf.st_size;
if(length == 0)
{
fprintf(stderr, "\nmolen_set : Empty input file\n");
return;
}

//Map bitstream file to memory to read it in via page faults(faster than explicit file I/O)
bs_map = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, fd, 0);
printf("molen_set : Bitstream mapped to = %p\n", bs_map);
printf("molen_set : Bistream size = %d\n", length);

//Get user space access to empty kernel memory of length bytes
arg = mmap(NULL, length, PROT_WRITE | PROT_READ, MAP_PRIVATE, htex_handle, 0);
printf("molen_set : Kernel buffer user space address = %p\n", arg);

//Copy bs data to kernel space buffer : this is when the page faults actually occur ?
memset(arg, 0, length);
memcpy(arg, bs_map, length);


printf("molen_set : ioctl called\n");
ioctl(htex_handle, HTEX_IOSET, arg);
printf("molen_set : ioctl returned\n");

munmap(arg, length);
munmap(bs_map, length);
}

char* molen_elfset(char* start)
{
printf("molen_elfset : Bitstream = %p\n", start);


printf("molen_elfset : ioctl called\n");
ioctl(htex_handle, HTEX_IOSET, start);
printf("molen_elfset : ioctl returned\n");
//ioctl(htex_handle, HTEX_IORESET);
//printf("molen_elfset : ioctl reset issued\n");
start = NULL;

return start;

}

void molen_reset(void)
{
ioctl(htex_handle, HTEX_IORESET);
}

void molen_close(void)
{
close(htex_handle);
htex_handle = 0;
}


#include <linux/init.h>
#include <linux/module.h>
#include <linux/pci.h>
//#include <linux/ioport.h>
//#include <asm/io.h>
#include <linux/interrupt.h>
//#include <linux/cdev.h>
//#include <linux/types.h>
//#include <linux/fs.h>
//#include <asm/uaccess.h>
//#include <linux/slab.h>
//#include <linux/ioctl.h>
//#include <linux/wait.h>
//#include <linux/mm.h>
#include <linux/pagemap.h>
//#include <asm/atomic.h>
#include <linux/rwsem.h>
#include "htex_fcntl.h"
#include "htex_driver.h"


//MODULE_LICENSE("GPLv2");

#define DEBUG

#ifdef DEBUG
#define DEBUG_MSG(...) printk( KERN_INFO "htex: " __VA_ARGS__)
#else
#define DEBUG_MSG(a,...)
#endif

#define ERROR_MSG(...) printk( KERN_WARNING "htex: error, " __VA_ARGS__)

static wait_queue_head_t wait_queue;


static atomic_t available = ATOMIC_INIT(1);

static struct pci_device_id ids[] ={
{ PCI_DEVICE(0x0007,0x0009)},
{0,},
};

static struct pci_driver pci_driver = {
.name = "htex_driver",
.id_table = ids,
.probe = probe,
.remove = remove,
};

static struct file_operations htex_fops = {
.owner = THIS_MODULE,
.llseek = htex_llseek,
.read = htex_read,
.write = htex_write,
.open = htex_open,
.release = htex_release,
.ioctl = htex_ioctl,
.mmap = htex_mmap
};

unsigned long j4,j2, j3;

MODULE_DEVICE_TABLE(pci, ids);

/**
register htex_driver with kernel
*/
static int __init test_init(void)
{
DEBUG_MSG("init\n");
//register pci driver
return pci_register_driver(&pci_driver);
}

/**
* unregister pci driver from kernel
*/
static void __exit test_exit(void)
{
DEBUG_MSG("exit\n");
//unregister driver
pci_unregister_driver(&pci_driver);
}

/**
* handle interupt from device
*/
irqreturn_t interrupt_handler( int irq, void *dev_id)
{
struct htex_dev_t *htex_dev = (struct htex_dev_t *) dev_id;
//long arg = ((long *)htex_dev->bar2->bar)[1];
int status = htex_dev->bar2->bar[0];
//DEBUG_MSG("addr = %lx\n", arg);

htex_dev->irq_count++;
rdtscll(j4);
//htex_dev->wq_count++;
//in the future maybe this will be needed.
DEBUG_MSG("Interrupt received count = %d.\n", htex_dev->irq_count);
// wake_up_interruptible(&wait_queue);

if(status == 1)
{
DEBUG_MSG("CCU execution done\n");
}
else if (status == 2)
DEBUG_MSG("Address interrupt\n");
else if(status == 4)
DEBUG_MSG("SET interrupt\n");
else
DEBUG_MSG("unknonw interrupt %d\n", status);
schedule_work(&htex_dev->ws);

return IRQ_HANDLED;
}

void print_tlb_entry(int index, struct htex_dev_t *htex_dev)
{
unsigned int *addr = (unsigned int *)htex_dev->bar4->bar;
DEBUG_MSG("TLB Entry = %8x%8x%8x\n", ioread32(addr + index*4), ioread32(addr + index*4+1), ioread32(addr + index*4+2));
}

void htex_interrupt_tasklet(struct work_struct *work)
{
struct htex_dev_t *htex_dev = container_of(work, struct htex_dev_t, ws);
int status = 0;
status = htex_dev->bar2->bar[0];
//signal that interrupt is handled
rdtscll(j3);
iowrite32(HTEX_IRQ_HANDLED_C, htex_dev->bar2->bar);
if(status == 1)
{
DEBUG_MSG("Execution done\n");
htex_dev->done = 1;
wake_up_interruptible(&wait_queue);
}
else if(status == 2)
{
long arg = ((long *)htex_dev->bar2->bar)[1];
DEBUG_MSG("Address translation request\n");
DEBUG_MSG("Address = %lx\n", arg);
if (!translate_address(arg, htex_dev))
{
DEBUG_MSG("Address Translation of the address %lx has failed !!\n", arg);
return;
}
else
{
//do this for now until the tlb is fixed
//iowrite32((unsigned int)htex_dev->entries[virt_to_index(arg)].hw_addr, ((long *)htex_dev->bar2->bar) + 1);
//DEBUG_MSG("xxxx hw addr = %lx\n", (unsigned long)htex_dev->entries[virt_to_index(arg)].hw_addr);
}
}
else if(status == 4)
{
DEBUG_MSG("Configuration Complete\n");
htex_dev->set_done = 1;
//wake_up_interruptible(&wait_queue);
}
else
{
DEBUG_MSG("unknown interrupt: %d\n", status);
}
rdtscll(j2);
DEBUG_MSG("%ld cycles %ld cycles\n", j3-j4, j2-j3);
}

/**
* request memory region bar and remaps the region.
* returns 0 on error or the address a struct containing information on the remapped region.
*/
bar_t *get_bar(struct pci_dev *dev, int bar)
{
unsigned long int begin = (unsigned long int)pci_resource_start(dev, bar);
unsigned long int size = (unsigned long int)pci_resource_len(dev, bar);
char *bar_map = 0;
struct resource *io_mem;
bar_t *bar_st = kzalloc(sizeof(bar_t), GFP_KERNEL);
if(bar_st == 0)
{
ERROR_MSG("could not allocate bar struct\n");
goto err_alloc;
}
io_mem = request_mem_region(begin, size, "htex");
if (io_mem == 0)
{
ERROR_MSG("bar %d unavailable\n", bar);
goto err_req;
}
//remap the bar address space so it can be accessed
bar_map = (char *)ioremap_nocache(begin, size);
if(bar_map == 0)
{
ERROR_MSG("could not map bar %d\n", bar);
goto err_map;
}
bar_st->begin = begin;
bar_st->size = size;
bar_st->bar = bar_map;
return bar_st;

err_map:
release_mem_region(begin, size);
err_req:
kfree(bar_st);
err_alloc:
return 0;

}

/**
*unmaps the memory region mapped in bar and frees the allocated memory of the bar struct.
*/
void unget_bar(bar_t *bar)
{
iounmap(bar->bar);
release_mem_region(bar->begin, bar->size);
kfree(bar);
}

/**
* this function initializes the pci device and makes it ready to be used.
* it maps the bar address space and allocates a dma buffer that the device can access.
*/
int __devinit probe(struct pci_dev *dev, const struct pci_device_id *id)
{
int result;
struct htex_dev_t *htex_dev;


DEBUG_MSG("probe\n");

init_waitqueue_head(&wait_queue);

//allocate the structure to store driver data
htex_dev = kzalloc(sizeof (struct htex_dev_t), GFP_KERNEL);
if(!htex_dev)
goto err_out;

//since kzalloc is used everything is already set to 0
htex_dev->irq_count = 0;
htex_dev->dev = dev;
htex_dev->entries = 0;
htex_dev->tlb_size = 0;
//htex_dev->wq_count = 0;

result = pci_enable_device(dev);
if(result != 0)
goto err_enable_device;

htex_dev->bar0 = get_bar(dev, 0);
if(htex_dev->bar0 == 0)
{
goto err_get_bar0;
}

htex_dev->bar2 = get_bar(dev, 2);
if(htex_dev->bar2 == 0)
{
goto err_get_bar2;
}

htex_dev->bar4 = get_bar(dev, 4);
if(htex_dev->bar4 == 0)
{
goto err_get_bar4;
}
htex_dev->tlb_size = htex_dev->bar4->size/16;

//create irq number for htx device
htex_dev->irq = ht_create_irq(dev, 0);
if (htex_dev->irq < 0)
{
ERROR_MSG("create irq\n");
goto err_create_irq;
}
//register isr for irq number
result = request_irq(htex_dev->irq, interrupt_handler, 0, "htex", htex_dev);
if (result)
{
ERROR_MSG("request irq\n");
goto err_request_irq;
}

//checks and sets device addressable map
if (!pci_set_consistent_dma_mask(dev, DMA_64BIT_MASK))
{
DEBUG_MSG("using 64bit DMA addressing\n");
}
else if (!pci_set_consistent_dma_mask(dev, DMA_32BIT_MASK))
{
DEBUG_MSG("using 32bit DMA addressing\n");
}
else
{
ERROR_MSG("no suitable DMA available\n");
goto err_set_consistent;
}


result = alloc_chrdev_region(&htex_dev->devno, 0, 1, "htex");
if(result < 0)
{
ERROR_MSG("cannot get major nr\n");
goto err_get_major;
}
//initialize fields in structure
cdev_init(&htex_dev->cdev, &htex_fops);
htex_dev->cdev.owner = THIS_MODULE;

//store the structure so it can be accessed later
pci_set_drvdata(dev, htex_dev);

INIT_WORK(&htex_dev->ws, htex_interrupt_tasklet);
clear_tlb(htex_dev);
//finally add char device so it can be used
result = cdev_add(&htex_dev->cdev, htex_dev->devno, 1);
if (result)
{
ERROR_MSG("adding char device\n");
goto err_cdev_add;
}
return 0;
//if there was an error undo all the steps that had already completed before returning
err_cdev_add:
unregister_chrdev_region(htex_dev->devno, 1);

err_get_major:

err_set_consistent:
free_irq(htex_dev->irq, dev);

err_request_irq:
ht_destroy_irq(htex_dev->irq);

err_create_irq:
unget_bar(htex_dev->bar4);

err_get_bar4:
unget_bar(htex_dev->bar2);
err_get_bar2:
unget_bar(htex_dev->bar0);
err_get_bar0:
pci_disable_device(dev);

err_enable_device:
kfree(htex_dev);

err_out:
return -ENODEV;
}

/**
* undo everything that was done in htex_probe
*/
void __devexit remove(struct pci_dev *dev)
{
struct htex_dev_t *htex_dev = pci_get_drvdata(dev);
DEBUG_MSG("remove\n");
cdev_del(&htex_dev->cdev);
unregister_chrdev_region(htex_dev->devno, 1);
unget_bar(htex_dev->bar4);
unget_bar(htex_dev->bar2);
unget_bar(htex_dev->bar0);
free_irq(htex_dev->irq, htex_dev);
ht_destroy_irq(htex_dev->irq);
pci_disable_device(dev);
kfree(htex_dev);
}

/**
* Handle mmap of the htex example device : added by abhi
*/
void htex_vma_open(struct vm_area_struct *vma)
{
printk(KERN_NOTICE "htex VMA open, virt %lx, phys %lx\n",
vma->vm_start, vma->vm_pgoff << PAGE_SHIFT);
}

void htex_vma_close(struct vm_area_struct *vma)
{
printk(KERN_NOTICE "htex VMA close.\n");
}

static struct vm_operations_struct htex_vm_ops = {
.open = htex_vma_open,
.close = htex_vma_close,
};

static int htex_mmap(struct file * file, struct vm_area_struct * vma)
{
DEBUG_MSG("htex_mmap called start:%lx, end:%lx, size:%lx, pfn:%lx, pa:%lx\n",
vma->vm_start, vma->vm_end, vma->vm_end - vma->vm_start,
vma->vm_pgoff, virt_to_phys(vma->vm_start));

struct page *pg = pfn_to_page(vma->vm_pgoff);

DEBUG_MSG("htex_mmap page: pg : %lx\n", page_address(pg));
vma->vm_flags |= VM_RESERVED;

if (remap_pfn_range(vma, vma->vm_start, vma->vm_pgoff,
vma->vm_end - vma->vm_start,
vma->vm_page_prot))
return -EAGAIN;

vma->vm_ops = &htex_vm_ops;
htex_vma_open(vma);

return 0;
}


/**
* Handle open of the htex example device.
*/
static int htex_open(struct inode *inode, struct file *filp)
{
int minor_nr;
//get struct with driver info
struct htex_dev_t *htex_dev = container_of(inode->i_cdev, struct htex_dev_t, cdev);
minor_nr = iminor(inode);

DEBUG_MSG("open\n");
if (minor_nr != 0)
{
ERROR_MSG("Note that only minor number 0 is valid!\n");
return -ENODEV;
}
//check if device is already open
if (! atomic_dec_and_test(&available))
{
atomic_inc(&available);
ERROR_MSG("device already opened\n");
return -EBUSY;
}
//store struct for open and close
filp->private_data = htex_dev;
htex_dev->tsk = current;
htex_dev->entries = kzalloc(htex_dev->tlb_size * sizeof(struct tlb_entry), GFP_KERNEL);
if(!htex_dev->entries)
{
ERROR_MSG("could not allocate pages array\n");
return -ENODEV;
}
return 0;
}

/**
* Handle close of the htex example device.
*/
static int htex_release(struct inode *inode,struct file *filp)
{
int minor_nr = iminor(inode);
struct htex_dev_t *htex_dev = container_of(inode->i_cdev, struct htex_dev_t, cdev);
if (minor_nr!=0)
{
ERROR_MSG("only minor 0 is valid\n");
return -ENODEV;
}
kfree(htex_dev->entries);
DEBUG_MSG("close\n");
atomic_inc(&available);
return 0;
}

static loff_t htex_llseek(struct file *filp, loff_t off, int whence)
{
struct htex_dev_t *htex_dev = filp->private_data;
loff_t newpos;

switch(whence)
{
case 0:
newpos = off;
break;

case 1:
newpos = filp->f_pos + off;
break;

case 2:
newpos = htex_dev->bar0->size + off;
break;

default:
return -EINVAL;
}
if (newpos < 0) return -EINVAL;
if (newpos >= htex_dev->bar0->size) return -EINVAL;
filp->f_pos = newpos;
return newpos;
}

/**
* Read reads data from the bar0 address space of the htex device
*/
static ssize_t htex_read(struct file *filp, char *buf, size_t count, loff_t *f_pos)
{
struct htex_dev_t *htex_dev = filp->private_data;
ssize_t retval = 0;
DEBUG_MSG("read\n");
//if f_pos is grater than the size of buffer we cannot read from it
if (*f_pos >= htex_dev->bar0->size)
goto out;
//make sure that if we read count bytes we dont go over the size of the buffer
if(*f_pos + count > htex_dev->bar0->size)
count = htex_dev->bar0->size - *f_pos;
//copy data over to user and check for error
if(copy_to_user(buf, htex_dev->bar0->bar + (long)*f_pos, count))
{
retval = -EFAULT;
ERROR_MSG("read\n");
goto out;
}
//update f_pos and return number of bytes read
*f_pos += count;
retval = count;
out:
return retval;
}

/**
* Write writes to the bar0 address space of the htex device ignoring f_pos
*/
static ssize_t htex_write(struct file *filp, const char *buf, size_t count, loff_t *f_pos)
{
struct htex_dev_t *htex_dev = filp->private_data;
ssize_t retval = 0;
DEBUG_MSG("write\n");
//if f_pos is grater than the size of buffer we cannot write to it
if (*f_pos >= htex_dev->bar0->size)
goto out;
//make sure that if we write count bytes we dont go over the size of the buffer
if(*f_pos + count > htex_dev->bar0->size)
count = htex_dev->bar0->size - *f_pos;
//copy data over from user and check for error
if(copy_from_user(htex_dev->bar0->bar+(long)*f_pos, buf, count))
{
retval = -EFAULT;
ERROR_MSG("write\n");
goto out;
}
//update f_pos and return number of bytes written
*f_pos += count;
retval = count;
out:
return retval;
}

/**
* unmap and release the page stored at index in the array of tlb entries
*/
void release_page(struct htex_dev_t *htex_dev, unsigned int index)
{
pci_unmap_page(htex_dev->dev, htex_dev->entries[index].hw_addr, PAGE_SIZE, DMA_BIDIRECTIONAL);
//SetPageDirty(htex_dev->entries[index].page);
page_cache_release(htex_dev->entries[index].page);
htex_dev->entries[index].page = 0;
//set valid flag to 0 on device
iowrite32(0, ((unsigned int *)htex_dev->bar4->bar)+index*4 + 2);
}

void release_all_pages(struct htex_dev_t *htex_dev)
{
int i = 0;
DEBUG_MSG("Released all pages\n");
for(i = 0; i < htex_dev->tlb_size; i++)
{
if(htex_dev->entries[i].page)
{
release_page(htex_dev, i);
}
}
}

/**
* constructs the tlb entry (tag hwaddress and flags) and writes it to the device
*/
void update_htx_tlb(unsigned long int index, unsigned long int tag, unsigned long int hw_addr, struct htex_dev_t *htex_dev)
{
unsigned long int pfn;
unsigned int entry0 = 0, entry1 = 0, entry2 = 0;
//write tlb entry to htex_device
pfn = (hw_addr >> PAGE_SHIFT) & 0xFFFFFFFF;

entry0 = (unsigned int)(pfn & 0x00000000ffffffff);
entry1 = (unsigned int)((pfn | (tag << 40)) >> 32);
entry2 = (unsigned int)(tag >> 24) | ((long)1 << 31);

iowrite32(entry0, ((unsigned int *)htex_dev->bar4->bar)+index*4);
iowrite32(entry1, ((unsigned int *)htex_dev->bar4->bar)+index*4 + 1);
iowrite32(entry2, ((unsigned int *)htex_dev->bar4->bar)+index*4 + 2);
return;
}

/**
* write 0 to every tlb entry
*/
void clear_tlb(struct htex_dev_t *htex_dev)
{
int i = 0;
for(i = 0; i < htex_dev->tlb_size; i++)
{
iowrite32(0, ((unsigned int *)htex_dev->bar4->bar)+i*4);
iowrite32(0, ((unsigned int *)htex_dev->bar4->bar)+i*4 + 1);
iowrite32(0, ((unsigned int *)htex_dev->bar4->bar)+i*4 + 2);
}
}
void htx_dump_tlb(struct htex_dev_t *htex_dev)
{
int i;
for (i = 0; i < htex_dev->bar4->size/16; i++)
{
print_tlb_entry(i, htex_dev);
}
}

/**
*get the page belonging to address, lock it in memory and get the bus address of the page
*return 0 on failure and 1 on success
*/
int translate_address(unsigned long int virt_addr, struct htex_dev_t *htex_dev)
{
int result;
unsigned long int translated;
unsigned long int index = 0;
unsigned long int tag = 0;
struct page *page;

index = virt_to_index(virt_addr);
tag = virt_to_tag(virt_addr);

DEBUG_MSG("translate_address: virt_addr = %lx, index=%lx, tag=%lx\n", virt_addr, index, tag);

//check if index already has a valid entry and if so
//release this entry before replacing it
if(htex_dev->entries[index].page)
release_page(htex_dev, index);


//init_rwsem(&sem);
if(down_read_trylock(&htex_dev->tsk->mm->mmap_sem))
DEBUG_MSG("translate_address: lock was granted!\n");
else{
DEBUG_MSG("translate_address: A Lock was not granted, skipping address translation\n");
goto cleanup;
}

//get page
/* result = get_user_pages(htex_dev->tsk, htex_dev->tsk->mm, virt_addr, 1, 0, 0, &page, NULL);
DEBUG_MSG("translate_address: result = %d\n", result);
if (result <= 0)
{
ERROR_MSG("translate_address: Unable to get page\n");
//htx_dump_tlb(htex_dev);
return 0;
}*/

//translated = virt_to_phys((volatile void *)virt_addr);
page = vmalloc_to_page((void*)virt_addr);
//get bus address of page
translated = pci_map_page(htex_dev->dev, page, 0, PAGE_SIZE, DMA_BIDIRECTIONAL);

DEBUG_MSG("translate_address: Translated Address: %lx\n", translated);

update_htx_tlb(index, tag, translated, htex_dev);
//update entry in array
htex_dev->entries[index].page = page;
htex_dev->entries[index].hw_addr = translated;

cleanup:
up_read(&htex_dev->tsk->mm->mmap_sem);
return 1;
}

/**
*Translates the addresses for count pages starting at start and writes them to the device
*returns the number of pages mapped and translated
*/
int translate_range(unsigned long int start, unsigned int count, struct htex_dev_t *htex_dev)
{
int i, map_count = 0;
unsigned long int virt_addr = start;

for(i=0; i < count; i++)
{
if(!translate_address(virt_addr, htex_dev))
{
//try to translate next address in range
virt_addr += PAGE_SIZE;
continue;
}
map_count++;
virt_addr += PAGE_SIZE;
}
return map_count;
}

void prefill_tlb(struct htex_dev_t *htex_dev)
{
int heap_count, stack_count;
unsigned long start_heap, start_stack, end_heap;
//struct vm_area_struct *vma_stack,*vma_heap;
struct mm_struct *mm = htex_dev->tsk->mm;
DEBUG_MSG("Prefilling tlb with stack and heap addresses\n");
/*vma_stack = find_vma(mm,mm->start_stack);
vma_heap = find_vma(mm,mm->start_brk);*/
start_heap = mm->start_brk;
end_heap = mm->brk;
start_stack = mm->start_stack;
stack_count = mm->stack_vm;

//prefill stack
start_stack = start_stack & PAGE_MASK;
start_stack = start_stack - ((stack_count-1)*PAGE_SIZE);
translate_range(start_stack, stack_count, htex_dev);

//prefill heap
heap_count = (end_heap - start_heap) >> PAGE_SHIFT;
translate_range(start_heap, heap_count, htex_dev);
}


/**
* compare the tlb before and after execution
*/
void compare_tlb(struct htex_dev_t *htex_dev)
{
int i = 0;
struct tlb_entry *entries = htex_dev->entries;
htex_dev->entries = kzalloc(htex_dev->tlb_size * sizeof(struct tlb_entry), GFP_KERNEL);
prefill_tlb(htex_dev);
for(i = 0; i < htex_dev->tlb_size; i++)
{
if(entries[i].page && (entries[i].page != htex_dev->entries[i].page || entries[i].hw_addr != htex_dev->entries[i].hw_addr))
{
ERROR_MSG("tlb not the same\t");
ERROR_MSG("index = %d\n", i);
ERROR_MSG("hw addr1 = %lx\n", (unsigned long)entries[i].hw_addr);
ERROR_MSG("hw addr2 = %lx\n", (unsigned long)htex_dev->entries[i].hw_addr);
release_page(htex_dev, i);
}
}
kfree(htex_dev->entries);
htex_dev->entries = entries;
}

/**
* this ioctl is used to issue the execute, set and optionally reset instrcutions
*/
static int htex_ioctl(struct inode *inode, struct file *filp, unsigned int cmd, unsigned long arg)
{ long status;
struct htex_dev_t *htex_dev = filp->private_data;
int result;
u32 temp;

DEBUG_MSG("HTEX_IORESET : %d\n",HTEX_IORESET);
DEBUG_MSG("HTEX_IOEXECUTE : %d\n",HTEX_IOEXECUTE);
DEBUG_MSG("HTEX_IOSET : %ld\n",HTEX_IOSET);
DEBUG_MSG("IOCTL issued : %d\n",cmd);
if((_IOC_TYPE(cmd) != HTEX_IOC_MAGIC) || (_IOC_NR(cmd) > HTEX_IOC_MAXNR))

return -ENOTTY;

switch (cmd)
{
case HTEX_IORESET:
iowrite32(HTEX_RESET_C, htex_dev->bar2->bar);
// iowrite32(HTEX_IRQ_HANDLED_C, htex_dev->bar2->bar);
break;
case HTEX_IOEXECUTE:
//reset the tlb and ccu
iowrite32(HTEX_RESET_C, htex_dev->bar2->bar);
//prefill_tlb(htex_dev);

DEBUG_MSG("Waiting for reconfiguration to complete.\n");
result = wait_event_interruptible(wait_queue, htex_dev->set_done != 0); //wait till prefetching over
//release_all_pages(htex_dev);

DEBUG_MSG("Starting CCU EXECUTION\n");
iowrite32(HTEX_EXECUTE_C, htex_dev->bar2->bar);
htex_dev->done = 0;
result = wait_event_interruptible(wait_queue, htex_dev->done != 0);
DEBUG_MSG("TLB state %d\n", ioread32(htex_dev->bar2->bar+16));

status = ioread32(htex_dev->bar2->bar+0);
DEBUG_MSG("IOEXECUTE DD 0 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+4);
DEBUG_MSG("IOEXECUTE DD 1 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+8);
DEBUG_MSG("IOEXECUTE DD 2 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+12);
DEBUG_MSG("IOEXECUTE DD 3 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+16);
DEBUG_MSG("IOEXECUTE DD 4 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+20);
DEBUG_MSG("IOEXECUTE DD 5 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+24);
DEBUG_MSG("IOEXECUTE DD 6 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+28);
DEBUG_MSG("IOEXECUTE DD 7 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+32);
DEBUG_MSG("IOEXECUTE DD 8 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+36);
DEBUG_MSG("IOEXECUTE DD 9 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+40);
DEBUG_MSG("IOEXECUTE DD 10 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+44);
DEBUG_MSG("IOEXECUTE DD 11 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+48);
DEBUG_MSG("IOEXECUTE DD 12 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+52);
DEBUG_MSG("IOEXECUTE DD 13 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+56);
DEBUG_MSG("IOEXECUTE DD 14 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+60);
DEBUG_MSG("IOEXECUTE DD 15 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+64);

//htx_dump_tlb(htex_dev);
release_all_pages(htex_dev);
htex_dev->done = 0;
if (result != 0)
return -ERESTARTSYS;
break;
case HTEX_IOSET: //Modified for prefetch
//reset the tlb and ccu
iowrite32(HTEX_RESET_C, htex_dev->bar2->bar);
//iowrite32(arg, htex_dev->bar2->bar+8);
*(((u64 *)htex_dev->bar2->bar) + 1) = arg;
//prefill_tlb(htex_dev);
temp = ioread32(htex_dev->bar2->bar+16);

//Verify bs data
DEBUG_MSG("B.S. data = %lx\n",*((unsigned long *)arg));
DEBUG_MSG("B.S. data = %lx\n",*((unsigned long *)arg+8));

DEBUG_MSG("icap before state = %d, busy = %d\n",(temp >> 3) & 0x7, (temp >> 6)&0x1);
DEBUG_MSG("icap before data = %x\n",ioread32(htex_dev->bar2->bar+24));

DEBUG_MSG("IN SET\n");
iowrite32(HTEX_SET_C, htex_dev->bar2->bar);

//result = wait_event_interruptible(wait_queue, htex_dev->done != 0); //removed to allow c program to continue
temp = ioread32(htex_dev->bar2->bar+16);

DEBUG_MSG("icap after state = %d, busy = %d\n",(temp >> 3) & 0x7, (temp >> 6)&0x1);
DEBUG_MSG("icap after data = %x\n",ioread32(htex_dev->bar2->bar+24));

status = ioread32(htex_dev->bar2->bar+0);
DEBUG_MSG("IOSET DD 0 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+4);
DEBUG_MSG("IOSET DD 1 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+8);
DEBUG_MSG("IOSET DD 2 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+12);
DEBUG_MSG("IOSET DD 3 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+16);
DEBUG_MSG("IOSET DD 4 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+20);
DEBUG_MSG("IOSET DD 5 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+24);
DEBUG_MSG("IOSET DD 6 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+28);
DEBUG_MSG("IOSET DD 7 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+32);
DEBUG_MSG("IOSET DD 8 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+36);
DEBUG_MSG("IOSET DD 9 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+40);
DEBUG_MSG("IOSET DD 10 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+44);
DEBUG_MSG("IOSET DD 11 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+48);
DEBUG_MSG("IOSET DD 12 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+52);
DEBUG_MSG("IOSET DD 13 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+56);
DEBUG_MSG("IOSET DD 14 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+60);
DEBUG_MSG("IOSET DD 15 = %.8lx\n", status);
status = ioread32(htex_dev->bar2->bar+64);

//release_all_pages(htex_dev); //moved to execute case, so configuration pages are released
htex_dev->set_done = 0;
/*if (result != 0)
return -ERESTARTSYS;*/
break;
default:
return -ENOTTY;
}
return 0;
}

module_init(test_init);
module_exit(test_exit);