Re: [RFC PATCH V2 1/4] pci: APM X-Gene PCIe controller driver

From: Tanmay Inamdar
Date: Thu Jan 16 2014 - 20:10:58 EST


On Wed, Jan 15, 2014 at 4:39 AM, Arnd Bergmann <arnd@xxxxxxxx> wrote:
> On Wednesday 15 January 2014, Tanmay Inamdar wrote:
>> This patch adds the AppliedMicro X-Gene SOC PCIe controller driver.
>> X-Gene PCIe controller supports maxmum upto 8 lanes and GEN3 speed.
>> X-Gene has maximum 5 PCIe ports supported.
>>
>> Signed-off-by: Tanmay Inamdar <tinamdar@xxxxxxx>
>
> This already looks much better than the first version, but I have a more
> comments. Most importantly, it would help to know how the root ports
> are structured. Is this a standard root complex and multiple ports,
> multiple root complexes with one port each, or a nonstandard organization
> that is a mix of those two models?

This is multiple root complexes with one port each.

>
>> +
>> +/* When the address bit [17:16] is 2'b01, the Configuration access will be
>> + * treated as Type 1 and it will be forwarded to external PCIe device.
>> + */
>> +static void __iomem *xgene_pcie_get_cfg_base(struct pci_bus *bus)
>> +{
>> + struct xgene_pcie_port *port = xgene_pcie_bus_to_port(bus);
>> + u64 addr = (u64)port->cfg_base;
>> +
>> + if (bus->number >= (port->first_busno + 1))
>> + addr |= AXI_EP_CFG_ACCESS;
>> +
>> + return (void *)addr;
>> +}
>
> Wrong type, it should be 'void __iomem *'. Also you can't assume that
> bit operations work on virtual __iomem addresses, so it should be better
> to just add a constant integer to the pointer, which is a valid
> operation.

ok.

>
> I also wonder why you need to do this at all. If there isn't a global
> config space for all ports, but rather a separate type0/type1 config
> cycle based on the bus number, I see that as an indication that the
> ports are in fact separate domains and should each start with bus 0.

It is not a standard ECAM layout. We also have a separate RTDID
register as well to program bus, device, function. While accessing EP
config space, we have to set the bit 17:16 as 2b'01. The same config
space address is utilized for enabling a customized nonstandard PCIe
DMA feature. The bits are defined to differentiate the access purpose.
The feature is not supported in this driver yet.

Secondly I don't think it will matter if each port starts with bus 0.
As long as we set the correct BDF in RTDID and set correct bits in
config address, the config reads and writes would work. Right?

>
>> +static void xgene_pcie_setup_lanes(struct xgene_pcie_port *port)
>> +{
>> + void *csr_base = port->csr_base;
>> + u32 val;
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_8);
>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>> + writel(val, csr_base + BRIDGE_8G_CFG_8);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_9);
>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>> + writel(val, csr_base + BRIDGE_8G_CFG_9);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_10);
>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>> + writel(val, csr_base + BRIDGE_8G_CFG_10);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_11);
>> + val = eq_pre_cursor_lane0_set(val, 0x7);
>> + val = eq_pre_cursor_lane1_set(val, 0x7);
>> + writel(val, csr_base + BRIDGE_8G_CFG_11);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_4);
>> + val = (val & ~0x30) | (1 << 4);
>> + writel(val, csr_base + BRIDGE_8G_CFG_4);
>> +}
>
> Please document what you are actually setting here. If the configuration
> of the lanes is always the same, why do you have to set it here. If not,
> why do you set constant values?

Good point. Let me check if these values should be constant or tune-able.

>
>> +static void xgene_pcie_setup_link(struct xgene_pcie_port *port)
>> +{
>> + void *csr_base = port->csr_base;
>> + u32 val;
>> +
>> + val = readl(csr_base + BRIDGE_CFG_14);
>> + val |= DIRECT_TO_8GTS_MASK;
>> + val |= SUPPORT_5GTS_MASK;
>> + val |= SUPPORT_8GTS_MASK;
>> + val |= DIRECT_TO_5GTS_MASK;
>> + writel(val, csr_base + BRIDGE_CFG_14);
>> +
>> + val = readl(csr_base + BRIDGE_CFG_14);
>> + val &= ~ADVT_INFINITE_CREDITS;
>> + writel(val, csr_base + BRIDGE_CFG_14);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_0);
>> + val |= (val & ~0xf) | 7;
>> + val |= (val & ~0xf00) | ((7 << 8) & 0xf00);
>> + writel(val, csr_base + BRIDGE_8G_CFG_0);
>> +
>> + val = readl(csr_base + BRIDGE_8G_CFG_0);
>> + val |= DWNSTRM_EQ_SKP_PHS_2_3;
>> + writel(val, csr_base + BRIDGE_8G_CFG_0);
>> +}
>
> Same here.
>
>> +static void xgene_pcie_program_core(void *csr_base)
>> +{
>> + u32 val;
>> +
>> + val = readl(csr_base + BRIDGE_CFG_0);
>> + val |= AER_OPTIONAL_ERROR_EN;
>> + writel(val, csr_base + BRIDGE_CFG_0);
>> + writel(0x0, csr_base + INTXSTATUSMASK);
>> + val = readl(csr_base + BRIDGE_CTRL_1);
>> + val = (val & ~0xffff) | XGENE_PCIE_DEV_CTRL;
>> + writel(val, csr_base + BRIDGE_CTRL_1);
>> +}
>
> 'program_core'?

Some of the PCIe core related misc configurations.

>
>> +static void xgene_pcie_poll_linkup(struct xgene_pcie_port *port, u32 *lanes)
>> +{
>> + void *csr_base = port->csr_base;
>> + u32 val32;
>> + u64 start_time, time;
>> +
>> + /*
>> + * A component enters the LTSSM Detect state within
>> + * 20ms of the end of fundamental core reset.
>> + */
>> + msleep(XGENE_LTSSM_DETECT_WAIT);
>> + port->link_up = 0;
>> + start_time = jiffies;
>> + do {
>> + val32 = readl(csr_base + PCIECORE_CTLANDSTATUS);
>> + if (val32 & LINK_UP_MASK) {
>> + port->link_up = 1;
>> + port->link_speed = PIPE_PHY_RATE_RD(val32);
>> + val32 = readl(csr_base + BRIDGE_STATUS_0);
>> + *lanes = val32 >> 26;
>> + }
>> + time = jiffies_to_msecs(jiffies - start_time);
>> + } while ((!port->link_up) || (time <= XGENE_LTSSM_L0_WAIT));
>> +}
>
> Maybe another msleep() in the loop? It seems weird to first do an
> unconditional sleep but then busy-wait for the result.

ok.

>
>> +static void xgene_pcie_setup_primary_bus(struct xgene_pcie_port *port,
>> + u32 first_busno, u32 last_busno)
>> +{
>> + u32 val;
>> + void *cfg_addr = port->cfg_base;
>> +
>> + val = readl(cfg_addr + PCI_PRIMARY_BUS);
>> + val &= ~PCI_PRIMARY_BUS_MASK;
>> + val |= (last_busno << 16) | ((first_busno + 1) << 8) | (first_busno);
>> + writel(val, cfg_addr + PCI_PRIMARY_BUS);
>> +}
>
> Please explain what you are doing here. As mentioned above, I would expect
> that each domain has visibility of all 255 buses. You shouldn't need any hacks
> where you try to artificially squeeze the ports into a single domain when
> they are separate in hardware.

ok. I will check and get back.

>
>> +/*
>> + * read configuration values from DTS
>> + */
>> +static int xgene_pcie_read_dts_config(struct xgene_pcie_port *port)
>
> The comment and function name don't seem to match what the function
> does. The main purpose of this function seems to be to ioremap
> the resources, which have nothing to with configuration.

ok.

>
>> +{
>> + struct device_node *np = port->node;
>> + struct resource csr_res;
>> + struct resource cfg_res;
>> +
>> + /* Get CSR space registers address */
>> + if (of_address_to_resource(np, 0, &csr_res))
>> + return -EINVAL;
>> +
>> + port->csr_base = devm_ioremap_nocache(port->dev, csr_res.start,
>> + resource_size(&csr_res));
>
> You can also use platform_get_resource() to access the resource
> that is already there, rather than creating another one.

ok.

>
>> +static void xgene_pcie_setup_ob_reg(struct xgene_pcie_port *port,
>> + u32 addr, u32 restype)
>> +{
>> + struct resource *res = NULL;
>> + void *base = port->csr_base + addr;
>> + resource_size_t size;
>> + u64 cpu_addr = 0;
>> + u64 pci_addr = 0;
>> + u64 mask = 0;
>> + u32 min_size = 0;
>
> A general note: don't initialize local variables to a bogus valus (e.g. 0)
> in their declaration. It prevents the compiler from warning about
> incorrect uses.

ok.

>
>> + u32 flag = EN_REG;
>
> This one on the other hand is ok, because you are actually going to
> use that value.
>
>> + switch (restype) {
>> + case IORESOURCE_MEM:
>> + res = &port->mem.res;
>> + pci_addr = port->mem.pci_addr;
>> + min_size = SZ_128M;
>> + break;
>> + case IORESOURCE_IO:
>> + res = &port->io.res;
>> + pci_addr = port->io.pci_addr;
>> + min_size = 128;
>> + flag |= OB_LO_IO;
>> + break;
>> + }
>
> I assume this works ok, but seems wrong in one detail: If the resource
> is marked IORESOURCE_IO, res->start is supposed to be in I/O space, not
> in memory space, which would make it the wrong number to program
> into the hardware registers.

Yes for using ioport resource. However we have decided to defer using
it since 'pci_ioremap_io' is not yet supported from arm64 side.

>From HW point of view, for memory mapped IO space, it is nothing but a
piece taken out of the ranges in address map for outbound accesses. So
while configuring registers from SOC side, it should take the CPU
address which is address from SOC address map. Right?

Later on we can have a separate io resource like 'realio' similar to
what pci-mvebu.c does.

>
>> +static int xgene_pcie_parse_map_ranges(struct xgene_pcie_port *port)
>> +{
>> + struct device_node *np = port->node;
>> + struct of_pci_range range;
>> + struct of_pci_range_parser parser;
>> + struct device *dev = port->dev;
>> +
>> + if (of_pci_range_parser_init(&parser, np)) {
>> + dev_err(dev, "missing ranges property\n");
>> + return -EINVAL;
>> + }
>> +
>> + /* Get the I/O, memory, config ranges from DT */
>
> The comment needs updating now that you don't read config space here any more.

ok.

>
>> +/* X-Gene PCIe support maximum 3 inbound memory regions
>> + * This function helps to select a region based on size of region
>> + */
>> +static int xgene_pcie_select_ib_reg(u64 size)
>> +{
>> + static u8 ib_reg_mask;
>> +
>> + if ((size > 4) && (size < SZ_16M) && !(ib_reg_mask & (1 << 1))) {
>> + ib_reg_mask |= (1 << 1);
>> + return 1;
>> + }
>> +
>> + if ((size > SZ_1K) && (size < SZ_1T) && !(ib_reg_mask & (1 << 0))) {
>> + ib_reg_mask |= (1 << 0);
>> + return 0;
>> + }
>> +
>> + if ((size > SZ_1M) && (size < SZ_1T) && !(ib_reg_mask & (1 << 2))) {
>> + ib_reg_mask |= (1 << 2);
>> + return 2;
>> + }
>> + return -EINVAL;
>> +}
>
> Shouldn't the ib_reg_mask variable be per host bridge? Static variables
> are dangerous if you ever get multiple instances of the hardware in one
> system.

Yes. You are right. Thanks.

>
>> +static int xgene_pcie_parse_map_dma_ranges(struct xgene_pcie_port *port)
>> +{
>> + struct device_node *np = port->node;
>> + struct of_pci_range range;
>> + struct of_pci_range_parser parser;
>> + struct device *dev = port->dev;
>> + int region;
>> +
>> + if (pci_dma_range_parser_init(&parser, np)) {
>> + dev_err(dev, "missing dma-ranges property\n");
>> + return -EINVAL;
>> + }
>> +
>> + /* Get the dma-ranges from DT */
>> + for_each_of_pci_range(&parser, &range) {
>> + u64 restype = range.flags & IORESOURCE_TYPE_BITS;
>> + u64 end = range.cpu_addr + range.size - 1;
>> + dev_dbg(port->dev, "0x%08x 0x%016llx..0x%016llx -> 0x%016llx\n",
>> + range.flags, range.cpu_addr, end, range.pci_addr);
>> + region = xgene_pcie_select_ib_reg(range.size);
>> + if (region == -EINVAL) {
>> + dev_warn(port->dev, "invalid pcie dma-range config\n");
>> + continue;
>> + }
>> + xgene_pcie_setup_ib_reg(port, &range, restype, region);
>> + }
>> + return 0;
>> +}
>
> I guess is could even be a local variable in this function, which you pass
> by reference.
>
>> +
>> +static int xgene_pcie_setup(int nr, struct pci_sys_data *sys)
>> +{
>> + struct xgene_pcie_port *pp = xgene_pcie_sys_to_port(sys);
>> +
>> + if (pp == NULL)
>> + return 0;
>> +
>> + sys->mem_offset = pp->mem.res.start - pp->mem.pci_addr;
>> + pci_add_resource_offset(&sys->resources, &pp->mem.res,
>> + sys->mem_offset);
>> + return 1;
>> +}
>
> Please follow the regular error handling conventions, which are to
> pass a negative errno value on error and zero on success.

ok.

>
> Also, what would be a reason for the port to be zero here? If
> it's something that can't happen in practice, don't try to handle
> it gracefully. You can use BUG_ON() for fatal conditions that
> are supposed to be impossible to reach.

This function is a hook to upper layer. We register nr_controllers in
hw_pci structure as MAX_PORTS we support. It can happen that number of
ports actually enabled from device tree are less than the number in
nr_controllers.

>
>> +static struct pci_bus __init *xgene_pcie_scan_bus(int nr,
>> + struct pci_sys_data *sys)
>> +{
>> + struct xgene_pcie_port *pp = xgene_pcie_sys_to_port(sys);
>> +
>> + pp->first_busno = sys->busnr;
>> + xgene_pcie_setup_primary_bus(pp, sys->busnr, 0xff);
>> + return pci_scan_root_bus(NULL, sys->busnr, &xgene_pcie_ops,
>> + sys, &sys->resources);
>> +}
>
> You have a number of helper functions that don't seem to gain
> much at all. Just move the call to pci_scan_root_bus() into
> xgene_pcie_setup_primary_bus() here, and use that as the .scan
> callback.
>

I can do that if I can get rid of setup_primary_bus api. Let me check.

>
>> + if (!port->link_up)
>> + dev_info(port->dev, "(rc) link down\n");
>> + else
>> + dev_info(port->dev, "(rc) x%d gen-%d link up\n",
>> + lanes, port->link_speed + 1);
>> +#ifdef CONFIG_PCI_DOMAINS
>> + xgene_pcie_hw.domain++;
>> +#endif
>> + xgene_pcie_hw.private_data[index++] = port;
>> + platform_set_drvdata(pdev, port);
>> + return 0;
>> +}
>
> Do you have multiple domains or not? I don't see how it can work if you
> make the domain setup conditional on a kernel configuration option.
> If you in fact have multiple domains, make sure in Kconfig that
> CONFIG_PCI_DOMAINS is enabled. Otherwise don't mess with the domain
> number...

It is enabled in Kconfig.

>
>> +static const struct of_device_id xgene_pcie_match_table[] __initconst = {
>> + {.compatible = "apm,xgene-pcie",},
>> + {},
>> +};
>
> Another general note: Your "compatible" strings are rather unspecific.
> Do you have a version number for this IP block? I suppose that it's related
> to one that has been used in other chips before, or will be used in future
> chips, if it's not actually licensed from some other company.

I will have to check this.

>
>> +static int __init xgene_pcie_init(void)
>> +{
>> + void *private;
>> + int ret;
>> +
>> + pr_info("X-Gene: PCIe driver\n");
>> +
>> + /* allocate private data to keep xgene_pcie_port information */
>> + private = kzalloc((XGENE_PCIE_MAX_PORTS * sizeof(void *)), GFP_KERNEL);
>
> This should not be done unconditionally: There is no point in printing
> a message or allocating memory if you don't actually run on a system
> with this device.

I am doing this here because I have one instance of hw_pci structure
with multiple pcie controllers. I can't do it from probe since it will
be called once per instance in device tree.

>
>> + if (private == NULL)
>> + return -ENOMEM;
>
> Style: if you are testing for an object, just write 'if (private)' or
> 'if (!private)', but don't compare against NULL.

ok.

>
>> + xgene_pcie_hw.private_data = private;
>> + ret = platform_driver_probe(&xgene_pcie_driver,
>> + xgene_pcie_probe_bridge);
>> + if (ret)
>> + return ret;
>> + pci_common_init(&xgene_pcie_hw);
>> + return 0;
>
> This seems wrong: You should not use platform_driver_probe() because
> that has issues with deferred probing.

I think 'platform_driver_probe' prevents the deferred probing.
'pci_common_init' needs to be called only once with current driver
structure. The probes for all pcie ports should be finished (ports
initialized) before 'pci_common_init' gets called.

>
> Arnd
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/