Skip to content

Commit

Permalink
Merge branch 'stmmac-optimizations'
Browse files Browse the repository at this point in the history
Alexandre TORGUE says:

====================
stmmac: enhance driver performances and update the version

According to Giuseppe, I send the v3 series.

This is a subset of patches to rework the driver in order to improve its
performances and make it more robust under stress conditions.

All patches have been ported on STi mainstream kernel branch and
tested on ARM STiH4xx platforms and newer ones.

This series also updates the driver version and prepares it
to include further development to support new chips.

In detail, these patches are:

o to rework and improve the internal DMA bus settings

  Fine tuning is mandatory on some platforms for both
  performance and stability issues.

o to rework and optimize the descriptor management.

  This will help a lot on performance side and preparing
  the inclusion on the GMAC4.x.

o to add a set of optimizations for both xmit and rx functions.

  These will help a lot on performance side and making the driver
  more robust in case of low memory conditions and under some
  stress test, performed for example on IP-STB.

Below some throughput figures obtained on some boxes before and after
the patches.

                       nuttcp (mbps)       iperf (Mbps)
------------------------------------------------------------------
                      tcp     udp          tcp      udp
                   tx   rx   tx  rx      tx   rx   tx  rx
                    ------------------------------------------
   old             680   800 480  506    760  800   600  700
   new             830   880 540  630    840  880   700   800

V2: - rx_copybreak is now managed by using ethtool.
V3: - improve comments on PCIe detailing that there are no regressions
    - rework some APIs to properly define some params as bool as expected
    - rework the formula to get the element inside the ring. Comparing V2,
	patches 4 and 13 have been merged because the same formula have been
	used. After this rework, no evident benefit has been noticed in terms
	of performances so the table above is still valid. Disassembling the
	code for SH4 and ARM, with the new formula just an instr is saved
	(depending on compiler flags) and this gives us not so relevanti gain,
	for example, on SH4 where some instr are executed in the same pipeline
	stage.
	Ring sizes are now fixed and maybe they can be reworked to be tuned
	w/o using stmmaceth= cmdline option. Indeed, nobody change these sizes
	and indeed the numbers selected by default respect the budget and
	avoid to pass invalid setup. These are the best driver default sizes
	for ring and chain.

====================
  • Loading branch information
David S. Miller committed Mar 2, 2016
2 parents fcb3f55 + 3796e44 commit d93e409
Show file tree
Hide file tree
Showing 20 changed files with 1,016 additions and 712 deletions.
54 changes: 37 additions & 17 deletions Documentation/devicetree/bindings/net/stmmac.txt
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,25 @@ Required properties:
The 1st cell is reset pre-delay in micro seconds.
The 2nd cell is reset pulse in micro seconds.
The 3rd cell is reset post-delay in micro seconds.

Optional properties:
- resets: Should contain a phandle to the STMMAC reset signal, if any
- reset-names: Should contain the reset signal name "stmmaceth", if a
reset phandle is given
- max-frame-size: See ethernet.txt file in the same directory
- clocks: If present, the first clock should be the GMAC main clock and
the second clock should be peripheral's register interface clock. Further
clocks may be specified in derived bindings.
- clock-names: One name for each entry in the clocks property, the
first one should be "stmmaceth" and the second one should be "pclk".
- clk_ptp_ref: this is the PTP reference clock; in case of the PTP is
available this clock is used for programming the Timestamp Addend Register.
If not passed then the system clock will be used and this is fine on some
platforms.
- tx-fifo-depth: See ethernet.txt file in the same directory
- rx-fifo-depth: See ethernet.txt file in the same directory
- snps,pbl Programmable Burst Length
- snps,aal Address-Aligned Beats
- snps,fixed-burst Program the DMA to use the fixed burst mode
- snps,mixed-burst Program the DMA to use the mixed burst mode
- snps,force_thresh_dma_mode Force DMA to use the threshold mode for
Expand All @@ -29,27 +47,28 @@ Required properties:
supported by this device instance
- snps,perfect-filter-entries: Number of perfect filter entries supported
by this device instance

Optional properties:
- resets: Should contain a phandle to the STMMAC reset signal, if any
- reset-names: Should contain the reset signal name "stmmaceth", if a
reset phandle is given
- max-frame-size: See ethernet.txt file in the same directory
- clocks: If present, the first clock should be the GMAC main clock
The optional second clock should be peripheral's register interface clock.
The third optional clock should be the ptp reference clock.
Further clocks may be specified in derived bindings.
- clock-names: One name for each entry in the clocks property.
The first one should be "stmmaceth".
The optional second one should be "pclk".
The optional third one should be "clk_ptp_ref".
- snps,burst_len: The AXI burst lenth value of the AXI BUS MODE register.
- tx-fifo-depth: See ethernet.txt file in the same directory
- rx-fifo-depth: See ethernet.txt file in the same directory
- AXI BUS Mode parameters: below the list of all the parameters to program the
AXI register inside the DMA module:
- snps,lpi_en: enable Low Power Interface
- snps,xit_frm: unlock on WoL
- snps,wr_osr_lmt: max write oustanding req. limit
- snps,rd_osr_lmt: max read oustanding req. limit
- snps,kbbe: do not cross 1KiB boundary.
- snps,axi_all: align address
- snps,blen: this is a vector of supported burst length.
- snps,fb: fixed-burst
- snps,mb: mixed-burst
- snps,rb: rebuild INCRx Burst
- mdio: with compatible = "snps,dwmac-mdio", create and register mdio bus.

Examples:

stmmac_axi_setup: stmmac-axi-config {
snps,wr_osr_lmt = <0xf>;
snps,rd_osr_lmt = <0xf>;
snps,blen = <256 128 64 32 0 0 0>;
};

gmac0: ethernet@e0800000 {
compatible = "st,spear600-gmac";
reg = <0xe0800000 0x8000>;
Expand All @@ -65,6 +84,7 @@ Examples:
tx-fifo-depth = <16384>;
clocks = <&clock>;
clock-names = "stmmaceth";
snps,axi-config = <&stmmac_axi_setup>;
mdio0 {
#address-cells = <1>;
#size-cells = <0>;
Expand Down
37 changes: 23 additions & 14 deletions drivers/net/ethernet/stmicro/stmmac/chain_mode.c
Original file line number Diff line number Diff line change
Expand Up @@ -31,8 +31,7 @@
static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
{
struct stmmac_priv *priv = (struct stmmac_priv *)p;
unsigned int txsize = priv->dma_tx_size;
unsigned int entry = priv->cur_tx % txsize;
unsigned int entry = priv->cur_tx;
struct dma_desc *desc = priv->dma_tx + entry;
unsigned int nopaged_len = skb_headlen(skb);
unsigned int bmax;
Expand All @@ -50,11 +49,14 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
if (dma_mapping_error(priv->device, desc->des2))
return -1;
priv->tx_skbuff_dma[entry].buf = desc->des2;
priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum, STMMAC_CHAIN_MODE);
priv->tx_skbuff_dma[entry].len = bmax;
/* do not close the descriptor and do not set own bit */
priv->hw->desc->prepare_tx_desc(desc, 1, bmax, csum, STMMAC_CHAIN_MODE,
0, false);

while (len != 0) {
priv->tx_skbuff[entry] = NULL;
entry = (++priv->cur_tx) % txsize;
entry = STMMAC_GET_ENTRY(entry, DMA_TX_SIZE);
desc = priv->dma_tx + entry;

if (len > bmax) {
Expand All @@ -64,9 +66,10 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
if (dma_mapping_error(priv->device, desc->des2))
return -1;
priv->tx_skbuff_dma[entry].buf = desc->des2;
priv->tx_skbuff_dma[entry].len = bmax;
priv->hw->desc->prepare_tx_desc(desc, 0, bmax, csum,
STMMAC_CHAIN_MODE);
priv->hw->desc->set_tx_owner(desc);
STMMAC_CHAIN_MODE, 1,
false);
len -= bmax;
i++;
} else {
Expand All @@ -76,12 +79,17 @@ static int stmmac_jumbo_frm(void *p, struct sk_buff *skb, int csum)
if (dma_mapping_error(priv->device, desc->des2))
return -1;
priv->tx_skbuff_dma[entry].buf = desc->des2;
priv->tx_skbuff_dma[entry].len = len;
/* last descriptor can be set now */
priv->hw->desc->prepare_tx_desc(desc, 0, len, csum,
STMMAC_CHAIN_MODE);
priv->hw->desc->set_tx_owner(desc);
STMMAC_CHAIN_MODE, 1,
true);
len = 0;
}
}

priv->cur_tx = entry;

return entry;
}

Expand Down Expand Up @@ -138,23 +146,24 @@ static void stmmac_refill_desc3(void *priv_ptr, struct dma_desc *p)
*/
p->des3 = (unsigned int)(priv->dma_rx_phy +
(((priv->dirty_rx) + 1) %
priv->dma_rx_size) *
DMA_RX_SIZE) *
sizeof(struct dma_desc));
}

static void stmmac_clean_desc3(void *priv_ptr, struct dma_desc *p)
{
struct stmmac_priv *priv = (struct stmmac_priv *)priv_ptr;
unsigned int entry = priv->dirty_tx;

if (priv->hw->desc->get_tx_ls(p) && !priv->extend_desc)
if (priv->tx_skbuff_dma[entry].last_segment && !priv->extend_desc &&
priv->hwts_tx_en)
/* NOTE: Device will overwrite des3 with timestamp value if
* 1588-2002 time stamping is enabled, hence reinitialize it
* to keep explicit chaining in the descriptor.
*/
p->des3 = (unsigned int)(priv->dma_tx_phy +
(((priv->dirty_tx + 1) %
priv->dma_tx_size) *
sizeof(struct dma_desc)));
p->des3 = (unsigned int)((priv->dma_tx_phy +
((priv->dirty_tx + 1) % DMA_TX_SIZE))
* sizeof(struct dma_desc));
}

const struct stmmac_mode_ops chain_mode_ops = {
Expand Down
39 changes: 27 additions & 12 deletions drivers/net/ethernet/stmicro/stmmac/common.h
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@

#include <linux/etherdevice.h>
#include <linux/netdevice.h>
#include <linux/stmmac.h>
#include <linux/phy.h>
#include <linux/module.h>
#if defined(CONFIG_VLAN_8021Q) || defined(CONFIG_VLAN_8021Q_MODULE)
Expand All @@ -41,6 +42,10 @@
#define DWMAC_CORE_3_40 0x34
#define DWMAC_CORE_3_50 0x35

#define DMA_TX_SIZE 512
#define DMA_RX_SIZE 512
#define STMMAC_GET_ENTRY(x, size) ((x + 1) & (size - 1))

#undef FRAME_FILTER_DEBUG
/* #define FRAME_FILTER_DEBUG */

Expand Down Expand Up @@ -95,7 +100,7 @@ struct stmmac_extra_stats {
unsigned long napi_poll;
unsigned long tx_normal_irq_n;
unsigned long tx_clean;
unsigned long tx_reset_ic_bit;
unsigned long tx_set_ic_bit;
unsigned long irq_receive_pmt_irq_n;
/* MMC info */
unsigned long mmc_tx_irq_n;
Expand Down Expand Up @@ -233,10 +238,19 @@ struct stmmac_extra_stats {

/* Rx IPC status */
enum rx_frame_status {
good_frame = 0,
discard_frame = 1,
csum_none = 2,
llc_snap = 4,
good_frame = 0x0,
discard_frame = 0x1,
csum_none = 0x2,
llc_snap = 0x4,
dma_own = 0x8,
};

/* Tx status */
enum tx_frame_status {
tx_done = 0x0,
tx_not_ls = 0x1,
tx_err = 0x2,
tx_dma_own = 0x4,
};

enum dma_irq_status {
Expand Down Expand Up @@ -332,17 +346,16 @@ struct stmmac_desc_ops {

/* Invoked by the xmit function to prepare the tx descriptor */
void (*prepare_tx_desc) (struct dma_desc *p, int is_fs, int len,
int csum_flag, int mode);
bool csum_flag, int mode, bool tx_own,
bool ls);
/* Set/get the owner of the descriptor */
void (*set_tx_owner) (struct dma_desc *p);
int (*get_tx_owner) (struct dma_desc *p);
/* Invoked by the xmit function to close the tx descriptor */
void (*close_tx_desc) (struct dma_desc *p);
/* Clean the tx descriptor as soon as the tx irq is received */
void (*release_tx_desc) (struct dma_desc *p, int mode);
/* Clear interrupt on tx frame completion. When this bit is
* set an interrupt happens as soon as the frame is transmitted */
void (*clear_tx_ic) (struct dma_desc *p);
void (*set_tx_ic)(struct dma_desc *p);
/* Last tx segment reports the transmit status */
int (*get_tx_ls) (struct dma_desc *p);
/* Return the transmit status looking at the TDES1 */
Expand All @@ -351,7 +364,6 @@ struct stmmac_desc_ops {
/* Get the buffer size from the descriptor */
int (*get_tx_len) (struct dma_desc *p);
/* Handle extra events on specific interrupts hw dependent */
int (*get_rx_owner) (struct dma_desc *p);
void (*set_rx_owner) (struct dma_desc *p);
/* Get the receive frame size */
int (*get_rx_frame_len) (struct dma_desc *p, int rx_coe_type);
Expand All @@ -376,8 +388,11 @@ extern const struct stmmac_desc_ops ndesc_ops;
/* Specific DMA helpers */
struct stmmac_dma_ops {
/* DMA core initialization */
int (*init) (void __iomem *ioaddr, int pbl, int fb, int mb,
int burst_len, u32 dma_tx, u32 dma_rx, int atds);
int (*reset)(void __iomem *ioaddr);
void (*init)(void __iomem *ioaddr, int pbl, int fb, int mb,
int aal, u32 dma_tx, u32 dma_rx, int atds);
/* Configure the AXI Bus Mode Register */
void (*axi)(void __iomem *ioaddr, struct stmmac_axi *axi);
/* Dump DMA registers */
void (*dump_regs) (void __iomem *ioaddr);
/* Set tx/rx threshold in the csr6 register
Expand Down
Loading

0 comments on commit d93e409

Please sign in to comment.