Abstract
This paper provides an applicable implementation of real-time Ethernet named CASNET, which modifies the Ethernet medium access control (MAC) to achieve the real-time requirement for motion control. CASNET is the communication protocol used for motion control system. Verilog hardware description language (VHDL) has been used in the MAC logic design. The designed MAC serves as one of the intellectual properties (IPs) and is applicable to various industrial controllers. The interface of the physical layer is RJ45. The other layers have been implemented by using C programs. The real-time Ethernet has been implemented by using field programmable gate array (FPGA) technology and the proposed solution has been tested through the cycle time, synchronization accuracy, and Wireshark testing.
1. Introduction
Motion control systems have been widely used in various applications of factory automation systems for a long time. The basic motion control system component includes controller, servo drivers, and motors. Now, more and more controllers connected servo drivers use the communication network. Actually, many types of networks having been used ranging from proprietary connections to open connections in motion control systems. The communication systems in the motion control system should guarantee the domain-specific requirements: real-time (maximum transfer delay, jitter in the transmissions, and available bandwidth).
Ethernet is so fast, easy to install, and cheap that it wins widespread popularity. However, Ethernet is not a real-time network and thus may cause time delay. Ethernet can be effectively applied in real-time communication with modification. There are different ways of modifying the Ethernet technology [1]. All the solutions proposed can be classified into three different approaches in principle: on top of TCP/IP, on top of Ethernet, and modified Ethernet [2].
Within the automation domain, the real-time requirements focus on the response time behavior of data packets. There are three real-time classes to guarantee response time: soft real-time, hard real-time, and isochronous real-time [3]. Only isochronous real-time Ethernet is used for motion control. The isochronous real-time Ethernet (RTE) devices provide more predictable and reliable real-time data transfer and means to support the precise synchronization of automation equipment according to IEEE 1588 [4, 5]. Several isochronous RTE devices are available from different vendors, such as Powerlink [6], Ethercat [7], Sercos III [8], and Profinet IO [9]. Most of them comply with both the IEC 61158 [10] and IEC 61784-2 [11] International Standards.
Field Programmable Gate Arrays (FPGAs) are semiconductor devices that are based around a matrix configurable logic blocks (CLBs) connected via programmable interconnects. All the internal logic elements and all the control procedures of the FPGA are executed continuously and simultaneously. Therefore, the execution time of FPGA is faster than either the digital signal processor (DSP) or the personal computer (PC). Recently, some researches adopted the FPGA chip to implement low-cost, high-performance, real-time industrial applications [12, 13].
The protocols of available RTE network are too complex to grasp and thus add to difficulty in its development. They use specific chips and communication cards, another negative element on road of its development. This paper proposes an isochronous RTE network named CASNET, which modifies the Ethernet MAC achieved by the FPGA to meet the real-time requirements for motion control. CASNET uses a ring topology within the segment. The MAC adopts the Master/Slave principle, where Master node (typically the control system) sends the Ethernet frames to the Slave nodes which extract data from and insert data into these frames. The physical layer is based on the Ethernet hardware.
2. CASNET Protocol
CASNET adopts simple three layers for a simple configuration proposed by IEC, as opposed to selecting the hierarchical model suggested by the OSI-7 layer [14, 15]. However, it appends on the fourth layer, referred to as the transport layer, to package and send/receive the message in CASNET device, as shown in Figure 1. The data link layer of CASNET protocol works in hardware. VHDL is used as an implementation and the functional verification is carried out by using FPGA.

CASNET protocol layer structure.
Service provided in each layer is explained as follows.
The communication media is connected directly to the physical layer, where both status of the media and synchronization are controlled. The interface of the physical layer is Ethernet (RJ45).
The data link layer is designed for making the physical link reliable. This is achieved by providing CRC check. The link layer sends and receives telegrams named as CASNET frames as shown in Figure 2. Upon arrival of the frame, the data link layer receives incoming telegram, extracts the relevant user data, sends the user data to Transport layer, inserts the slave station relevant data to the telegram, and transfers the telegram to the next CASNET slave station. Figure 2 also shows the structure of the CASNET telegrams contained in the Ethernet data field. A telegram begins with a 1-byte address that represents the area in which slaves write or read user data. The address is followed by a 1-byte control word that specifies the type of control or operation for which it is intended (e.g., position, velocity, write, and read). A 1-byte status represents the telegram from/to master station. A 1-byte time is used to record the time stamp to synchronize clock. A 6-byte periodic data and a 2-byte aperiodic data are real-time data exchanging between master and slave.

Structure of CASNET frame.
The transport layer is not only for packaging the application data into frames and transmitting the data to the link layer, but also for reassembling telegrams coming from the link layer. In CASNET, the transport layer is incorporated into the application layer. The application layer cooperates with the transport layer and the data link layer to enable reliable communications. It provides standardized functions and data formatting with which the user can interact.
CASNET is a master/slave network that composes a master station and up to 125 slave stations. The system topology structure is shown in Figure 3. The CASNET network is a ring topology achieved by point-to-point connections between consecutive nodes. The network controllers (either master or slaves) are full-duplex devices that are capable of receiving/transmitting data concurrently. CASNET adopts standard Ethernet frames that encapsulate the telegrams specifically defined by the protocol, as shown in Figure 2. The master station is a standard Ethernet interface running the CASNET protocol. The slave consisted of the standard Ethernet interface and real-time MAC achieved by FPGA.

System topology structure.
The master station may be CNC controller or industrial robot controller and it packages data, analyzes data, initializes systems, data communication control, synchronizing clock, and sends data package.
At the runtime, a single frame periodically issued by the master station circulates among all the slave stations. On the arrival of the frame, each slave station receives incoming telegrams, extracts the relevant user data, sends the user data to the controller relevant to the slave, inserts the slave station-related data to the telegram, and transfers the telegram to the next CASNET slave station through Ethernet Rx wire pairs. The last CASNET slave station sends the fully processed telegram back through Ethernet Tx wire pairs.
3. CASNET Slave Controller
Both the CASNET master and the slave can in principle be operated by using Ethernet physical layer, such as copper cables (100 base TX) and fiber optic (100 base Fx). The CASNET master is normally operated with standard network interfaces (in this context referring to any PC compatible MAC and PHY). However, the CASNET slave is operated by special hardware (e.g., FPGA) to facilitate very short packet forwarding delays in the slave devices.
The slave controller is operated for the data link layer by VHDL in a CASNET protocol. The other layers in the slave controller are operated by using software programs. A functional test is performed by FPGA. Firmware and test board are developed to examine operations of CASNET protocol designed with FPGA, as shown in Figure 4.

Board with FPGA implementing CASNET slave.
Figure 5 depicts an overall block diagram of the designed CASNET slave controller. The CASNET slave controller proposed in this paper mainly consists of three blocks such as the physical layer (PHY), the real-time media access control (MAC), and microcontroller. The data interface between the real-time MAC and the micro-processor, the interface between the real-time MAC and the PHY, are implemented in the real-time MAC. As the microprocessor and PHYs adopt the standard chip, these blocks are not dealt with in this paper.

Overall block diagram of CASNET slave controller. RMII_mux—reduced media independent interface; Casnet_rev—CASNET receive; Clk_syn—clock synchronization; Intc—interrupt controller; μC_if—Microcontroller interface; Casnet_trans—CASNET transmit; Trans_buf—Slave data buffer that is to be transmitted; Rev_buf—Slave receiving data buffer.
3.1. CASNET Slave Controller Registers
The CASNET slave controller has an address space of 4 kByte. The memory space in the real-time MAC is used for registers and user memory. The registers are configured by the CASNET telegram at initialization. The address range is directly addressable by the CASNET master and an attached μC_If block. The μC_If block realizes the connection between the real-time MAC and the microcontroller. Table 1 is an overview of the registers and user memory.
CASNET slave register and user memory address.
3.2. Internal Configuration of Real-Time Media Access Control (MAC)
The architecture of the real-time MAC is shown in Figure 5. The real-time MAC is composed of six units, namely, RMII_mux, CASNET receiving block (Casnet_rev), CASNET transmitting block (Casnet_trans), clock synchronization block (Clk_syn), interrupt control (Intc), and microcontroller interface (μC_if). The MAC works as follows: when a frame is received through the RMII_mux (Tx and Rx lines), the Casnet_rev receives and extracts the relevant slave user data, then writes the user data into the Rev_buf ram block of the μC_if. The μC_if is a communication interface between micro-processor and the real-time MAC. The microcontroller accesses the user data in the Rev_buf ram of the μC_if and writes the return data (the slave status, warning, local timer, etc.) into the Trans_buf ram block of the μC_if. The Casnet_trans reads the return data from the Trans_buf ram, inserts the return data into the frame, and forwards this frame to the next slave. If the user data received by the Casnet_rev is synchronization data, this synchronization data is written to the Clk_syn unit and microcontroller to align the distributed clock and adjust the local timer. When the user data or synchronization data is completely received, the Intc unit generates interrupt to remind the microcontroller respectively. The following sections describe the functions and operations related to each block in more detail.
3.2.1. RMII_Mux
The RMII_mux is used to connect the real-time MAC with the Ethernet PHYs. It supports 100 MB/s data rates. The RMII_mux unit detects the connection between CASNET slave and PHYs. If the CASNET slave connects the PHYs, the RMII_mux unit connects the signals Rx0 and Tx0 of the real-time MAC to the Rx0 and Tx0 of the PHY0, respectively, and the signals Rx1 and Tx1 of the real-time MAC to the Rx1 and Tx1 of the PHY1, respectively, otherwise it can connect Rx to Tx to form a loop in the real-time MAC.
3.2.2. Receiving and Transmitting Block
The internal structure of receiving unit Casnet_rev is represented in Figure 6(a). The finite state machine (FSM) controls the operation when user data is being extracted from the frame. Once the receiving user data is completed, a RCF signal writes to Intc unit. As soon as the Intc unit receives the RCF signal, it immediately generates an interrupt to remind the microcontroller. If the user data is synchronization data, it is forwarded to Clk_syn unit via Clk_syn_if to synchronize the distributed clock. CLK_syn_if is a 64 bit read/write interface. The user data received is stored in FIFO. The FIFO size is set by parameters, and its maximum size is up to 32 × 16 bit. The frame of slave is up to 2 kbit. After the user data stored in the receiving FIFO pass the Cyclic Redundancy Check of 32 bits (CRC-32), they are sent to the Rev_buf ram of μC_if unit.

(a) Casnet_rev structure. (b) Casnet_trans structure. FSM—finite state machine, CRC—Cyclic Redundancy Check, RCF—receive complete flag. SWF—successful writes flag. FIFO—fist input first output. Clk_syn_if—Clk_syn interface.
The internal structure of transmitting unit Casnet_trans is given in Figure 6(b). If there is data transmitting, the user data that pass CRC-32 check will be sent from the Trans_buf ram of μC_if unit to the transmitting FIFO. The FSM controls the transmitting process.
The CRC-32 polynomial is shown in (1). The CRC calculation can be realized with a shift register and exclusive-or (XOR) logic gates, where the data is processed one bit per clock cycle. There are necessary (k + r) cycles to obtain an r-bit CRC code for a message with k bits [15]. Although the serial implementation is simple and can run at a high clock rate, it is low data throughput for its serial input. To increase the throughput of CRC calculation, the parallel implementation is used to calculate and transmit it, which is adopted by [16]
Considering the CASNET protocol, the FSM of receiving and transmitting data is adopted as shown in Figure 7. Once there is the RMII input, the state machine will switch to the Header state from the IDLE state. The Header state checks the frame header to determine whether it is a CASNET message. The state will switch to the ADDRESS state after the CASNET frame passes check, or it will switch to the ERROR state. The CASNET command or control types are received/transmitted in the ControlWord state. The command types are read or write operation, the read operation is performed before the write operation. The control types include position mode, velocity mode, acceleration mode, and IO mode. The status is read to determine the telegram from/to master station in the STATUS state. The operation of DATA state abstracts or inserts user data from/to CASNET frame at the address space. If the data is received, the user data will be stored in the receiving FIFO. The state will switch to the NOTIFY state after the user data pass the CRC check. If the data are being received, the NOTIFY state will send out a signal to notify the Rev_buf ram to store the received data. If the data are transmitted, the NOTIFY state will give a signal to copy the data in the Trans_buf ram to the transmitting FIFO. The state switches to IDLE state after the data are completely stored in the Rev_buf ram or the transmitting FIFO. The conditions of FSM transition are depicted in Table 2.
Conditions of FSM transition.

Finite state machine of receive/transmit.
3.2.3. Clock Synchronization Block
Clock synchronization block enables all CASNET devices (master and slaves) to share the same time. It may be achieved either by hardware or software. The CASNET clock synchronization is hardware based on the master and slave clock, which is very closely related to the IEEE 1588 standard. A master clock of a specified device is used to synchronize the slave clock of the other devices. A synchronization telegram containing the current master clock time is sent at certain intervals and the devices with slave clocks read the time from the same telegram to synchronize the local time.
In this system, there are three types of compensation: propagation delay from the device of master clock to the devices of slave clock, clock drift caused by oscillator instabilities due to temperature changes, aging and other reasons, and the offset between local clock and master clock caused by the initial difference of the local times resulting from different times at which the system is powered up. The first two compensations should be compensated at certain intervals while the last can be compensated at start-up.
The clock synchronization process consists of three steps: propagation delay measurement, offset compensation, and clock drift compensation.
Propagation delay measurement is initiated by the master with time stamping between all slaves, which adopt a master-slave based protocol termed precision time protocol (PTP). The propagation delay tdelay is calculated by the device of the master clock and written to the devices of slave register DELAY.
The offset is calculated by the device of the master clock and is written to the devices of the slave register OFFTIME. Each device of slave calculates its local copy of the master time ttime using its local time tlocal and local offset valuetoffset:
After the propagation delay has been measured and the offset compensated, the clock drift Δt of every local clock is compensated by a clock rate algorithm, which is achieved by hardware. The treference is the copy of the master clock time, which is stored at slave register REFTIME as follows:
If Δt is positive, it means that the local clock is running faster than the master clock and thus needs to be slowed down; if Δt is negative, it means that the local time is running slower than the master clock and has to be sped up.
The clock synchronization unit includes timestamp capturing Time_Capture unit, clock drift compensation Clock_drift unit, adder Add, and clock drift averaging Mean_diff unit, as shown in Figure 8. The Add is used to calculate the clock drift Δt. The following sections describe the functions and operations related to each unit in clock synchronization block

Structure of clock synchronization.
(a) Timestamp Capturing. The timestamp is captured at the first shot of CASNET packet arriving at Rx or Tx (Rx_DV = 1) and writes the time to the local time registers DS0 and DS1.
(b) Clock Drift Compensation. Clock drift compensation can be achieved by employing an oscillator with small drift and clock rate algorithm. Unfortunately, accurate oscillators are very expensive and consume much more power and space. The implementation of clock rate algorithm maintains the clock rate stability and is much cheaper in term, of additional communication as well as computation cost.
A dedicated compensate clock IP implements the clock rate algorithm, as shown in Figure 9. Actually it is a clock counter that can fine tune the frequency of the calculator. It is composed of a 64 bit clock calculator, a 32 bit accumulator, and a 32 bit register. The accumulator is a divider. The register is used to store the compensating rate r c . They are all driven by oscillator whose frequency is fosc. Accumulator adds its value to compensating rate r c at every oscillator period, and the accumulated value is stored to the accumulator. At the same time, a carry flag is generated to indicate whether the addition is overflow. If it is overflow, the clock calculator will get an increment value that is equal to the resolution of the compensate clock at the next oscillator period. The frequency fcal of clock counter is determined by r c and fosc as follows:

Structure of 40 M divider.
When fosc changes, r c can be tuned to maintain fcal. In this system, the frequency of oscillator fosc is 50 MHZ, and the frequency fcal of clock counter is 40 MHZ. the r c can be obtained as follows:
where Δt1 is the clock drift this time, Δt0 is the clock drift last time, and default is 32′b0, k is the damping coefficient, and k is 5. Δr c may be negative.
The value of accumulator is
where rc1 is the value of the compensating rate this time andrc0 is the value of the compensating rate last time.
Once receiving the synchronization frame, the clock synchronization unit compensates the frequency of clock counter.
(c) Clock Drift Averaging. The clock drift averaging calculates the mean of the latest eight times clock drift to store in the register DIFFTIME. The master reads the register to determine whether the process of synchronization is completed.
3.2.4. Microcontroller (μC_If) Interface
The μC interface (μC_If) is the connection between real-time MAC and micro-processor. The internal structure of μC_If is shown in Figure 10. The μC_if for CASNET has two 512 b RAM and registers. One RAM is Trans_buf ram used for transmit buffer and the other is Rev_buf ram used for receive buffer. The μC_If for microprocessor has 10-bit address, 16-bit data, Chip Select signal (cs), read/write enable and interrupt control signal.

Structure of μC_if.
3.2.5. Interrupt Control
An interrupt control selectively deals with the interrupt based on the priorities of four interrupt sources. Interrupt source supports edge trigger mode only.
4. Verification and Simulation of the Implemented CASNET
4.1. CASNET Slave Implementation
Using the Quartus II toolset, the present authors have implemented CASNET slave in a Cyclone II family FPGA, EP2C5, which offers high performance and high density. This version includes 4608 Les, 26 M4k Ram Blocks, 117 Embedded Memory, 13 18-bit × 18-bit Embedded Multipliers, 2 PLLs, and maximum 158 user I/O pins. The Quartus II synthesis and route summary indicate that the CASNET slave costs 3569 LE and 8.6 k ram block bits, and the max frequency is 78.2 MHz.
4.2. Test System Configuration
The testing environment, as shown in Figure 11, is composed of the CASNET master (CNC) and 50 CASNET slave boards, each of which is designed by using FPGA technology. The master station adopts the PC.

System configuration for test.
4.3. CASNET Cycle Time
Cycle time is one of the key variables of the motion control system. CASNET cycle time is made up of a weighted sum of master packet forwarding time, master and slave PHY delay, slave forwarding delay and propagation delays along the cables. In order to test the minimum cycle time, the first step is to calculate the minimum cycle times with a payload of 60 bytes per slave at a given constant time. As the second step, it is to calculate the minimum times that can successfully send and return a frame with a payload of 60 bytes per slave. The number of devices ranges from 1 to 50. The results are shown in Figure 12.

Minimum achievable cylcle times on a line topology network with 60 bytes payload per devices.
4.4. Time Synchronization Accuracy
Time synchronization accuracy indicates the maximum deviation between the slave clocks and the master clock. The oscilloscope CH1 connects the master device and the oscilloscope CH2 connects the slave device. The phase between the master clock and the slave clocks is shown in Figure 13. The time synchronization accuracy is 15 ns. The time synchronization accuracy is between 15 ns and 100 ns. The number of devices ranges from 1 to 50. The minimum deviation between the salve clocks and the master clock is 0~15 ns. The time synchronization accuracy is achieved at different startups. The average time from startup to synchronization is 20 s.

Synchronization accuracy at different startups.
4.5. Wireshark Testing
Wireshark is a network packet analyzer that can capture live network traffic or read data from a file and translate the data to be presented in a format that the user can understand. Wireshark can define user packet to analyze. The authors of this paper use a PC as a master device. Two ways have been adopted to test packet loss rate. One is that the slave generates a random data with a certain payload to send to PC and writes the data to a file every 2 ms. The files are input to the Wireshark and compare the data with the data captured by the Wireshark at the master device. After the experiment with 50 slaves lasted for as long as 48 h, there is no error at all. The other is that the master device generates a packet and then sends the packet to the slaves. The slaves insert data to the packet and then return the packet to the master. Wireshark generates the data and captures the return data to analyze. No error occurs during long-time testing. These tests have proved that the packet loss rate is very low.
5. Conclusion
The present paper proposes a real-time Ethernet named CASNET. Instead of utilizing Ethernet MAC, the MAC is modified to meet the real-time requirements and is implemented in Verilog HDL. In other words, the MAC can provide deterministic behavior by processing the functions in hardware such as synchronization of a frame, address recognition, and CRC comparison. The CASNET is tested in terms of its cycle time, synchronization accuracy, and reliability. This solution has been proved feasible and the next step is to optimize hardware block and fabricate the application specific IC (ASIC) for the CASNET. Certainly, it is necessary to carry out a complete process of functional verification through a field test with the fabricated ASIC.
Footnotes
Acknowledgments
This work is supported by National Major S&T Program of China (Grant no. 2009ZX04013), National Hi-tech Research, and Development Program of China (863 Program, Grant no. 2011AA04A104).
