## **International Journal of Science and Research (IJSR)** ISSN (Online): 2319-7064 Index Copernicus Value (2013): 6.14 | Impact Factor (2015): 6.391 # Implementation of Fault Tolerance System from the Concept of Endocrine Cell ## Prajeesh.P<sup>1</sup>, Jasmin Basheer<sup>2</sup> <sup>1</sup>M.Tech Student,Sree Buddha College of Engineering, Department of ECE, Pattoor, Kerala University Abstract: Self-healing digital system have recently emerged concept for fault tolerance. However such systems are still more complex due to the hardware size and rerouting architecture. In most case digital circuit lose efficiency when circuit size increases. The proposed work is based on the concept of human blood cell communication system known as endocrine cell communication. The work is trying to implement the endocrine concept in to the Digital circuit world to achieve the best possible fault tolerance. Each working unit in the design is treated as a working cell and each of these cells are surrounded by stem cells. Working cell are replaced with stem cell if a fault occur. **Keywords:** Endocrine cell, Fault tolerance, Rerouting architecture, Self-healing, Stem cells. #### 1. Introduction In embedded system industry self-healing system has recently emerged concept for fault tolerance. Reliability of the product from the customer side is depends on the quality. If the system is become faulty these two things will goes down. Redundancy techniques are commonly used method for fault tolerance. During the early stage two type of redundancy techniques were introduced know as dual modular redundancy (DMR) and triple modular redundancy (TMR). In these techniques, runs the same modules in parallel and its output is compared for fault detection and vote for the majority one. In TMR there are three active components and a voter section for fault detection and the problem is that the voter is a single point of failure. Same way in DMR, it consists of additional hardware so it may leads to increase the hardware size and the power consumption rate. And another problem is that a large part of the hardware module needs to replace even a small part is damaged so it will affect the NRE (Non-recurring Engineering) cost. From biology a new concept called endocrine cell system is found for fault recovery. In human cell all the endocrine cell have a similar structure and they are different only in their position and function which is performed by each cell. This cell can also recover from a fault by isolating the faulty cell and differentiating a spare (stem) cell with the same genetic code previously held by the faulty cell. Thus, such systems only need to change a small part of the system, and the spare (stem) blocks do not need to operate constantly. The main reason of the proposed work is to overcome the challenges faced by customers in the industry. The embedded system manufacturing companies do not provide any chip level service when the System gets damaged. They replace the entire PCB instead of replacing that particular faulty integrated circuit (IC). The replacement of an IC with more than 100 port pins is an extremely difficult task. We expect this new approach to dramatically improve the performance of Digital Electronics circuits and reduce the size of hardware without any loss of quality and performance. #### 2. Concept **Endocrine** Cellular of **Communication for Fault Tolerance** Endocrine system consists of glands widely separated from each other with no direct link. Endocrine glands consist of group of secreting cells surrounded by an extensive network of capillaries that facilitates diffusion of hormones (chemical messenger) from the secondary cells in to the blood stream. They are commonly referred to as the ductless glands because hormones diffuse directly in to the blood stream. The hormone is then carried in the blood stream to the target. This endocrine cellular communication is more interesting because when an endocrine cell dies a new endocrine cell having the same function of the dead cell is produced by differentiation and the cellular communication network is recovered. The endocrine cell signaling is the most common type of cell signaling which involves sending a signal throughout the whole body by secreting hormones in the blood stream. #### 3. Overview of the Proposed System Based on the endocrine cellular communication technique an ALU is designed which performs basic operations like addition, subtraction, multiplication and shifting. The proposed system consist of two layer architecture (Figure 1), the first layer is called functional layer which composed of working cell (WC) and Stem cell (SC). The fault correction unit also superimposed in to the functional layer for fault correction. The second layer knows as genome supervisory layer consist of index changing unit (ICU). Same as in endocrine cell each of the working cell and stem cell have the similar structure and the only difference between the cell is the code in the cell (call it as genome), i.e. the code with in the cell makes the difference between each cell. Every working cell is surrounded by two stem cells and in case of any fault occurrence the faulty WC can be replaced by any available SC. Index Changing Unit (ICU) in the supervisory Volume 5 Issue 7, July 2016 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Paper ID: ART2016151 432 <sup>&</sup>lt;sup>2</sup>Assistant Professor, Sree Buddha College of Engineering, Department of ECE, Pattoor, Kerala University Index Copernicus Value (2013): 6.14 | Impact Factor (2015): 6.391 layer takes the charge of one working cell and its surrounding stem cells. Each working cell is assigned for each ALU operation and the selection of the working cell is based on the input address bit. Figure 1: Layer Structure of the architecture #### 4. Functional layer ## 4.1 Working cell architecture Figure 2 shows the block diagram representation of the working cell. Every working cell has the similar structure which composed of functional unit and fault detection unit. The functional unit as the name suggest performs any of the predefined ALU operation like Addition, Subtraction, Multiplication and Shifting. Figure 2: Block diagram of working cell #### 4.2 Fault Detection Unit From figure 3, CUT represents the circuit under test (Working cell functional unit). The CUT takes two 8 bit input from the input port also from the test sig and linear feedback shift register (LFSR). It gives two outputs namely to the fault detector (FD) and to the Fault Signal Checker (FSC). The FSC takes the input from the CUT as well as from the FDU and produce the actual WC output as well as fault signal (fault signal will be generated only if a fault is detected). The LFSR circuit generates 16 bit test patterns. The fault detector is a 16 bit comparator which compares the output from CUT and ROM2. Rom2 has a one dimensional memory consisting of predefined correct output based on the input value from the LFSR (Value inside the ROM2 is stored depending upon the WC). The FSC which check whether the output of the FD is high or not. If it is high, FSC provides a fault Signal and also isolate the WC output faulty output. ## 4.3 Fault detection algorithm Fault detector has two mode of operations normal mode and advanced mode. Mode of operation is depends on the test signal input of the fault detector. Algorithm for fault detection - 1. If test sg=1 (Advance mode of operation). - 2. CUT will accept only the input from the LFSR. - 3. Result of the CUT operation is fed to FD and another output is fed to FSC. - 4. CUT output is compared with the data from the ROM 2 buffer for fault detection. - 5. If an inequality then a fault signal is generated. - 6. In FSC, if fault signal=1 then all output from the working cell will be tri-stated and a fault signal is propagated to ICU unit. - 7. If the test\_sg=0 (Normal mode of operation). - 8. CUT will only accept the current ALU inputs and generate the output - 9. During the initial stage test\_sg always equal to 1 for a period of time. Figure 3: Fault detection block diagram #### 4.4 Stem cell architecture The Stem Cell has its architecture similar to that a working cell with the difference that a SC performs all the function likes Addition, Subtraction, Multiplication and Shifting. Whereas in WC only one of the above. Volume 5 Issue 7, July 2016 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Index Copernicus Value (2013): 6.14 | Impact Factor (2015): 6.391 #### 4.5 Rerouting architecture To realize the proposed system routing architecture the WCs and SCs are arranged as shown in Figure 4. From Fig. 4 it is clear that one SC is common for two WCs and one WC is common for two SCs. When the WC becomes faulty it will generate a fault signal which is fed to the ICU unit. ICU unit will find out the available free stem cell for replacement by using the control signal from the supervisory layer. Same time ICU will modify the status bit in the register bank. When a fault signal is obtained from the working cell the ICU unit will check first left stem cell for the replacement. If the left stem cell (LS) is free then ICU unit will transfer the entire input signal from faulty working cell to LS. ## 5. Supervisory layer architecture In the proposed system, a WC can be replaced by any of its two neighboring SCs for fault tolerance. Thus the Supervisory layer must control the functional layer properly without any collision. The ICU in the supervisory layer takes the charge of every WC and its two neighboring SCs. The proposed system contains a register bank which stores the status of all the stem cells in the functional layer. The cell replacement mechanism is done on the basis of these register bank shown in Fig. 4(These registers are not for general Purpose usage). Every ICU receives a fault signals from two neighboring SCs as well as the WC. When a SC or WC is faulty then the index bits of corresponding SC or WC in the register bank are changed by ICU. Because the ICU control the two neighboring SCs for replacement. It can isolate a WC and replace it with another SC. Figure 4: Rerouting structure Every WC and SC has its own index bits in register bank and they show the status of the each cell in top layer of the architecture. Three type of index bits in the register bank are state bit, direction bit and differentiation bit. During the system start-up time all these state bits are cleared to zero. These state bit shows whether the stem cell is available or not for the replacement. The direction bit represents the **Table 1:** ICU operation | The condition | | State before | | State after Fault | | | | | | | | |----------------|----|--------------|--------------------------|-------------------|-----|-----------------|----|---------------|---|----|----| | for the change | | | fault | | | | | | | | | | Fault signal | | | State bit | | | Differentiation | | Direction bit | | | | | | | | | | bit | | | | | | | | W | LS | RS | W | LS | RS | W | LS | RS | W | LS | RS | | 1 | | | | 0 | | | 1 | | | 0 | | | 1 | 1 | | | 1 | 0 | | | 1 | | | 1 | | 1 | | 1 | | 0 | 1 | | 1 | | | 0 | | | 1 | 1 | 1 | Fault correction enabled | | | | | | | | | direction of the stem cell (zero represents left side of the WC and one represents the right side of the WC). The differentiation bit is used for the isolation purpose. Working of the ICU can be explained by using Fig. 4 and Table I. From the table W represents the working cell unit and LS, DS are the left and right stem cells. The responsibility of the ICU is to change the index bit of the two SCs and WC. The priority order of the cell replacement is done in counterclockwise direction by enabling the selection bit of the SCs. Table 1 illustrates all the possible changes of index bits in W,LS and RS after the receipt of the fault signal from the WC and SCs. On the first line on the table fault signal of W is one and the differentiation bit of LS is set to one only if the state bit of the LS is zero. Here the state bit of the LS indicates that, LS is the first available stem cell for the cell replacement with faulty W. During first stage the direction bit need not be change because the initial value already represents the left direction. In the second line of the table fault signal of W is one and the state bit of the LS is also one which means that W becomes faulty and is replaced by DS not LS because LS is occupied by another WC. Hence the algorithm sets the System to skip the faulty cells. The system also contains one fault correction unit for fault correction procedure which is done by multiple EXOR operation. ### 6. Fault Correction Unit These are the four working cells developed for each ALU operation. The input is given to working cell as well as stem cells. The fault is detected by fault detection unit. The fault bit position can be located by xoring the wrong WC output and expected correct outputs. It can be recovered by xoring the wrong output that is by doing the multiple EXOR operation. E.g. If WC becomes faulty and the WC will be replaced with SC that is now SC behaves like as WC and gives the correct output. By xoring both WC faulty output and SC correct output, if the result of xoring is zero then no error is detected (This is the one simplest method for fault detection). If the result is not zero then error can be located by looking the 1 position. For that purpose the proposed system use a Bit locator circuit which gives a count value represents the number of faulty bits. Volume 5 Issue 7, July 2016 www.ijsr.net <u>Licensed Under Creative Commons Attribution CC BY</u> ## 7. ALU operation based on the address All the operation performed by the ALU is based on the 2 bit input address bit. Each operation of the ALU is assigned to respective working cell. The Table II shows the address bits and operation performed by the ALU and WC which is selected for the operation. When the address bit is "00" addition operation will take place by working cell one (WC1). When the address bit is "01" subtraction operation will take place by working cell two (WC2). When the address bit is "10" multiplication operation will take place by working cell three (WC3). When the address bit is "11" shifting operation will take place by working cell four (WC4). **Table 2:** ALU operation based on address bits | Address | Operation Performed | Selected working cell | | | | |---------|---------------------|-----------------------|--|--|--| | 00 | Addition | WC1 | | | | | 01 | Subtraction | WC2 | | | | | 10 | Multiplication | WC3 | | | | | 11 | Shifting | WC4 | | | | ## 8. Overall architecture of the proposed system The proposed system consist of a 2:4 address decoder which takes 2 bit inputs (00, 01, 10, and 11) shown in Figure 5. It also consists of two 8 bits inputs which are being given to ALU. Firstly the address decoder takes input E.g."00" during which becomes active and performs the addition operation because WC1 is configures as addition working Module of the ALU. Secondly, when the input from the adders decoder is "01" and the WC2 becomes active and performs the subtraction operation. During the third and fourth phases "10" and "11" become the input from the decoder by which WC3 and WC4 is active performing multiplication and shifting operation. If WC1 becomes Faulty the FDU becomes active and generate fault signal which in-turn activates the ICU unit. In this scenario the faulty WC1 is replaced by either SC1 or SC2 whichever is free. The same method is applies for other WC. Figure 5: System Architecture top layer The proposed system is designed using Xilinx ISE Design suite. By using software one more working cell is designed named as faulty cell for fault analysis. In the first phase the simulation has been done by using the correct working cell and verifies the output with expected output. During the second phase the correct working cell is replaced with a faulty working cell and Test sig is enabled. Verification of Figure 6: RTL schematic the second phase output also done successfully. Figure 6 shows the RTL schematic view of the proposed system. From the RTL schematic the fault\_sig\_out will be active high only if entire system becomes fault. Figure 7. shows the output simulation of adder working cell without any fault and Figure 8 shows the adder working cell with fault corrected by cell replacement, at that moment ICU will be activated (Figure9) and which enabled the right stem cells (working cell and left stem cell are treated as faulty cell) indicated by "enable 01" in Figure 9. Figure 7: Adder working cell #### 9. Result analysis Volume 5 Issue 7, July 2016 www.ijsr.net Licensed Under Creative Commons Attribution CC BY Index Copernicus Value (2013): 6.14 | Impact Factor (2015): 6.391 From Figure 9 it is clear that when a fault is detected in adder working cell the ICU unit will updates the index bits in the register bank for cell replacement. ICU first checks the state bit of the LS (INDEX\_STATEBIT\_LS in Figure 9). From the simulation window (Figure 9) it is clear that state bit of the LS is one. Which means that the LS is in busy state or it become faulty stem cell. ICU unit will change the DIRECTION\_BIT to "01" that mean it enables the right stem cell (RS) and generate an isolation signal for the left stem cell in order to isolate the LS from the system. Fig. 8 shows the output of the right stem cells, where the enable pin gets the input from the ICU which is the DIRECTION\_BIT "01". It will enables the RS for continues operation. Same time ICU unit also generate the isolation Figure 8: Right stem cell output Signal for faulty working cell in order to isolate it. All these happen only if the fault signal is obtained from the fault detection unit in the functional layer. Figure 9: ICU output #### 10. Conclusion In this paper, a new self-repairing architecture which provides good scalability and fault coverage was proposed. The architecture composed of two layer structure top layer called functional layer which consist of functional units like WCs and SCs. Bottom layer call Supervisory layer which supervise the overall functioning of the functional layer. Bottom layer consist of ICU unit which is the heart of the system. ICU controls the proper assignment of SC for the replacement of a faulty cell. New architecture in circular shape which helps to reduce the hardware size and improves the performance. As a result, all these make the system efficient. #### References - [1] Rui Gong, Kui Dai, Zhiying Wang, "Transient Fault Tolerance on Chip Multiprocessor based on Dual and Triple Core Redundancy," 14th IEEE Pacific Rim International Symposium on Dependable Computing,vol.24, no. 6, pp. 22-29, 2008. - [2] Mohammad Salehi and Alireza Ejlali "A Hardware Platform for Evaluating Low-Energy Multiprocessor Embedded Systems Based on COTS Devices," *IEEE Trans. On industrial Electronics.*, vol. 62, no. 2, pp. 1262–1269, Feb. 2015. - [3] C. Ortega and A. Tyrrell, "Design of a basic cell to construct embryonic arrays," *IEE Proc. Comput. Digital Tech.*, vol. 145, no.3, pp. 242–248, May 1998. - [4] D. Mange, E. Sanchez, A. Stauffer, G. Tempesti, P. Marchal, and C. Piguet, "Embryonics: A new methodology for designing fieldprogrammable gate arrays with self-repair and self-replicating properties," *IEEE Trans. Very Large Scale Integr. (VLSI) Syst.*, vol. 6, no. 3, pp. 387–399, Sep. 1998. - [5] P. K. Lala and B. K. Kumar, "An architecture for self-healing digital systems," *J.Electron. Testing: Theory Appl.*, vol. 19, no. 5, pp. 523–535, Oct. 2003. - [6] X. Zhang, G. Dragffy, A. G. Pipe, N. Gunton, and Q. M. Zhu, "A reconfigurable self-healing embryonic cell architecture," in *Proc. ERSA*, Jun.2003, pp. 134–140. - [7] M. Samie, G. Dragffy, and T. Pipe, "UNITRONICS: A novel bioinspired fault tolerant cellular system," in *Proc. NASA/ESA Conf. Adapt.Hardw. Syst.*, Jun. 2011, pp. 58–65. Volume 5 Issue 7, July 2016