**Ferroelectrics and Piezoelectrics**

**Mater. Res. Soc. Symp. Proc. Vol. 1129 © 2009 Materials Research Society 1129-V10-01-C08-01**

#### **Ferroelectric Random Access Memory as a Non-Volatile Cache Solution in a Multimedia Storage System**

Dong Jin Jung and Kinam Kim Memory Business Division, Samsung Electronics Co. Ltd. San #16, Banwol-Dong, Hwasung-City, Gyunggi-Do, S. Korea, 445-701

# **ABSTRACT**

We demonstrate that ferroelectric memory is very eligible to become a non-volatile cache solution, in particular, in a multimedia storage system such as solid-state disk. It could provide benefits both of performance and of reliability. In performance, a FRAM cache allows us to rid overhead of power-off recovery. Random WRITE performance has been improved by 250%. In assertion of endurance, we investigate acceleration factors to evaluate cycle-to-failure of the ferroelectric memory both in device-level and in capacitor-level. What has been found is that ferroelectric memory cells have  $6.0x10^{14}$  of the cycle-to-failure at the operational condition of 85  $\rm{^0C}$  and 2.0V. This cycle-to-failure is well above lifetime READ/WRITE cycles of 9.5x10<sup>13</sup> in such system. From 2-dimensional stress simulation, it has also been concluded that the number of dummy cells plays a critical role in qualifying the high temperature life tests.

### **INTRODUCTION**

There has been enormous improvement in VLSI (very large-scale integration) technology to implement system performance of computing platform in many ways over the past decades. For instance, data throughput of central processing unit (CPU) has been increased by thousand times faster (e.g., several GHz in Quad-core, 2006) than that of Intel 286 (6 MHz) emerged in the beginning of 1980s. Alongside, other important platform, a latest version of dynamic random access memory (DRAM) reaches a clock speed of 1 GHz. By contrast, state-of-the-art HDD (hard disk drive) transfers data at 600 MB/sec around (see figure 1). Note that data rate of the latest HDD is still orders of magnitude slower than the processor/system-memory clock speed. To achieve the throughput performance in more effective way, it is therefore needed to bridge performance gap in between each component. To compensate the gap between CPU and system memory, a CPU cache<sup>\*</sup> has been required and adopted. In this paper, authors are trying to attempt not only how ferroelectric random access memory (FRAM) provides NV-cache solutions in a multimedia storage system such as solid state disk (SSD) with performance benefits but also what should be satisfied in terms of lifetime endurance in such applications. Also, we demonstrate that what integration technology is critical for qualifying high temperature lifetime tests.

# **EXPERIMENT**

Tile system cache is an area of physical memory that stores recently used data as long as possible to permit access to the data without having read from the disk.

150-nm technology has been adopted to integrate an 1T1C FRAM in 64 Mb density, organization of which has 16 IOs. Figure 2 shows micrographic views of cross-sectional images



**Figure 1.** Evolution of electronic components in data throughput performance.

after full integration of the FRAM both (a) in a peripheral circuitry region and (b) in a cell array region, containing 15 $F^2$  cells. In process features, 80-nm MOCVD PZT serves as a ferroelectric film;  $SFRuO<sub>3</sub>$  as a top electrode (TE); and Ir as a bottom electrode (BE). The 64 Mb FRAM operates at  $V_{DD} = 1.8 \pm 0.2$  V; cycle/read-access time is 120 ns/100 ns, respectively; and CMOS stand-by current is less than  $20 \mu A[1]$ . Capacitor-level tests have been carried out on the cell arrays of test-element-group (TEG) in fully integrated wafers[l-3]. Package-level reliabilities have also been evaluated[3] both in the high temperature life tests and in endurance. After waferlevel tests, individual prime dies have proceeded to a conventional packaging process for the standard device-level tests. 2-dimensional (2-D) stress simulation in cell arrays has been done by utilizing a commercial tool, ABAQUS CAE ver. 6.7.1[4].

# **DISCUSSION**

Thanks to the bi-stable state of ferroelectrics at near ambient temperature, ferroelectric memory has two important characteristics worth mentioning from the operational point of view. First, since core circuitry for the memory does not require stand-by power during quiescent state and the information remains unchanged even with no power supplied, it is thus *non-volatile*. Second, as the core needs to return the original state after being read, it is called a *destructive* read-out device. This is because the original information is destroyed after READ. As a consequence, it is essential to return the information back to its original state, which is so-called RESTORE, necessarily following the READ. This operation is so inevitable in the destructive read-out memory. In particular, when the ferroelectric memory are used as one of the storage devices in computing system, such as a byte-addressable non-volatile (NV)-cache device, the memory has to ensure lifetime endurance, which is regarded as the number of READ/WRITE (or ERASE if such operation is required) cycles that memory can withstand before loss of any of

entire bit information. Here, we begin to discuss FRAM as a NV-cache solution in SSD. Next, since such a NV-cache memory has to meet the requirement of lifetime READ/WRITE cycles,



Figure 2. Cross-sectional micrographs both (a) in a peripheral circuitry region and (b) in a cell region, (c) in which one of the cell capacitors is pictured.[l]

we evaluate device endurance necessary for ensuring 10-year lifetime through acceleration factors (AFs) in terms of temperature and voltage. Also, we compare these AFs obtained from device-level tests with those from capacitor-level. Finally, we investigate what integration technology was critical for qualifying the device life tests by means of 2-D stress simulation.

#### **Performance benefits as a NV-cache solution**

SSD, one of the multimedia storage systems, in general, consists of 4 important devices. First is a micro-controller having a few hundreds of clock speed in MHz, with real-time operating system (firmware). Second is solid-state storage device such as HDD or NAND-Flash memory, which has several hundreds of memory size in gigabyte. Third is host interface that has the primary function of transferring data between the motherboard and the mass storage device. In particular, SATA (serial advanced technology attachment) 6G offers sustainable 100 MB/s of



**Figure 3.** Impact of DRAM utilization in SSD on system performance, (a) Increase in sequential READ/WRITE by IO shaping, (b) Performance improvement by collective WRITE.

data disk rate in HDD. In addition, bandwidth required in DRAM is dominated by the serial I/O ports whose maximum speed can reach 600MB/s. SATA adapters can communicate over a highspeed serial cable. Last but not least is a buffer memory playing a considerable role in system performance. As such, DRAM utilization in SSD brings us many advantages as a buffer memory. For example, in DRAM-employed SSD, not only does IO shaping in DRAM allow us to align WRITE-data unit fitted into NAND flash page/block size but collective WRITE could also be possible. As a result of sequential WRITE, the former brings a performance benefit improved by 60% at maximum, and also the latter gives us another performance benefit improved by 17% due to increase in cache function, as shown in figure 3.



Figure 4. Additional performance benefit for DRAM plus FRAM in SSD.

As an attempt to implement system performance further, not only does DRAM have been considered but FRAM has also been taken into account because of its non-volatility and random accessibility. Before that, it is noteworthy that in SSD with no NV-cache, system-log manager is needed to record and maintain log of each transaction<sup>†</sup> in order to ensure that file system maintains consistency even during a power-failure. A log file that contains all the changes in metadata, generally serves as a history list of transactions performed by the file system over a certain period of time. Once the changes are recorded to this log, the actual operation is now executed. This is so-called power-off recovery (POR). By contrast, POR is redundant in FRAMemployed-SSD as a NV-cache because metadata can be protected by FRAM. Elimination of POR overhead is the single most critical implementation by utilization of FRAM. This is because FRAM provides such system with byte-addressable and non-volatile RAM function. Thus, in spite of sudden power failure, system can safely be protected by adopting FRAM even without POR overhead, ensuring integrity of metadata stored in the ferroelectric memory. Through many benchmark tools, we have confirmed that by eliminating this overhead, system performance has

t Each set of operations for performing a specific task.

been increased by 250% in random WRITE (see figure 4). This also brings the system to no need of FLUSH operation in file system. As a consequence, additional 9.4% increase in performance, maximizing cache hitting ratio. Since metadata frequently updated do not necessarily go to NAND flash medium, endurance of the flash memories can be increased by 8% at maximum as well. Besides, failure rate of operations can be reduced by 20% due to firmware robustness increased mostly by elimination of the POR overhead.

# **Endurance**

In FRAM, it is not readily achieved to assure whether or not a memory device can endure virtually infinite READ/WRITE cycles. This is because of memory size that is several tens or hundreds megabits typically. For instance, a HTOL (high temperature operational life) test during 2 weeks at 125 °C, is merely a few millions of endurance cycles for each memory cell in 64-Mb memory size. Even taking into account minimum number of cells (in this case 128 bits because of 16 IOs), time to take evaluation of  $10^{13}$  cycles is at least more than 20 days. Therefore, it is essential to find acceleration factors to estimate device endurance through measurable quantities such as voltage and temperature. However, direct extraction of acceleration factors from memory chips is not as easy in practice as it seems to be in theory. This is because VLSI circuit consists of many discrete CMOS components that have a temperature and voltage range to work. Generally, more than 125 °C is supposed to be a limit to operate properly. A voltage range of a memory device is also specified in given technology node  $(\pm 10\%)$ of  $V_{DD}=1.8$  V in this case). Despite those difficulties, we have attempted to figure out acceleration factors in terms of temperature and voltage, together with information obtained from capacitor-level tests.

First, in regard to package-level endurance, figure 5 represents changes in (a) peak-to-peak



**Figure 5.** Changes in (a) peak-to-peak sensing margin (SMpp) and (b) tail-to-tail sensing margin (SMtt) as a function of endurance cycles at 125 °C. (c) SMpp vs. endurance cycles at 125 °C, 2.5V. (d) SMtt vs. endurance cycles at 125 °C, 2.5V. SMt and SMi of the ordinate in figure 5(a) and (b) is sensing margin at time t and initial time, respectively.



**Figure 6.** (a) A normalized polarization plot against cumulative fatigue cycles at 145 °C in a variable voltage range. (b) Logarithm of CTF vs. stress voltage,  $V_{DD}$  at 145 °C.

sensing margin (SMpp) and (b) tail-to-tail sensing margin (SMtt) as READ/WRITE cycles continues to stress devices cumulatively at 125 °C. Both SMpp and SMtt have been obtained by averaging out 30 package samples for each stress voltage. Function-failed packages have been observed when SMpp and SMtt reach 10% and 25% loss of each initial value, respectively. As seen in figure 5, voltage acceleration factors (AF<sub>V</sub>) between 2.0V and 2.5V has been calculated by these criteria ( $AF<sub>V</sub> = 81$  at SMpp and  $AF<sub>V</sub> = 665$  at SMtt). In other words, the test FRAMs can endure  $1x10^{12}$  of READ/WRITE cycles at the condition of 125 °C and 2.0V.

Second, in capacitor-level endurance, figure 6 shows a normalized polarization plot against cumulative fatigue cycles at 145 °C in a variable voltage range. Here, we introduce a term of CTF (cycle-to-failure) which is referred to as an endurance cycle at which remanent polarization (or sensing margin) has a reasonable value for cell capacitors (or memory) to operate. Polarization drops gradually as fatigue cycles increase and the collapsing rate is accelerated as stress voltage goes higher. Likewise, provided 10% loss of polarization is criteria of CTF, the CTF at 145 °C and 2.0V approximates 2.2x10<sup>12</sup>. (NB. This is reasonable because samples of 10% loss in SMpp turned out to be defective functionally.) Figure 7 is a logarithm plot of CTF as a function of stress voltage in a various range of temperature. Considering temperature- and voltage-acceleration factors from figure 7, acceleration condition of 145 °C, 3.5V is more stressful in 5 orders of magnitude than that of 85 °C, 2.0V. In other words,  $1.0x10<sup>9</sup>$  of CTF at 145  $\rm{^{\circ}C}$ , 3.5V is equivalent to 6.0x10<sup>14</sup> at 85  $\rm{^{\circ}C}$ , 2.0V.

Results of the acceleration factors obtained from device-level tests differ from those in capacitor-level. For example, while  $AF_v(2.5V/2.0V)^{\frac{4}{3}}$  at in device-level tests, that ~16 in capacitor-level. We have yet to find a reasonable clue of what makes this difference. But it could be thought that the difference might arise from the fact that a memory device contains many different functional circuitries such as voltage-latch sense amplifier, word-line/plate-line drivers,

<sup>\*</sup> It is thought that AFV in capacitor-level tests follows AFV of SMpp in device-level rather than that of SMtt because of nature of capacitor-level tests that average out all the cell capacitor connected in parallel.

all of which make tiny amount of voltage difference magnify each effect on cell capacitors. This tendency can also be observed in the big gap of AF<sub>V</sub> obtained from two different definitions between SMtt ( $AF_V=665$ ) and SMpp ( $AF_V=81$ ). Tail-bit behaviors of memory cells could include a certain amount of extrinsic imperfection, in general. Thus, we believe that results tested in capacitor-level seem to be close to a fundamental nature of CTF than those in device-level









tests due to lack of extrinsic components. Figure 8 is Weibull distribution of endurance life in package samples tested at 125 °C in a various voltage range. The distributions in a 2.2-3.0V range of voltage have a similar shape parameter,  $m-2.4$ . This suggests that evaluation of endurance tests in device-level makes sense in physical term. As seen in figure 8, voltageendurance stress at less than 2.0V does not allow us to obtain any sign of degradation in sensing margins within a measurable time span. Nor does temperature-endurance stress above 125 °C due to off-limits of operational specifications of the device.

Meanwhile, how many endurance cycles are necessary for use in applications of NV-cache solutions such as data memory and code memory? To answer this question, we need to understand access patterns of NV-cache devices in multimedia system. Now, we take into account the followings: First is the ratio of READ and WRITE per cycle in data memory (likewise, number of data fetching per cycle in code memory). Generally, the ratio for data





memory and code memory is 0.75 and 1.00, respectively. Second is data locality<sup>§</sup>. Figure 9 is a simulation result showing strong locality of 1.5% when FRAM has been considered a code memory. As shown in figure 9, less than 200 bytes of code space is more frequently accessed. Provided wear-leveling in READ/WRITE against the strong locality and taking an example of 20 MHz clock frequency of main memory (CPU clock  $\sim$  200 MHz), what has been found is that the endurance cycles for 10-year lifetime becomes less than  $9.5x10^{13}$ . This number of cycles is far less than the cycles we presumably assumed, which is  $\sim$  more than  $10^{15}$  cycles. Thus, authors believe that the CTF of  $6.0x10^{14}$  at 85 °C, 2.0V is big enough to ensure that the developed ferroelectric memory as a NV-cache is so endurance-free as to be adopted to a multimedia storage system.

### **Integration technology**

Defective cells in function-failed packages during high-temperature life tests are localized at the corner of the memory cell arrays. Figure 10(a) represents a plan view of the failed cell capacitor by FIB (focused ion beam). Figure 10(b) is a cross-sectional micrograph of the defective spot observed in figure 10(a). The IrO<sub>2</sub> layer disappeared and formed a gap between ATE Ir and TE Ir. The gap has turned out to be an empty, confirmed by rastering of EDX (Energy Dispersive X-ray)[4]. This empty space could come from the two kinds of origins. One is the reduction of  $IrO<sub>2</sub>$  with involvement of hydrogen-related products contained in the interlayer dielectric (ILD) and inter-metal dielectric (IMD)

<sup>&</sup>lt;sup>§</sup> The locality of reference is the phenomenon that the collection of data locations often consists of relatively well predictable clusters of code space in bytes.