# RADIATION HARDENED AUTONOMOUS HIGH-DENSITY SOLID STATE MASS MEMORY

#### Lorenzo Gonzales<sup>(1)</sup>, Iztok Kramberger<sup>(1)</sup>

<sup>(1)</sup> Faculty of Electrical Engineering and Computer Science, University of Maribor, Koroška cesta 46, 2000 Maribor, Slovenia, Lorenzo.gonzales@um.si

#### ABSTRACT

This paper presents a radiation hardened autonomous solid state mass memory using NAND Flash ICs. The module is compliant with the PC-104 form factor and is designed for nanosatellites and small satellites. The rationale for the use of COTS components from the point of view of achieving high storage capacity is shown. A component based FDIR policy, required for the proper operation of said COTS components, is presented. The module uses a Product code ECC with matrix encoding using Reed-Solomon (255, 247) code and a Hamming (72, 64) codes. The unit has a built-in hardware accelerated memory management module featuring address translation, garbage collection, bad block management and wear levelling. The data interface is implemented over a redundant LVDS interface which supports bitrates up to 100 Mbits in both, single page access and burst access, modes. A redundant CAN bus, utilizing the CAN TS protocol, which can be extended to the CANOpen protocol or even replaced by a RS422 interface, is used as the primary TMTC interface. Mass memory units can be clustered in a multiple-unit storage solution enabling device redundancy and storage space expansion.

#### **1** INTRODUCTION

Small satellites market is one of the fastest growing markets in space industry and is, with the introduction of satellite miniaturisation, showing great potential for large variety of missions, especially for distributed space missions (DSMs). Usage of DSM constellations is suited for Earth observation missions, enabling an increase of observation sampling points in spatial, spectral, temporal and angular dimensions [1]. This results in higher raw data yields, which must be stored on-board for post processing or raw data download. Due to increasing number of satellites, the required downlink data stream of these satellites must be minimized to reduce GS utilisation costs. Postprocessing of raw data onboard the satellites is therefore mandatory [2], [3].

With the introduction of on-board high performance computing solutions, which are required for on-demand data processing in space, the need for a compact high-capacity non-volatile memory is imminent. The systems need to accumulate raw data for information creation in space, as well as user code space for storing data processing models. Furthermore, such systems are more demanding in terms of power budget and thermal control and are ideally not run in parallel to the data acquisition payload. This raises a need for autonomous storage devices, which can store streaming data without a powered computing module. This article presents a highly miniaturised, high density NAND flash based solid state mass memory (SSMM) system. The system is designed to be an easy to integrate autonomous satellite subsystem with high interoperability options. The system is radiation hardened by design and is designed in a CubeSat compliant form factor, enabling usage in nanosatellites and larger scale missions, where racks with multiple units can be used.

#### 2 MASS MEMORY FOR FUTURE SPACE MISSIONS

When designing a SSMM system built for the next generation of small satellite and nanosatellite Earth observation platforms, we must consider a high adaptability to more and more challenging system requirements previously not encountered in typical satellite systems. Additionally, when building small satellites, cost reduction and simplicity is one of the key priorities, especially when DSM are the targeted approach for the missions. Next to costs, these mission requirements can be split into:

- Power budget requirements: small satellites tend to have simple solar array deployment systems with limited power harvesting capabilities. Additionally, this is directly linked to thermal control of the system, where, with the introduction of high-performance computing units, the duty cycle of such systems needs to be kept at a minimum level which is required to fulfil processing needs.
- Mass and structure constraints: The SSMM system needs to be highly miniaturised to fit into limited volume and mass constraints. Consequentially, usage of COTS components is unavoidable, especially for the non-volatile memory storage chips.
- System bus adaptation: The system needs to support a large variety of on-board communication standards to assure easy integration to the satellite platform and enable the SSMM to operate as a standalone subsystem.
- OBC and payload resources: Due to different computing resources and processing speeds, the SSMM must support high level storage operations with filesystem support, as well as raw address-based storage access with internal wear levelling and ECC.

An architecture of a micro-scale Earth Observation satellite with computing on the edge capabilities can be seen in Figure 1. With usage of autonomous SSMM in such design, the storage can be used directly with the payload data stream, where low power OBC orchestrates only initialisation and error handling in the payloads streaming process, stream handling itself is done by the SSMM, excluding the need of a powered high-performance computing (HPC) module during data acquisition. When the high-computing payload is activated, it acquires raw images directly from the storage, ready for post processing. With such approach, the peak power consumption of the satellite is drastically reduced, resulting in lower depth of discharge (DoD) of the batteries, prolonging mission lifetime. The same process is reintroduced with raw data or post-processing information streaming towards ground station (GS) with a high throughput downlink system. The autonomous data exchange capabilities of a SSMM are key for lower power consumption and subsystem dependence.



Figure 1. Micro Earth observation satellite architecture

Furthermore, even in case of failure of the HPC module, the satellite can still operate with raw data downlink, prolonging mission lifetime and increasing reliability. In contrast, typical designs use a dedicated storage attached to the HPC module, which are also usually the most error prone system in terms of radiation hardening, particularly when COTS components are used.

To ensure low system housekeeping requirements and device specific routines on other satellite subsystems, the SSMS system is designed for autonomous memory management operation and exposes a user-friendly easy to integrate system level interface, including the telemetry and telecommand (TMTC) interface. The system telemetry consists of storage status and metrics, system vitality metrics (temperature, voltages, etc.) and fault detection and isolation (FDIR) status and metrics. The TMTC is accessed over the primary satellite bus. Regarding the data exchange interface, the high throughput bus is intended, but the storage can be accessed also through the primary bus. The data exchange can be done in single page access, sub page access, burst mode and streaming mode. The key difference in burst mode and streaming mode is that the burst mode ensures all packets are delivered with acknowledgments on protocol level.

The primary as well as high throughput buses are redundant. The redundant high throughput bus can be in nominal conditions also used for increased data throughput towards the device.

The MMSS prototype uses CAN (CAT-TS protocol) bus as the primary bus and LVDS (LVDS-TS protocol) bus as the high-throughput bus, but these can be replaced through device variations with RS422, CANOpen, SpaceWire or I2C as the primary bus and SpaceWire or Ethernet as the high-throughput bus. The LVDS bus utilizing the LVDS-TS protocol can operate up to 100 Mbits per channel.

### **3 MASS MEMORY SUBSYSTEM**

The SSMM subsystem is designed in a PC-104 compliant form factor and is build around an FPGA based System on Chip (SoC). The design can be split to the mass memory specific components and the subsystem vitality, FDIR and general telemetry acquisition components. For raw storage the system incapsulates 15 NAND Flash ICs, which are distributed in a matrix of 5 rows with 3 columns per row. Each row has a separate data bus, which is shared between ICs in the row. The SEE component-based protection is also done on the row basis. Each column has its own chip select, meaning that the row devices cannot be used simultaneously for data access operations, but they can be commanded together. The SoC is designed to support high data throughput with fast wear levelling address translation. The data pipeline is hardware accelerated, featuring a dedicated LVDS controller with direct access to data pipeline and address translation encoder, an access

controller, and an encoding pipeline per Flash device row. The SoC is controlled by a softcore PicoSky FT processor, with dedicated peripheral units used for interacting with mass memory control. The main satellite bus is handled by the CPU, exposing TMTC commands and block access to Flash data. The block access through primary bus is slower than the dedicated bus and is meant for emergency mode, debugging and application specific non-volatile storage accesses. The TMTC consist of access control configuration, high level memory management commands, housekeeping commands and access to storage metrics, including bit error counters, error logs, used storage area, FDIR state and general board TM. Additionally, through TMTC device can be configured for streaming data from and to the device over the high throughput bus.

The software implements a Flash translation layer including block to physical address translation, wear levelling, bad block management and garbage collection, which can also be triggered manually.

The high-throughput bus is attached directly to the data access pipeline as can be seen on Figure 2. The LVDS controller is designed for pipelined operation and is connected directly to the address translation control and data pipeline control. On write operation, the physical address is fetched through the address controller and the data is passed through the data pipeline before validity check of the LVDS frame. Upon the check, the write in NAND is triggered from the page buffer to the physical address. The read operation is an inverse of the write operation and on data integrity error, the LVDS frame ECC is corrupt, the next access is not acknowledged.



Figure 2. SSMM block scheme

The memory controller incorporates a Product code based ECC scheme. The encoding is done with a Product matrix of Read Solomon (RS) (255, 247) code and Hamming (72, 64) code [4]. The encoding scheme of the SSMM is designed for an MLC NAND Flash with 16KB + 2208B pages. The matrix (Figure 3) splits the page into 256 bytes long rows, over which is a RS (255, 247) code, leaving one empty byte used for inter-page access tagging. This yields 64 rows, which are along columns used for data bits of the Hamming (72, 64) code. The Hamming code parity bits are stored in the checksum page section of the Flash and as the RS uses 8-bit symbols, the parity consists of 2040 B, and is stored at the end of checksum region, leaving 168 B for raw page marking bytes used for bad block marking, log-book and wear levelling marking.

The Product code approach can handle both random and Multiple Bit Upset (MBU) errors, achieving one decade lower bit error rates (BER) compared to plain RS or Bose–Chaudhuri–Hocquenghem (BCH) codes with similar code length. A comparison of the area, latency, additional

storage also show that product schemes have lower hardware and latency than plain RS codes. [4] The encoded page has a capacity for 15808 bytes of data with a 15% raw data overhead, including the marking bytes. This is a significant improvement over traditional ECC approaches for space application, where typically a 50% overhead is used. To fit a 512 B or 1024 B sub-paging, the page size can be reduced to 15360 B, leaving 448 encoded bytes per page available for wear levelling information lookup, while keeping the overhead ratio on 17%. This enables usage of efficient logbook based wear levelling approach [5].



Figure 3. NAND Flash page product code ECC encoding

The memory controller enables periodic memory scrubbing and automatic rewriting if user defined symbol error threshold in RS is achieved. The flash can always be access in bypass mode, exposing raw physical address access, but this must be enabled in access control.

The Table 1 shows key NAND Flash array parameters, the raw storage size of the SSMM is just above 1TB, while the ECC protected size is 992 GB. The actual memory available is reduced by 2 GB processor storage used for sector-based look-up and CPU program code. The NAND access speed are typical asymmetrical as for all Flash based devices. Nevertheless, the maximal raw memory read access speed is not achievable with the LVDS transceivers on the SSMM EM and needs a high-throughput bus with bitrates up to 2 Gbits per second. In the current configuration, the maximal access speed limited by the LVDS controller is 30 MB per second. This covers single page read speeds, but for achievable read speeds, the device must be configured in streaming mode. The theoretically maximal achievable read speed is 1250 MB per seconds, but it requires a 12.7 Gbps transceiver.

| Parameter                    | Raw data stream | ECC protected data stream |
|------------------------------|-----------------|---------------------------|
| Storage size (GB)            | 1166            | 992                       |
| Read speed (MB/s)            | 250             | 180                       |
| Write speed (MB/s)           | 24              | 19                        |
| Streaming read speed (MB/s)  | 1250            | 900                       |
| Streaming write speed (MB/s) | 120             | 95                        |

 Table 1. NAND Flash Array access parameters

### 3.1 Radiation hardened by design

To achieve a highly integrated Mass memory storage (Figure 4) in a CubeSat compliant form factor, usage of COTS components is unavoidable. To achieve high reliability and availability of the system in a radiation harsh environment, two level FIDR policy must be fulfilled, the device must

be resistant to Single event effects (SEE) on component and SSMM information integrity level.

Firstly, the device is constructed using carefully selected COTS components protected by a subsystem and component based FDIR policy [6], [7]. The Flash ICs are protected for Latch-up events. To extend mission lifetime, the unit is designed to operate with partial faulty memory devices, utilising only ICs in healthy condition. Due to the Flash array architecture, a faulty IC can disable the whole row.

Secondly, the device integrates ECC codes in intermediate buffers, uses register parity bits and a variety of pipeline error detection mechanisms with mitigation processes, especially designed for NAND Flash access and wear levelling, ensuring no data loss on confirmed packets.





Figure 4. SSMM engineering model (EM) system

# 4 DEVICE CLUSTERING AND CONNECTIVITY

Mass memory units can be clustered together in a multiple unit storage solution (Figure 5) enabling device redundancy, storage space expansion and data throughput increase. The storage cluster is constructed using multiple SSMM devices, a LVDS cluster hub and an optional cluster controller. This solution is currently supported only in a CAN and LVDS bus configuration. Through the primary bus all devices can be accessed, the LVDS hub splits the transactions made to and from the SSMMs. Additionally, the cluster hub can expand the data bus interface with Ethernet, enabling full utilisation of all the SSMM devices high-throughput busses simultaneously. The hub can be configured in a simple RAID1 configuration, duplicating write accesses and merging read accesses.



Figure 5. SSMM cluster architecture

To enable high level storage cluster operations, a cluster controller is integrated in the unit. The controller is capable of handling higher level protocol compared to a blocking device nature of SSMM, enabling direct integration with communications modules and payloads. Such device can

directly generate a CCSDS stream for an X-Band transceiver and similar. At the same it can time control an access from a payload device. Additionally, the controller can implement a file system, enabling HPC devices to mount the device as remote drive while the payloads and communications modules can access the storage over a lower-level protocol.

## 5 CONCLUSION

The presented Autonomous SSMM shows great potential for next generation economical constellations and small satellites missions. The system is designed to operate autonomously without an orchestration subsystem, reducing power budget requirements compared to traditional configurations where HPC modules must be run in parallel to the storage solution. The module uses an electrically and physically easily compatible architecture, making the device easy to integrate and operate, thus being suitable for rapid development environments. With a simple yet reliable design, the module prioritises FDIR policy over complex storage operations. Nevertheless, the module features a streaming mode, dedicated to payloads and communication modules with simple data streams sources and sinks. The device supports high bitrates and features a versatile ECC scheme. The SSMM module will undergo radiation testing to verify the effectiveness of the ECC code and the global FDIR policy. The clustering capabilities of the module make the design compatible with high storage capacity requirements and give the module high level functionalities.

## 6 ACKNOWLEDGEMENT

This work was supported by the Slovenian Research Agency under Grant P2-0069.

## 7 **REFERENCES**

- S. Nag, J. LeMoigne, and O. de Weck, "Cost and risk analysis of small satellite constellations for earth observation," in 2014 IEEE Aerospace Conference, Mar. 2014, pp. 1–16. doi: 10.1109/AERO.2014.6836396.
- [2] "Modern Small Satellites Changing the Economics of Space University of Surrey." https://openresearch.surrey.ac.uk/esploro/outputs/journalArticle/Modern-Small-Satellites---Changing-the-Economics-of-Space/99512072802346 (accessed Apr. 09, 2022).
- [3] D. J. Barnhart, T. Vladimirova, and M. N. Sweeting, "Very-Small-Satellite Design for Distributed Space Missions," J. Spacecr. Rockets, vol. 44, no. 6, pp. 1294–1306, 2007, doi: 10.2514/1.28678.
- [4] C. Yang, Y. Emre, and C. Chakrabarti, "Product Code Schemes for Error Correction in MLC NAND Flash Memories," *IEEE Trans. Very Large Scale Integr. VLSI Syst.*, vol. 20, no. 12, pp. 2302–2314, Dec. 2012, doi: 10.1109/TVLSI.2011.2174389.
- [5] L.-P. Chang and L.-C. Huang, "A low-cost wear-leveling algorithm for block-mapping solidstate disks," ACM SIGPLAN Not., vol. 46, no. 5, pp. 31–40, Apr. 2011, doi: 10.1145/2016603.1967683.
- [6] D. Selčan, G. Kirbiš, and I. Kramberger, "Nanosatellites in LEO and Beyond: Advanced Radiation Protection Techniques for COTS-based Spacecraft," *Acta Astronaut.*, vol. 131, Nov. 2016, doi: 10.1016/j.actaastro.2016.11.032.
- [7] D. Selčan, G. Kribiš, and I. Kramberger, "INCREASING RELIABILITY OF COTS-BASED EPS SUBSYSTEMS FOR NANOSATELLITES," *Proc. 4S Symp. 2016*, 2016.