December 23 2024

Tee and Toast With My Post

sanjaybhatikar Blog Linux, PostgreSQL, tee, TOAST

Writer's Block My database is toast?

I had a heart-stopping moment when my Postgres database stopped allowing writes disrupting a critical ETL job. Postgres croaked as follows:

	Err: index "pg_toast_36514_index" contains unexpected zero page at block 121
HINT:  Please REINDEX it.

What gives?

Toast on the Table What, Why & How of Toast Tables

What are TOAST Tables in PostgreSQL?

TOAST stands for The Oversized-Attribute Storage Technique. It is a mechanism in PostgreSQL to handle large values that do not fit into a single page. PostgreSQL uses pages to store data, with a typical page size of 8KB. When a data value exceeds this size, PostgreSQL uses TOAST to store the value in a separate table.

Why do we need TOAST?

A page is a fixed-size block of data storage in a database. It is the basic unit of I/O operation. In PostgreSQL, the default page size is 8KB.

Pages store rows of a table. Each page contains metadata and the actual data. If a row is too large to fit in a single page, special handling is required.

How does TOAST work?

TOAST is used to handle large data values (such as large text or binary data) that cannot fit in a single page. When a data value exceeds the TOAST threshold (typically around 2KB for a default 8KB page size), PostgreSQL stores the value in a TOAST table. The main table stores a pointer to the TOAST table where the actual data is stored.

Large values are automatically compressed and divided into chunks. Each chunk is stored as a row in the TOAST table. This approach allows efficient storage and retrieval of large data values.

Burnt Toast What can go wrong?

Root Cause Analysis

The error message indicates corrupted index. Corruption can occur due to several reasons:

Improper Shutdown: If a PostgreSQL instance is not shut down properly (e.g., due to a power failure or improper Docker container shutdown), it can lead to corruption.
Hardware Issues: Disk failures or memory issues can also cause corruption.
Software Bugs: Bugs in PostgreSQL or related software can sometimes lead to data corruption.

Symptoms of Corruption

Corruption in TOAST tables can manifest as errors when accessing large values. The error message I encountered indicates that an unexpected zero page was found in a TOAST table, suggesting corruption.

Toast of the Town Resolve the issue by reindexing

Reindexing can often resolve issues with corrupted indexes. Since the error suggests an index issue with a TOAST table, reindexing the table may help:

	REINDEX TABLE pg_toast.pg_toast_<oid>;

Replace <oid> with the actual object identifier of the TOAST table. The step-step process is as follows:

Step 1: Identify the TOAST table and main table. We can do this as follows:

	SELECT
    reltoastrelid::regclass AS toast_table,
    relname AS main_table
FROM
    pg_class
WHERE
    relname = 'transcript';

The output should gives us the names of the TOAST table (e.g. pg_toast.pg_toast_36514) and main table (e.g. transcript).

Explanation: In PostgreSQL, if any of the columns of a table are TOAST-able, the table will have an associated TOAST table, whose Object Identifier (OID) is stored in the table’s pg_class.reltoastrelid entry. Further, we use ::regclass as a type cast to convert the OID to a human-readable name. This mechanism is particularly useful when dealing with system catalogs like pg_class, where relations (table, index, or sequence) are typically referenced by their OIDs.

Try:

`SELECT reltoastrelid::regclass AS toast_table FROM pg_class WHERE relname = ‘transcript’;`

Contrast with:

`SELECT reltoastrelid AS toast_table FROM pg_class WHERE relname = ‘transcript’;`

Step 2: Reindex the TOAST table.

	REINDEX TABLE pg_toast.pg_toast_36514;

Step 3: Reindex the main table.

	REINDEX TABLE public.transcript;

Putting these steps together, we have the following code:

	-- Step 1: Identify the TOAST table and main table
SELECT
    reltoastrelid::regclass AS toast_table,
    relname AS main_table
FROM
    pg_class
WHERE
    relname = 'transcript';

-- Output should give the TOAST table name, e.g., pg_toast.pg_toast_36514

-- Step 2: Reindex the TOAST table
REINDEX TABLE pg_toast.pg_toast_36514;

-- Step 3: Reindex the main table
REINDEX TABLE public.transcript

This fixed the issue so I could restart the ETL job and resume writes.

Toasty Takeaways Backups & Preventive Measures

Restore From Backup

Reindexing may not always resolve the issue. In that case, our only recourse is to restore the database from a backup. Regular backups are crucial for recovering from corruption.

Preventive Measures

In addition to regular backups, the following preventive measures can minimize the likelihood of corruption:

Proper Shutdown: Always shut down PostgreSQL instances properly using commands like pg_ctl stop or docker stop for Docker containers.
Regular Maintenance: Regularly vacuum and analyze the database to keep it in good health.
Monitoring: Use monitoring tools to keep an eye on the database’s health and performance.

Summary

TOAST handles large values that don’t fit in a single page by storing them in a separate table. Here, pages are fixed-size blocks of data storage, typically 8KB in PostgreSQL. We encountered corruption which may have arisen out of improper shutdown. Among other causes of corruption are hardware issues and software bugs. We used reindexing to resolve the issue. When reindexing fails, our only recourse is restoring from backup, so it is a good practice to take regular backups.

Improper shutdown of the Docker container could indeed be the cause of the corruption. It is crucial to always shut down PostgreSQL and Docker containers properly to avoid such issues.

Tee Totaller Logging messages to a file while viewing them in console

The ETL job was a Scrapy spider that crawled websites for news about financial markets, extracting transcripts, applying transformations to structure the text for storage, and loading the information into the Postgres data warehouse. It’s important to use a standard logging module, like Python’s logging, for job monitoring and analysis. Scrapy documentation provides information about logging from spiders.

By default, the logger output is sent to standard error. To save it to disk, I redirected the stream to a text file, but then the messages no longer appeared in my console. How can we save the log messages to a file while also viewing them in the console? This is where the tee command comes in handy.

The tee command allows us to see real-time logs in the console and simultaneously save them to a file. Here is a simulation to illustrate its use:

	for i in {1..10}; do echo "Line $i"; sleep 1; done | tee long_output.txt

This command will print “Line 1” to “Line 10” with a 1-second delay between each, displaying the output in the console and saving it to long_output.txt. While the above command is running, in another terminal, we can monitor the output file:

	tail -f long_output.txt

This will show the contents of long_output.txt in real-time as new lines are added. Here’s how to use tee to allow us to see real-time logs in the console and simultaneously save them to a file.

Basic Usage

	scrapy crawl spiderfool -a start_page=210 -a number_pages=20 | tee out.txt

This command runs the Scrapy spider with some custom arguments and uses tee to display log messages in the console while saving them to out.txt.

Appending to a Log File

To append the output to an existing log file instead of overwriting it, use the -a flag:

	scrapy crawl spiderfool -a start_page=210 -a number_pages=20 | tee -a out.txt

Monitoring Log File in Real-Time

While the spider is running, we can monitor the log file in real-time using the tail command:

	tail -f out.txt

This command will display new log messages added to out.txt as they appear.

By using tee, we can effectively monitor our ETL job, debug issues in real-time, and keep a log of our output for further analysis.

References Reading List

Chapter on Database Physical Storage with TOAST from PostgresSQL.org

PostgreSQL