How I Screwed Up the Most Important Feature of the Team

And the Lessons I Learned.

Dec 04, 2024

“Core is dumping like crazy!”

That’s what our DevOps lead exclaimed one day, clearly frustrated. After some investigation, the issue was pinned down to a high server load. A simple machine restart seemed to temporarily fix things.

But deep down, I had a sinking feeling. Something told me the problem wasn’t truly resolved — and worse, my refactoring might be to blame.

The Feature

Let me explain a bit about the feature so you can understand why my “improvement” went so wrong.

The core model, let’s call it AnalysisRequest, handles user submissions. Each request can contain anywhere from a handful to hundreds of files awaiting analysis. Analyzing each file involves a lengthy process:

Annotating and flagging decisions
Creating numerous sub-arrays of instances
Gathering notifications
Performing bulk actions
Rendering final results

Without optimization, a single AnalysisRequest with 300 files could take over 30 minutes to process. This feature is the backbone of our system, and any inefficiency or error here impacts everything.

Naturally, the first and most logical idea to improve performance was to parallelize the processes.

My “Improved” Parallelizer

At first, my changes seemed like a major win. Here’s a simplified version of my parallelized method:

# filename: parallelizer

def thread_parallelize(inputs, func, additional_variables=None, close_connections):
    # ...
    partial_func = partial(func, **additional_variables)
        
    # Calculate the max_workers based on the CPU count
    with ThreadPoolExecutor(self.max_workers) as executor:
        results = list(executor.map(partial_func, inputs))

    return results

def _generate_aspect_list(self, generator, analysis):
    import_ids_list = []
    skipped_aspects = []
    dicts_list = list(generator)

    # Retrieve models
    target_names = [cur_dict.get('name', '') for cur_dict in dicts_list]
    targets = models.Target.objects.filter(Q(name__in=target_names) | Q(other_id__in=target_names))
    
    res, skipped_aspects = parallelizer.thread_parallelize(
        dicts_list, 
        self._sub_aspect_method, 
        {'analysis_id': analysis.id, 'targets': targets}, 
        close_connections=True
    )

    import_ids_list = [item[1][0] for item in res if len(item[1])]
    
    if skipped_aspects:
        logger.info(f'Analysis {analysis.id} Skipped analyzing aspect: {skipped_aspects}')
    
    return import_ids_list

By using ThreadPoolExecutor, I reduced the analysis time from 32 minutes to just over 4 minutes. It felt like a huge win… until it wasn’t.

The Problem with ThreadPoolExecutor

The primary issue was Python’s Global Interpreter Lock (GIL). While thread pooling works well for I/O-bound tasks, it’s ineffective for CPU-intensive workloads. Unfortunately, the core feature wasn’t lightweight:

Unpredictable Requests: Each request had a variable number of files, ranging from a few to thousands.
Complex File Handling: Different file categories require unique processing.
Heavy Computations: Each file analysis involved extensive database visits and CPU-heavy tasks.

Moreover, when the volume of instance creation surged, even I/O-bound tasks became resource-heavy. Errors like “I/O error: file cannot open,” and occasional 500 errors began to appear. Clearly, my solution was flawed.

The Initial Solution

My first attempt at a fix was a queuing strategy. The idea was simple: the backend would pool unanalyzed requests and process them sequentially. This involved adding a pooling_id to the request model.

While this resolved thread conflicts and file I/O issues, it came with a glaring drawback: it was painfully slow. Imagine submitting a large request today and waiting several days for results. It wasn’t scalable.

The Real Fix: Conditional Parallelization

I knew I had to parallelize the processes — but in a smarter way.

Our choice of parallelization is limited due to dependency conflicts and difficulties in setting up additional infrastructure, so python’s built-in pool executors are still the best bet.

ProcessPoolExecutor to the Rescue

For CPU-intensive tasks, I switched to ProcessPoolExecutor. Here’s a revised version of the method:

# filename: parallelizer

def process_parallelize(inputs, func, additional_variables=None, close_connections):
    # ...

    partial_func = partial(func, **additional_variables)
    
    with ProcessPoolExecutor(max_workers=multiprocessing.cpu_count()) as executor:
        # Submit all tasks and collect results
        results = list(executor.map(partial_func, inputs))

    return results

# class methods are aggregated in AnalysisUtils class as static methods.
import AnalysisUtils

def _generate_aspect_list(self, generator, analysis):
    res, skipped_aspects = parallelizer.process_parallelize(
        dicts_list, 
        AnalysisUtils.sub_aspect_method, 
        {'analysis_id': analysis.id}, 
        close_connections=True
    )

The Challenges

I/O Errors: Initially, I encountered _io.ReadBuffer errors, which I suspected were caused by unclosed files. After verifying that all files were closed, the error persisted.

Eventually, I discovered the issue stemmed from using a class method instead of a @staticmethod. Once I fixed that, the error disappeared.

Database Connection Issues: Django’s ORM doesn’t gracefully handle multiprocessing. Closing connections at the start of each process led to “EOF” errors, while leaving them open caused data conflicts.

The solution? Cleanly separate CPU and I/O tasks.

Final Refactor

After refactoring thousands of lines of code to isolate CPU-bound and I/O-bound tasks, the system finally worked seamlessly. A test run confirmed the functionality. Here’s the final pseudo-code:

import AnalysisUtils
import parallelizer

def _generate_aspect_list(self, generator, analysis):
    import_ids_list = []
    skipped_aspects = []

    dicts_list = list(generator)

    processed_data = parallelizer.process_parallelize(
        dicts_list, 
        AnalysisUtils.sub_aspect_method, 
        {'analysis_id': analysis.id}, 
        close_connections=True
    )

    # Retrieve models
    target_names = [cur_dict.get('name', '') for cur_dict in dicts_list]
    targets = models.Target.objects.filter(Q(name__in=target_names) | Q(other_id__in=target_names))

    res = parallelizer.thread_parallelize(
        processed_data,
        AnalysisUtils.assign_aspect_data, 
        {'analysis': analysis, 'targets': targets}, 
        close_connections=False
    )

    import_ids_list = [item[1][0] for item in res if len(item[1])]

    if skipped_aspects:
        logger.info(f'Analysis {analysis.id} Skipped analyzing aspect: {skipped_aspects}')

    return import_ids_list

What I Learned From This Drama

Success in Development Doesn’t Guarantee Production Success
What works in development might fall apart in production. Always question scalability and robustness — success today doesn’t guarantee success tomorrow.
Dig Deep Into Root Causes
Surface-level fixes can hide bigger issues. If I had investigated those 500 errors earlier, I could have avoided the “core dumping” fiasco altogether.
There’s No One-Size-Fits-All Solution
Every problem requires its own approach. Separating CPU-bound and I/O-bound tasks is tedious but necessary — shortcuts don’t cut it.
Fallbacks Are Okay (Even When They Suck)
Sometimes, a slow and steady solution is better than an unstable one. In the real world, reliability often trumps speed.
Error Messages Can Be Misleading
Error messages don’t always tell the full story. Digging deeper into the _io.BufferedReader issue taught me to question and investigate beyond the obvious.

What I Would Do If Infrastructure Weren’t a Limit

If infrastructure weren’t a constraint, my first choice would be Celery, which excels at handling distributed tasks and heavy computations.

A runner-up option would be DjangoQ for its seamless Django integration.

Final Thoughts

This was a close call, and I’m grateful we resolved the issue before it reached our partners. Through this experience, I gained a much better understanding of Python’s GIL, threading limitations, and parallelization strategies. Now, I feel more confident tackling similar challenges in the future.

What about you? Have you faced any debugging nightmares or tricky production issues? Share your stories below!

Romee Panchal

Feb 16

It is a common issue that almost every developer faces at some point in their early career. Concurrency should be implemented based on access patterns and computational requirements.

Nice article! Very well put. I think such articles (case studies type) will help developers more than a typical topic walkthrough.

Expand full comment

Build to Launch

Discussion about this post