Zahiruddin Tavargere (zahere.com)

Why Slack Redesigned Their Job Queue

Zahiruddin Tavargere — Sun, 25 Feb 2024 12:29:18 GMT

You can read about the 'how' in this brilliantly written article by Slack engineers.

Let's focus on the 'why'!

The Context

Slack handles critical web requests like message posts, push notifications, calendar reminders, billing, etc asynchronously using its Job Queue system.

6 years ago, the numbers looked something like this...

On busy days, Slack's job queue system processed over 1.4 billion jobs
The peak rate reached 33,000 jobs per second

The DAU (Daily Active Users) have grown 4X since then - it's safe to assume the number of jobs would have increased significantly as well.

Production Outage Due to Job Queue

The Redis-based Job Queue, which was designed in the early days of Slack, served the company well until it was pushed to the limits.

6+ years ago, Slack experienced a significant production outage due to the job queue.

A resource contention in its database layer led to jobs being processed slowly.

This led to Redis running out of memory as jobs were not getting dequeued, which meant no new jobs were enqueued as well and all the downstream operations were failing.

Even though the database contention was resolved, the job queue remained locked as dequeuing also required free memory.

The Slack team managed to resolve this after 'extensive manual intervention'.

This incident led to a re-evaluation of the job queue as a whole. You can read the source article on how Slack resolved this using Kafka.

But let's understand the shortcomings of the initial design and the lessons we can take away.

Architectural Problems

After the post-mortem, the Slack team identified the below constraints. Let's double-click on a few.

Constraint 1

Redis had little operational headroom, particularly with respect to memory. If we enqueued faster than we dequeued for a sustained period, we would run out of memory and be unable to dequeue jobs (because dequeuing also requires having enough memory to move the job into a processing list).

It's important to understand how Redis' behavior changes when enqueuing exceeds the rate of dequeuing.

If we continuously add more items to the queue faster than we can process them, the available memory will eventually become insufficient. This leads to two consequences:

Memory exhaustion: As mentioned, when there isn't sufficient memory left to hold both newly added elements and those waiting to be processed, Redis cannot perform dequeuing because it needs additional memory to move the job from the queue into a processing list.
Possible Performance degradation: If an eviction strategy is employed, Redis removes older/random keys. However, this results in performance issues since the data being removed might still be required by ongoing processes.

What is the right eviction policy in the scenario above?

The exact behavior Redis follows when the maxmemory limit is reached is configured using the maxmemory-policy configuration directive.

The following policies are available:

noeviction: New values arent saved when memory limit is reached. When a database uses replication, this applies to the primary database
allkeys-lru: Keeps most recently used keys; removes least recently used (LRU) keys
allkeys-lfu: Keeps frequently used keys; removes least frequently used (LFU) keys
volatile-lru: Removes least recently used keys with the expire field set to true.
volatile-lfu: Removes least frequently used keys with the expire field set to true.
allkeys-random: Randomly removes keys to make space for the new data added.
volatile-random: Randomly removes keys with expire field set to true.
volatile-ttl: Removes keys with expire field set to true and the shortest remaining time-to-live (TTL) value.

The eviction policy is dependent on the business context and the business impact of the job/message, maybe certain less impactful features can have a TTL key set.

Constraint 2

Job workers couldnt scale independently of Redisadding a worker resulted in extra polling and load on Redis. This property caused a complex feedback situation where attempting to increase our execution capacity could overwhelm an already overloaded Redis instance, slowing or halting progress.

This is because Redis is a single-threaded system.

As each worker must poll Redis for new jobs to execute - the number of workers increases, the polling frequency also increases, leading to a higher load on Redis.

This can cause Redis to become overloaded, leading to performance degradation and even system failure.

Also, adding more workers to the system can create a complex feedback situation where attempting to increase execution capacity can overwhelm an already overloaded Redis instance.

To address this issue, it is essential to carefully monitor the load on Redis and adjust the number of workers accordingly.

Constraint 3

Previous decisions on which Redis data structures to use meant that dequeuing a job requires work proportional to the length of the queue. As queues become longer, they became more difficult to emptyanother unfortunate feedback loop.

Deleting elements from a Redis list, such as a job queue, involves iteratively removing elements from the tail of the list using the RPOP command.

Since the time complexity of RPOP is O(n), where n represents the size of the list, the longer the queue gets, the more expensive it becomes to dequeue a single job. Meaning dequeuing requires free memory as well.

This exacerbated the problem, as the longer the queue got, the harder it became to keep pace with the growing demand for dequeuing jobs.

One strategy to counter this is implement 'blocking queues' to avoid continuous polling when the queue is empty.

With blocking queues, consumers are put to sleep until a task becomes available, reducing unnecessary load on Redis

Thank you so much for investing your time in reading this article.

In the coming days, I will share tutorials on building AI agents and automation solutions.

Do not forget to subscribe if you haven't already.

My Popular Blog Articles

How I Transitioned Into an 'AI Engineer'

Zahiruddin Tavargere — Sun, 11 Feb 2024 09:30:13 GMT

💡

This is NOT a recommendation or a template for aspiring AI engineers. I am simply sharing my experience so far.

The Context

As you can guess from the name of this newsletter, I am a big proponent of Adaptability for a fulfilling career. A few months ago, I got an opportunity to put my words into action when I was tasked to lead a GenAI team.

I had to, once again in my career, get out of my comfort zone, and 'adapt' to the demands of the industry.

I transitioned into an 'AI Engineer' in November last year and the 3-month journey has been a thrilling rollercoaster of challenges and small victories.

Though I transitioned in 3-months ago, the learning had begun in early 2023.

Today I can use data engineering techniques, build ML models, leverage GenAI, and use MLOps techniques to solve business problems and push the solutions to production.

Are the above skills good enough to call myself an AI Engineer? Let me know in the comments.

'AI Engineer' has ambiguous definitions in the industry. The Coursera definition best resonates with my understanding of an AI Engineer.

Artificial intelligence engineers are individuals who use AI and machine learning techniques to develop applications and systems that can help organizations increase efficiency, cut costs, increase profits, and make better business decisions.

How did I get started?

I had certain advantages.

I work in a polyglot organization that does rapid experimentation with emerging technologies.

I was tasked to build functional GenAI prototypes, and seeing them come to life piqued my interest in AI engineering.

This means I had the resources to learn and transition into an AI engineer and I only had to decide to become one at this stage in my career.

Also, I have more than 15 years of experience optimizing processes, building automation tools, and designing distributed systems.

Knowing how systems work, combined with the ability to code and design systems, and my domain knowledge was already a huge advantage.

I just had to add another skill set that is complimentary to my previous experience.

I must say, I was lucky and I am grateful to my organization for investing in me.

Now coming back to the topic; here's how I got started.

The first couple of weeks were hard. I just didn't know how to start.

The rapid advancements in LLMs were overwhelming and hard to keep up.

Some questions that I would constantly ponder over...

Do I spend most of my time mastering RAG techniques and LLMs?

Will I even use ML models in the business problems I intend to solve?

Should I get good at Math concepts even before starting the above?

After a month, as I wasn't making any progress, I decided to take a step back and focus on skills that I know will help solve my organization's AI use cases.

Instead of going through the entire AI learning path, I dove deep into areas I knew would help me solve the immediate business problems.

What about Math?

When you start researching the prerequisites of AI/ML, the first thing you read is Math.

Unless you are a Math enthusiast or have an advanced Math degree - Vector Calculus, Differential equations, and Mathematical statistics can scare any beginner into abandoning ML.

In my opinion, if your goal is to solve business problems only you will not use a lot of Math. Many tools take care of the Math for you.

If you are practicing AI/ML in an academic setup or are in the research space - then you most certainly will need maths.

Of course, knowing Math will help to know the concepts behind the Machine Learning algorithms, when to use them, and why to use them.

But you don't have to be PRO in math.

Data 'engineering' - The Ultimate Skill?

You would have heard/read this quote before "Data scientists spend between 80 to 90 percent of their time in data cleaning."

I cannot emphasize enough how critical this process is and why you should spend a lot of time mastering this particular skill set.

The illustration below perfectly captures the data science lifecycle. If you pay attention to steps 2 to 5, that's the phase where you will iterate several times (maybe for weeks and even months) before you reach step 6.

Resources I used to learn

Microsoft - AI for beginners.
Microsoft-stack heavy, but good to learn the fundamentals.
Andrew NG's Deeplearning.ai
I learned it in the below order...
1. ChatGPT Prompt Engineering for Developers
2. LangChain for LLM Application Development
3. Building Applications with Vector Databases
Campus-X for Natural Language Processing (It's in Hinglish)
Educative.io for Machine Learning

Thank you so much for investing your time in reading this article.

In the coming days, I will share tutorials on building AI agents and automation solutions. Do not forget to subscribe if you haven't already.

How to Unlock System 2 Thinking in Large Language Models?

Zahiruddin Tavargere — Sun, 26 Nov 2023 17:55:53 GMT

Context

I recently went through the video 'Intro to Large Language Models' by Andrej Karpathy.

For the uninitiated, Andrej was the research scientist and founding member of OpenAI, before joining Tesla as Sr Director of AI working on the Tesla Autopilot, and returning back at OpenAI in 2023. Rest assured he knows what he is talking about.

Andrej brings up an interesting point about how the state-of-the-art LLMs today excel at instinctive tasks, like answering straightforward questions or generating text based on patterns they've learned, but, when faced with more complex queries that require reasoning, they fall short.

Andrej asserts that the LLMs of today have System 1 thinking and fall short on System 2 thinking.

What is System 1 and System 2 thinking?

Popularized by Daniel Kahneman in his best seller Thinking, Fast and Slow, System 1 and System 2 is a model that describes two modes of human decision-making and reasoning.

System 1 thinking is quick, instinctive, and automatic, while System 2 involves more rational, slower decision-making that requires conscious effort.

Canadian computer scientist Yoshua Bengio in a 2019 talk explains this perfectly.

In the scenario of driving home on a familiar route, the process becomes automatic and instinctivetypifying system 1 thinking. When driving to a well-known destination, individuals don't need to pay significant attention to the road; they can engage in conversation or perform other tasks while driving. This is akin to how current deep learning excels in tasks that are habitual and can be performed intuitively.

On the other hand, Yoshua contrasts this with the experience of navigating in a new city. In this unfamiliar setting, individuals need to pay close attention to directions, read road signs, and be actively engaged in the task. If someone attempts to engage in conversation, the driver may ask them to hold off, illustrating the need for conscious and deliberate thinkingcharacteristic of system two tasks.

The analogy underscores the difference in the depth and consciousness of thinking between routine, automatic actions (system one) and more complex, consciously processed tasks (system two).

Andrej on Thinking, System 1/2

Large language models, including the widely-known GPT-3, operate as if they are on a predetermined track, processing words in sequence with each "chunk" taking the same amount of time.

This limitation confines them to system one thinking, preventing them from simulating the nuanced, thoughtful process associated with system two.

Andrej emphasizes that researchers and developers aim to imbue these language models with system two capabilities.

The vision involves allowing users to present questions and problems to AI models with the luxury of time30 minutes instead of instant responses.

This shift would enable the model to engage in a more deliberate and conscious thought process, akin to a human systematically exploring a problem, considering various angles, and arriving at a well-thought-out conclusion.

Challenges in LLMs' System 2 Thinking Capabilities

LLMs, as pattern recognition systems, may not be naturally suited for System 2 tasks. They currently lack internal reasoning mechanisms and the ability to engage in independent reasoning

Additionally, they may rely on System 1 thinking, which can lead to errors in judgment and decision-making

Potential solutions

I found this blog post where the authors (some prominent minds AI) discuss approaches to enhance System 2 thinking capabilities in LLMs.

Some that I found interesting are...

Integration of Symbolic Approaches: Historically, cognitive architectures focused on symbolic approaches for knowledge, reasoning, planning, and constraint satisfaction. The challenge today is that these methods, while providing sound inference, don't easily align with language, hindering their use by current large language models (LLMs). The solution lies in integrating symbolic approaches with neural architectures.
Neural-Symbolic Integration: A growing body of research is exploring the integration of neural and symbolic architectures. This involves adding mathematical and physical reasoning to language, as exemplified by Stephen Wolfram's work. Calls for integrating knowledge graphs with language models, such as those by Denny Vrandei, and incorporating explicit knowledge and rules of thumb, as suggested by Doug Lenat, indicate a shift toward more sophisticated, hybrid approaches.
Knowledge Representation Enhancement: Researchers propose enhancing language models by integrating knowledge graphs and more sophisticated knowledge representation schemes. Tom Dietterich advocates for a research arc starting from integrating knowledge graphs into language models, recognizing that a purely System 1 approach is insufficient for advanced competencies.
Hybrid Architectures: Examples from various domains, including AlphaGo and self-driving cars, showcase successful implementations of hybrid architectures. AlphaGo, for instance, employs a combination of deep policy networks and tree search. In self-driving cars, symbolic representations of traffic laws are crucial for precision. The common thread is that hybrid architectures, combining symbolic representations with neural networks, are more effective for building System 2 competencies.

Conclusion

While this capability is not yet realized in existing language models, it represents an exciting frontier in AI research. The prospect of merging the efficiency of system one thinking with the depth of system two thinking could revolutionize how we interact with AI, allowing for more thoughtful and nuanced conversations with these intelligent systems.

I write about System Design, UX, and Digital Experiences. If you liked my content, do kindly like and share it with your network. And please don't forget to subscribe for more technical content like this.

Why Processes Help Even If You Are a "Fast-Moving" Company

Zahiruddin Tavargere — Sun, 17 Sep 2023 18:38:44 GMT

I have spent a good 6 years working in a newsroom where every day was a unique experience.

This is what a Happy Path looked like in the day of the life of a news producer (my perspective).

I had to generate 5 new story ideas for the day and get them approved in the editorial meeting.
Start working on the scripts.
Coordinate with ground reporters and stingers for visuals or bites around the stories.
Write a promo for the show and coordinate with the promo producer.
After the first story is written, get the voice-over done.
Work with the video editor to cut the visuals, bytes, and clips.
Work with the graphics team to create Intros, Outros, and plates.
Produce a live discussion, invite guests - arrange pickup/drop.
Review the edited video.
Repeat the above steps for other stories( 3 or 4 times)
Work with the PCR director on the show rundown and push the clips through a workflow.
Watch the live show.
Add new stories for the morning show.

and REPEAT THIS EVERY DAY.

This was a 10-hour shift. Add breaking news to this and the entire equation changes.

Despite all the volatility, ambiguity, and chaos - the only thing that made all this work was the systems or processes.

There were many other producers like me who had their own time slots, including live news bulletins, and yet the system hardly failed.

In case it did, producers improvised and kept the show running.

In the software development analogy, this is a major release every day or sometimes multiple times a day.

Deployment days can be a whirlwind of activity, but having a well-defined process in place can help to ensure a smooth rollout.

In software development, this translates to thoroughly testing the software, clearly communicating the changes to stakeholders, and having a plan in place for any issues that may arise.

Whats the Purpose of a Process?

"The purpose of the process is to ensure consistency. A good process is like a checklist that ensures the right things get done by the right people at the right time" Peter M Beaumont, Fractional COO.

In software development, a well-defined process is necessary to control the complex development of software projects.

This is because ad-hoc approaches can lead to ambiguous communication, inaccurate risk assessment, insufficient testing, and uncontrolled change propagation.

A formal development process provides visibility into the project, enables timely management control, and helps to organize workflow and outputs to increase individual and team productivity.

The adoption of a formal development process with clear policies will result in meeting the stakeholders' needs accurately, including necessary features, and reducing post-development costs.

Some may argue that processes can slow down start-ups and high-growth companies.

In my opinion, It may get them off the blocks and beyond the first iteration, but after that, it's well-defined processes that keep them on course.

Software development is not a sprint but a marathon.

In summary, the importance of a well-defined process in software development cannot be overstated.

The adoption of a formal development process provides numerous benefits, reduces the risk of failure, and helps to ensure a successful outcome.

Please share your thoughts in the comment sections.

If you liked this piece of content, please share this with your network.

Why Every Software Engineer Needs a Brag Document

Zahiruddin Tavargere — Mon, 11 Sep 2023 10:39:04 GMT

At the beginning of this year, as I was drafting my performance review document, much like in previous years, I only highlighted the significant achievements.

They are easy to remember and I was able to go into details.

But, in a 4-quarter cycle when projects are large, you might only have a couple of big wins.

When I was trying to recall all the "small activities" that I performed that could show leadership, ownership, and team spirit, I was just not able to recall any.

The clever bug fixes, the mentoring of junior developers, the hours of debugging that rescued a critical release from disaster well, all of those slipped through the cracks.

I realized how crucial it is to document all those activities.

In fact, there's a term for such a document, and developers who maintain it swear by its advantages.

Introduction to Brag Document

Brag documents, also known as shine documents or hype docs, are a way for software engineers to keep track of their accomplishments and successes.

They are essentially a collection of useful talking points, anecdotes, and achievements that can be used to showcase one's skills and contributions to the team.

Benefits of Keeping a Brag Document

There are several benefits to keeping a brag document, including:

Recognition: Brag documents help ensure that your work is recognized and appreciated by your colleagues and managers. They can be used to highlight your contributions during performance reviews, salary negotiations, and other important meetings.
Reflection: Brag documents can also help you reflect on your work and identify areas for improvement. By tracking your successes and failures, you can gain a better understanding of your strengths and weaknesses as a software engineer.
Career Development: Brag documents can be a valuable tool for career development. They can help you identify areas where you need to improve your skills, as well as opportunities for growth and advancement within your organization.

Components of a Brag Document

A brag document can include a variety of components, depending on your preferences and goals. Some common components include:

Accomplishments: This section should include a list of your most significant accomplishments, along with any relevant metrics or data. Be sure to highlight the impact of your work on the team or organization.
Skills: This section should list your technical skills, as well as any soft skills that you have developed over time. Be sure to include any certifications or training that you have completed.
Projects: This section should provide an overview of the projects you have worked on, including your role and responsibilities. Be sure to highlight any challenges you faced and how you overcame them.
Feedback: This section should include any feedback you have received from colleagues or managers, both positive and negative. Be sure to include any action items or areas for improvement.

Overcoming Modesty and Impostor Syndrome

One of the biggest challenges in creating a brag document is overcoming modesty and impostor syndrome. Many software engineers are hesitant to "brag" about their accomplishments, fearing that they will come across as arrogant or boastful.

To overcome this, it's important to remember that a brag document is not about bragging or showing off. It's simply a tool to help you keep track of your accomplishments and showcase your skills and contributions. By focusing on the facts and being honest about your successes and failures, you can create a document that is both informative and humble.

Please find below the resources and examples of the brag doc. Do not hesitate to create your own version - there are only positives here.

Resources

https://www.youtube.com/watch?v=4hnZZRYAHJ8

https://github.com/readme/guides/document-success

Brag Doc Examples

https://dev.to/getworkrecognized/3-free-brag-document-templates-google-docs-4nla

https://thefountaininstitute.notion.site/Jeff-s-Brag-Doc-Template-b8153d823174403c87caff3f310acaad

My Popular Blog Articles

NVIDIA H100 - The Most In-Demand Hardware in The World

Zahiruddin Tavargere — Tue, 22 Aug 2023 05:15:14 GMT

The Context

Ancient civilisations hunted for spice; in the 20th century we fought wars for oil. In 2023, the worlds most precious commodity is an envelope-sized computer chip.

I couldnt have come up with a better intro than The Telegraph. It perfectly sums up the current scenario.

Despite a price tag of $40000, even superrich countries and companies are struggling to get their hands on it.

Demand for the H100 is so great that some customers are having to wait as long as six months to receive it.

I was of the opinion that AI has evolved at break-neck speed in the last 8 months, apparently, its slower - it could have been even faster.

So, what is this H100?

What is the H100?

Named after pioneering computer scientist Grace Hopper, The NVIDIA H100 GPU is a hardware accelerator designed for data centers and AI-focused applications.

A GPU is a type of chip that normally lives in PCs and helps gamers get the most realistic visual experience. The H100 GPU unlike a regular GPU is meant for data processing.

Here are some key details about the H100 GPU:

Tensor Cores: The H100 features fourth-generation Tensor Cores, which are specialized hardware units for accelerating AI computations.
Transformer Engine: The H100 GPU includes a Transformer Engine with FP8 precision, which enables up to 4 times faster training compared to the previous generation.
Architecture: The H100 is based on the NVIDIA Hopper architecture, which is the fourth generation of AI-focused server systems from NVIDIA.
Form Factors: The H100 GPU is available in PCIe and SXM form factors.
Multi-Instance GPU: The H100 GPU supports GPU virtualization and can be divided into up to seven isolated instances, making it the first multi-instance GPU with native support for Confidential Computing.
Performance: According to NVIDIA, the H100 is up to nine times faster for AI training and 30 times faster for inference compared to the previous generation A100 GPU.

https://newsletter.adaptiveengineer.com/embed

What Makes H100 So Special?

Powerful Performance

The H100 promises significant performance improvements compared to its predecessors. It can deliver up to 9 times faster AI training and up to 30 times faster AI inference in popular machine-learning applications.

To put things in perspective, GPT-3 was trained with 1024 A100 (H100 predecessor) GPUs in 34 days, while the H100+Codeweaver infrastructure of 3584 GPUs trained GPT-3 in 11 minutes.

Ideal for AI and HPC

The H100 GPU is particularly well-suited for complex AI models and high-performance computing applications. It offers advanced architecture and fourth-generation Tensor Cores, making it one of the most powerful GPUs available.

Best Use Cases (Reference: Lambda Labs)

Big models with high structured sparsity:
- H100's performance is great for big models with organized sparsity, especially large language models. Vision models benefit less from the upgrade.
- Transformers, commonly used in NLP, are becoming popular in other areas like computer vision and drug discovery due to features like FP8 and Transformer Engine.
Large-scale distributed data parallelization:
- H100's new NVLink and NVSwitch tech make communication 4.5x faster in setups with many GPUs across nodes.
- This helps a lot in large-scale distributed training where communication between GPUs was slow.
- It's useful for various models including language and text2image models, as well as big CNN models.
Model parallelization:
- Some complex models don't fit on one GPU, so they need to be split across multiple GPUs or nodes.
- H100's NVSwitch boosts performance greatly. For example, running a Megatron Turing NLG model is 30x faster on H100 than on A100 with the same GPUs.
Model quantization:
- Making trained models work well with less precision (INT8) is important for real-world applications.
- Converting models to INT8 can lead to accuracy loss.
- H100 introduces FP8, a new data type that helps maintain accuracy while using less precision, making model quantization easier and more effective.

My Popular Blog Articles

Introduction to Storage Engines

Zahiruddin Tavargere — Tue, 22 Aug 2023 05:07:28 GMT

Storage Engines

Databases are modular systems and consist of multiple parts:

Transport layer accepting requests
A query processor to determine the most efficient way to run queries
An execution engine carrying out the operations
A storage engine

A storage engine is a software component in a DBMS Architecture that is responsible for storing, retrieving, and managing data in memory or in disk.

MySQL, a DBMS, has several storage engines - including InnoDB, MyISAM, and RocksDB.

When designing systems, the choice of a database is the single most important decision you will make.

So it is important to invest time earlier in the development cycle to decide on a specific database to build confidence.

The best way to do it is to simulate the operations on multiple databases using test data. Operations that are specific to your use case. This is also the best way to find out how active the community is when encountering an issue.

Do not go by industry benchmarks as they can establish bias due to test conditions that you might never encounter. You want to simulate conditions for your use case.

Understand your use case in great detail to identify...

Schema
Potential number of clients
Possible database size
Read-to-Write ratio

With this input, you will know how easy/difficult it is for you to manage your data.

To pick the database for your use case you should be ready to make tradeoffs.

A simple illustration to help you understand tradeoffs.

Moq Data-Collection Controversy: The Other Side of The Story

Zahiruddin Tavargere — Mon, 14 Aug 2023 07:25:18 GMT

The Context

Moq is a very popular, open-source project that provides a mocking library for .NET developers.

It has come under fire for quietly collecting data without the knowledge or consent of its users.

Crux of The Matter

Moq is was everything we love about open source.

It is a high-quality library, with over 470M downloads. Heavily used by companies including very large enterprises.

Creator of Moq

For more than 10 years, Daniel Cazzulino (or @kzu) has been diligently building and refining it.

The storm of criticism erupted when Moqs 4.20.0 release quietly incorporated the SponsorLink project.

SponsorLink is was shipped on NuGet as closed-source software, containing obfuscated DLLs that gather hashed email addresses of users and transmit them to SponsorLinks cloud service.

This deceptive act of Sponsorlink has received backlash from open-source software enthusiasts who felt betrayed by what they deemed a breach of trust.

https://newsletter.adaptiveengineer.com/embed

Daniel Cazzulino has now removed Sponsorlink from the project, not because there is no longer a desire to add it to Moq, but due to a bug that was showing in Mac and Linux.

There is no guarantee yet that the removal of SponsorLink is a permanent decision.

While Daniel's intentions might not have been malicious, the manner in which it was executed is unjustified and simply WRONG.

However, many developers showed disappointment in how the developer community reacted to this issue.

The damage is irreversible now as many companies and developers are already contemplating migrating their tests to other libraries.

For devs, the next best option after Moq is NSubstitute - another open-sourced project.

And here lies the problem.

There is a reason free packages like Moq are in demand.

Companies and developers save a lot of time and money by simply using these projects.

The time, effort, and money required for creating an in-house project similar to Moq can, in many cases, surpass the effort needed to develop the projects in which the library would be employed

Maintaining OSS projects is a lot of work and maintainers like Daniel Cazzulino depend on sponsors to keep the project running.

The Other Side of the Story

Many open-source maintainers struggle to make a living from their work, despite the fact that their software is used by millions of people around the world.

This is because open-source software is often developed and maintained by volunteers who are not compensated for their work.

Marc Gravell, author of some very important OSS projects like Dapper and StackExchange.Redis, has come out in support of Daniel Cazzulino.

Marc makes a solid point that Organizations (using the library) should sponsor not individuals

He has poured his heart out in this Twitter thread.

The Conclusion

Open-source software relies on trust between developers and users. When a project like Moq collects data without users' knowledge or consent, it erodes that trust.

This can have a ripple effect throughout the open-source community, as users become more skeptical of other projects and developers become more hesitant to contribute their time and expertise.

However, it is important to understand why this issue has cropped up in the first place.

Many maintainers struggle to make a living from their work, and companies that rely on their software often do not provide adequate support.

To address this issue, companies should allocate funds to support open-source maintainers and provide recognition and support to the maintainers who develop and maintain the software they use.

My Popular Blog Articles

The Promise That Was LK-99!

Zahiruddin Tavargere — Mon, 07 Aug 2023 03:18:14 GMT

**Update**: On August 18, 2023, it was confirmed that LK-99 isn't a superconductor. You can read more about this development here. I will retain the information in the rest of the post as it was originally written for the newsletter audience.

The Context

If you haven't been keeping up with the LK-99 saga on the internet in the past week, you're likely missing out on a remarkable and historic development.

Sukbae Lee, Ji-Hoon Kim, and Young-Wan Kwonthree individuals whose names could soon be etched in the annals of history or tainted by controversy.

Their recent publication on arXiv has sent shockwaves through the scientific community, as they claim to have unraveled the world's first room-temperature ambient pressure superconductor.

LK-99 is a fusion of lead, copper, phosphate, and oxygen.

This compound apparently holds the promise of propelling humanity into an unprecedented era of possibilities.

Okay, let's slow down a bit. Before we proceed, let's take a moment to grasp the concept of superconductivity.

Superconductivity

A regular conductor is a material that allows electricity or electrons to flow through it.

Examples of these materials are aluminum in high-voltage transmission lines, copper wires in your house, and even gold, silver, and ionized gas used in fluorescent light bulbs.

The problem with regular conductors is that they create resistance as electrons flow through them, causing heat.

That's why devices like computers need fans and heat sinks, and power lines run at high voltages, which is not ideal.

Superconductors, on the other hand, are materials that can carry electricity without any resistance.

But the catch is that they only work at extremely cold temperatures near absolute zero or under intense pressure, making them impractical for most uses.

There are some exceptions like specialized MRI machines.

Courtesy: https://twitter.com/Andercot

History of Superconductors

Superconductors were first discovered in 1911 by Dutch physicist Heike Kamerlingh Onnes. He found that certain metals when cooled to very low temperatures, lost all electrical resistance.

Since then, scientists have been working on developing superconductors that work at higher temperatures, as this would have significant implications for a wide range of applications

1913, lead was found to superconduct at 7K (-266.15C).
1933, when Meissner and Ochsenfeld discovered that superconductors expelled applied magnetic fields, a phenomenon that has come to be known as the Meissner effect.
1941, niobium nitride was found to superconduct at 16 K(-257.15C)
1957, Bardeen, Cooper, and Schrieffer proposed the complete microscopic theory of superconductivity, known as the BCS theory, which explained the superconducting current as a superfluid of Cooper pairs, pairs of electrons interacting through the exchange of phonons.
1962, the first commercial superconducting wire, a niobium-titanium alloy, was developed by researchers at Westinghouse.

In the 21st century, superconductivity remains a challenging field, with only one widespread application having emerged in the 100 years since its discovery.

This 2-year-old video by VICE explains how Roop Temperature Superconductors Will Change Everything.

Now lets get back to LK-99. As important as the Korean researchers discovery is, their claims need to be independently verified before these possibilities can become a reality.

The Controversies and Breakthroughs

Developing room-temperature superconductors has been a major challenge for scientists, as it requires finding materials that can conduct electricity without resistance at temperatures that are practical for everyday use.

There has been some controversy surrounding the recent claims about LK-99, with some researchers questioning whether it is truly a superconductor.

Verge.com in its article points out that 2 preprints were published to arXiv and the two preprints disagree with each other.

However, tech enthusiasts on Twitter(X) and other social media, did not hold back in recreating the compound. In fact, some claimed that LK-99 is the real deal.

https://twitter.com/thegarrettscott/status/1687928486193672192

Some users claim that LK-99 exhibited the Meisner Effect and achieved Quantum Lock at room temperature.

While the legitimacy of the claim is yet to be corroborated by a trusted source, the results of the compound so far are promising.

In case the claims above are true, the future will be insane. What would that mean for Software Engineers?

Implications for Software Engineers

Faster and more energy-efficient processors: Room-temperature superconductors could lead to faster and more energy-efficient processors. This could lead to more powerful computing systems that consume less energy.
New algorithms and software applications: Room-temperature superconductors could be used to create more efficient data storage systems or to improve the performance of machine learning algorithms.
Advancements in quantum computing: Superconductors are a key component of quantum computing. If LK-99 is a room-temperature superconductor, it could lead to significant advancements in fields such as cryptography, drug discovery, and materials science.
New opportunities for innovation: The development of room-temperature superconductors could open up new opportunities for innovation in various industries. Software engineers could play a key role in developing new applications and technologies that leverage the capabilities of superconductors.

Conclusion

While there is still some controversy surrounding LK-99 and its potential as a room-temperature superconductor, recent developments have sparked a lot of interest in the scientific community.

If superconductors can be developed that work at higher temperatures, it could have significant implications for a wide range of applications, including computing and renewable energy technologies.

My Popular Blog Articles

How Discord Stores TRILLIONS of Messages

Zahiruddin Tavargere — Sat, 29 Jul 2023 13:48:31 GMT

If you are not familiar with Discord, heres a quick introduction.

Discord is a voice, video, and text chat app that's used by tens of millions of people to talk and hang out with their communities and friends.

The communities in Discord are called Servers.

A server is a collection of persistent chat rooms and voice channels.

Discord has over 350 million registered users and over 150 million monthly active users.

The Context

Discord has been storing trillions of messages in Scylla.

Their growth over the years has led them to migrate from Mongo to Cassandra to Scylla in the span of 5 years.

Lets understand the problems and decision-making that led them to ScyllaDB.

Why Did They Migrate From Mongo to Cassandra?

Discord initially used MongoDB to store messages, but they faced several issues, including poor performance and scalability.

MongoDB's architecture was not designed for the high write throughput that Discord required.

As a result, in 2017, they migrated to Cassandra, which provided better write performance and scalability.

The Problems Faced with Cassandra

By 2022, the Cassandra messages database grew almost 15x to 177 nodes with trillions of messages.

With this growth came some problems

Unpredictable latency
Very expensive maintenance operations
On-call system under constant stress

The Discord team zeroed down on one major cause. Partition.

Discord partitioned messaged by the channel theyre sent in. In Cassandra, all messages for a given channel were stored together and replicated across three nodes.

This approach to partitioning had a performance pitfall.

A discord server with thousands of people sends an order of magnitude more messages than a server with a few hundred users. This leads to a scenario that the discord teams refers to as Hot Partitions.

**What are Hot Partitions?**

Hot partitions are a common issue in distributed databases.
They occur when a single partition receives a disproportionate amount of traffic, causing performance issues.
Hot partitions can be caused by a variety of factors, including uneven data distribution, poor partition key design, and high write throughput.

The discord team observed that the hot partition affected latency across the entire database cluster.

As one channel and bucket pair received large traffic, latency in the relevant node would increase leading to a broader end-user impact.

The process of compaction in Cassandra would further add to the latency making the reads very expensive.

Also, the discord team would constantly battle with JVMs garbage collector as the garbage collector pauses would cause significant latency spikes.

How did the Discord team tackle the above challenges?

Optimize the data pipeline first.

Discord wrote data services that sit between their API and database clusters, using Rust as the language of choice. Rust's awesome concurrency and libraries were a great match for the task.

The big feature of these data services is request coalescing, which significantly reduces traffic spikes against the database.

Request coalescing combines multiple simultaneous requests to the same resource into a single request going to the origin. If multiple requests come in at the same time, they will be automatically merged.

Discord also implemented consistent hash-based routing to their data services to enable more effective coalescing.

These improvements helped reduce the load on the database.

Migrate to ScyllaDB

Heres how the Discord team migrated trillions of messages from Cassandra to Scylla.

Provision a new ScyllaDB cluster using a super-disk storage topology.
Dual-write new data to both Cassandra and ScyllaDB while concurrently setting up ScyllaDB's Spark migrator, a tool for data migration.
They extended their data service library to perform large-scale data migrations, using Rust programming language, which reduced the estimated time from 3 months to 9 days.
The migration encountered a challenge when it got stuck at 99.9999% complete due to large ranges of tombstones in the last few token ranges of data. The team resolved this by compacting the token range, and the migration was successfully completed.
Automated data validation was performed by comparing a small percentage of reads from both databases, ensuring the data was accurately migrated. The ScyllaDB cluster performed well with full production traffic, while Cassandra experienced frequent latency issues.

Performance Improvement

Smooth operations: No more weekend firefights or node juggling.
Optimized resources: 72 ScyllaDB nodes with 9 TB each, twice the storage capacity.
Improved latency: Fetching messages faster with 15ms p99 latency.

Conclusion

Discord's message storage is a complex system that requires careful consideration of performance and scalability. By migrating to ScyllaDB, Discord was able to solve their hot partition issues and improve their overall performance.

Below are some of my popular posts...

Cache Stampede: A Problem The Industry Fights Every Day

Zahiruddin Tavargere — Mon, 24 Jul 2023 01:38:06 GMT

In 2010, Facebook, with 600+ Million users, was already one of the most popular and biggest websites in the world.

On September 23, 2010, their scale and limits were put to the test as it faced one of its most severe outages to date.

Facebook was down for more than 2.5 hours.

The reason. A bad configuration change led to a swarm of requests being funneled to their databases.

Today we made a change to the persistent copy of a configuration value that was interpreted as invalid. This meant that every single client saw the invalid value and attempted to fix it. Because the fix involves making a query to a cluster of databases, that cluster was quickly overwhelmed by hundreds of thousands of queries a second*.*
To make matters worse, every time a client got an error attempting to query one of the databases it interpreted it as an invalid value, and deleted the corresponding cache key. This meant that even after the original problem had been fixed, the stream of queries continued. As long as the databases failed to service some of the requests, they were causing even more requests to themselves. We had entered a feedback loop that didnt allow the databases to recover.

For companies with a global presence, or with a very large user base, this is a common problem - and the engineers there fight this on a daily basis.

The problem is called Cache Stampede.

What is a Cache Stampede?

Lets simplify it.

Assume that the data source is a SQL server database. The cached data is in Redis, and the time to live is 5 minutes.

...public static object GetValue(string key, string sql)    {        object value = GetFromCache(key);        if (value == null)        {            value = QueryDb(sql);            SetToCache(value, timeout: 60);        }        return value;    }...

In that 5-minute duration, all the requests to the web app are served data from the cache. After the 5-minute duration, when the object expires, the requests are routed to the database.

The database returns the data, and in the process, the cache is updated again.

Its peak time, and the application is seeing a sudden influx of requests.

After the 5-minute duration, the cache object expires. All the requests are now routed to the database to serve. This puts an extreme load on the database servers.

The database starts blocking the requests before possibly going down.

This phenomenon can be called "Cache Stampede", and it is sometimes referred to as a Dog-Piling.

A cache stampede occurs when multiple attempts to write the same data to the cache happen concurrently.

This worsens as additional requests overload the server, causing a self-reinforcing cascade of more requests and slower performance.

What are the reasons for Cache Stampede?

Cache item expiration or deletion: When a cached item expires or is deleted, multiple requests can see a cache miss and attempt to regenerate the resource simultaneously, leading to a cache stampede.
Cache cold start: When the cache is empty or has not been populated yet, all the requests will miss the cache, and the system will have to generate the resources from scratch, leading to a cache stampede.
Cache invalidation: When a cached item is invalidated due to changes in the underlying data, all the requests that depend on that item will miss the cache and attempt to regenerate the resource simultaneously, leading to a cache stampede.
Cache synchronization: In a distributed caching system, when the cache nodes need to synchronize their data, all the requests can miss the cache simultaneously, leading to a cache stampede.

How can we mitigate this Cache Stampede problem?

Some obvious solutions that one can think of are...

Distributed Caching and Load Balancing: Store cached data strategically across regions.
Proactive Cache Updates: Update cache every time there is a write to the database.
Rate Limiting

However, the recommendations in the whitepaper Optimal Probabilistic Cache Stampede Prevention make absolute sense.

External re-computation: Instead of recreating cache items when they expire, there is a separate process that periodically recreates them in the background. The drawbacks though are additional maintenance of the daemon job and unnecessary consumption of resources when cache regeneration was not required.
Locking: Upon a cache miss, a request attempts to acquire a lock for that cache key and regenerates the item only if it acquires it.
Probabilistic early expiration: Each individual request may regenerate the item in the cache before its expiration by making an independent probabilistic decision. The probability of performing an early expiration increases as the request time gets closer to the expiration of the item

Conclusion

Cache Stampede is not a common problem for most companies, but if you are working at big tech or a startup that is scaling fast, then this a concept you should definitely dive deep into.

Each technique and tool discussed here has its own advantages and disadvantages, depending on the context and requirements of the system. Therefore, it is important to understand the trade-offs and choose the best solution for your system.

Below are some of my popular posts...

Migrating Newsletter to Substack as "The Adaptive Engineer"

Zahiruddin Tavargere — Sat, 22 Jul 2023 04:27:22 GMT

Hi Friend,

First of all, my humble thanks to you for subscribing to my blog/newsletter.

Your support encourages me to put in effort every week to come up with content that can be of some help to you.

As the title says, I am migrating my newsletter to substack.

Read here to know more about "The Adaptive Engineer".

There are 2 reasons why I am doing this.

I want to keep blog content different from the newsletter. This allows me to post multiple times a week on topics beyond system design and software engineering and leverage the benefits of Hashnode. The newsletter will have content that is as promised.
While Hashnode is a great blogging platform, I wanted the newsletter to be on a platform that is meant for newsletters - which comes with recommendations and good analytics. Substack has everything I wanted.

I have added you to the "The Adaptive Engineer" newsletter. You should receive a newsletter issue every week in your inbox.

Previously, my newsletter had one topic that I dove deep into.

In "The Adaptive Engineer", we will have the following format.

The Topic of the Week
AI News and Snippets
System Design Snippets
Career Advice Snippets

The read time will be between 3-5 minutes.

Thank you again for the support and encouragement. I am confident, you will love "The Adaptive Engineer" newsletter.

Regards,
Zahiruddin Tavargere

How Quadtree Data Structure Enables Exploring Millions of Places Instantly on Booking.com

Zahiruddin Tavargere — Mon, 10 Jul 2023 23:54:27 GMT

Booking.com's Lightning-Fast Location Search.

The most powerful feature on Booking.com is the map.

There are tens of millions of listed properties on their marketplace, and users can search the map for the location and places nearby.

Thie experience on searching on the map is good only when it loads fast - and it sure does on Booking.com.

How does the backend search millions of different points across the world so quickly?

𝗧𝗵𝗲 𝘂𝗻𝗱𝗲𝗿𝗹𝘆𝗶𝗻𝗴 𝗱𝗮𝘁𝗮 𝘀𝘁𝗿𝘂𝗰𝘁𝘂𝗿𝗲 𝘁𝗵𝗮𝘁 𝗽𝗼𝘄𝗲𝗿𝘀 𝘁𝗵𝗶𝘀 𝗶𝘀 𝗮 𝙌𝙪𝙖𝙙𝙩𝙧𝙚𝙚.

A quadtree is a data structure used to represent and organize spatial data in a hierarchical manner. It is particularly useful for efficiently partitioning and querying information in two-dimensional space.

Here's how it works:

Start with the big square, which represents the entire space you want to divide and store information about.

Divide the square into four equal-sized smaller squares. These smaller squares are called "quadrants." Each quadrant represents a smaller part of the overall space.

If a quadrant is empty, you don't need to divide it further. You can mark it as empty and move on.

If a quadrant contains information, you can choose to divide it further into four smaller quadrants. You repeat the process for each of these smaller quadrants.

You continue dividing the squares until each quadrant either becomes empty or contains a single point of information. This forms the leaves of the quadtree.

Quadtrees find applications in various fields such as computer graphics, geographic information systems (GIS), collision detection, image processing, and more.

It is also an important system design topic for engineers to have an understanding of.

P.S - Repost this if you found it useful

P.P.S. If you want to learn more about software architecture and software engineering consider subscribing to my FREE newsletter.

Join 100+ "Adaptive Engineers"👇

https://lnkd.in/giRgg5yu

A Step-by-Step Guide: Using Problem Formulation to Introduce New Technologies in Existing Processes

Zahiruddin Tavargere — Mon, 10 Jul 2023 03:14:05 GMT

Let's follow the journey of a fictitious startup, ContentPro, and explore how their product manager and development team employed various problem formulation techniques to overcome content publishing hurdles.

The Context

ContentPro is a cutting-edge startup that specializes in content publishing solutions. With a team of talented content creators, editors, and marketers, they strive to deliver high-quality and engaging content to their target audience.

Currently, ContentPro's existing processes involve manual content creation, which includes researching topics, ideating content ideas, writing, editing, and distributing content across various platforms.

However, they face challenges such as content creation inefficiencies, scalability limitations, and the need for personalized content recommendations to enhance audience engagement.

To address these challenges, ContentPro aims to introduce new technologies through effective problem formulation techniques, enabling them to streamline their existing processes and unlock new levels of productivity and innovation.

What is Problem Formulation?

Problem formulation is the process of clearly defining and understanding a problem, including its context, objectives, constraints, and desired outcomes, in order to guide effective problem-solving and decision-making.

There are 3 types of Problem Formulation

Descriptive
Comparative
Associative

Why is Problem Formulation important?

Oguz A. Acar, in his HBR article, argues that "problem formulation is a widely overlooked and underdeveloped skill for most of us".

"First, future generations of AI systems will get more intuitive and adept at understanding natural language, reducing the need for meticulously engineered prompts. Second, new AI language models like GPT4 already show great promise in crafting prompts AI itself is on the verge of rendering prompt engineering obsolete."

Acar writes that there are four key components for effective problem formulation:

Problem Diagnosis
Decomposition
Reframing
and Constraint Design

Now let's understand using ContentPro as an example to elaborate on the above components.

First, let's rephrase ContentPro's problem statement with 3 different types of Problem Formulation techniques.

Descriptive
ContentPro faces challenges related to content creation inefficiencies, scalability limitations, and the need for personalized content recommendations to enhance audience engagement.

Comparative
ContentPro's content creation processes are less efficient and scalable compared to industry standards, resulting in decreased productivity and limited audience engagement.

Associative
ContentPro's content creation challenges are associated with a lack of automation, scalability limitations, and the absence of personalized content recommendations, impacting their productivity and audience engagement.

Problem Diagnosis

Uncovering the Root Causes The PM and the development team embarked on a comprehensive problem diagnosis process. They conducted interviews, data analysis, and user research to identify the underlying issues affecting ContentPro's content publishing operations. By diagnosing the root causes, they gained a clear understanding of the problem space.

The team discovered that content creation inefficiency, low audience engagement, and scalability limitations were the primary challenges hindering ContentPro's growth. This diagnosis laid the foundation for targeted problem formulation and solution-seeking.

Decomposition

Breaking Down Complexities To tackle the identified challenges effectively, the team utilized decomposition techniques. They broke down the overarching content publishing problem into smaller, manageable sub-problems, enabling a more systematic approach.

For instance, the team divided content creation inefficiency into sub-problems such as manual research, repetitive tasks, and time-consuming editing processes. By decomposing the problem, the PM and development team could address each aspect individually, ensuring a more focused problem-solving approach.

Reframing

Exploring Alternative Perspectives Recognizing the importance of alternative viewpoints, the PM and the development team embraced reframing techniques. They sought to challenge assumptions and expand their problem-solving horizons.

By reframing the content publishing challenges, the team discovered new opportunities. They explored the possibility of leveraging artificial intelligence and generative algorithms to automate content ideation, personalize user experiences, and enhance audience engagement. This fresh perspective allowed for innovative use case identification and solution generation.

Constraint Design

Aligning Resources and Priorities Conscious of limited resources, the PM and development team employed constraint design techniques. They intentionally defined boundaries and constraints within which they could operate effectively.

By considering factors such as budget, time, and technological capabilities, the team prioritized use cases that were feasible and aligned with ContentPro's resources. They focused on integrating Generative AI into specific areas, such as content ideation, personalized recommendations, and multilingual translation, where they had the necessary expertise and infrastructure.

Conclusion

By diagnosing root causes, breaking down complexities, exploring alternative perspectives, and aligning resources, the team developed a robust strategy to integrate Generative AI solutions into ContentPro's operations.

Of course, implementing the above techniques is going to be context-driven and different for different use cases.

However, giving problem formulation a structure will be highly beneficial in finding use cases.

Resources

Below are some of my popular posts...

Prompt Engineering is 𝗡𝗢𝗧 the Skill of the Future but "P𝗿𝗼𝗯𝗹𝗲𝗺 F𝗼𝗿𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻" is

Zahiruddin Tavargere — Thu, 06 Jul 2023 10:33:14 GMT

Engineers, Product Managers - let me ask you a question.

How many real use cases of Generative AI can you come up with for your organization?

When I asked a few engineer/PM friends of mine. the answers I mostly got were around Chatbots and Knowledge management.

In our enthusiasm to introduce AI in our processes, we might not be framing the use cases well.

And that is the problem.

It seems prompt engineering is 𝗡𝗢𝗧 the skill of the future but "𝗽𝗿𝗼𝗯𝗹𝗲𝗺 𝗳𝗼𝗿𝗺𝘂𝗹𝗮𝘁𝗶𝗼𝗻" is.

Oguz A. Acar, in his HBR article, argues that "problem formulation is a widely overlooked and underdeveloped skill for most of us".

Acar writes that there are four key components for effective problem formulation:

Problem Diagnosis
Decomposition
Reframing
and Constraint Design

P.S. If you want to learn more about the latest trends in software engineering consider subscribing to my newsletter.

Join 100+ "Adaptive Engineers"👇

Elon Musk's Twitter 'Rate Limit': A Stroke of Genius or a Potential Disaster?

Zahiruddin Tavargere — Tue, 04 Jul 2023 03:44:45 GMT

"Temporary Rate Limits"

Elon Musk on July 1st announced temporary rate limits which restricts users from viewing tweets once they surpass a certain threshold.

https://twitter.com/elonmusk/status/1675187969420828672

The users started complaining.

https://twitter.com/haeshion/status/1675121084343910402

The Tesla chief then later added that the rate limits were increasing to 8,000 for verified, 800 for unverified, and 400 for new unverified

https://twitter.com/elonmusk/status/1675214274627530754

And then to "10k, 1k & 0.5k".

https://twitter.com/elonmusk/status/1675260424109928449

Why Rate Limits?

Elon Musk claimed that this step was crucial due to the excessive scraping of Twitter data by numerous organizations, which severely impacted the genuine user experience.

Possibly Musk is blaming companies attempting to collect data for training large language models (LLMs) such as the ones utilized in ChatGPT, Microsoft Bing, and Google Bard.

Is scraping a genuine reason?

It well could be.

Twitter is now charging very high fees ($42000/Month) for API access. This might have created a market where people are buying and selling tweet data.

...or was it due to forcing users to log in?

A few days ago, Elon Musk made a decision to block people from being able to read Twitter without logging in which could have resulted in twitter "DDOSing itself"

Impact on Ad revenue?

Capping how much users can view could be "catastrophic" for the platform's ad business, said Jasmine Enberg, principal analyst at Insider Intelligence.

"This certainly isn't going to make it any easier to convince advertisers to return. It's a hard sell already to bring advertisers back,"

https://twitter.com/jasmineaenberg/status/1675865779256930304

https://twitter.com/jasmineaenberg/status/1675866160833634308

Are the "rate limits" temporary or is Musk forcing his users to move to verified accounts - what is your opinion?

Exploring Vector Databases: A Game-Changer in AI Technology

Zahiruddin Tavargere — Sun, 02 Jul 2023 18:12:27 GMT

Part 1 of the generative AI series.

Let us first understand the "Vector" part in the Vector Database.

What is Vector Data?

What happens when you see an apple?

When you set your eyes on an apple, your brain goes through a two-step process.

As light enters your eyes, it hits the retina. Just as a camera captures the intensity of light, your retina records the brightness of the apple and translates it into neural signals.

Now, these neural signals embark on a journey to the visual cortex, which resides at the back of your brain.

The visual cortex consists of different layers, each with its own role in analyzing the signals.

At the initial layer, the signals are decoded to identify the basic features of the apple, such as its edges. As the signals travel deeper into the visual cortex, more intricate details come to light. Curves, shading, and contours are recognized, painting a vivid picture of the apple in your mind.

The vectorized representation of the apple is then sent to other parts of the brain, such as the temporal lobe.

Here, advanced functions like object recognition, face recognition, and visual memory take place. It's important to note that these functions rely not on the original image but on the rich representation built by the visual cortex.

This process mirrors how artificial neural networks operate. Just as our brain transforms neural signals, machine learning algorithms can convert images or text into a different representation called vectors.

For instance, when it comes to image recognition, convolutional neural networks can transform images into vector representations. This enables us to perform diverse tasks, like finding similarities, searching for specific products, and even detecting scenes within images.

While vector data provides a representation of the raw data itself, vector embeddings take this concept to another level.

Instead of merely representing the data, embeddings aim to capture the semantic meaning and contextual relationships embedded within the data. They transform words, phrases, or even entire texts into numerical vectors that encapsulate the essence of their semantic properties.

What are Vector Embeddings?

Vector embeddings, also known as word embeddings, are numerical representations of words or phrases that capture their meaning and semantic relationships. They are generated using machine learning algorithms trained on large amounts of text data.

Embeddings are a way of representing dataalmost any kind of data, like text, images, videos, users, music, whateveras points in space where the locations of those points in space are semantically meaningful.

Imagine we have a dataset of sentences, including the sentence "I love cats." A vector embedding model would analyze the words in this sentence and assign each word a unique vector representation. For example:

"I" could be represented by the vector [0.2, 0.5, -0.1]
"love" could be represented by the vector [0.8, -0.3, 0.6]
"cats" could be represented by the vector [-0.4, 0.7, 0.9]

These vectors encode the meaning of the respective words in a numerical format. The values in each vector capture different aspects of the word's semantics.
In the above vectors, the first value could represent positivity/negativity, the second value could represent intensity, and the third value could represent emotion.

How is a large text dataset represented?

An embedding model analyzes the words, sentences, or paragraphs together to create cohesive representations. This process takes into account the context and meaning of words in relation to the surrounding text.

For example, let's say we have a collection of essays about different animals. The vector embeddings model would consider the words used across all the essays, identify patterns, and assign numerical vectors to each word. These vectors capture the semantic meaning and contextual information within the entire corpus.

This approach enables us to process and understand large volumes of text efficiently. Vector embeddings provide a compact representation of the information present in the texts, allowing for more effective text analysis, document clustering, and information retrieval in various natural language processing applications.

Vector embeddings are vital for training AI models. To fully leverage their advantages, it is essential to have a specialized database that can efficiently store, index, and retrieve these embeddings. This dedicated database maximizes the benefits of using vector embeddings in AI applications.

What is a Vector Database?

A vector database is a type of database that stores data as high-dimensional vectors

The main advantage of a vector database is that it allows for fast and accurate similarity search and retrieval of data based on their vector distance or similarity.

This means that instead of using traditional methods of querying databases based on exact matches or predefined criteria, you can use a vector database to find the most similar or relevant data based on their semantic or contextual meaning.

How is data stored and retrieved?

It's a 3-step process.

Let's say you want to parse and store a lengthy PDF.

Create vector embeddings of the PDF by using relevant embedding models
Store and index the generated embeddings into the vector database
When an application issues a query, the query must first go through the same vector embedding model used to generate the stored data on the vector database. The generated vector query is then placed on the vector database, where the nearest vector is then retrieved as the most fitting answer to the query.

Popular Vector Databases

Faiss (22.8k ) is a library developed primarily at Meta's Fundamental AI Research group. It enables efficient similarity search and clustering of dense vectors, even for sets of vectors that may not fit in RAM
Milvus (20.3k ) is an open-source vector database capable of managing trillions of vector datasets. It supports multiple vector search indexes and has built-in filtering functionalities
Qdrant (11.2k ) is a vector similarity search engine and database. It offers a production-ready service with a convenient API for storing, searching, and managing vectors along with additional payload
Elasticsearch (64.2k ) is a distributed search and analytics engine that supports different types of data. It introduced vector field support in version 7.10, allowing the storage of dense numeric vectors
Weaviate (6.5k ) is an open-source vector database capable of storing data objects and vector embeddings from ML-models. It can seamlessly scale to handle billions of data objects.
Vespa (4.5k ) is a comprehensive search engine and vector database. It enables vector search (ANN), lexical search, and structured data search within a single query.
Vald (1.2k ) is a highly scalable distributed dense vector search engine for fast approximate nearest neighbor search.
pgvector (4.1k ) is a PostgreSQL extension that enables storing and querying vector embeddings directly within the database. It is built on top of the Faiss library.
ScaNN (Scalable Nearest Neighbors, Google Research) is a library for efficient vector similarity search. It identifies the k nearest vectors to a query vector based on a similarity metric.
Pinecone is a vector database designed for machine learning applications. It provides speed, scalability, and support for various machine-learning algorithms.

Resources

Below are some of my popular posts...

Why Software Developers Need Prompt Engineering Skills

Zahiruddin Tavargere — Sun, 25 Jun 2023 14:42:01 GMT

The Context

Confession: I am a software engineer who has done very complex enterprise integrations at the workplace but never really worked on an AI/ML project.

Previously, I hadn't pursued ML model building or training, as I didn't consider it necessary for my role.

However, my current focus is on a Generative AI use case.

This entails leveraging AI services from a major cloud provider and using enterprise data to train the AI models.

Using the available documentation, I was able to build a prototype in a week's time. All I had to do was upload training data through the UI, train an existing model, and leverage the APIs to build the prototype. It was easy.

But there is one area I have been putting in a lot of work and that is getting the prompts right.

The prompts we are used to giving in ChatGPT v/s the prompts you'd need to solve business use cases are very different.

Unlike prompts on ChatGPT, generative AI in your business solution need to return a high-quality response with a single prompt.

And that's why you developers need to learn "Prompt Engineering" with a developer's mindset.

https://twitter.com/sama/status/1627796054040285184?ref_src=twsrc%5Etfw%7Ctwcamp%5Etweetembed%7Ctwterm%5E1627796054040285184%7Ctwgr%5Ef212303f7adaef7a05af65f571b8e3e780c2a25b%7Ctwcon%5Es1_&ref_url=https%3A%2F%2Fwww.wearedevelopers.com%2Fmagazine%2Fprompt-engineering-for-software-developers

https://twitter.com/AndrewYNg/status/1670137498163347456

Generative AI is only as good as the 'prompt' it receives. Otherwise GIGO (Garbage in; garbage out).

The Elements of a Good Prompt

📜 Instruction: This is what you tell the model to do. It could be something like "translate this text into French" or "come up with ideas for a science fiction story." The instruction sets the task for the model.

🔎 Context: This provides extra information to help the model understand the task. For example, if it's a translation task, you might mention that the text is a dialogue from a movie or a passage from a scientific paper. The context helps the model understand the style, tone, and specific details needed.

📥 Input data: This is the actual data the model works with. In a translation task, it's the text you want to translate. In a question-answering task, it's the question you're asking.

📤 Output indicator: This tells the model how to format its response. You can specify if you want a list, a paragraph, a single sentence, or any other structure. This helps the model generate a response that is more focused and useful.

https://twitter.com/karpathy/status/1617979122625712128

Anatomy of an Engineered Prompt

BASIC ELEMENTS:

SPECIFIERS:

BASIC ELEMENTS:

🔍 : The starting point or context of the prompt, setting the stage for the task at hand. This sets the tone and context. An example would be to tell it to act as a certain profession

: Action words that define what needs to be done or the type of response required.

📊 : Information or data used as input to generate the desired response. 📚

SPECIFIERS:

📝 : The expected structure or format of the response, such as a paragraph, list, or table.

🎨 : The desired tone or style of the response, be it formal, informal, technical, or persuasive.

🎯 : The intended audience for the response, whether general readers, experts, students, or a specific group.

: Time-related context or requirement for the response, involving specific time periods, events, or scenarios.

🌐 : The boundaries or extent of the response, which can be narrow or broad in terms of focus and coverage.

: Additional limitations or restrictions on the response, such as word limits or specific resources to be utilized.

Different Types of Prompting

Zero-shot prompting

A technique where a language model is asked to generate a response or perform a task without any specific training examples or prior knowledge about that particular task. It relies on the model's ability to generalize and understand the prompt's instructions.

In the given prompt, the model is instructed to classify the sentiment of a text as neutral, negative, or positive without being explicitly trained on specific examples of text and their corresponding sentiment classifications.

Few-shot prompting

Few-shot prompting involves providing the language model with a small amount of training data or examples related to a specific task or topic. The model uses these limited examples to quickly adapt and generate responses or perform tasks more accurately than zero-shot prompting.

Chain-of-Thought Prompting

It involves providing a series of related prompts or questions to guide the language model's response and encourage it to generate a coherent and logical chain of thought. Each prompt builds upon the previous one, creating a cohesive narrative or argument.

Self-Consistency Prompting

It involves asking the language model to generate multiple responses or completions for a given prompt and then ranking them based on their internal consistency. The aim is to encourage the model to produce responses that are coherent and self-consistent.

Q: There are 15 trees in the grove. Grove workers will plant trees in the grove today. After they are done,there will be 21 trees. How many trees did the grove workers plant today? A: We start with 15 trees. Later we have 21 trees. The difference must be the number of trees they planted.So, they must have planted 21 - 15 = 6 trees. The answer is 6. Q: If there are 3 cars in the parking lot and 2 more cars arrive, how many cars are in the parking lot? A: There are 3 cars in the parking lot already. 2 more arrive. Now there are 3 + 2 = 5 cars. The answer is 5. Q: Leah had 32 chocolates and her sister had 42. If they ate 35, how many pieces do they have left in total? A: Leah had 32 chocolates and Leahs sister had 42. That means there were originally 32 + 42 = 74chocolates. 35 have been eaten. So in total they still have 74 - 35 = 39 chocolates. The answer is 39.

In the following prompt, we provide multiple examples of a similar problem and the manner in which we would work to come up with the correct answer.

General Knowledge Prompting

It involves providing the language model with factual information or knowledge about a wide range of topics. This allows the model to generate accurate responses or answer questions based on its understanding of various domains.

Input: Greece is larger than mexico.
Knowledge: Greece is approximately 131,957 sq km, while Mexico is approximately 1,964,375 sq km, making Mexico 1,389% larger than Greece.
Input: Glasses always fog up.
Knowledge: Condensation occurs on eyeglass lenses when water vapor from your sweat, breath, and ambient humidity lands on a cold surface, cools, and then changes into tiny drops of liquid, forming a film that you see as fog. Your lenses will be relatively cool compared to your breath, especially when the outside air is cold.
Input: A fish is capable of thinking.
Knowledge: Fish are more intelligent than they appear. In many areas, such as memory, their cognitive powers match or exceed those of higher vertebrates including non-human primates. Fishs long-term memories help them keep track of complex social relationships.
Input: A common effect of smoking lots of cigarettes in ones lifetime is a higher than normal chance of getting lung cancer.
Knowledge: Those who consistently averaged less than one cigarette per day over their lifetime had nine times the risk of dying from lung cancer than never smokers. Among people who smoked between one and 10 cigarettes per day, the risk of dying from lung cancer was nearly 12 times higher than that of never smokers.
Input: A rock is the same size as a pebble.
Knowledge: A pebble is a clast of rock with a particle size of 4 to 64 millimetres based on the Udden-Wentworth scale of sedimentology. Pebbles are generally considered larger than granules (2 to 4 millimetres diameter) and smaller than cobbles (64 to 256 millimetres diameter).
Input: Part of golf is trying to get a higher point total than others.
Knowledge:

Conclusion

Prompt Engineering is becoming a sub-discipline in AI - and not everything about Prompt Engineering can be covered in this blog.

If you want to go deeper into the above topics, please continue reading the resources below.

In Part 2, we will cover custom prompt engineering techniques that we need to learn when building Gen AI solutions.

Sources:

Demystifying Large Language Models: A Guide for Software Developers

Zahiruddin Tavargere — Sun, 18 Jun 2023 04:26:50 GMT

Introduction

Before we talk about LLM, it is important to have an understanding of the subsets of Artificial Intelligence.

Natural Langauge Processing

Natural language processing (NLP) is a field of computer science and artificial intelligence that deals with the interaction between computers and human languages. It involves developing algorithms and systems that can understand, generate, and analyze human languages, such as speech and text.

Neural Network

A neural network is a type of computer program that is really good at solving problems and learning new things. It is called a "neural" network because it is inspired by the way that the human brain works.

Just like the brain has lots of tiny cells called neurons that are connected together and work together to help us think, a neural network has lots of tiny computer programs called neurons that are connected together and work together to help it solve problems.

To train a neural network, we give it lots of examples of the problem we want it to solve.

For example, if we want the neural network to recognize different types of animals in pictures, we might show it lots of pictures of animals and tell it what kind of animal is in each picture.

The neural network will try to learn from these examples, and it will become better and better at recognizing different types of animals as it sees more and more examples.

Large Language Model

A large language model, or LLM, is a computer program(a deep learning algorithm) that can recognize, summarize, translate, predict, and generate text and other forms of content based on knowledge gained from massive datasets it was 'trained' on.

What is training in the context of the Langauge Model?

Training a model is the process of teaching a computer program to do a specific task.

For example, To teach the model or a computer program to understand and analyze texts, we give the program lots of examples of text to study, such as books, articles, and websites. The program will look at all of these examples and try to figure out the rules of language, such as what words go together and what words mean.

I like this oversimplified explanation from Google.

Imagine training your dog with the commands below. These commands will make your dog a good canine citizen.

However, if you need a special service dog then you add special training.

A similar idea applies to large language models.

These models are trained for general purposes to solve common language problems such as Text Classification, QnA, Document Summarization, and Text Generation.

The models can then be tailored (special training) to solve specific problems in different fields like Retail, Finance, Entertainment, etc.

Behind the scene, LLM is a large transformer model that does all the magic.

Transformer Architecture

The transformer architecture is a type of neural network that is particularly well suited for natural language processing tasks.

It was introduced in the paper "Attention Is All You Need" by Vaswani et al. (2017) and has since become widely used in a variety of natural language processing and machine translation tasks.

One of its key features is the Self-attention mechanism.

Self-attention mechanisms allow the model to weigh the importance of different input elements such as words or phonemes when generating output.

In a machine translation application, it would take a sentence in one language, and output its translation in another.

The transformer essentially is the encoder and decoder component with a connection.

Say the following sentence is an input sentence we want to translate:

The animal didn't cross the street because it was too tired

What does it in this sentence refer to? Is it referring to the street or to the animal? Its a simple question to a human, but not as simple to an algorithm.

When the model is processing the word it, self-attention allows it to associate it with animal.

As the model processes each word, self-attention allows it to look at other positions in the input sequence for clues that can help lead to a better encoding for this word.

Top Large Language Models

Name	Created by	Parameters
GPT-4	OpenAI	1 Trillion (unconfirmed)
GPT-3	OpenAI	175 Billion
Bloom	Collaborative Project	176 Billion
LaMDA	Google	173 Billion
MT-NLG	Nvidia/Microsoft	530 Billion
LLaMA	Meta	7-65 Billion
Stanford Alpaca	Stanford	7 Billion
Flaun UL2	Google	20 Billion
GATO	DeepMind	1.2 Billion
PaLM	Google	540 Billion
Claude	Anthropic	52 Billion (unconfirmed)
ChatGLM	Tsinghua University	6 Billion

LLM AI models are generally compared by the number of parameters where bigger is usually better.

The number of parameters is a measure of the size and the complexity of the model.

The more parameters a model has, the more data it can process, learn from, and generate.

However, having more parameters also means having more computational and memory resources, and more potential for overfitting or underfitting the data.

Challenges and Limitations of LLM

Development costs: LLMs generally demand expensive graphics processing unit hardware and extensive datasets for efficient operation, resulting in substantial development costs.
Operational costs: Once the training and development phase is complete, the ongoing operational costs of maintaining an LLM can be significantly high for the hosting organization.
Bias: There exists a potential risk of bias in LLMs trained on unlabeled data, as it is not always evident whether known biases have been adequately eliminated.
Explainability: It can be challenging to provide clear explanations regarding how an LLM generated a specific result, posing difficulties for users seeking transparency and interpretability.
Hallucination: In certain instances, LLMs may produce inaccurate responses that are not grounded in the training data, resulting in what is commonly referred to as AI hallucination.
Complexity: Modern LLMs comprise billions of parameters, making them exceptionally intricate technologies that can pose significant challenges in terms of troubleshooting and understanding their inner workings.
Glitch tokens: Since 2022, there has been a rising trend of maliciously designed prompts, known as glitch tokens, which can cause LLMs to malfunction, highlighting a potential security concern.

Conclusion

The world of Large Language Models (LLMs) is rapidly evolving, with new advancements and increasing model parameters emerging at an astonishing pace. It's important to recognize that the true value of an LLM lies in its practical application rather than its mere existence.

While LLMs offer immense potential, their effectiveness depends on how they are harnessed and integrated into real-world scenarios. As the field continues to evolve, it is crucial for software developers to explore and discover the best ways to leverage LLMs in order to unlock their full potential and achieve remarkable outcomes in various domains.

How to build a Redis-like database in C#

Zahiruddin Tavargere — Sun, 11 Jun 2023 11:14:38 GMT

Context:

The objective of this exercise is to create a simplified version of Redis in C# as a learning project.

We will call it "KVLite"!

While Redis is a complex and feature-rich key-value store, we will focus on implementing a subset of its features to start with.

Some common Redis features KVLite 'could' have.

Basic Key-Value Operations:
- Implement basic operations like GET, SET, DEL, EXISTS, and TTL (time-to-live) for key-value pairs.
Expiration and Persistence:
- Implement key expiration using a time-to-live (TTL) mechanism i.e. to automatically remove expired keys
Multiple Client Connection:
- TCP server with the capability to handle multiple clients

The goal of this article is to teach readers to build a Key-Value Store. I will over-simplify the process to build with a step-by-step explanation. The repo below has a fully-functional code.

https://github.com/zahere-dev/KVLite

The KeyValue Data Structure

We will use the Dictionary collection to store the Key and Value pair.

Dictionary is a dynamic collection. The size of a Dictionary can grow or shrink as needed as it uses a hash table to store its elements.

Hash table is a type of data structure that allows for fast lookup and insertion of elements.

For the first iteration of KVLite, the Key will be a String and the Value will be a dynamic Object.

Dictionary<string,object>

For the benefit of all the readers, I will use a simple project structure.

Create 2 projects:

Console App to run the server
ASP Net Core Web API as the client

In the KVLite.Console app, create class KeyValueStore.cs with the following code.

public class KeyValueStore    {        private readonly Dictionary<string, object> keyValuePairs;        public KeyValueStore()        {            keyValuePairs = new Dictionary<string, Object>();        }        public void Set(string key, object value)        {            keyValuePairs[key] = value;        }        public object Get(string key)        {            return keyValuePairs[key];        }    }

Update program.cs in the KVLite.Console app with the below code.

using KVLite.Console;Console.WriteLine("Starting KVLite");var kvStore = new KeyValueStore();kvStore.Set("Test Key", "Test Value");Console.WriteLine(kvStore.Get("Test Key"));

When you run the console app, you should see the kvStore.Get returning the value for the key "Test Key"

Great! We were able to run 2 simple operations. You can add other operations like Update and Delete to the KeyValueStore class and test the functionality.

TTL (Time-To-Live)

Let's implement TTL to our existing KeyValueStore.cs.

But what is TTL and why do we need it?

In context of a key-Value Store,𝗧𝗶𝗺𝗲 𝘁𝗼 𝗹𝗶𝘃𝗲 (𝗧𝗧𝗟) 𝗶𝘀 𝗮 𝗳𝗲𝗮𝘁𝘂𝗿𝗲 𝗼𝗳 𝗥𝗲𝗱𝗶𝘀 𝘁𝗵𝗮𝘁 𝗮𝗹𝗹𝗼𝘄𝘀 𝘆𝗼𝘂 𝘁𝗼 𝘀𝗲𝘁 𝗮 𝘁𝗶𝗺𝗲 𝗹𝗶𝗺𝗶𝘁 𝗳𝗼𝗿 𝗮 𝗸𝗲𝘆.

After the time limit has expired, the key will be automatically deleted.

By default, TTL is -1 for any Redis key which means the key lives forever and this value can be changed while storing the key in DB.

Some use cases:
Caching
Session Management
Rate Limiting
Data Purging

We will implement something simple and similar to our code.

Our algorithm...

Take TTL as input from the client for the SET operation
Create another Dictionary to track Expiration Time. 'Dictionary'
In the SET operation add the key to both the keyValue dictionary and the expiration time dictionary
In the Get operation, check if they key exists in expiration time dictionary and check if DateTime value is lesser than Current Time. If lesser, remove the key from both dictionaries.

Update the KeyValueStore.cs with the below code.

public class KeyValueStore    {        private readonly Dictionary<string, object> keyValuePairs;        private readonly Dictionary<string, DateTime> expirationTime;        public KeyValueStore()        {            keyValuePairs = new Dictionary<string, Object>();            expirationTime = new Dictionary<string, DateTime>();        }        public void Set(string key, object value, int ttl)        {            keyValuePairs[key] = value;            expirationTime[key] = DateTime.UtcNow.Add(TimeSpan.FromSeconds(ttl));        }        public object Get(string key)        {            if (expirationTime.ContainsKey(key) && expirationTime[key] < DateTime.UtcNow)            {                keyValuePairs.Remove(key);                expirationTime.Remove(key);                Console.WriteLine($"Removing Key {key} as it has reached expiration");                return null;            }            return keyValuePairs[key];        }    }

Update program.cs with the below code

Console.WriteLine("Starting KVLite");var kvStore = new KeyValueStore();kvStore.Set("Test Key", "Test Value", 5);Console.WriteLine(kvStore.Get("Test Key"));Thread.Sleep(6000);Console.WriteLine(kvStore.Get("Test Key"));

On running the app, you should see "Test Value" only once on the console as it was removed after 5 seconds.

However, this is incomplete as in most use cases, we want the Key-Value to be stored forever (until explicitly deleted).

Let's implement that.

We will set the default value of ttl to be -1.
if ttl is -1, then the datetime value stored is DateTime.MaxValue (which is forever), else the given time

public class KeyValueStore    {        private readonly Dictionary<string, object> keyValuePairs;        private readonly Dictionary<string, DateTime> expirationTime;        public KeyValueStore()        {            keyValuePairs = new Dictionary<string, Object>();            expirationTime = new Dictionary<string, DateTime>();        }        public void Set(string key, object value, int ttl = -1)        {            keyValuePairs[key] = value;            var dateTime = (ttl == -1) ? DateTime.MaxValue : DateTime.UtcNow.Add(TimeSpan.FromSeconds(ttl));            expirationTime[key] = dateTime;        }        public object Get(string key)        {            if (expirationTime.ContainsKey(key) && expirationTime[key] < DateTime.UtcNow)            {                keyValuePairs.Remove(key);                expirationTime.Remove(key);                Console.WriteLine($"Removing Key {key} as it has reached expiration");                return null;            }            return keyValuePairs[key];        }    }

This was an oversimplified implementation of the TTL.

A fully-functional version can be as below. Download the repo to run it.

using KVLite.Core.Models;namespace KVLite.Core.Storage{    public class KeyValueStore    {        private readonly Dictionary<string, object> keyValuePairs;        private readonly Dictionary<string, DateTime> expirationTimes;        private readonly TimeSpan defaultTtl = TimeSpan.MaxValue;        public KeyValueStore()        {            keyValuePairs = new Dictionary<string, Object>();            expirationTimes = new Dictionary<string, DateTime>();        }        ///         /// Sets a value in the key-value store with an optional time-to-live (TTL).        ///         /// The key of the value to set.        /// The value to set.        /// The optional time-to-live (TTL) in seconds.        /// A StatusModel indicating the status of the set operation.        public StatusModel Set(string key, object value, double ttl = -1)        {            var status = new StatusModel();            try            {                if (string.IsNullOrEmpty(key))                {                    status.Status = StatusConst.Error;                    status.Message = "Key is null or empty";                     return status;                }                if (Exists(key))                {                    status.Status = StatusConst.Error;                    status.Message = "Key already exists";                    return status;                }                var dateTime = (ttl == -1) ? DateTime.MaxValue : DateTime.UtcNow.Add(TimeSpan.FromSeconds(ttl));                keyValuePairs[key] = value;                expirationTimes[key] = dateTime;            }            catch (Exception ex)            {                status.Status = StatusConst.Error;                status.Message = ex.Message;            }            status.Message = "Successfully Stored ";            return status;        }        ///         /// Retrieves the value associated with the specified key from the key-value store.        ///         /// The key of the value to retrieve.        /// A StatusModel indicating the status of the get operation and the retrieved value.        public StatusModel Get(string key)        {            var status = new StatusModel();            if (string.IsNullOrEmpty(key))            {                status.Status = StatusConst.Error;                status.Message = "Key is null or empty";                return status;            }            if (!Exists(key))            {                status.Status = StatusConst.Error;                status.Message = "Key does not exist";                return status;            }            if (expirationTimes[key] >= DateTime.UtcNow)            {                status.Status = StatusConst.Success;                status.Message = StatusConst.Success;                status.Value = keyValuePairs[key];                return status;            }            // Key has expired, remove it            RemoveExpiredKey(key);            status.Status = StatusConst.Error;            status.Message = "Key does not exist";            return status;        }        ///         /// Checks if the specified key exists in the key-value store.        ///         /// The key to check for existence.        /// True if the key exists, false otherwise.        public bool Exists(string key)        {            return keyValuePairs.ContainsKey(key) && expirationTimes.ContainsKey(key);        }        ///         /// Deletes the value associated with the specified key from the key-value store.        ///         /// The key of the value to delete.        /// A StatusModel indicating the status of the delete operation.        public StatusModel Delete(string key)        {            var status = new StatusModel();            if (string.IsNullOrEmpty(key))            {                status.Status = StatusConst.Error;                status.Message = "Key is null or empty";                return status;            }            if (!Exists(key))            {                status.Status = StatusConst.Error;                status.Message = "Key does not exist";                return status;            }            RemoveExpiredKey(key);            if (keyValuePairs.Remove(key))            {                expirationTimes.Remove(key);            }            status.Status = StatusConst.Success;            status.Message = "Key removed successfully";            return status;        }        ///         /// Updates the value associated with the specified key in the key-value store.        ///         /// The key of the value to update.        /// The new value to set for the key.        /// A StatusModel indicating the status of the update operation.        public StatusModel Update(string key, object value)        {            var status = new StatusModel();            if (string.IsNullOrEmpty(key))            {                status.Status = StatusConst.Error;                status.Message = "Key is null or empty";                return status;            }            if (!Exists(key))            {                status.Status = StatusConst.Error;                status.Message = "Key does not exist";                return status;            }            if (expirationTimes[key] < DateTime.UtcNow)            {                RemoveExpiredKey(key);                status.Status = StatusConst.Error;                status.Message = "Key does not exist";                return status;            }            keyValuePairs[key] = value;            status.Message = "Updated Successfully";            return status;        }        private void RemoveExpiredKey(string key)        {            if (expirationTimes[key] < DateTime.UtcNow)            {                keyValuePairs.Remove(key);                expirationTimes.Remove(key);            }        }    }}

Building a TCP Server

Our goal is to build an in-memory data store that can connect with multiple clients.

To accomplish this we need a server that is listening for incoming requests on a specific port.

Like all prominent databases (MYSQL, Redis, Mongo, etc.), we will build a TCP-based server.

Create a class called Server.cs in the Console app and add the following snippet.

public class Server{    private const int Port = 6377;    private readonly TcpListener listener;    private readonly KeyValueStore keyValueStore;    public Server(KeyValueStore keyValueStore)    {        this.keyValueStore = keyValueStore;        listener = new TcpListener(IPAddress.Any, Port);    }}

The Server class has two private fields:
- listener: It is an instance of the TcpListener class. TcpListener is a class that provides TCP network services by listening for incoming connections on a specified network port.
- keyValueStore: It is an instance of the KeyValueStore class. The KeyValueStore class is a custom class that is expected to be provided as a parameter in the constructor.
The constructor of the Server class takes an argument of type KeyValueStore and assigns it to the keyValueStore field. It also initializes the listener field by creating a new TcpListener instance that listens on any available network interface (IPAddress.Any) and the specified port number (Port = 6377).

When we run the console app, we want to be able to start the server.

Let's write a method.

public void Start()        {            listener.Start();            Console.WriteLine($"Server started. Listening on port {Port}...");            while (true)            {                var client = listener.AcceptTcpClient();                ClientHandler(client);            }                    }

The listener.Start() method starts listening for incoming connection requests on the specified network port.

The while (true) loop ensures that the server keeps running indefinitely. Inside the loop, the server waits for a client to connect by calling listener.AcceptTcpClient(). This method blocks execution until a client connection is made.

When a client connection is accepted, the AcceptTcpClient() method returns a TcpClient object representing the connected client.

The ClientHandler(client) method is called, passing the client object as an argument. This method is responsible for handling the client's requests and performing any necessary processing. Let's write it.

private void ClientHandler(TcpClient client)        {            try            {                var stream = client.GetStream();                var reader = new StreamReader(stream, Encoding.UTF8);                var writer = new StreamWriter(stream, Encoding.UTF8) { AutoFlush = true };                while (client.Connected)                {                    try                    {                        if (!stream.DataAvailable)                            break;                        var buffer = new byte[1024];                        var messageBuilder = new StringBuilder();                        while (true)                        {                            var bytesRead = stream.Read(buffer, 0, buffer.Length);                            messageBuilder.Append(Encoding.UTF8.GetString(buffer, 0, bytesRead));                            if (stream.DataAvailable)                                continue;                            // All data has been read, exit the loop                            break;                        }                        var receivedData = Encoding.UTF8.GetBytes(messageBuilder.ToString());                        var receivedString = Encoding.UTF8.GetString(receivedData);                        Console.WriteLine(receivedString);                        var encodedResponse = Encoding.UTF8.GetBytes(receivedString);                        stream.Write(encodedResponse, 0, encodedResponse.Length);                    }                    catch (Exception e)                    {                        Console.WriteLine(e);                    }                }                // Close the client connection                client.Close();            }            catch (Exception e)            {                Console.WriteLine(e);            }        }

The method obtains the network stream (stream) from the client, and creates a StreamReader and StreamWriter to handle reading from and writing to the stream, using UTF-8 encoding.

client.Connected ensures the method runs until the connection is active.

Within the loop, it first checks if there is data available to read from the stream using stream.DataAvailable. If there is no data available, it breaks out of the loop.

If there is data available, the method initializes a byte array buffer and a StringBuilder object messageBuilder to store the received data.

It enters an inner while (true) loop to read the complete message from the stream. It reads data from the stream into the buffer and appends the UTF-8 encoded string representation of the buffer to the messageBuilder. This process continues until all the available data is read.

Update Program.cs

using KVLite;Console.WriteLine("Starting KVLite");var kvStore = new KeyValueStore();var server = new Server(kvStore);server.Start();

Our simple server is now ready.

Building a Client

The client needs to establish a connection with the server. It needs to make a connection on the IP and port the server is on.

Let's write it.

Create a class Client.cs in the Web API project. Add the following code to it.

public class Client    {        private const int Port = 6397;        private const string Host = "localhost";        private TcpClient client;        private NetworkStream stream;        private StreamWriter writer;        private StreamReader reader;        public Client()        {            Console.WriteLine($"KVLite Client initiated for {Host}:{Port}");            client = new TcpClient(Host, Port);            stream = client.GetStream();            writer = new StreamWriter(stream, Encoding.ASCII) { AutoFlush = true };            reader = new StreamReader(stream, Encoding.ASCII);        }        public string Set(string message)        {            writer.WriteLine(message);            string response = reader.ReadLine();            Console.WriteLine(response);            return response;        }    }

This code snippet represents a simple client implementation that connects to a server and can send a message to the server using the Set method. The client can then receive and display the server's response.

Create a ClientController and use the Client class to send a simple message to the Server.

Run the Web API project and send a message to the Server.

You should get the same message as response.

Congratulations - you have created a simple TCP client and Server.

Serialization Protocol

How will the client understand the operation to be executed on the input i.e. is it a GET or SET operation?

I checked how Redis has solved it under the hood and found it uses a protocol called RESP.

RESP is a compact and efficient protocol that allows Redis to communicate with clients over various network protocols, including TCP/IP and Unix sockets.

I dabbled with it for a bit but realized it would be too complex a task to write a parser on the Server considering the time constraints.

So I created JSON-like objects that will be sent as plain text and deserialized on the server.

For example:
GET: {"Operation": "GET", "key": "key"}
SET: {"Operation": "SET", "key": "key", "value": "value"}
DELETE: {"Operation": "DELETE", "key": "key"}
UPDATE: {"Operation": "UPDATE", "key": "key", "value": "value"}

Update the Client.cs

 public string Set(string key, string value, string timeToLive)        {            string command = $"{{\"Operation\": \"SET\", \"key\": \"{key}\", \"value\": \"{value}\",\"ttl\": \"{timeToLive}\"}}";            writer.WriteLine(command);            string response = reader.ReadLine();            Console.WriteLine(response);            return response;        }

On the Server side, we need a parser to deserialize the incoming string.

 public class InputParser    {        public enum OperationType        {            GET,            SET,            DELETE,            UPDATE        }        public class Command        {            public string Operation { get; set; }            public string Key { get; set; }            public Object Value { get; set; }            public string Ttl { get; set; }        }        ///         /// Parses the input string into a Command object.        ///         /// The input string to parse.        /// A Command object representing the parsed input.        public Command Parse(string input)        {            var command = new Command();            try            {                command = JsonConvert.DeserializeObject(input);            }            catch (Exception e)            {                Console.WriteLine(e);            }            return command;        }    }

You can then use the Parser class to deserialize the input in Server.cs

private StatusModel ProcessRequest(string request)        {            var parser = new InputParser();            var command = parser.Parse(request);            if (command.Operation == "SET")            {                double ttl = -1;                if(!double.TryParse(command.Ttl, out ttl)) ttl= -1;                return this.keyValueStore.Set(command.Key, command.Value.ToString(), ttl);            }            else if (command.Operation == "GET")            {                return this.keyValueStore.Get(command.Key);            }            else if (command.Operation == "DELETE")            {                return this.keyValueStore.Delete(command.Key);            }            else if (command.Operation == "UPDATE")            {                return this.keyValueStore.Update(command.Key, command.Value.ToString());            }            return new StatusModel { Status = StatusConst.Error, Message = "Error"};        }

Handling Multiple Clients

One of the biggest challenges in building a server is to have the ability to handle multiple clients simultaneously.

We can do this by blocking threads and using asynchronous processing wherever possible.

Let's make small changes to Server.cs to enable multiple client requests.

public void Start()        {            listener.Start();            Console.WriteLine($"Server started. Listening on port {Port}...");            ClientListenerAsync().ConfigureAwait(false);                // The server will keep running indefinitely until manually stopped.            // Make sure to handle any necessary cleanup or termination logic.        }        public async Task ClientListenerAsync()        {            while (true)            {                var client = await listener.AcceptTcpClientAsync();                Task.Run(() => ClientHandler(client));            }        }

Conclusion

There were a lot of firsts for me in this project. Networking programming, TTL implementation, and multiclient support.

In the second part of the KVLite series, we will cover SnapShot backup of data in memory, and different data structure support for values.

I am hoping you have learnt something in this article.

Repo:

https://github.com/zahere-dev/KVLite

Story Points and Tech Debt: Avoiding Estimation Nightmares

Zahiruddin Tavargere — Sun, 04 Jun 2023 02:52:09 GMT

The Context

Picture this.

The product you are working on is a few years old. Through trial and tribulation, your team has been constantly adding new features.

The team has planned to implement an important feature of the product.

The team was feeling really confident, so they started estimating the work they had to do.

They carefully divided the tasks into smaller pieces, thought about how complicated they were, and gave them story points. Everything was perfect.

But as the development started, they realized something scary.

Even though they were thorough in their efforts, the story point estimation appeared to be quite far from reality.

Small shortcuts taken to meet deadlines previously had come back to haunt them.

The estimated work seemed never-ending, weighed down by the burdensome tech debt.

Panic loomed, and the team spiraled into an estimation nightmare, haunted by their own creation.

Been in this situation? Do not worry, you are not alone.

When it comes to estimating the time and effort required for a software development project, one crucial factor often gets overlooked: tech debt.

Tech Debt

Tech debt refers to the cumulative consequences of taking shortcuts or implementing temporary solutions during the development process.

These shortcuts and compromises may help meet immediate deadlines, but they create long-term problems that accumulate over time, hindering progress and slowing down future development.

When tech debt is not accounted

Taking tech debt seriously during story estimation is vital for several reasons.

Failing to account for tech debt can lead to inaccurate estimations.

Ignoring tech debt in estimation leads to underestimating the complexity and potential obstacles, setting unrealistic expectations for project timelines.

Unresolved issues, suboptimal coding practices, and accumulated shortcuts increase the likelihood of bugs, crashes, and performance problems.

Overlooking tech debt in story estimation can have a detrimental impact on team morale.

In fact, a 2021 survey by Coding Sans shows that top-performing teams showed estimation as the number 1 cause of software delivery problems. Unaccounted tech debt plays a big role here.

The factors affecting estimation accuracy

Complexity: The extent to which the number of components and associated component interactions is considered challenging
Scale: The size of the effort
Timelines: The impact of aggressive timelines
Prior estimation experience: The years of experience involved with estimating effort in software development projects
Perceived Project Control: The degree to which the effort timeline can be influenced by developers
Domain knowledge: The degree to which agile team members are familiar with the discipline or field that the application encompasses
Codebase knowledge: The degree to which the agile team members have past experience with existing source code

There appear to be not many research models that incorporate technical debt into an existing effort estimation process involving agile software development projects.

A 2019 whitepaper "Hitting the Bullseye: The Influence of Technical Debt on the Accuracy of Effort Estimation in Agile Projects" by James Ball of the University of Mississippi, talks about how Tech Debt is hardly considered during effort estimation.

The analysis showed TD positively affects AEE. Specifically, as Technical Debt is considered more by the agile team during the effort estimation process, the Accuracy of the Effort Estimate in an Agile Project improves and therefore 100 increases the likelihood for an on-time delivery.

What is the impact of Tech Debt on Story Point estimation?

Hidden Time Bombs

Ignoring those accumulated shortcuts and suboptimal solutions during story point estimation can lead to inaccurate time predictions and jeopardize project timelines.

Impaired Velocity

Tasks that seem simple on the surface can become much more complex due to dependencies on outdated or poorly structured code.

Increased Rework

Ignoring tech debt during estimation can lead to underestimating the effort required to complete a user story. As a result, you may find yourself facing unexpected rework as you encounter roadblocks caused by the tangled web of technical debt.

Unpredictable Dependencies

Tech debt can introduce unexpected dependencies between different user stories. These dependencies can cause delays and bottlenecks as the team struggles to address the underlying issues and integrate new features or changes into the existing codebase.

Challenging Prioritization

Urgent bug fixes or critical feature updates may get pushed aside due to unforeseen complications arising from the accumulation of technical debt.

In what ways can developers account for tech debt during the estimation process?

Awareness and Documentation

Developers should be aware of existing tech debt and document it appropriately. This includes identifying areas of the codebase that require refactoring, known bugs or issues, and any technical limitations that may impact the estimation process.

Task-Level Estimation

When breaking down user stories into smaller tasks, developers should allocate time specifically for addressing tech debt. This could include allocating time for refactoring, code cleanup, bug fixing, or improving existing functionality affected by tech debt.

Communication with Stakeholders

Developers should transparently communicate the presence and impact of tech debt to project stakeholders, such as product owners or managers. This helps set realistic expectations and ensures that the importance of addressing tech debt is understood and prioritized appropriately.

Conclusion

Developers should advocate for regular time allocated to addressing tech debt as part of the development process. By actively tackling tech debt, developers can gradually reduce its impact on future estimations and improve overall project efficiency.

Sources:

HTTP - Everything you MUST know!

Zahiruddin Tavargere — Sun, 28 May 2023 14:54:31 GMT

While there are voluminous books on HTTP, I have done my best to condense the important parts that every developer must understand about HTTP in one article.

Let's look at the HTTP version used at some of the top websites in the world.

Open Chrome and press F12 for dev tools
Click on the Network tab
In the address bar type google.com (or any website) and press Enter

In the Protocol column h3 or h2 or http/1.1 denotes the HTTP version in use.

Notice how these websites are leveraging different HTTP versions to serve different objects. Chances are you might have never cared to observe.

How Abstractions Shape Developer Perspectives on HTTP

It's fascinating how abstractions have shaped our perspective as developers, often leading us to primarily focus on surface-level aspects.

If you were to inquire about HTTP with an average developer, their response would likely revolve around the basic HTTP methods such as GET, POST, PUT, and so on.

Perhaps a few might mention HTTP headers, content-type, and a handful of other specifics, but overall, most developers wouldn't have extensively explored the intricacies of HTTP.

What are the benefits you may ask?

By understanding and effectively utilizing the capabilities of HTTP, we can optimize and provide users with blazing-fast, interactive web experiences.

Here are a few benefits of a deeper understanding of HTTP...

Minimizing Round Trips
Efficient Resource Delivery
Caching for Performance
Prioritizing Resource Loading
Reducing Redirects

Let's begin with the basics of HTTP.

HTTP - Hypertext Transfer Protocol.

What began as a simple, one-line protocol for retrieving hypertext quickly evolved into a generic hypermedia transport, and now almost two decades later can be used to power just about any use case you can imagine.

Evolution

Let's look at the evolution of HTTP over the years.

📆 1989 - HTTP is developed by Tim Berners-Lee at CERN.

📆 1990 - The first version of HTTP, HTTP/0.9, is released. HTTP/0.9 was a very simple protocol that only allowed clients to request documents from servers.

📆 1991 - HTTP/1.0 is released. HTTP/1.0 added several new features, including support for headers, cookies, and caching.

📆 1997 - HTTP/1.1 is released. HTTP/1.1 added several performance improvements, including support for persistent connections and pipelining

📆 2015, HTTP/2 is released. HTTP/2 is a major revision of HTTP that provides significant performance improvements over HTTP/1.1

📆 2022, HTTP/3 becomes standard on June 6, 2022. It is designed to improve the performance and security of HTTP

Building blocks of HTTP

Client/Server

A client and server make up the basic components of the Internet.
An HTTP client (Eg: Browser) makes a request to an HTTP Server. The server responds back with data.

Resources

Be it HTML files, static files, images, videos, etc. - a web server responds with a resource.

Because the Internet hosts many thousands of different data types, HTTP carefully tags each object being transported through the Web with a data format label called a MIME type.

MIME type

MIME - Multipurpose Internet Mail Extensions - originally designed for emails was adopted by HTTP to describe and label its own multimedia content.

How does the client request a resource?

It does this through the server resource name called a Uniform Resource Identifier, or URI.

URI

URIs are like postal addresses of the Internet, uniquely identifying and locating information resources around the world.

The URI for the png image above is as below.

https://cdn.hashnode.com/res/hashnode/image/upload/v1684977294735/57562ef5-369b-4127-90b9-bd8ec6d92ea9.png

The most common form of a resource identifier is the Uniform Resource Locator or URL.

URL

URLs describe the specific location of a resource on a particular server.

Below are some of the different types of URLs

Transactions

An HTTP transaction consists of a request (sent from client to server), and a response (sent from the server back to the client).

There are several request methods. Each method tells the server the action to perform.

Some common HTTP methods.

Method	Meaning
GET	Read data
POST	Insert data
PUT	Deposit data ( inverse of GET)
PATCH	Update data
DELETE	Delete data

It should be noted that the establishment of the connection between the client and server is a prerequisite for utilizing these HTTP methods.

Connections

Before an HTTP client can send a message to a server, it needs to establish a TCP/IP or QUIC (more about this below) connection between the client and server using Internet protocol (IP) addresses and port numbers.

To make a connection, we need the IP address of the server computer and the port number associated with the specific software program running on the server.

The following steps show how a browser uses HTTP to connect to a remote server and display simple HTML...

The browser extracts the servers hostname from the URL
The browser converts the servers hostname into the servers IP address
The browser extracts the port number (if any) from the URL
The browser establishes a TCP connection with the web server
The browser sends an HTTP request message to the server
The server sends an HTTP response back to the browser
The connection is closed, and the browser displays the document

Types of Headers

There are several types of headers in HTTP, each serving a specific purpose. Here are some commonly used header types:

General Headers: These headers apply to both requests and responses and provide general information about the message. Examples include:
- Cache-Control: Specifies caching directives for both the client and server.
- Connection: Indicates whether the connection should be kept alive after the request/response.
- Date: Represents the date and time when the message originated.
Request Headers: These headers are sent by the client as part of an HTTP request to provide information about the request or the client itself. Examples include:
- Host: Specifies the target host and port number of the server.
- User-Agent: Identifies the client application, typically a web browser.
- Accept: Informs the server about the acceptable content types for the response.
Response Headers: These headers are sent by the server as part of an HTTP response and provide additional information about the response. Examples include:
- Content-Type: Specifies the media type of the response content.
- Server: Indicates the server software used to handle the request.
- Set-Cookie: Sets a cookie value to be stored on the client for future requests.
Entity Headers: These headers apply to the message body or entity in both requests and responses. Examples include:
- Content-Length: Specifies the length of the message body in bytes.
- Content-Encoding: Indicates the encoding applied to the entity, such as gzip or deflate.
- Content-Language: Specifies the natural language of the content.
Security Headers: These headers are used to enhance security in the context of web applications. Examples include:
- Strict-Transport-Security: Enforces the use of secure connections (HTTPS).
- X-Content-Type-Options: Prevents browsers from automatically detecting content types.
- X-XSS-Protection: Helps mitigate cross-site scripting (XSS) attacks.

Headers play a crucial role in communicating additional information between clients and servers, enabling enhanced functionality and security in web applications.

Issues with HTTP/1.1

Head-of-Line blocking

Head of Line blocking (HOL blocking) is a performance-limiting phenomenon that occurs in HTTP/1.1 when a client sends multiple requests to a server over a single TCP connection.

Each request is queued up behind the previous request, and the server can only process one request at a time.

This means that if the first request in the queue takes a long time to complete, all of the other requests will have to wait.

For example, imagine that you are trying to load a web page that contains images. Your browser will send multiple HTTP requests to the server, one for each image.

If the first request takes a long time to complete, because the image is large or the server is overloaded, then the other requests will have to wait. This can lead to a noticeable delay in loading the web page.

Congestion Window

The congestion window is a measure of how much data can be sent in a single TCP packet.

When a client makes a request, the congestion window is set to a small value. As the client receives data from the server, the congestion window is gradually increased.

The problem with this approach is that if too many clients are making requests at the same time, the congestion window can be too small to accommodate all of the traffic.

This can lead to congestion on the network, which can slow down or even prevent requests from being completed.

No Compressed Headers

HTTP/1.1 does not support header compression, which can lead to large request sizes and slow page load times. This is especially true on low-bandwidth or congested networks.

HTTP/2 key features and how they solved HTTP/1.1 problems

Multiplexing

One of the major improvements in HTTP/2 is multiplexing, which allows multiple requests and responses to be sent and received concurrently over a single TCP connection.

In HTTP/1.1, only one request/response could be processed at a time per connection, leading to head-of-line blocking where slower resources delayed others.

With multiplexing, HTTP/2 eliminates head-of-line blocking, enabling parallel and efficient resource fetching, resulting in improved performance and reduced latency.

Binary Protocol

HTTP/2 uses a binary protocol instead of the text-based protocol used by HTTP/1.1.

The binary framing enables more efficient parsing and serialization of data, reducing overhead and improving overall performance.

Header Compression

HTTP/2 introduces header compression using the HPACK algorithm.

With header compression, HTTP/2 reduces the size of headers, leading to reduced bandwidth consumption and faster communication between the client and server.

Server Push

HTTP/2 introduces server push, where the server can proactively push resources to the client without waiting for explicit requests.

This feature allows the server to anticipate the client's needs and send relevant resources, eliminating the need for additional round trips.

It helps reduce latency and speeds up page loading times by reducing the number of requests required.

Stream Prioritization

HTTP/2 allows stream prioritization, where resources can be assigned different priorities.

This feature enables the client and server to communicate the importance of resources, ensuring that critical resources are given priority and delivered promptly, improving the overall user experience.

Limitation of HTTP/2

Case Study: How Lucidchart found the hard way why turning on HTTP/2 was a mistake

When Lucidchart enabled HTTP/2 on the load balancers for some of their services, they immediately noticed the load balancers had higher CPU load and lower response time.

They noticed the incoming traffic was the same as usual but there was a spike in the flow of requests.

The reason? HTTP/2 multiplexing

Multiplexing substantially increased the strain on their servers.

Firstly, they received requests in large batches instead of smaller, more spread-out batches. And secondly, because with HTTP/2, the requests were all sent together, their start times were closer together, which meant they were all likely to time out.

Implementing server push

The server push feature in HTTP/2 can be challenging for developers to implement and integrate into existing applications. It requires careful consideration and implementation to ensure optimal usage and avoid potential issues.

TCP-level blocking

While HTTP/2 addressed the head-of-line blocking problem present in HTTP/1.1, there can still be blocking at the TCP level.

Protocol Ossification

The progressive reduction in flexibility of the HTTP/2 protocol, known as ossification, can pose challenges for devices that are configured to accept only TCP or UDP. This can limit the adaptability and potential benefits of HTTP/2 in certain network environments.

HTTP/3 key features and how they solved HTTP/2 problems

Improved Performance

One of the key features of HTTP/3 is its improved performance, specifically in terms of reducing latency.

It achieves this through the use of the QUIC (Quick UDP Internet Connections) transport protocol, which is built on top of UDP (User Datagram Protocol) instead of TCP (Transmission Control Protocol) used by HTTP/2.

TCP suffers from a phenomenon called head-of-line blocking, where the delay or loss of a single packet can block the delivery of subsequent packets, resulting in increased latency.

With QUIC, HTTP/3 mitigates this issue by enabling independent packet transmission, reducing latency and improving overall performance.

Enhanced Security

HTTP/3 incorporates built-in encryption as a standard feature, providing enhanced security compared to its predecessor. Unlike HTTP/2, HTTP/3 mandates the use of encryption, ensuring that all communications are encrypted by default.

HTTP/1.1 vs HTTP/2 vs HTTP/3

	HTTP/1.1	HTTP/2	HTTP/3
Protocol	Text-based protocol	Binary protocol	Binary protocol
Multiplexing	Not supported	Supported (Multiplexing allows multiple requests/responses to be sent over a single connection simultaneously)	Supported (Improved multiplexing for better performance)
Header Compression	Not supported	Supported (Headers are compressed to reduce overhead)	Supported (Enhanced header compression for improved efficiency)
Flow Control	Not supported	Supported (Allows controlling the rate of data sent)	Supported (Improved flow control mechanisms)
Server Push	Not supported	Supported (Allows the server to push additional resources to the client without waiting for a request)	Supported (Improved server push mechanisms)
Header Compression Mechanism	Not specified, typically uses uncompressed headers	HPACK (Header Compression for HTTP/2)	QPACK (Header Compression for HTTP/3)
Request/Response Prioritization	Not supported	Supported (Allows assigning priority to requests, ensuring more important requests are processed first)	Supported (Enhanced prioritization mechanisms)
Stream Dependencies	Not supported	Supported (Allows establishing dependencies between streams)	Supported (Improved stream dependency handling)
Connection Reuse	Limited reuse of TCP connections	Multiplexing allows efficient connection reuse	Multiplexing with enhanced connection reuse
Server Push Cancellation	Not supported	Supported (Allows canceling pushed resources if they are no longer needed)	Supported (Improved server push cancellation mechanisms)
Security	No mandatory encryption (HTTPS optional)	No mandatory encryption (HTTPS optional)	Encrypted by default (HTTPS is mandatory)
Latency	Higher latency due to sequential request/response handling	Lower latency due to multiplexing and compressed headers	Further reduced latency due to improved multiplexing and enhanced mechanisms

HTTP/2+ - why it is still not fully adopted?

Despite the obvious benefits, HTTP/2+ has not found 100% adoption yet. Some reasons could be...

Compatibility and Support

HTTP/2 and HTTP/3 have gained significant support from major web browsers and servers. However, there are still legacy systems, outdated browsers, or network infrastructure that do not fully support these newer protocols. This lack of compatibility can hinder the widespread adoption of HTTP/2 and HTTP/3.

Migration Complexity

Transitioning from HTTP/1.1 to HTTP/2 or HTTP/3 can be complex and require infrastructure upgrades, configuration changes, and updates to server and client software. The complexity and effort involved in the migration process can slow down adoption, particularly for organizations with large and complex systems.

Network Infrastructure Limitations

Some network infrastructure, particularly in enterprise or legacy environments, may have limitations or restrictions that make it challenging to fully adopt HTTP/2 or HTTP/3. This can include firewalls, proxies, or network configurations that do not support the necessary protocols or encryption mechanisms required by these newer versions.

Performance Trade-offs

While HTTP/2 and HTTP/3 offer significant performance improvements over HTTP/1.1, there can be trade-offs in certain scenarios. For example, the increased resource consumption in HTTP/2 due to multiplexing may pose challenges for resource-constrained servers. Additionally, the encryption overhead in HTTP/2 and HTTP/3 can introduce additional processing requirements for both servers and clients.

Slow Industry Standardization

The process of standardization and widespread adoption of new protocols takes time. HTTP/2 and HTTP/3 have undergone rigorous standardization processes, but it still takes time for organizations, developers, and vendors to implement and support these newer protocols.

Conclusion

having a strong understanding of HTTP is crucial for developers. By grasping its building blocks and staying updated with the advancements in protocols such as HTTP/2 and HTTP/3, developers can optimize web applications for better performance, security, and user experience.

A deep knowledge of HTTP also enables effective troubleshooting, caching, compression, and encryption, leading to high-performing and reliable applications. With this knowledge, developers can unlock the full potential of the web and deliver seamless user experiences while driving innovation in the digital world.

Source

Newsletter: Different Architecture Patterns

Zahiruddin Tavargere — Sat, 20 May 2023 03:57:04 GMT

In this edition of the newsletter, we will look at different architectural patterns that are practical and have several use cases.

What is the Pipe and Filter Architecture?

The Pipe and Filters architecture pattern is a design pattern that provides a structured approach to processing and transforming data.

It breaks down complex tasks into a series of smaller, independent processing steps, or filters, which are connected together through pipes.

Each filter performs a specific operation on the data, and the output is passed along to the next filter in the pipeline.

One way to think about the Pipe and Filters architecture pattern is to imagine a production line in a manufacturing plant.

On a production line, raw materials go through a series of machines or filters, each performing a specialized operation.

The materials pass through the machines in a sequential manner, with each machine adding value and refining the product.

Similarly, in the Pipe and Filters pattern, data flows through a sequence of filters, with each filter performing a specific processing task, ultimately producing the desired output.

High-level Pros and Cons

Pros
Modularity: Filters can be developed and maintained independently, which allows for better code organization and reusability.
Scalability: New filters can be added or existing ones modified easily, making it simpler to accommodate changing requirements and scale the system.
Flexibility: Filters can be combined and rearranged to support different data processing flows, offering adaptability to various scenarios.
Testability: Individual filters can be tested in isolation, facilitating unit testing and ensuring the correctness of each processing step.

🚫 Cons
Overhead: The pattern introduces additional communication overhead between filters, which may impact performance in highly latency-sensitive systems.

Ordering Dependencies: The order of filters may be critical in some cases, requiring careful consideration during the design phase.

Data Format Consistency: Filters need to agree on a standardized data format, which may pose challenges when dealing with multiple data sources or formats.

🎯 Use Cases:
Data processing pipelines: ETL (Extract, Transform, Load) processes, data validation, and data enrichment pipelines.

Image and video processing: Applying filters, transformations, or effects to media files.

Message processing systems: Filtering, transforming, or routing messages based on specific criteria.

Event-driven systems: Handling events through a sequence of filters for processing and analysis.

What is Space-based Architecture?

Space-based Architecture draws inspiration from the vastness and resilience of outer space, this architectural pattern enables applications to handle large volumes of data and efficiently process complex tasks across a network of interconnected nodes.

Space-Based Architecture leverages a network of interconnected nodes, each processing and storing data independently.

Just as celestial bodies collaborate harmoniously to shape the cosmos, these nodes collaborate to perform tasks and share information in a decentralized manner, creating a scalable and fault-tolerant system.

High-level Pros and Cons

Pros
🌟 Scalability: By distributing data and processing across multiple nodes, this pattern allows for horizontal scaling, accommodating growing workloads seamlessly.

🌟 Fault-Tolerance: With its decentralized nature, Space-Based Architecture enhances system resilience. If a node fails, the workload and data can be dynamically redistributed to other nodes, ensuring continuity of operations and reducing the impact of failures.

🌟 Elasticity: The architecture pattern supports dynamic resource allocation, enabling the system to scale up or down based on demand.

🚫 Cons
Complexity: Implementing Space-Based Architecture requires careful planning and design. Coordinating data distribution, ensuring consistency, and managing node failures may introduce additional complexity to the system.

Network Overhead: The communication overhead between nodes can impact performance. Efficient network design and optimization are crucial to minimize latency and maximize throughput.

What is the Microkernel Architecture?

Are you looking to build highly extensible and adaptable software systems? If so, the microkernel architecture pattern may be the solution you need.

Microkernel architecture, also known as plug-in architecture, is a software design pattern that separates a system's core functionalities from non-core functionalities.

An oversimplified example of microkernel architecture is an IDE.

Though not a microkernel architecture in the purest form, an IDE, like VS code, can be a simple example to help understand the microkernel architecture pattern.

The core system of VS Code provides basic functionality such as text editing, syntax highlighting, and debugging, while additional functionality is provided by a wide range of plugins and extensions.

High-level Pros and Cons

Pros

Plug-in modules can be added or removed without making significant changes to the core system, allowing it to react to changes in the modules while minimizing modifications to the core system.

Compared to a layered architecture, using plug-in modules makes deployment easier and faster, which reduces downtime.

Testing is simplified since each module can be tested individually and in isolation from the others.

Although not typically recommended for high-performance applications, the plug-in architecture can perform well because it allows the application to be customized by including only the necessary features.

🚫 Cons

The plug-in architecture is typically suited for smaller applications that are not designed to be highly scalable.

Before implementing a plug-in architecture, thorough design analysis is necessary, including considerations such as contract versioning, internal plug-in registries, plug-in granularity, and the available options for plug-in connectivity.

What is the Event-bus Architecture?

What if your application architecture has evolved to a point where the communication between several services has become tangled?

Event-bus architecture might just be the solution you've been searching for. 🌟

Also known as publish/subscribe architecture, is a design pattern that enables decoupled and asynchronous communication between various components of a system.

Instead of components directly communicating with each other, they exchange messages through a central "event bus."

🚌 This bus acts as a mediator, ensuring seamless information flow without components needing to have explicit knowledge of one another.

Highlevel Pros and Cons:

Pros
🌟 Loose Coupling: Components are decoupled, allowing them to evolve independently without impacting other parts of the system.

🌟 Scalability: New components can be easily added to the system, and existing components can subscribe to relevant events, enabling flexibility and scalability.

🌟 Asynchronous Communication: Components can operate independently and asynchronously, enhancing performance and responsiveness.

🌟 Extensibility: By using event-driven communication, it becomes easier to introduce new features and functionalities without disrupting the existing system.

🚫 Cons
Complexity: Implementing and managing an event-bus infrastructure requires careful design and maintenance, which can introduce additional complexity to the system.

Event Handling: Developers need to design robust event handling mechanisms to ensure reliable delivery and prevent issues like event loss or delayed processing.

Debugging and Testing: With distributed communication, debugging and testing can become more challenging, as the flow of events may not be linear and predictable.

What is the Broker Pattern?

Are you looking for a powerful design pattern that improves communication and collaboration between components in your software architecture?

The Broker Pattern is a widely used architectural pattern that facilitates communication between components by introducing a central coordinator known as the broker.

Instead of direct communication between components, the broker acts as an intermediary, enabling decoupling and flexibility in your software design.

By centralizing communication, the pattern reduces dependencies and enhances the scalability, maintainability, and extensibility of your system.

👍 Highlevel Pros and Cons:

Pros
🌟 Decoupling: Components communicate through the broker, reducing direct dependencies and enabling individual components to work independently.
🌟Flexibility: The broker can dynamically adapt to changes in the system, allowing for easy addition or removal of components.
🌟Scalability: With the broker acting as a centralized point for communication, scaling your system becomes more manageable and efficient.
🌟 Maintainability: By abstracting the communication logic to the broker, it becomes easier to modify or replace components without affecting the overall system.

🚫 Cons
Increased complexity: Introducing a central broker adds an additional layer of complexity to the system.
Single point of failure: If the broker fails, it can disrupt the communication between components, potentially impacting the entire system.
Performance overhead: The use of a broker can introduce some performance overhead due to the additional processing required for message passing.

Newsletter: How to document Architecture Decisions, Analyze Risk, and more!

Zahiruddin Tavargere — Sun, 14 May 2023 04:52:41 GMT

What is the Groundhog Day Antipattern?

The Groundhog Day anti-pattern occurs when stakeholders involved in a project dont know why a design decision.
It is named after the classic movie "Groundhog Day" in which the protagonist finds himself reliving the same day over and over again.

Consequences
The same decision is made more than once, possibly differently. New people joining the project dont understand why a decision was made.

Solution
1. Document decision-making and progress
2. Collaborate for a different perspective
3. (Over) Communicate decisions to the right people
4. Experiment with different solutions

How to document Software Architecture Decisions?

Use a consistent format (for easy consumption and coherence between related topics)
Explain the problem in detail
Outline the solutions ( alternatives)
Justify the decision
Provide supporting document

I also like the structure recommended by Mark Richards and Neal Ford in their book Fundamentals of Software Architecture an Engineering Approach.

The authors recommend Architecture Decision Records (ADRs).

The basic structure of an ADR consists of five main sections

Title, Status, Context, Decision, and Consequences.

How to analyze Architecture Risk

Classification

Determine if the risk can be classified as low, medium or high.

Use the architecture Risk Matrix to reduce level of subjectiveness.

You can build a summarized report of the overall risk of an architecture called the Risk Assessment. Risk assessments contain the risk against some assessment criteria.

Risk Storming

It is a collaborative exercise used to determine risk within a specific area (technology, performance, scalability, etc.)

It involves multiple brainstorming efforts from multiple stakeholders including architects and developers. Each stakeholder first assigns risks independently and then collaboratively.

It has 3 activities:
1. Identification of areas of risks
2. Gaining Consensus among all participants
3. Mitigation of risk by changing/enhancing the architecture

Do you know the fallacies of distributed computing?

Attention junior developers! It seems like monolith architecture is making a grand comeback!

Nevertheless, as a junior developer, it's essential for you to understand the fallacies of distributed computing because it can affect the way you design, develop, and maintain software applications that run on distributed systems.

The fallacies of distributed computing are a set of assertions made by L Peter Deutsch and others at Sun Microsystems describing false assumptions that programmers new to distributed applications invariably make

These fallacies include:

The network is reliable

Latency is zero

Bandwidth is infinite

The network is secure

Topology doesn't change

There is only one administrator

Transport cost is zero

The network is homogeneous

By taking into account the limitations and challenges of distributed computing, developers can create systems that are better equipped to handle network failures, latency, and security threats.

Slow responses from the servers in turn cause more traffic to them?

Did you know that slow responses from your website cause more traffic to your servers?

Slow website response times can lead to increased traffic on your website as users hit the Reload button while waiting for pages to load.

This can put an additional strain on your website's servers, leading to even slower response times and a poor user experience.

This can lead to a higher bounce rate, as users may navigate away from the website before the page has a chance to load.

A study by Akamai Technologies found that 47% of consumers expect a web page to load in 2 seconds or less, and 40% of people abandon a website that takes more than 3 seconds to load.

How to avoid this? There are several ways to avoid the problem of slow website response times and the subsequent issues that can arise

Optimize images and other media
Use a Content Delivery Network (CDN)
Use a caching plugin
Optimize your code
Use a performance monitoring tool
Regularly update the website and its components

How does DNS lookup work?

A user types a URL (such as abc.com) into their web browser.
The browser sends a request to the local DNS resolver, which is typically provided by the user's internet service provider (ISP).

The local DNS resolver checks its cache to see if it has a recent copy of the DNS record for the requested domain.

If the local DNS resolver does not have a recent copy of the DNS
record, it sends a request to the root DNS servers.

The root DNS servers respond with the address of the top-level domain (TLD) DNS servers (such as .com or .org) for the requested domain.

The local DNS resolver sends a request to the TLD DNS servers, which respond with the address of the authoritative DNS servers for the requested domain.

The local DNS resolver sends a request to the authoritative DNS servers, which respond with the IP address associated with the requested domain.

The local DNS resolver caches the DNS record and sends the IP address back to the browser.

The browser uses the IP address to establish a connection to the web server associated with the requested domain, and the website is displayed to the user

How one man almost ruined the web!!!

Meet, David Siegel, a web design pioneer who is credited to bring design practices during the early days of the world wide web.

Despite being a stalwart designer - Siegel garnered another distinction to his name.

He is widely considered by many, and also claimed by himself, as the man who almost ruined the web.

In 1997, David Siegel wrote an article titled "The Web Is Ruined and I Ruined It."

In the article, Siegel argued that he had single-handedly ruined the web by popularizing the use of tables for layout.

Prior to Siegel's work, web pages were typically designed using simple HTML markup.

However, Siegel's use of tables allowed for more complex and visually appealing layouts.

As a result, other web designers began to follow suit, and soon tables became the de facto standard for web layout.

Siegel's article was controversial at the time, but it turned out to be a prescient warning about the dangers of using tables for layout.

Tables are not designed for layout, and as a result, they can make web pages difficult to read and navigate.

Additionally, tables can make web pages slower to load, and they can be difficult to maintain.

What is the Data-Ink Ratio?

I am building a dashboard in angular and was reading more on how to draw the users' attention to specific columns.

While researching, I came across the concept of the Data-Ink Ratio.

Data-Ink Ratio

Edward R. Tufte introduced a concept in his 1983 classic The Visual Display of Quantitative Information that he calls the "dataink ratio."

"A large share of ink on a graphic should present data information, the ink changing as the data change. Dataink is the nonerasable core of a graphic, the non-redundant ink arranged in response to variation in the numbers represented."

Then, Dataink ratio = dataink / total ink used to print the graphic i.e. proportion of a graphic's ink devoted to the nonredundant display of data information

Stephen Few, the author of the book 'Information Dashboard Design', explains this further with a pixel analogy for computer screens.

"Reduce the nondata content (grid lines in a table) as much as possible, and then proceed to enhance the data content with as much clarity and meaning as possible, working to make the most important data stand out above the rest"

The Tricky Business of Building Microservices: How to Keep Your Architecture from Becoming a Distributed Monolith

Zahiruddin Tavargere — Tue, 28 Mar 2023 08:55:04 GMT

The Context

Imagine this: You've been tasked with converting a monolithic application into a more modern, scalable architecture.

You diligently decompose the monolith by features, creating a separate service for each one and deploying them to the cloud.

You're feeling pretty good about yourself - after all, you've just built a microservices architecture from scratch!

But as your application starts to scale, you begin to notice something odd. Every time you need to deploy a new feature, you find yourself having to deploy not just one service, but many.

And if something goes wrong with one of those services, it can have a ripple effect throughout the entire application.

That's when you realize you've engineered a 'Distributed Monolith'.

This mistake is actually quite common. It's so widespread that it even has a name: the "microservices trap."

You see, it's easy to think that breaking up your application into individual services is enough to create a microservices architecture. But in reality, true microservices architectures require much more than that.

In this article, we explore methods to avoid the transformation of intended microservices architecture into a distributed monolith.

What is a Distributed Monolith?

The Distributed Monolith system may share some resemblances with the microservices architecture, but it maintains tight internal interconnectivity, much like a monolithic application.

In simple terms, it is an application that is deployed like a microservice but is built like a monolith.

How to determine if you have built a Distributed Monolith?

You have built a distributed monolith if you are experiencing any one of the below...

Tight Coupling ( Implementation Coupling)

In a distributed monolith architecture, implementation coupling refers to the tight coupling between the various services that make up the system.

This type of coupling occurs when two or more services are dependent on each other's implementation details, making it difficult to make changes to one service without affecting the others.

For example, if one service in a distributed monolith relies on the implementation details of another service, any changes made to the second service will affect the first service. This type of coupling can lead to cascading changes across the system and can make it difficult to maintain and scale the application.

Implementation coupling can be particularly problematic in a distributed monolith architecture because it can limit the ability to independently deploy and scale services.

Communication Dependency (High-latency)

One of the major challenges of a distributed monolith architecture is latency.

When services are tightly coupled and rely on synchronous communication between multiple services, it can significantly impact the application's throughput.

Synchronous communication can introduce high latency.

For instance, if Service A sends a request to Service B and it takes a long time to respond due to a crash or slow server, it can increase the overall latency of communication and affect the application's throughput.

This undermines the purpose of implementing a microservices architecture, which is to improve scalability and resilience by breaking down functionality into independent units.

In a distributed monolith, sharing a common database among services creates tight coupling and undermines the benefits of a microservices architecture.

When services share the same datastore, changes made by one service can have a ripple effect on other services that use the same datastore.

For instance, altering the structure of the user table in an e-commerce application's datastore can trigger a cascade of redeployments in other services, such as payments, catalogs, and products.

This not only adversely affects the developer productivity but also the overall customer experience.

Similarly, even if microservices have independent codebases, shared codebases or libraries can pose challenges.

Frequent updates of shared libraries can lead to redeployments of dependent services, rendering microservices inefficient and susceptible to changes.

How to avoid building a Distributed Monolith?

Proper service decomposition

First and foremost, when decomposing your application into microservices, make sure to group features properly and identify the benefit of parsing it out into an independent unit of functionality. This step helps avoid building tightly-coupled services that rely on synchronous communication and, as a result, induce high latency.

Avoid shared datastore

In a microservice architecture, each service should have its own datastore, and services should communicate through APIs. Avoid sharing a single datastore among multiple services, as it creates a strong coupling between them, and changes in one service can affect the others.

Use asynchronous communication

To avoid coupling between services, asynchronous communication is a better approach compared to synchronous communication. It allows services to communicate without waiting for a response, improving the overall application's throughput.

Implement proper event-driven architecture

Implement an event-driven architecture that helps to avoid direct coupling between services. In this approach, services send and receive events asynchronously, and decoupling happens through the use of an event bus.

Conclusion

In conclusion, distributed monoliths can be an unintended consequence of a poorly planned microservices architecture.

While it may seem like a quick fix to just break down a monolith into smaller services, it's important to keep in mind the potential pitfalls of tightly coupled services and shared resources.

By following best practices such as service decomposition, event-driven architecture, and loose coupling, developers can avoid creating distributed monoliths and reap the benefits of a truly scalable and resilient microservices architecture.

Microkernel Architecture: How It Works and What It Offers

Zahiruddin Tavargere — Mon, 27 Feb 2023 05:35:25 GMT

What is it

Microkernel architecture, also known as plug-in architecture, is a software design pattern that separates a system's core functionalities from non-core functionalities.

In this pattern, the core functionality, also known as the microkernel, is responsible for managing the most fundamental aspects of the system. Non-core functionalities are implemented as separate modules that communicate with the microkernel through well-defined interfaces.

The application logic is divided between independent plug-in components and the basic core system.

This architecture allows for the extensibility, adaptability, and isolation of application features and custom processing.

Let's simplify that with the below example

Though not a microkernel architecture in the purest form, a simple example to help understand the microkernel architecture pattern is an IDE like VS Code.

The core system of VS Code provides basic functionality such as text editing, syntax highlighting, and debugging, while additional functionality is provided by a wide range of plugins and extensions.

Plugins in VS Code are small code snippets that provide specific features such as Git integration, auto-completion, or file searching. These plugins can be installed and uninstalled independently, and are designed to work together with other plugins and with the core system.

High-level Pros and Cons

Pros

Plug-in modules can be added or removed without making significant changes to the core system, allowing it to react to changes in the modules while minimizing modifications to the core system.
Compared to a layered architecture, using plug-in modules makes deployment easier and faster, which reduces downtime.
Testing is simplified since each module can be tested individually and in isolation from the others.
Although not typically recommended for high-performance applications, the plug-in architecture can perform well because it allows the application to be customized by including only the necessary features.

Cons

The plug-in architecture is typically suited for smaller applications that are not designed to be highly scalable.
Before implementing a plug-in architecture, thorough design analysis is necessary, including considerations such as contract versioning, internal plug-in registries, plug-in granularity, and the available options for plug-in connectivity.

Deep Dive

Microkernel architecture has 2 architecture components:

Core System
Plug-in Components

Core System

Has minimal functionality required to run the system
Designed to handle the most common and basic functions of the application, which are typically used by most users. These functions represent the happy path or general processing flow, which includes the most common and expected user interactions and system responses
Depending on size and complexity, it can be implemented as a layered architecture or a modular monolith
It can also be split into separately deployed domain services, with each domain service containing the specific plug-in to a specific domain

Plug-In Components

Standalone, independent components that contain specialized processing, additional features, and custom code meant to enhance or extend the core system
Can be used to isolate highly volatile code for better maintainability and testability
Communication between plug-in components and the core system can be point-to-point, REST, or messaging
Can be compile-based or runtime-based

How do they work together?

Registry and Contracts

To manage plug-in modules, a common approach is to use a plug-in registry.

This registry keeps track of available plug-ins and provides information about them, such as their name, data contract, and connection details.

The core system can use this registry to discover and interact with the plug-ins.

Plug-in components have standard contracts with the core system, defining their behavior, input data, and output data.

However, third-party plug-ins may use custom contracts, requiring an adapter to bridge the gap between the plug-in and the core system's standard contract.

This approach allows the core system to interact with various plug-ins without requiring specialized code for each one.

Use Cases

Some common use cases are as below...

Content Management Systems( Drupal, WordPress )

WordPress and Drupal's core system provides only the most basic services required for a CMS, such as user authentication and access control, Navigation Management, and Page Creation, while additional functionality is provided by modules.

These modules can be independently installed, updated, and configured, and are designed to work together with other modules and with the core system.

Example: WooCommerce integration with WordPress and Drupal

Web Browsers

In a web browser, the core system typically provides the basic functionality of rendering and displaying web pages, managing user input, and handling network requests.

This core system can be extended through plug-in components that provide additional functionality such as ad-blockers, password managers, or developer tools.

Benefits and Drawbacks

Benefits

High Agility: Since the application is loosely coupled, it allows constant changes to the application as the change is isolated to plug-in components
Easy Deployments: Plug-in components can be added at run-time and are all plugins are mostly independent of each other
Testability: Plug-in components can be isolated, mocked and tested independently

Drawbacks

Poor Scalability: The microkernel pattern is used for smaller products and applications. Plug-in components can be scalable but they run into the complexities described in the next point.
Complex development: Implementing the microkernel architecture can be challenging due to the complexity of contract governance, including contract versioning, plug-in granularity, internal plug-in registries, and multiple plug-in connectivity options.

Final Thoughts

The microkernel architecture pattern can be used as part of another architecture pattern or embedded in a specific area of the application where it solves a particular problem.

It provides support for evolutionary design and incremental development, allowing the addition of features and functionality without significant changes to the core system.

For product-based applications with evolving feature sets, the microkernel architecture pattern is an excellent starting point, and if necessary, it can always be refactored to another architecture pattern that better suits specific requirements.

How a Single Uncaught SQLException Grounded a Multibillion Dollar Airline?

Zahiruddin Tavargere — Fri, 27 Jan 2023 05:01:28 GMT

U.S. Federal Aviation Administration (FAA) in Jan 2023 announced new details on the cause of the Notice to Air Mission (NOTAM) system outage, which caused the delay or cancellation of more than 8,400 flights earlier that month.

The FAA announced that a contractor deleted files while working to correct synchronization between the live primary database and a backup database.

This reminded me of a case study I had read recently in the book
Release It! Design and Deploy Production-ready Software by Michael Nygard.

Interestingly the case study involved an airline and a database.

The book in the section "Case Study: The Exception That
Grounded An Airline" talks about how a tiny programming error starts the snowball rolling downhill.

In the post-mortem analysis of a major outage that occurred at an airline company, it was discovered that the root cause of the problem was a single uncaught SQLException in the code of a session bean.

The incident happened after a routine database failover and maintenance, and it caused all check-in kiosks and IVR servers to stop servicing requests at the same time.

Through investigating the thread dumps, log files, and configurations of the servers, it was determined that the problem was caused by a resource leak in the connection pool of the application server.

The leak was caused by a failure to handle SQLException when closing a JDBC statement, which resulted in the exhaustion of the resource pool and the blocking of all future calls to connectionPool.getConnection().

This incident serves as a reminder of the importance of proper handling of exceptions in code, and the potential consequences of a seemingly small oversight.

FAA outage

Book "Release It"

Scaling for Success: The Load Balancing Journey of a Fictional Startup

Zahiruddin Tavargere — Wed, 18 Jan 2023 09:28:08 GMT

The Context

We will use the journey of a fictional startup to learn how load-balancing decisions are made at every phase of the growth of a startup.

The journey is oversimplified to cater to a wide variety of audiences.

The evolution of the architecture of the startup is out of the scope of this article.

Fictional startup: ProScheduler

Company size: 4

The fictional startup we'll be discussing in this article is a SAAS (Software as a Service) company that specializes in schedule management tools.

The company started small, with a few hundred users and a basic web application that helped individuals and small businesses manage their schedules more efficiently.

However, as the company's user base grew and its product matured, it quickly realized it needed to scale its infrastructure to handle the increased traffic.

At every phase of its growth, the company learned different Load Balancing techniques and scaled its app accordingly.

Phase 1: The Beginning

Traffic: ~300 requests/hour

Load Balancer: -

In the beginning, the startup had a small team and a basic web application. They didn't anticipate a lot of traffic. Their application, hosted on a single server, was easily supporting 300 requests/hour.

Traffic: ~5000 Requests/hour; Spikes on weekends

Load Balancer: DNS-based load balancer

As the number of users grew, they would see occasional spikes on weekends and Mondays, when the customers plan for the week.

The spikes were overwhelming the single server and they needed a way to ensure their customers are not impacted.

They introduced another server to balance the load.

What is load balancing?

Load balancing is a technique used to distribute workloads evenly across multiple servers, in order to optimize resource usage, minimize response time and avoid overloading any single server.

This can be accomplished in several ways such as

Splitting applications across multiple servers
Specialized components can be added to applications to balance processing requirements and improve performance
Additional servers can be created to collectively share the load, increasing the overall processing power available

ProScheduler used a simple round-robin DNS load balancer to distribute traffic among the 2 servers.

To understand DNS-based load balancing, it is important to understand how DNS lookup works

How DNS lookup works

A user types a URL (such as www.abc.com) into their web browser.
The browser sends a request to the local DNS resolver, which is typically provided by the user's internet service provider (ISP).
The local DNS resolver checks its cache to see if it has a recent copy of the DNS record for the requested domain.
If the local DNS resolver does not have a recent copy of the DNS record, it sends a request to the root DNS servers.
The root DNS servers respond with the address of the top-level domain (TLD) DNS servers (such as .com or .org) for the requested domain.
The local DNS resolver sends a request to the TLD DNS servers, which respond with the address of the authoritative DNS servers for the requested domain.
The local DNS resolver sends a request to the authoritative DNS servers, which respond with the IP address associated with the requested domain.
The local DNS resolver caches the DNS record and sends the IP address back to the browser.
The browser uses the IP address to establish a connection to the web server associated with the requested domain, and the website is displayed to the user

How does the DNS have records of both servers?

In DNS-based load balancing, the DNS server is configured with multiple records for the same domain name, each pointing to a different server.

The DNS server is typically configured by the system administrator or network administrator of an organization.

If a website is hosted by a 3rd-party hosting service provider like GoDaddy, the provider will typically set up and configure a DNS server on the customer's behalf and provide them with the necessary information to point their domain name to the provider's server.

This load balancer worked well for ProScheduler in the early stages as it was easy to set up and required minimal maintenance.

However, as the startup began to gain traction, it became clear that it needed to scale its infrastructure to handle more traffic.

Phase 2: Scaling up

As the startup began to gain more users, it needed to scale its infrastructure to handle the increased traffic.

It explored different types of load balancers and load-balancing techniques.

What are the different types of load balancers?

They opted for software-based Layer 4 load balancers, such as NGINX, which could handle more connections and provide basic load-balancing capabilities.

These load balancers work at the network level and can distribute traffic based on IP address and port number.

Traffic: ~ 500-1000 requests/second

Load Balancer: Software Load Balancer (NGINX)

What is NGINX?

NGINX (pronounced "engine-x") is an open-source, high-performance web server and reverse proxy software.

It is often used as a web server, load balancer, and reverse proxy, and it can also be used as a mail proxy and HTTP cache.

NGINX is designed to handle a large number of concurrent connections, making it a popular choice for high-traffic websites and web applications.

How does NGINX work?

The NGINX architecture can be divided into 3 main components

Master
Worker
Cache
Master process: The NGINX master process is responsible for reading the configuration files, spawning worker processes, and handling signals. It doesn't handle client connections or perform any processing of its own.
Cache processes: Proxy caches have a cache loader and manager. The cache loader checks the disk cache item and populates the engines in-memory database with the cache metadata.
Worker processes: The NGINX worker processes are responsible for accepting and processing client connections. Each worker process can handle multiple connections simultaneously using an event-driven or asynchronous model.

Phase 3: Growing pains

As the ProScheduler's user base grew and the demand for its schedule management tool increased, it began to experience growing pains in terms of scalability and performance.

To address these issues, the startup decided to adopt a microservices architecture.

As the number of users and requests increased, the startup began to see heavy traffic on certain services, such as the user management service and the scheduling service.

To ensure that these services could handle the increased traffic, the startup implemented an F5 BIG-IP load balancer.

The F5 BIG-IP load balancer is used to distribute incoming traffic among multiple servers running each service.

By using a Layer 7 load balancer, the BIG-IP is able to route traffic based on the content of the request, rather than just the IP address or port.

This allows the BIG-IP to route traffic to the appropriate service based on the functionality being requested.

To handle the increased traffic, the startup configured the BIG-IP to use specific load balancing algorithms such as Least Connections and IP Hash, which are optimized for handling high-traffic loads.

Traffic: ~ 2000-3000 requests/second

Load Balancer: Hardware Load Balancer (F5 BIG-IP)

What is F5 BIG-IP?

F5 BIG-IP is a hardware-based load balancer and application delivery controller (ADC) that helps to improve the availability, performance, and security of web applications.

Some terms to understand.

Host: A host can be a physical server or a virtual machine running on a hypervisor.
Service: A web server/application server running in the host and exposed through a port
Cluster: A collection of hosts
Virtual Server: A virtual server is a proxy of the actual server (physical, virtual, or container). Combined with a virtual IP address, this is the application endpoint that is presented to the outside world.

The basic application delivery transaction is as follows:

The client attempts to connect with the service.
The ADC accepts the connection, and after deciding which host should receive the connection, changes the destination IP (and possibly port) to match the service of the selected host (note that the source IP of the client is not touched).
The host accepts the connection and responds back to the original source, the client, via its default route, the ADC.
The ADC intercepts the return packet from the host and now changes the source IP (and possible port) to match the virtual server IP and port, and forwards the packet back to the client.
The client receives the return packet, believing that it came from the virtual server, and continues the process.

What are the different types of load-balancing algorithms?

There are several load-balancing algorithms, the most commonly used are...

Round Robin*: In this algorithm, traffic is distributed in a round-robin fashion, where each server in the pool is given an equal opportunity to handle incoming requests.*
Least Connections*: In this algorithm, traffic is distributed based on the number of connections each server currently has. The server with the least number of connections is given the next incoming request.*
IP Hash*: In this algorithm, traffic is distributed based on the source IP address of the incoming request. The load balancer uses a hash function to map the source IP address to a specific server in the pool.*
Weighted Round Robin*: This algorithm works like the round-robin algorithm, but each server is assigned a weight, which determines how much traffic it should receive.*

The Layer 7 load balancer provided several benefits for the startup.

For one, it allowed them to distribute traffic based on more advanced rules, which helped to improve the application's performance and reduce downtime.

Additionally, it provided more advanced load balancing capabilities, such as content-based routing and rate shaping, which helped to distribute traffic more efficiently.

Phase 4: Scaling globally

As the startup became a popular destination on the internet, it needed to further scale its infrastructure to handle even more traffic.

They implemented a multi-tier load balancing architecture, with a Layer 7 load balancer in front and multiple Layer 4 load balancers behind it, to distribute traffic to multiple web server clusters.

Traffic: ~ 10000 - 15000 requests/second

Load Balancer: Multi-tier Load Balancer (F5 BIG-IP + NGINX)

This approach allowed them to improve the performance and reliability of their application while also reducing downtime.

The multi-tier load-balancing architecture provided several benefits for the startup.

For one, it allowed them to distribute traffic more efficiently across multiple web server clusters, which helped to improve the application's performance and reduce downtime.

Additionally, it provided more advanced load balancing capabilities, such as content-based routing and rate shaping, which helped to distribute traffic more efficiently.

Phase 5: Expanding globally

As the startup expanded globally, it faced new challenges in terms of managing and distributing traffic effectively.

To address these challenges, the startup implemented a combination of a traffic manager, a Content Delivery Network (CDN), and regional data centers.

The traffic manager, such as F5 BIG-IP Global Traffic Manager (GTM), is a DNS-based load balancing solution that allows the startup to manage and distribute traffic globally by directing clients to the closest or best-performing server based on their location.

The startup also implemented a CDN (they could also use 3rd party vendors), which is a network of servers that are distributed across multiple locations around the world.

To further improve performance and availability, the startup also set up regional data centers in strategic locations around the world.

This allows the startup to store data and run services closer to users, reducing the latency and improving the performance of the application.

In conclusion, the fictional startup's journey illustrates the importance of load balancing and traffic management in ensuring the availability and performance of web applications and services, and how the solution has to be dynamic and adaptive to the growth and the challenges that come along with it.

I write about System Design, UX, and Digital Experiences. If you liked my content, do kindly like and share in your network. And please don't forget to subscribe for more technical content like this.

Resources:

https://www.nginx.com/blog/inside-nginx-how-we-designed-for-performance-scale/

https://amzn.to/3XkZQ7W

https://www.f5.com/services/resources/white-papers/load-balancing-101-nuts-and-bolts

UX engineers and designers, ever been down the Wikipedia Rabbit hole? You might want to, and here's why

Zahiruddin Tavargere — Sun, 15 Jan 2023 08:03:04 GMT

If you have used Wikipedia, chances are that you have been down the Wikipedia Rabbit hole already.

"Wiki rabbit holes are informally defined as navigation paths followed by Wikipedia readers that lead them to long explorations, sometimes involving unexpected articles.

I have been going down that path ever since my journalism days.

During my initial six years working as a TV journalist, I found myself going down the Wikipedia rabbit hole (though not always the best resource) to learn about new topics.

As if, those interconnected articles weaved a story for me and I could infer a theme from it.

As a software engineer today, I find that the Wikipedia rabbit hole technique is even more valuable for me, as I am able to explore and research various topics in-depth.

I always thought that the content of an article was the sole factor that led me to go down the rabbit hole, but it appears that user experience design also played a significant role in facilitating deeper navigation.

The paper "Going Down the Rabbit Hole: Characterizing the Long Tail of Wikipedia Reading Sessions" published by Piccardi et al has the below findings.

Characteristics of article layout are associated with deep navigation in Wikipedia.

Navigational links in the infobox can support the browsing experience, similar to reading a slideshow.

The dynamics of falling into a "wiki rabbit hole" vary based on the time of day, the device used and starting article topic.

Rabbit hole sessions are more common on desktop devices and at night.

Articles about entertainment, sport, politics, and history are popular starting points for rabbit hole sessions.

Rabbit hole sessions tend to consist of topically coherent articles.

Rabbit hole sessions constitute a substantial number of overall reading sessions, far exceeding the number of active editors on Wikipedia.

Rabbit hole sessions exhibit different patterns compared to the average of all sessions, which are dominated by short sessions.

In summary, the design and layout of articles, as well as the presence of navigational links in the right place, can greatly impact the experience of deep navigation and browsing.

These sessions, which often involve cohesive sets of articles and are more prevalent on desktop devices and at night, make up a significant portion of overall Wikipedia usage.

So UX engineers and designers, can these findings be applied to Knowledge Management portals and Learn page websites?

What do you think?

Link to the research paper.

Star Wars Meets Software Development: A Guide to Implementing the Observer Pattern in C#

Zahiruddin Tavargere — Wed, 11 Jan 2023 07:45:55 GMT

Luke: Master Yoda, I need guidance

Yoda: Yes, young Luke

Luke: "Master Yoda, I am having trouble communicating with the other Jedi in the council. We are all working on different missions, and I do not know what is happening with them."

Yoda: "Ah, young Padawan, you are experiencing a problem with coordination and communication. In programming, this is a common problem that can be addressed with the Observer pattern.

Yoda: For example, we can use this pattern to keep track of the status of Jedi missions. Let's say that you have a JediCouncil class, which contains a list of Jedi Knights who are undertaking different missions. Each of these Jedi Knights has a MissionStatus property that indicates whether the mission is ongoing, completed, or failed.

Yoda: In C#, we can implement the Observer pattern by having the JediCouncil class implement the IObservable interface, and each JediKnight class implement the IObserver interface.

public class JediCouncil : IObservable<JediKnight>{    private List _jediKnights;    private List> _observers;    public JediCouncil()    {        _jediKnights = new List();        _observers = new List>();    }    public void AddJediKnight(JediKnight jediKnight)    {        _jediKnights.Add(jediKnight);        NotifyObservers(jediKnight);    }    public void RemoveJediKnight(JediKnight jediKnight)    {        _jediKnights.Remove(jediKnight);    }    public IDisposable Subscribe(IObserver observer)    {        _observers.Add(observer);        return new Unsubscriber(_observers, observer);    }    private void NotifyObservers(JediKnight jediKnight)    {        foreach (var observer in _observers)        {            observer.OnNext(jediKnight);        }    }    private class Unsubscriber : IDisposable    {        private List> _observers;        private IObserver _observer;        public Unsubscriber(List> observers, IObserver observer)        {            _observers = observers;            _observer = observer;        }        public void Dispose()        {            if (_observer != null && _observers.Contains(_observer))            {                _observers.Remove(_observer);            }        }    }}

using System;using System.Collections.Generic;public class Program{    public static void Main(string[] args)    {        var jediCouncil = new JediCouncil();        var Luke = new JediKnight {Name = "Luke Skywalker", Status = MissionStatus.Ongoing};        var Leia = new JediKnight {Name = "Leia Organa", Status = MissionStatus.Ongoing};        var Han = new JediKnight {Name = "Han Solo", Status = MissionStatus.Ongoing};        jediCouncil.AddJediKnight(Luke);        jediCouncil.AddJediKnight(Leia);        jediCouncil.AddJediKnight(Han);        var observer1 = jediCouncil.Subscribe(new JediCouncilObserver());        //Change the mission status of Luke, triggers the OnNext on the observer        Luke.Status = MissionStatus.Completed;        //Change the mission status of Leia, triggers the OnNext on the observer        Leia.Status = MissionStatus.Failed;        //Unsubscribe observer1        observer1.Dispose();        //Change the mission status of Han, it will not trigger OnNext, observer1 is unsubscribed        Han.Status = MissionStatus.Completed;    }}

public class JediKnight : IObserver<JediKnight>{    public string Name { get; set; }    public MissionStatus Status { get; set; }    public void OnCompleted()    {        // Notify the observer that all updates have been received    }    public void OnError(Exception error)    {        // Notify the observer of an error    }    public void OnNext(JediKnight jediKnight)    {        // Update the status of the Jedi Knight    }}

The Power of Proactive Performance Optimization: Rakuten 24's Success Story

Zahiruddin Tavargere — Tue, 10 Jan 2023 08:01:43 GMT

Improving web performance is not always about fixing problems that already exist.

Sometimes, proactive performance optimization can help prevent problems from occurring in the first place.

Businesses and organizations can take a proactive approach to performance optimization in order to improve the user experience, increase scalability, save time and resources, and future-proof their websites.

This can be especially important for websites that experience high levels of traffic or usage, or that are critical to the business or organization's operations.

One company that has seen the benefits of proactive performance optimization is Rakuten 24.

credit: web.dev

By investing in Core Web Vitals (CWV) and other performance optimization strategies, Rakuten 24 was able to increase revenue per visitor by 53.37%, conversion rate by 33.13%, and improve the overall user experience of its website.

Let's understand Rakuten 24's experience with proactive performance optimization and how it has helped drive business success.

We'll also explore some of the key benefits of proactive performance optimization and how other businesses and organizations can adopt a similar approach.

Let's look at their current CWV assessment first (as on Jan 10th 2023).

According to Google, The Cumulative Layout Shift measures visual stability. To provide a good user experience, pages should maintain a CLS of 0.1. or less.

They have some work to be done here in terms of improving the overall user experience, but let's understand what they did to increase revenue and conversion.

Optimized JavaScript and resources

Eliminate render-blocking resources
- This involves identifying and eliminating resources (such as scripts or stylesheets) that block the rendering of the page, by either loading them asynchronously or placing them in the footer of the document instead of the head
Split code and use dynamic import()
- This technique involves dividing your codebase into smaller chunks and loading them on demand, only when they are needed. It takes advantage of dynamic imports, which are used to load modules asynchronously, rather than loading all code at once during the initial load
Split all content into separate parts and lazy load below-the-fold HTML files
- This involves dividing a website's content into smaller chunks and loading them on demand as the user scrolls through the page
Using Resource hints
- Identify slow JavaScript resources and optimize the loading process by using the async attribute on