ProxySQL

Making MySQL AI-Ready: How MyVector and ProxySQL Work Together

Rene Cannao — Wed, 11 Feb 2026 18:34:22 +0000

As AI workloads become standard in modern applications, engineering teams face a familiar dilemma: MySQL is already the system of record, but vector search typically requires bolting on a separate database. That means two security models, two observability stacks, and inevitable data consistency headaches.

A concept presented at Pre-FOSDEM 2026 by René Cannaò (ProxySQL founder) and Alkin Tezuysal (Altinity) proposes a cleaner path: keep everything in MySQL, but add vector capabilities through a plugin called MyVector, with ProxySQL serving as the unified control plane.

The Core Idea

MyVector is a MySQL plugin (targeting 8.0/8.4) that adds vector storage, distance functions, and HNSW indexing directly inside MySQL — without requiring MySQL’s native VECTOR type. HNSW (Hierarchical Navigable Small World) enables approximate nearest neighbor search at O(log N) speed, making it practical for millions of vectors.

ProxySQL sits in front and handles the hard operational problems: classifying queries as OLTP vs. vector workloads, routing them to separate hostgroups, enforcing concurrency limits, and triggering circuit breakers when vector bursts threaten to impact production latency.

Why This Matters

The primary use cases are RAG pipelines over documentation and knowledge bases, incident/runbook search, and code semantic search — workloads where your source of truth is already in MySQL and you don’t want to fragment your data stack.

The architecture keeps OLTP writes going to a primary hostgroup while vector similarity queries route to dedicated replicas with the MyVector plugin loaded. If vector traffic spikes and replica lag climbs, ProxySQL can automatically shed vector load before OLTP P99 is affected.

The RAG Ingest Pipeline

The presentation also covers a CLI tool (rag_ingest) that handles the full ingestion pipeline: fetching rows from backend MySQL sources incrementally using watermarks, chunking and transforming text, generating embeddings via any OpenAI-compatible API in configurable batches, and storing results back as vectors. This gives you a complete, observable pipeline without external orchestration infrastructure.

Migration Path

The proposed adoption path is pragmatic: start with ProxySQL for routing and observability alone (no vectors yet), then add vector columns with simple distance UDFs, then graduate to HNSW indexing with dedicated vector replicas, and finally operationalize freshness, backfills, and tenant-aware QoS.

The Honest Tradeoff

This is explicitly a WIP/concept — the authors are looking for design partner feedback to shape the roadmap. The approach is conservative by design: InnoDB stability comes first, vector capabilities are additive, and failure modes are meant to be boring. When retrieval breaks, OLTP stays healthy.

For teams heavily invested in MySQL who want AI retrieval without fragmenting their data operations, this is a compelling direction worth watching.

Repos: github.com/askdba/myvector and github.com/ProxySQL/proxysql-vec.

Download the Making MySQL AI-Ready: How MyVector and ProxySQL Work Together Presentation given by
René Cannaò and Alkin Tezuysal at preFOSDEM MySQL Belgian Days 2026 in Brussels.

The post Making MySQL AI-Ready: How MyVector and ProxySQL Work Together appeared first on ProxySQL.

Bringing GenAI to Every MySQL Instance: ProxySQL v4.0

Rene Cannao — Sun, 08 Feb 2026 16:14:20 +0000

The Problem with “Just Migrate”

Most organizations are sitting on MySQL deployments they can’t easily change — a mix of community editions, managed cloud services, and legacy versions. Teams want RAG pipelines and natural language querying, but adding AI capabilities typically means schema migrations, new vector database infrastructure, dual-write synchronization headaches, and AI logic sprawled across every application layer. The operational cost is real, and the governance risk is worse.
ProxySQL v4.0 takes a different approach: don’t touch your database at all.

The Transparent AI Layer

The core thesis is elegant — put the intelligence at the proxy layer, not in the database or the application. ProxySQL already sits between every client and every MySQL backend, which makes it a natural choke point for centralized governance, auth, auditing, and now AI capabilities. No connection string changes, no schema migrations, no new infrastructure your DBA team has to babysit.
The comparison with app-layer integration is stark. Where app-layer AI means fragmented governance across every service, schema changes, and multiple network hops, the ProxySQL AI layer provides unified query rules, zero schema changes, and a single enforced access path.

What’s Actually in v4.0

The MCP (Model Context Protocol) server is the centerpiece. Running on port 6071 over HTTPS with bearer token auth, it exposes 30+ tools that any MCP-compatible agent — Claude Code, GitHub Copilot, Cursor, Warp — can discover and call via standard JSON-RPC. Tools span schema discovery (list_schemas, list_tables, list_columns), safe read-only execution (run_sql_readonly, explain_sql), and full RAG search capabilities (rag.search_fts, rag.search_vector, rag.search_hybrid).

All MCP requests pass through MCP Query Rules — analogous to ProxySQL’s existing mysql_query_rules — which can allow, block, rewrite, or timeout requests before they ever reach MySQL. This is where you enforce read-only access, prevent data exfiltration, and audit everything agents are doing.

The Autodiscovery system (discovery.run_static) runs a two-phase process: static schema harvesting followed by LLM-enriched metadata including summaries and domain analysis. Everything lands in a local SQLite catalog (mcp_catalog.db) that agents can then search semantically via llm.search.

The NL2SQL workflow builds on this: agents search the catalog for relevant schemas, synthesize or reuse SQL templates, execute safely via run_sql_readonly, and optionally store successful query patterns as templates for future reuse — a continuous learning loop that improves accuracy over time.

What’s Still Coming

The presentation is upfront that this is a prototype showcase. Still on the roadmap: automatic embedding generation (with local or external model options), real-time indexing via MySQL replication/binlog without touching source tables, DISTANCE() SQL semantics for vector search on AI-blind MySQL backends, and additional MCP endpoints for config management, cache inspection, and observability.

The Bottom Line

The proxy layer argument is compelling: it’s operationally mature, protocol-aware, already deployed in front of critical databases, and has a battle-tested policy engine. Adding AI there rather than in application code means one place to enforce rules, one place to audit, and zero changes to the workloads that depend on MySQL stability.

The Code is on the v4.0 branch at github.com/sysown/proxysql, and the Generative AI documentation section at sysown.github.io/proxysql covers the new features.

Download the Bringing GenAI to every MySQL Instance Presentation given by
René Cannaò at preFOSDEM MySQL Belgian Days 2026 in Brussels.

The post Bringing GenAI to Every MySQL Instance: ProxySQL v4.0 appeared first on ProxySQL.

We are hiring at ProxySQL!

wpadmin — Tue, 27 Jan 2026 15:37:59 +0000

ProxySQL is a high-performance, highly-available protocol-aware proxy for MySQL, PostgreSQL, and wire-compatible products including RDS, RDS Aurora, Percona Server, MariaDB, ClickHouse, TimescaleDB, CockroachDB, Google Cloud SQL and Azure Database.

We’re growing and seeking talented people to contribute to ProxySQL’s development while delivering best-in-class solutions to our subscription and support clients. Our team provides high-grade support services around the clock, engages in large-scale enterprise projects, and is often called upon to solve challenging situations. Join our bright and dedicated teammates working remotely across various locations and time zones.

ProxySQL C/C++ Developer [JOBSPEC-L41]

Remote (Global) • Full-time • 5-7+ years experience

What You’ll Do

Build and optimize query routing, load balancing, and failover capabilities in the ProxySQL core
Enhance connection pooling and memory management for enterprise-scale deployments
Debug complex production issues using GDB, core dump analysis, and performance profiling tools
Enhance ProxySQL GenAI components including RAG Tools, MCP and FTS
Contribute to embeddings, reranking algorithms, and AI-powered query optimization
Enhance Stats Monitoring, admin interfaces, and configuration management
Develop replication indexing features and support PostgreSQL/ClickHouse protocol expansion
Leverage AI coding assistants to accelerate development, testing, and documentation workflows

Required Skills

5+ years of professional C/C++ development with multi-threaded and asynchronous networking
Deep understanding of MySQL and PostgreSQL wire protocols, server internals, and SQL optimization
Proven expertise in performance optimization and code efficiency
Experience with vector embeddings and similarity search algorithms
Knowledge of RAG (Retrieval-Augmented Generation) architectures and their application to databases
Experience with full-text search engines, semantic search, and natural language processing
Experience working with remote teams; self-motivated with strong initiative./li>

Modern Development & AI Tools

Proficiency with GDB debugging, core dump interpretation, and performance profiling
Experience with Git, CI/CD tools (Jenkins or similar), and Docker
Demonstrated ability to use AI coding assistants
Strong prompt engineering skills to guide AI tools for code generation, refactoring, and problem-solving
Ability to critically evaluate AI-generated code for correctness, performance, and security considerations
Active open source contributions to database or AI infrastructure projects
Production experience with observability and monitoring systems

If you are interested in any of the positions listed above please email your latest resume to info@proxysql.com (add the job title and spec in the subject of the email) or please send us your details here together with any examples of relevant open source project contributions, salary expectations and a brief overview of why you are interested in joining our team!

NOTE: All applications will be treated in the strictest of confidence.

The post We are hiring at ProxySQL! appeared first on ProxySQL.

ProxySQL 3.0.5: Security hardening, protocol fixes, and smoother operations

Rene Cannao — Tue, 20 Jan 2026 22:17:37 +0000

ProxySQL 3.0.5: Security hardening, PostgreSQL protocol fixes, and smoother operations

We’re excited to announce ProxySQL 3.0.5, our first release of 2026.
As always, thank you to everyone who tested, reported issues, sent patches, and shared real-world feedback.
ProxySQL is shaped in production, and this release is a direct result of that collaboration.

ProxySQL 3.0.5 focuses on tightening security guarantees (especially for PostgreSQL TLS enforcement),
improving protocol correctness for PostgreSQL clients, and restoring compatibility knobs for MySQL
application stacks that are sensitive to version semantics.

Full release notes are available on GitHub:
https://github.com/sysown/proxysql/releases/tag/v3.0.5

What’s new in 3.0.5 (and why it matters)

PostgreSQL frontend SSL enforcement: closing a real security gap

One of the most important fixes in 3.0.5 is about trust boundaries.

In certain configurations, a PostgreSQL client could connect without TLS even when the user
was configured with use_ssl=1. That’s the kind of issue that can sit quietly in the background until
you have to prove enforcement for compliance, audit, or incident response.

ProxySQL 3.0.5 ensures that pgsql_users.use_ssl is properly applied at session level and that frontend
connections respect the SSL requirement.

If you run ProxySQL in front of PostgreSQL and rely on ProxySQL as the policy enforcement point, this is the first
thing you should validate after upgrading.

Illustrative workflow:

UPDATE pgsql_users
SET use_ssl = 1
WHERE username = 'app_user';

LOAD PGSQL USERS TO RUNTIME;
SAVE PGSQL USERS TO DISK;

MySQL compatibility: control `SELECT VERSION()` behavior

ProxySQL can answer some “metadata” queries directly, and that’s often a win. But in practice,
some application stacks are very particular—especially ORMs that interpret SELECT VERSION()
as “tell me exactly what the backend is.”

In 3.0.5 we introduced a new global variable, mysql-select_version_forwarding, that gives you
a clean compatibility switch: you can forward SELECT VERSION() to the backend when your environment requires it.

Illustrative configuration:

SET mysql-select_version_forwarding = 1;

LOAD MYSQL VARIABLES TO RUNTIME;
SAVE MYSQL VARIABLES TO DISK;

Better parsing for `SET time_zone` with modern timezone identifiers

Timezone strings have a way of becoming “simple” right up to the moment your application uses a real IANA name.

3.0.5 improves parsing for MySQL SET time_zone statements to handle:

three-component IANA names (e.g., America/Argentina/Buenos_Aires)
hyphenated timezone identifiers

If you’ve ever seen a “why is this failing only in production?” issue around session timezone configuration,
this is exactly the kind of fix that prevents headaches.

PostgreSQL extended protocol: Bind packet correctness

For PostgreSQL users, 3.0.5 includes an important protocol correctness fix: it addresses malformed
Bind packets when clients provide a single parameter format.

That may sound niche, but in practice it can surface as intermittent client failures depending on driver behavior.
The goal here is straightforward: make ProxySQL behave correctly and predictably across the variety of PostgreSQL
client implementations that exist in the real world.

Safer fork/exec behavior: file descriptor handling improvements

This release also fixes issues in close_all_non_term_fd() that could cause undefined behavior
(including potential deadlocks) in multi-threaded programs when used between fork() and execve().

That’s a subtle area of systems programming where “mostly works” is not good enough. These fixes reduce
operational risk in environments where process spawning patterns show up.

Admin API hardening: parameterized prepared statements

On the security front, 3.0.5 replaces SQL string concatenation paths in REST-admin execution with
parameterized prepared statements. That change reduces injection risk and strengthens the overall
security posture of administrative interfaces.

Upgrade notes: what we recommend checking

After upgrading to 3.0.5, we suggest validating these items:

PostgreSQL TLS enforcement: confirm that users with use_ssl=1 cannot connect without TLS.
MySQL application compatibility: if your stack relies on backend VERSION() semantics, consider enabling mysql-select_version_forwarding.
Timezone configuration: if your apps use IANA timezone identifiers in SET time_zone, re-test session initialization paths.
PostgreSQL drivers using extended protocol: if you previously saw edge-case Bind/Execute issues, re-test with your primary clients.

Thank you—and what’s next

Again, thank you to our users and contributors. Every bug report, reproduction case, and patch helps us keep ProxySQL
robust in the places it matters most: production systems under load.

This is our first release of 2026, and we plan to keep the momentum going—continuing to deliver features and fixes that
match what the community and enterprise users need. If you have feedback on 3.0.5 (or ideas for what you want to see next),
we’d genuinely like to hear it.

Full release notes:
https://github.com/sysown/proxysql/releases/tag/v3.0.5

The post ProxySQL 3.0.5: Security hardening, protocol fixes, and smoother operations appeared first on ProxySQL.

ProxySQL 3.0.3 and 3.0.4: A Smoother Day-to-Day for PostgreSQL and MySQL Operators

Rene Cannao — Fri, 09 Jan 2026 12:23:15 +0000

ProxySQL 3.0.3 and 3.0.4: A Smoother Day-to-Day for PostgreSQL and MySQL Operators

We shipped ProxySQL 3.0.3 and 3.0.4 close together, and while the release notes are detailed, the real story is simpler: day-to-day operation got smoother. PostgreSQL compatibility matured in important ways, MySQL edge cases became far less disruptive, monitoring got more trustworthy, and the configuration path got safer.

If you’re running ProxySQL in production, these two releases are about fewer surprises and more confidence—especially if you’re bringing PostgreSQL traffic into ProxySQL for the first time.

PostgreSQL: Fewer “Driver Weirdness” Moments, More Predictability

The biggest shift in 3.0.3 is support for PostgreSQL’s Extended Query Protocol. That’s the protocol most real applications use when they rely on prepared statements or richer client behavior.

What you feel as an operator: fewer client-specific incompatibilities, cleaner session behavior under load, and better visibility into prepared statement usage.

Then 3.0.4 polished the PostgreSQL experience further with a PostgreSQL-aware query tokenizer. That means more accurate query digests: similar queries are grouped correctly, analysis and caching behave more predictably, and visibility tools tell a clearer story.

MySQL: Better Behavior Around the Edges

For MySQL users, 3.0.4 focused on the things that cause unexplained hiccups in production:

Unexpected COM_PING packets are handled safely. Some clients ping mid-query; those pings now get queued and answered cleanly without interrupting the active query.

SELECT @@version no longer hits your backend. ProxySQL answers these directly, cutting out unnecessary backend connections and noise.

Client timeouts now behave like clients expect. Client-specified wait_timeout values are honored (safely clamped to global settings), reducing confusion when clients manage their own timeouts.

Monitoring: More Honest Numbers, Better Signals

Monitoring is only useful if you trust it, and a few fixes landed to make that true:

Ping latency is now more accurate. The monitor’s ping scheduling was over-reporting latency in some cases; that’s been corrected.

TCP keepalive warnings surface early. If keepalive is disabled, ProxySQL now tells you, helping prevent slow-burn connection problems.

Safety and Stability: Quiet Improvements That Matter

Some of the most important changes are the ones you don’t notice—until they’re missing.

Configuration parsing is hardened against SQL injection. This is a concrete safety improvement, especially in templated or generated configuration environments.

Session teardown and processlist handling are safer. That means fewer crashes and fewer edge cases when connections are dropped or restarted under pressure.

Replication and prepared-statement hot paths are faster. Less contention under load means more headroom without tuning.

The Bottom Line

ProxySQL 3.0.3 and 3.0.4 are less about flashy features and more about making ProxySQL a smoother, safer proxy in real production conditions—especially if you’re running mixed MySQL/PostgreSQL workloads.

If you’re on 3.0.2 or earlier, jump straight to 3.0.4 to get all of these improvements in one step.

The post ProxySQL 3.0.3 and 3.0.4: A Smoother Day-to-Day for PostgreSQL and MySQL Operators appeared first on ProxySQL.

Decrypting SSL/TLS Traffic with Wireshark and ProxySQL

Rahim Kanji — Wed, 07 Jan 2026 12:24:22 +0000

Decrypting SSL/TLS Traffic with Wireshark and ProxySQL

In this guide, we will walk you through the process of decrypting SSL/TLS traffic to and from ProxySQL using Wireshark. By enabling the SSLKEYLOG feature in ProxySQL and configuring Wireshark to use the SSL key log file, you will be able to view the decrypted traffic for debugging and analysis purposes.

Prerequisites

Before we begin, make sure you have the following:

ProxySQL installed and running.
Wireshark installed on your machine.

Enabling SSLKEYLOG in ProxySQL

You can enable the SSLKEYLOG feature in ProxySQL either by modifying the configuration file or via runtime queries.

Enabling SSLKEYLOG via Configuration File

Open the ProxySQL configuration file (typically proxysql.cnf) in a text editor.
Add or modify the following line to enable SSLKEYLOG support:
```
admin-ssl_keylog_file = 
```
Replace ‘’ with the absolute path or relative path (see Understanding Absolute and Relative Paths) where you want to store the SSL key log file. If you want to disable the SSLKEYLOG feature, leave this variable empty.
Save the configuration file and restart ProxySQL for the changes to take effect.

Enabling SSLKEYLOG via Runtime Queries

Connect to the ProxySQL administration interface using your preferred MySQL client.
Execute the following runtime query to enable the SSLKEYLOG feature:
```
SET admin-ssl_keylog_file='';
LOAD ADMIN VARIABLES TO RUNTIME;
```
Replace ‘’ with the absolute path or relative path (see Understanding Absolute and Relative Paths) where you want to store the SSL key log file. If you want to disable the SSLKEYLOG feature, set the value to an empty string.

Capturing and Decrypting SSL/TLS Traffic

Capturing SSL/TLS Traffic

Open Wireshark on your machine.
Start a new capture in Wireshark by clicking on the Capture > Options button (or press ‘Ctrl + K’).
In the capture options, select the network interface where ProxySQL’s traffic flows through and click Start.
Perform the actions that generate SSL/TLS traffic to and from ProxySQL that you want to decrypt (e.g., sending query to backend server).
Once you have captured the desired traffic, stop the capture in Wireshark.
In the Wireshark main window, you should see the captured packets.

Decrypting SSL/TLS Traffic

To decrypt the SSL/TLS traffic, Wireshark will use the SSL key log file generated by ProxySQL.

Go to Edit > Preferences (or press ‘Ctrl + Shift + P’).
In the Preferences window, select Protocols > TLS.
Click on the (Pre)-Master-Secret log filename browse button.
In the file dialog, navigate to the location where you specified the SSL key log file in ProxySQL’s configuration or runtime query.
Select the SSL key log file and click Open.
Click OK to close the Preferences window.

Analyzing Decrypted Traffic

Now that you have successfully decrypted the SSL/TLS traffic, you can analyze it in Wireshark:

Decrypting SSL/TLS Traffic

Select a packet that contains SSL/TLS traffic.
Expand the Transport Layer Security section in the packet details.
You should see decrypted information such as Client Hello, Server Hello, Application Data, etc.
Analyze the decrypted information to troubleshoot issues or gather insights about the SSL/TLS communication.

Decoding SSL/TLS Traffic as MySQL Protocol

Select a packet that contains SSL/TLS traffic.
Right-click on the packet and choose Decode As.
In the Decode As dialog, select + button.
In the Current column, locate the row with the TCP protocol and the appropriate source or destination port (e.g., 13306 for MySQL).
In the Decode As column, select MySQL from the dropdown menu.
Click OK to apply the decoding changes.
Wireshark will now interpret the selected packets as MySQL traffic, allowing you to view the decrypted MySQL protocol information.
Expand the Transport Layer Security section in the packet details.
You should see decrypted information such as Client Hello, Server Hello, Application Data, etc., along with the MySQL protocol-specific details.
Analyze the decrypted information to troubleshoot issues or gather insights about the MySQL communication.

Congratulations! You have successfully configured Wireshark to decrypt SSL/TLS traffic to and from ProxySQL using the SSLKEYLOG feature. This allows you to gain deeper visibility into the encrypted traffic for debugging and analysis purposes.

Understanding Absolute and Relative Paths

When specifying the ‘admin-ssl_keylog_file’ variable, you have two options:

Absolute Path: If you provide an absolute path, ProxySQL will use that exact path to create the SSL key log file. For example: ‘/var/log/sslkeylog.txt’
Relative Path: If you provide a relative path, ProxySQL will automatically convert it to an absolute path by prefixing it with the ProxySQL data directory path. For example: ‘sslkeylog.txt’ will be converted to ‘/sslkeylog.txt’. The ‘’ represents the ProxySQL data directory path, which is typically set in the ProxySQL configuration.

The post Decrypting SSL/TLS Traffic with Wireshark and ProxySQL appeared first on ProxySQL.

PostgreSQL Monitoring with ProxySQL

Javier Jaramago Fernandez — Fri, 29 Aug 2025 15:13:09 +0000

This blogpost dives into the latest advancements in ProxySQL‘s monitoring capabilities, specifically about the recently added monitoring support for PostgreSQL, as well as serve as a guide for you through ProxySQL configuration process, showcasing inspecting monitoring data, error handling and how ProxySQL will ensure high availability and resilience for your PostgreSQL deployments.

As a final demostration of these capabilities, thanks to the recently added support for Read-Only monitoring, we will simulate an unplanned failover in a PostgreSQL replication cluster, letting ProxySQL handle the automatic failover detection and traffic redirection.

Primary-Replica Setup

We are going to use a very simple PostgreSQL replication setup to revisit several ProxySQL monitoring principles by example. The infra of choice is going to be a single primary and a single replica, and in the following order, we will:

Configure ProxySQL to enable monitoring for backend servers.
Verify monitoring is working as expected: tables, metrics and logs.
Learn how ProxySQL responds to ping monitoring errors.
Configure ProxySQL to take advantage of the replication setup:
- Configure read-only monitoring for our primary and replica servers.
- Configure basic query routing for our reader hostgroup.
- Test our setup with sysbench.
- Simulate a failover under load and verify how ProxySQL handled it.

For a more in depth reading and understanding about these core-concepts and principles, please refer to the reference documentation, or the user guide.

ProxySQL Configuration

Let’s clone the repo holding the infra and config that we will be using:

git clone https://github.com/ProxySQL/non-prod-demo-infras/

PostgreSQL config is condensed in the following file:

00_init.sql:

CREATE USER repluser WITH replication encrypted password 'replpass';
SELECT pg_create_physical_replication_slot('replication_slot');

CREATE USER proxymon WITH encrypted password 'proxymon';
GRANT pg_monitor TO proxymon;

-- For testing 'pgsql-monitor_dbname'
CREATE DATABASE proxymondb;

And containers are initialized with these common values:

---
POSTGRES_USER: postgres
POSTGRES_DB: postgres
POSTGRES_PASSWORD: postgres
POSTGRES_HOST_AUTH_METHOD: "scram-sha-256\nhost replication all 0.0.0.0/0 md5"
POSTGRES_INITDB_ARGS: "--auth-host=scram-sha-256"

To start/stop the infra just use the two following usual docker commands:

docker compose up -d    # Start the infra in the background
docker compose down -v  # Destroy the infra and the volumes

After starting the infra, let’s start a ProxySQL instance with the config present in the repo:

$ export INFRA=$PATH_TO_REPO/pg-mon-primary-replica/
$ proxysql --clickhouse-server --sqlite3-server --idle-threads -f -c $INFRA/conf/proxysql/proxysql.cnf -D $INFRA/datadir

After launching the instance, we can check that there are no servers yet, for this we use the Admin interface. The configuration present in the config file allowed for two users, and defined the port:

# Administration variables
admin_variables=
{
...
    # IMPORTANT: Remember that default 'admin:admin' credentials can only be used from localhost. These are
    # sample defaults and should be changed for your configuration.
    admin_credentials="admin:admin;radmin:radmin"
...
    # Port for accessing Admin interface using `PostgreSQL` Protocol
    pgsql_ifaces="0.0.0.0:6132"
}

We verify that we can connect to that port with such credentials, and we check that we have no servers configured:

$ PGPASSWORD='radmin' psql -h127.0.0.1 -p6132 -Uradmin
psql (16.3, server 16.1)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname | port | gtid_port | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms | comment
--------------+----------+------+-----------+--------+--------+-------------+-----------------+---------------------+---------+----------------+---------
(0 rows)

Since there are no servers configured, there is yet no monitoring performed by ProxySQL. Since monitoring is enabled by default, and not changed in our configuration, server monitoring (ping and connect checks), should start as soon as they are configured. Let’s double check the configuration variables:

radmin=# SELECT * FROM global_variables WHERE variable_name='pgsql-monitor_enabled';
     variable_name     | variable_value
-----------------------+----------------
 pgsql-monitor_enabled | true
(1 row)

So, monitoring is enabled as expected, let’s check the rest of the monitoring config:

radmin=# SELECT * FROM global_variables WHERE variable_name LIKE 'pgsql-monitor_%';
               variable_name               | variable_value
-------------------------------------------+----------------
 pgsql-monitor_enabled                     | true
 pgsql-monitor_connect_interval_window     | 50
 pgsql-monitor_ping_interval_window        | 10
 pgsql-monitor_ping_max_failures           | 3
 pgsql-monitor_ping_timeout                | 1000
 pgsql-monitor_read_only_interval          | 1000
 pgsql-monitor_read_only_interval_window   | 10
 pgsql-monitor_read_only_timeout           | 800
 pgsql-monitor_read_only_max_timeout_count | 3
 pgsql-monitor_username                    | monitor
 pgsql-monitor_password                    | monitor
 pgsql-monitor_threads                     | 2
 pgsql-monitor_history                     | 600000
 pgsql-monitor_connect_interval            | 60000
 pgsql-monitor_connect_timeout             | 600
 pgsql-monitor_ping_interval               | 8000
 pgsql-monitor_dbname                      | postgres
(17 rows)

All defaults seems correct, we will later explore and play with these values. Let’s configure both servers, the primary and replica, and check that monitoring is now properly running:

$ PGPASSWORD='radmin' psql -h127.0.0.1 -p6132 -Uradmin < pg-mon-primary-replica/conf/proxysql/servers-single_primary_replica.sql
DELETE 0
INSERT 0 2
LOAD
SAVE

For checking the servers status we query table runtime_pgsql_servers:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

And also the stats_pgsql_connection_pool

radmin=# SELECT * FROM stats_pgsql_connection_pool;
 hostgroup | srv_host  | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us
-----------+-----------+----------+--------+----------+----------+--------+---------+-------------+---------+-----------------+-----------------+------------
 0         | 127.0.0.1 | 15432    | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0               | 0               | 0
 1         | 127.0.0.1 | 15433    | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0               | 0               | 0
(2 rows)

After performing any configuration, it’s sensible to check the Error Log to double check that the configuration was indeed updated and that no errors took or are taking place after the change:

...... --> Config promotion command is acknowledge in the next line
2024-10-28 13:37:11 Admin_Handler.cpp:281:is_admin_command_or_alias(): [INFO] Received LOAD PGSQL SERVERS TO RUNTIME command
2024-10-28 13:37:11 PgSQL_HostGroups_Manager.cpp:1209:commit(): [INFO] Generating runtime pgsql servers and pgsql servers v2 records.
...... --> Servers regeneration information is taking place
...... --> Second config promotion command is acknowledge in the next line
2024-10-28 13:37:11 Admin_Handler.cpp:281:is_admin_command_or_alias(): [INFO] Received SAVE PGSQL SERVERS TO DISK command
...... --> After monitoring starts, errors start to accumulate in the error log
2024-10-28 13:37:13 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15433' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:13 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:21 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15433' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:21 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:29 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15433' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:29 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:37 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15433' error='FATAL:  password authentication failed for user "monitor"'
2024-10-28 13:37:37 PgSQL_Monitor.cpp:628:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='FATAL:  password authentication failed for user "monitor"'

As we saw in the Error Log, there is still missing configuration, we haven’t updated the monitoring username and password, you can check the current ones in the previous command we executed for dumping monitoring config. Because of this, we see monitoring errors accumulate in the Error Log, matching the monitoring checks intervals (8000ms by default for ping).

Before fixing this scenario, we are going to check the rest of the places from which we can inspect the monitoring actions. From tables monitor.pgsql_server_connect_log and monitor.pgsql_server_ping_log we can check the direct result of each check:

radmin=# SELECT * FROM monitor.pgsql_server_connect_log LIMIT 5;
 hostname  | port  |  time_start_us   | connect_success_time_us |                       connect_error
-----------+-------+------------------+-------------------------+-----------------------------------------------------------
 127.0.0.1 | 15433 | 1729858325226277 | 0                       | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15432 | 1729858325226394 | 0                       | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15432 | 1729858385226389 | 0                       | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15433 | 1729858385226313 | 0                       | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15433 | 1729858445226396 | 0                       | FATAL:  password authentication failed for user "monitor"
(5 rows)

radmin=# SELECT * FROM monitor.pgsql_server_ping_log LIMIT 5;
 hostname  | port  |  time_start_us   | ping_success_time_us |                        ping_error
-----------+-------+------------------+----------------------+-----------------------------------------------------------
 127.0.0.1 | 15433 | 1729858273243186 | 0                    | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15432 | 1729858273243181 | 0                    | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15433 | 1729858281243252 | 0                    | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15432 | 1729858281243239 | 0                    | FATAL:  password authentication failed for user "monitor"
 127.0.0.1 | 15433 | 1729858289243360 | 0                    | FATAL:  password authentication failed for user "monitor"
(5 rows)

And from metrics we can see if the actions are being successful or resulting in failures:

radmin=# SELECT * FROM stats_pgsql_global WHERE variable_name LIKE '%PgSQL_Monitor%';

           Variable_Name           | Variable_Value
-----------------------------------+----------------
 PgSQL_Monitor_connect_check_OK    | 0
 PgSQL_Monitor_connect_check_ERR   | 4
 PgSQL_Monitor_ping_check_OK       | 0
 PgSQL_Monitor_ping_check_ERR      | 18
 PgSQL_Monitor_read_only_check_OK  | 0
 PgSQL_Monitor_read_only_check_ERR | 0
(6 rows)

Even from stats_pgsql_connection_pool we could hint that something was off in the setup, despite of servers being configured, latency_us was 0. In a healthy setup, latency_us will be updated by ping checks, and it’s value will never be 0.

Now, let’s address the issue by updating the monitoring credentials. We are only required to update username and password, since the default database that is used for monitoring connections is postgres. If we wanted to connect to other custom database, we could do it by changing pgsql-monitor_dbname:

radmin=# SET pgsql-monitor_password='proxymon';
UPDATE 1
radmin=# SET pgsql-monitor_username='proxymon';
UPDATE 1
radmin=# LOAD PGSQL VARIABLES TO RUNTIME;
LOAD
radmin=# SAVE PGSQL VARIABLES TO DISK;
INSERT 0 141

NOTE: Monitoring user is required to have pg_monitor privileges in order to perform the readonly checks.

If re-check the stats now, we can see that the number succeeded operation is increasing:

radmin=# SELECT * FROM stats_pgsql_global WHERE variable_name LIKE '%PgSQL_Monitor%';
           Variable_Name           | Variable_Value
-----------------------------------+----------------
 PgSQL_Monitor_connect_check_OK    | 2
 PgSQL_Monitor_connect_check_ERR   | 4
 PgSQL_Monitor_ping_check_OK       | 12
 PgSQL_Monitor_ping_check_ERR      | 28
 PgSQL_Monitor_read_only_check_OK  | 0
 PgSQL_Monitor_read_only_check_ERR | 0
(6 rows)

We can also directly check the monitoring tables, in order to see the results of each check:

radmin=# SELECT * FROM monitor.pgsql_server_ping_log ORDER BY time_start_us DESC LIMIT 5;
 hostname  | port  |  time_start_us   | ping_success_time_us | ping_error
-----------+-------+------------------+----------------------+------------
 127.0.0.1 | 15433 | 1729871929021524 | 414                  |
 127.0.0.1 | 15432 | 1729871929021411 | 475                  |
 127.0.0.1 | 15432 | 1729871921021480 | 446                  |
 127.0.0.1 | 15433 | 1729871921021392 | 447                  |
 127.0.0.1 | 15433 | 1729871913021424 | 419                  |
(5 rows)

radmin=# SELECT * FROM monitor.pgsql_server_connect_log ORDER BY time_start_us DESC LIMIT 5;
 hostname  | port  |  time_start_us   | connect_success_time_us |                       connect_error
-----------+-------+------------------+-------------------------+-----------------------------------------------------------
 127.0.0.1 | 15433 | 1729871877019975 | 19320                   |
 127.0.0.1 | 15432 | 1729871877019836 | 16824                   |
 127.0.0.1 | 15433 | 1729871817019995 | 17701                   |
 127.0.0.1 | 15432 | 1729871817019536 | 21815                   |
 127.0.0.1 | 15433 | 1729871757019843 | 0                       | FATAL:  password authentication failed for user "monitor"
(5 rows)

The new operations now show proper success times for both servers, and no errors. We can also revisit stats_pgsql_connection_pool and check that latency_us is now being updated thanks to ping checks:

radmin=# SELECT * FROM stats_pgsql_connection_pool;
 hostgroup | srv_host  | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us
-----------+-----------+----------+--------+----------+----------+--------+---------+-------------+---------+-----------------+-----------------+------------
 0         | 127.0.0.1 | 15432    | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0               | 0               | 434
 1         | 127.0.0.1 | 15433    | ONLINE | 0        | 0        | 0      | 0       | 0           | 0       | 0               | 0               | 427
(2 rows)

Let’s change the last connection creation config value, pgsql-monitor_dbname, and check that ProxySQL is renovating the monitoring connections connecting against this db. First we check the current connections present in PostgreSQL:

postgres=# SELECT pid,datname,application_name,state FROM pg_stat_activity;
  pid   | datname  | application_name | state
--------+----------+------------------+--------
 203632 | postgres | psql             | active
 204663 | postgres | ProxySQL-Monitor | idle
 ...
(6 rows)

Now, we change the current config for ProxySQL:

radmin=# SET pgsql-monitor_dbname='proxymondb';
UPDATE 1
radmin=# LOAD PGSQL VARIABLES TO RUNTIME;
LOAD

New monitoring connections will use the new user, eventually replacing previous ones:

postgres=# SELECT pid,datname,application_name,state FROM pg_stat_activity;
  pid   |  datname   | application_name | state
--------+------------+------------------+--------
 203632 | postgres   | psql             | active
 204712 | proxymondb | ProxySQL-Monitor | idle
 ...
(6 rows)

Ping Errors – Servers SHUNNING

Once we have a healthy setup. Let’s create a source of errors and see how ProxySQL reacts to those errors:

# Prevent traffic from reach the replica
sudo iptables -I OUTPUT 1 -p tcp -d 127.0.0.1 --dport 15433 -j DROP

Now let’s check how monitoring table record this event:

radmin=# SELECT * FROM monitor.pgsql_server_ping_log ORDER BY time_start_us DESC LIMIT 6;
 hostname  | port  |  time_start_us   | ping_success_time_us |     ping_error
-----------+-------+------------------+----------------------+---------------------
 127.0.0.1 | 15433 | 1729947305791361 | 0                    | Operation timed out
 127.0.0.1 | 15432 | 1729947305791335 | 328                  |
 127.0.0.1 | 15433 | 1729947297791230 | 0                    | Operation timed out
 127.0.0.1 | 15432 | 1729947297791219 | 453                  |
 127.0.0.1 | 15433 | 1729947289791258 | 0                    | Operation timed out
 127.0.0.1 | 15432 | 1729947289791241 | 354                  |
(6 rows)

radmin=# SELECT * FROM monitor.pgsql_server_connect_log ORDER BY time_start_us DESC LIMIT 6;
 hostname  | port  |  time_start_us   | connect_success_time_us | connect_error
-----------+-------+------------------+-------------------------+---------------
 127.0.0.1 | 15432 | 1729947357115316 | 20614                   |
 127.0.0.1 | 15432 | 1729947297115226 | 20946                   |
 127.0.0.1 | 15432 | 1729947237115265 | 18975                   |
 127.0.0.1 | 15432 | 1729947177115203 | 15218                   |
 127.0.0.1 | 15432 | 1729947117114955 | 15655                   |
 127.0.0.1 | 15432 | 1729947057115016 | 17177                   |
(6 rows)

As we can see, ping operations are timing out, and connect operations have stopped. This is expected, since only responsive servers (servers which respond to ping) are considered for further monitoring actions, as always we can resort to the Error Log to find the motivation for the actions taken:

proxysql.log:

...
2024-10-28 09:16:43 PgSQL_Monitor.cpp:1403:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15433 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.
...

Now that the server is SHUNNED, ProxySQL will attempt to pick other server in the hostgroup when a client tries to send traffic to such hostgroup. Since the only server in hostgroup 1 is this server, it will make a last attempt to wake the server in a best effort for serving the traffic. Let’s try and verify this process with the help of the Error Log:

PGPASSWORD='postgres' psql -h127.0.0.1 -p6133 -Upostgres
psql: error: connection to server at "127.0.0.1", port 6133 failed: FATAL:  User not found
connection to server at "127.0.0.1", port 6133 failed: FATAL:  User not found

Looks like first we need to configure the correct users in ProxySQL, so let’s do that:

radmin=# SHOW CREATE TABLE pgsql_users;
    table    |                                         Create Table
-------------+-----------------------------------------------------------------------------------------------
 pgsql_users | CREATE TABLE pgsql_users (                                                                   +
             |     username VARCHAR NOT NULL,                                                               +
             |     password VARCHAR,                                                                        +
             |     active INT CHECK (active IN (0,1)) NOT NULL DEFAULT 1,                                   +
             |     use_ssl INT CHECK (use_ssl IN (0,1)) NOT NULL DEFAULT 0,                                 +
             |     default_hostgroup INT NOT NULL DEFAULT 0,                                                +
             |     transaction_persistent INT CHECK (transaction_persistent IN (0,1)) NOT NULL DEFAULT 1,   +
             |     fast_forward INT CHECK (fast_forward IN (0,1)) NOT NULL DEFAULT 0,                       +
             |     backend INT CHECK (backend IN (0,1)) NOT NULL DEFAULT 1,                                 +
             |     frontend INT CHECK (frontend IN (0,1)) NOT NULL DEFAULT 1,                               +
             |     max_connections INT CHECK (max_connections >=0) NOT NULL DEFAULT 10000,                  +
             |     attributes VARCHAR CHECK (JSON_VALID(attributes) OR attributes = '') NOT NULL DEFAULT '',+
             |     comment VARCHAR NOT NULL DEFAULT '',                                                     +
             |     PRIMARY KEY (username, backend),                                                         +
             |     UNIQUE (username, frontend))
(1 row)

radmin=# INSERT INTO pgsql_users (username,password,default_hostgroup) VALUES ('postgres','postgres',0);
INSERT 0 1
radmin=# LOAD PGSQL USERS TO RUNTIME;
LOAD

We should be able to log into the PostgreSQL interface and perform a query:

PGPASSWORD='postgres' psql -h127.0.0.1 -p6133 -Upostgres
psql (16.3, server 16.1)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=> /* hostgroup=1 */ DO $$ BEGIN END $$;
ERROR:  Max connect timeout reached while reaching hostgroup 1 after 10000ms
postgres=>

In the previous query we are forcing hostgroup 1 as the destination, instead of letting ProxySQL routing it to the default hostgroup 0. In the Error Log we can check ProxySQL attempt to find a healthy server in the hostgroup, and in the absence of one, trying to create a connection against the currently SHUNNED server, this will be attempted until either pgsql-connect_retries_on_failure or pgsql-pgsql-connect_timeout_server_max are exceeded:

proxysql.log:

2024-10-28 09:18:04 PgSQL_HostGroups_Manager.cpp:1994:get_random_MySrvC(): [ERROR] Hostgroup 1 has no servers available! Checking servers shunned for more than 1 second
2024-10-28 09:18:08 PgSQL_Connection.cpp:1693:handler(): [ERROR] Connect timeout on 127.0.0.1:15433 : exceeded by 357us
2024-10-28 09:18:11 PgSQL_Connection.cpp:1693:handler(): [ERROR] Connect timeout on 127.0.0.1:15433 : exceeded by 447us
2024-10-28 09:18:11 PgSQL_Monitor.cpp:1403:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15433 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.
2024-10-28 09:18:14 PgSQL_Connection.cpp:1693:handler(): [ERROR] Connect timeout on 127.0.0.1:15433 : exceeded by 296us
2024-10-28 09:18:14 PgSQL_HostGroups_Manager.cpp:1994:get_random_MySrvC(): [ERROR] Hostgroup 1 has no servers available! Checking servers shunned for more than 1 second
2024-10-28 09:18:14 PgSQL_Session.cpp:1977:handler_again___status_CONNECTING_SERVER(): [ERROR] Max connect timeout reached while reaching hostgroup 1 after 10000ms . HG status: [{"Bytes_recv":"0","Bytes_sent":"0","ConnERR":"3","ConnFree":"0","ConnOK":"0","ConnUsed":"0","Latency_us":"0","MaxConnUsed":"1","Queries":"0","hostgroup":"1","srv_host":"127.0.0.1","srv_port":"15433","status":"SHUNNED"}]

We can modify the previous behavior by, for example, changing pgsql-connect_retries_on_failure, setting it’s value to 1 will result in a different error in client side, and report in ProxySQL Error Log:

postgres=> /* hostgroup=1 */ DO $$ BEGIN END $$;
ERROR:  Max connect failure while reaching hostgroup 1

proxysql.log:

2024-10-28 09:39:59 Admin_Handler.cpp:774:admin_handler_command_set(): [INFO] Received command SET pgsql-connect_retries_on_failure=1
2024-10-28 09:40:01 Admin_Handler.cpp:281:is_admin_command_or_alias(): [INFO] Received LOAD PGSQL VARIABLES TO RUNTIME command
2024-10-28 09:40:12 PgSQL_HostGroups_Manager.cpp:1994:get_random_MySrvC(): [ERROR] Hostgroup 1 has no servers available! Checking servers shunned for more than 1 second
2024-10-28 09:40:15 PgSQL_Connection.cpp:1693:handler(): [ERROR] Connect timeout on 127.0.0.1:15433 : exceeded by 1050us
2024-10-28 09:40:18 PgSQL_Connection.cpp:1693:handler(): [ERROR] Connect timeout on 127.0.0.1:15433 : exceeded by 335us
2024-10-28 09:40:18 PgSQL_Monitor.cpp:1403:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15433 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.

Let’s verify the servers status by querying runtime_pgsql_servers, we find the server still as SHUNNED:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

If we remove our iptable rule, we should see that the monitoring table stops having errors, and that once this is detected by ProxySQL, the server will be picked without issues when traffic is redirected to its hostgroup:

sudo iptables -t nat -D OUTPUT 1

radmin=# SELECT * FROM monitor.pgsql_server_ping_log ORDER BY time_start_us DESC LIMIT 6;
 hostname  | port  |  time_start_us   | ping_success_time_us |     ping_error
-----------+-------+------------------+----------------------+---------------------
 127.0.0.1 | 15432 | 1730105681958812 | 503                  |
 127.0.0.1 | 15433 | 1730105681958768 | 456                  |
 127.0.0.1 | 15432 | 1730105673958577 | 373                  |
 127.0.0.1 | 15433 | 1730105673958572 | 16043                |                             || Higher time points to connection creation; First connection after iptables rule deletion.
 127.0.0.1 | 15432 | 1730105665958708 | 497                  |
 127.0.0.1 | 15433 | 1730105665958704 | 0                    | Operation timed out
(6 rows)

If we now check the runtime_pgsql_servers table, we will still see the server as SHUNNED, this is expected:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

ProxySQL follows a lazy approach regarding server state modification, only when the server is detected as suitable for serving traffic, and picked for that purpose its status will change back to ONLINE:

postgres=> /* hostgroup=1 */ DO $$ BEGIN END $$;
DO

Since the only server in hostgroup 1 is our previously SHUNNED server, this must be the server that was picked for serving the query. Server should have brought back as ONLINE and Error Log will be clear:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

radmin=# SELECT * FROM stats_pgsql_connection_pool;
 hostgroup | srv_host  | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us
-----------+-----------+----------+--------+----------+----------+--------+---------+-------------+---------+-----------------+-----------------+------------
 0         | 127.0.0.1 | 15432    | ONLINE | 0        | 1        | 1      | 0       | 1           | 6       | 243             | 393             | 482
 1         | 127.0.0.1 | 15433    | ONLINE | 0        | 1        | 1      | 12      | 1           | 1       | 43              | 14              | 437
(2 rows)

The stats also reflect our previously issued queries and connection errors.

Replication Hostgroups – Failover Handling

We have revisited how to setup server monitoring and how ProxySQL deals with monitoring errors. Now we will configure our replication setup in ProxySQL, and improve our understanding on read-only checks. Let’s start by enabling them by adding the required entries to pgsql_replication_hostgroups. Since no configuration is in place, our monitoring table for read_only should be empty:

radmin=# SELECT COUNT(*) FROM pgsql_replication_hostgroups;
 COUNT(*)
----------
 0
(1 row)

radmin=# SELECT COUNT(*) FROM monitor.pgsql_server_read_only_log;
 COUNT(*)
----------
 0
(1 row)

Let’s insert one entry and activate read-only monitoring:

-- $ PGPASSWORD='radmin' psql -h127.0.0.1 -p6132 -Uradmin < pg-mon-primary-replica/conf/proxysql/replication_hostgroups-single_primary_replica.sql

DELETE FROM pgsql_replication_hostgroups;
INSERT INTO
    pgsql_replication_hostgroups (writer_hostgroup, reader_hostgroup, check_type, comment)
VALUES
    (0, 1, 'read_only', 'pg-replication');

LOAD PGSQL SERVERS TO RUNTIME;
SAVE PGSQL SERVERS TO DISK;

Checking the Error Log we verify that our new replication_hostgroups config took effect:

proxysql.log:

...
2025-05-13 20:53:28 PgSQL_HostGroups_Manager.cpp:3673:read_only_action_v2(): [INFO] read_only_action_v2() detected RO=0 on server 127.0.0.1:15432 for the first time after commit(), but no need to reconfigure
...

As the Error Log states, there was no need for server reconfiguration after the read_only detection since the server was already placed in the correct hostgroup (1) for the replica. Now we check that read-only monitoring actions had started as expected:

radmin=# SELECT COUNT(*) FROM monitor.pgsql_server_read_only_log;
 COUNT(*)
----------
 18
(1 row)

radmin=# SELECT * FROM monitor.pgsql_server_read_only_log ORDER BY time_start_us DESC LIMIT 5;
 hostname  | port  |  time_start_us   | success_time_us | read_only | error
-----------+-------+------------------+-----------------+-----------+-------
 127.0.0.1 | 15433 | 1747162504265964 | 334             | 1         |
 127.0.0.1 | 15432 | 1747162504265879 | 361             | 0         |
 127.0.0.1 | 15433 | 1747162503265892 | 263             | 1         |
 127.0.0.1 | 15432 | 1747162503265783 | 282             | 0         |
 127.0.0.1 | 15432 | 1747162502265705 | 335             | 0         |
(5 rows)

We can confirm that the read_only values matches the expected for our primary and replica. And that stats also reflect the success of these actions:

radmin=# SELECT * FROM stats_pgsql_global WHERE variable_name LIKE '%PgSQL_Monitor%';
           Variable_Name           | Variable_Value
-----------------------------------+----------------
 PgSQL_Monitor_connect_check_OK    | 12
 PgSQL_Monitor_connect_check_ERR   | 0
 PgSQL_Monitor_ping_check_OK       | 80
 PgSQL_Monitor_ping_check_ERR      | 12
 PgSQL_Monitor_read_only_check_OK  | 242
 PgSQL_Monitor_read_only_check_ERR | 0
(6 rows)

Using the same technique as before, now that read-only checks are in place, we can check how ProxySQL deals with monitoring errors in our replica. For this we will add a simple iptable rule:

sudo iptables -I OUTPUT 1 -p tcp -d 127.0.0.1 --dport 15433 -j DROP

As always, actions taken over servers will be reported in the Error Log:

proxysql.log:

2025-05-13 20:53:28 PgSQL_HostGroups_Manager.cpp:3673:read_only_action_v2(): [INFO] read_only_action_v2() detected RO=0 on server 127.0.0.1:15432 for the first time after commit(), but no need to reconfigure
2025-05-13 20:56:09 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15433 missed 3 read_only checks. Assuming read_only=1
2025-05-13 20:56:10 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15433 missed 3 read_only checks. Assuming read_only=1
2025-05-13 20:56:11 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15433 missed 3 read_only checks. Assuming read_only=1
... // Error will be reported at each 'readonly' action until server is permanently shunned
2025-05-13 20:56:24 PgSQL_Monitor.cpp:1447:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15433 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.

No further actions will be performed on the server until is detected back as ONLINE. We resort to Admin to check the monitoring info and servers status:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

radmin=# SELECT from_unixtime(time_start_us/1000/1000),* FROM monitor.pgsql_server_ping_log ORDER BY time_start_us DESC LIMIT 10;
 from_unixtime(time_start_us/1000/1000) | hostname  | port  |  time_start_us   | ping_success_time_us |     ping_error
----------------------------------------+-----------+-------+------------------+----------------------+---------------------
 2025-05-13 18:57:43                    | 127.0.0.1 | 15433 | 1747162663258491 | 0                    | Operation timed out
 2025-05-13 18:57:43                    | 127.0.0.1 | 15432 | 1747162663258470 | 279                  |
 2025-05-13 18:57:35                    | 127.0.0.1 | 15433 | 1747162655258380 | 0                    | Operation timed out
 2025-05-13 18:57:35                    | 127.0.0.1 | 15432 | 1747162655258354 | 266                  |
 2025-05-13 18:57:27                    | 127.0.0.1 | 15433 | 1747162647258313 | 0                    | Operation timed out
 2025-05-13 18:57:27                    | 127.0.0.1 | 15432 | 1747162647258262 | 288                  |
 2025-05-13 18:57:19                    | 127.0.0.1 | 15433 | 1747162639258129 | 0                    | Operation timed out
 2025-05-13 18:57:19                    | 127.0.0.1 | 15432 | 1747162639258095 | 227                  |
 2025-05-13 18:57:11                    | 127.0.0.1 | 15433 | 1747162631258417 | 0                    | Operation timed out
 2025-05-13 18:57:11                    | 127.0.0.1 | 15432 | 1747162631258409 | 301                  |
(10 rows)

Apparently, the only operation that have been performed by ProxySQL is server SHUNNING. Same as previously when we didn’t have read-only configured, yet, the error is reporting us about the following assumption:

proxysql.log:

...
2025-05-13 20:56:09 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15433 missed 3 read_only checks. Assuming read_only=1
...

This assumption has consequences, we just don’t see them because the server we were targetting was the replica, which read_only value was already 1. Let’s remove the rule for the replica and impose it for the primary to check this behavior again:

sudo iptables -D OUTPUT 1
sudo iptables -I OUTPUT 1 -p tcp -d 127.0.0.1 --dport 15432 -j DROP

NOTE: Remember that in the previous section we explained why the replica won’t appear as ONLINE till it has received traffic again.

The Error Log will now report different operations over the servers:

proxysql.log:

2025-05-13 20:56:24 PgSQL_Monitor.cpp:1447:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15433 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.
2025-05-13 20:58:34 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15432 missed 3 read_only checks. Assuming read_only=1
2025-05-13 20:58:34 PgSQL_HostGroups_Manager.cpp:3701:read_only_action_v2(): [INFO] Server '127.0.0.1:15432' found with 'read_only=1', but not found as reader
2025-05-13 20:58:34 PgSQL_HostGroups_Manager.cpp:4588:insert_HGM(): [INFO] Creating new server in HG 1 : 127.0.0.1:15432 , weight=1, status=0
2025-05-13 20:58:34 PgSQL_HostGroups_Manager.cpp:4603:remove_HGM(): [WARNING] Removed server at address 0x50e000102860, hostgroup 0, address 127.0.0.1 port 15432. Setting status OFFLINE HARD and immediately dropping all free connections. Used connections will be dropped when trying to use them
2025-05-13 20:58:34 PgSQL_HostGroups_Manager.cpp:3709:read_only_action_v2(): [INFO] Regenerating table 'pgsql_servers' due to actions on server '127.0.0.1:15432'
2025-05-13 20:58:34 PgSQL_HostGroups_Manager.cpp:3736:read_only_action_v2(): [INFO] Checksum for table pgsql_servers is 0xC35A6D6EDD68A64C
2025-05-13 20:58:35 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15432 missed 3 read_only checks. Assuming read_only=1
2025-05-13 20:58:36 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15432 missed 3 read_only checks. Assuming read_only=1
2025-05-13 20:58:37 PgSQL_Monitor.cpp:1556:perf_readonly_actions(): [ERROR] Server 127.0.0.1:15432 missed 3 read_only checks. Assuming read_only=1
... // Error will be reported at each 'readonly' action until server is permanently shunned
2025-05-13 20:58:48 PgSQL_Monitor.cpp:1447:shunn_non_resp_srv(): [ERROR] Server 127.0.0.1:15432 missed 3 heartbeats, shunning it and killing all the connections. Disabling other checks until the node comes back online.

The assumption of read_only=1 leads to:

Server being removed from the writer_hostgroup, hostgroup 0.
Server being added to reader_hostgroup, hostgroup 1.

This mechanism is a protection that ProxySQL offers, that prevents serving traffic to a server, which read_only value is unknown, since it’s failing to respond to the read_only checks. The threshold for these actions is controlled by pgsql-monitor_read_only_max_timeout_count, which by default is 3.

Let’s now remove our iptable rule and recheck the Error Log:

proxysql.log:

2025-05-13 21:00:55 PgSQL_HostGroups_Manager.cpp:3639:read_only_action_v2(): [INFO] Server '127.0.0.1:15432' found with 'read_only=0', but not found as writer
2025-05-13 21:00:55 PgSQL_HostGroups_Manager.cpp:4588:insert_HGM(): [INFO] Creating new server in HG 0 : 127.0.0.1:15432 , weight=1, status=0
2025-05-13 21:00:55 PgSQL_HostGroups_Manager.cpp:3649:read_only_action_v2(): [INFO] Regenerating table 'pgsql_servers' due to actions on server '127.0.0.1:15432'
2025-05-13 21:00:55 PgSQL_HostGroups_Manager.cpp:3736:read_only_action_v2(): [INFO] Checksum for table pgsql_servers is 0x3D4560A9FDED397C

Our server has now has being found again as writer (read_only=0). Thus, a new server has been created in hostgroup 0 for it. We can see how this reflects in the runtime_pgsql_servers:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15432 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(3 rows)

Server is now present in both, hostgroup_id 1, and 0. This is a consequence of the setting pgsql-monitor_writer_is_also_reader, which by default is true. This setting is only considered whenever a node (server) is detected to change it’s read_only value from 1 to 0. The reason for this, is that this is the same scenario we encounter when a failover is taking place. This is, a server which previously has a read_only value of 1 (replica) changes its read_only value to 0 while being promoted as a new primary.

This setting allows us to control if during this promotion we would like the new primary to be placed in the writer_hostgroup only, or to be duplicated from the reader_hostgroup into the writer_hostgroup.

When we think on server placement between our defined hostgroups for read_only, we should keep in mind that ProxySQL will at first assume that the read_only value corresponds with the placement you are performing. And that the previously described read_only actions will take place when further changes are detected to this read_only values.

This means, that if we manually place our primary just in the reader_hostgroup, and reload our servers configurations, we will also achieve the previous status for runtime_pgsql_servers. ProxySQL will assume that a transition for read_only have taken place in the server (1 to 0) and will place it in both hostgroups:

radmin=# SELECT hostgroup_id,hostname,port,status,weight,comment FROM pgsql_servers;
 hostgroup_id | hostname  | port  | status | weight |  comment
--------------+-----------+-------+--------+--------+------------
 0            | 127.0.0.1 | 15432 | ONLINE | 1      | pg-primary
 1            | 127.0.0.1 | 15433 | ONLINE | 1      | pg-replica
(2 rows)

radmin=# UPDATE pgsql_servers SET hostgroup_id=1;
UPDATE 2
radmin=# LOAD PGSQL SERVERS TO RUNTIME;
LOAD

radmin=# SELECT hostgroup_id,hostname,port,status,weight,comment FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status | weight |  comment
--------------+-----------+-------+--------+--------+------------
 0            | 127.0.0.1 | 15432 | ONLINE | 1      | pg-primary
 1            | 127.0.0.1 | 15432 | ONLINE | 1      | pg-primary
 1            | 127.0.0.1 | 15433 | ONLINE | 1      | pg-replica
(3 rows)

proxysql.log:

2025-05-13 21:03:50 Admin_Handler.cpp:284:is_admin_command_or_alias(): [INFO] Received LOAD PGSQL SERVERS TO RUNTIME command
... // Checksum computations and table rebuilding
2025-05-13 21:03:50 PgSQL_HostGroups_Manager.cpp:1464:commit(): [INFO] PgSQL_HostGroups_Manager::commit() locked for 3ms
2025-05-13 21:03:51 PgSQL_HostGroups_Manager.cpp:3639:read_only_action_v2(): [INFO] Server '127.0.0.1:15432' found with 'read_only=0', but not found as writer
2025-05-13 21:03:51 PgSQL_HostGroups_Manager.cpp:4571:insert_HGM(): [INFO] Found server node in Host Group Container 127.0.0.1:15432 as 'OFFLINE_HARD', setting back as 'ONLINE' with: hostgroup_id=0, weight=1, compression=0, max_connections=1000, use_ssl=0, max_replication_lag=0, max_latency_ms=0, comment=pg-primary
2025-05-13 21:03:51 PgSQL_HostGroups_Manager.cpp:3649:read_only_action_v2(): [INFO] Regenerating table 'pgsql_servers' due to actions on server '127.0.0.1:15432'
2025-05-13 21:03:51 PgSQL_HostGroups_Manager.cpp:3736:read_only_action_v2(): [INFO] Checksum for table pgsql_servers is 0x3D4560A9FDED397C
2025-05-13 21:03:52 PgSQL_HostGroups_Manager.cpp:3673:read_only_action_v2(): [INFO] read_only_action_v2() detected RO=0 on server 127.0.0.1:15432 for the first time after commit(), but no need to reconfigure

With this knowledge, and insights on the behavior that we are expecting from ProxySQL we are now ready for simulate our first failover. Let’s revert this, and continue with our initial server configuration:

radmin=# LOAD PGSQL SERVERS FROM DISK;
LOAD
radmin=# LOAD PGSQL SERVERS TO RUNTIME;
LOAD
radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+--------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 1            | 127.0.0.1 | 15433 | ONLINE | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(2 rows)

We need to setup a database for sysbench, we will be doing this connecting to ProxySQL:

PGPASSWORD='postgres' psql -h127.0.0.1 -p6133 -Upostgres -dpostgres
psql (16.3, server 16.1)
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=> CREATE DATABASE sysbench;
CREATE DATABASE

We also need to prepare the tables we will be using for sysbench to run. We will also be routing this traffic through ProxySQL:

sysbench --db-driver=pgsql --db-ps-mode=disable --pgsql-user=postgres --pgsql_password=postgres --pgsql-db=sysbench --pgsql-host=127.0.0.1 --pgsql-port=6133 --rate=1 --time=0 --tables=10 --table-size=1000 --threads=16 --report-interval=1 --pgsql-sslmode="disable" oltp_read_only prepare

To make the exercise more complete, let’s create some simple query rules that will redirect part of our sysbench traffic to the read_only hostgroup, so it only target our replica. For this, lets briefly run sysbench:

sysbench --db-driver=pgsql --db-ps-mode=disable --pgsql-user=postgres --pgsql_password=postgres --pgsql-db=sysbench --pgsql-host=127.0.0.1 --pgsql-port=6133 --rate=1000 --time=2 --tables=10 --table-size=1000 --threads=16 --report-interval=1 --pgsql-sslmode="disable" oltp_read_only run
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Target transaction rate: 1000/sec
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 16 tps: 1011.52 qps: 16227.21 (r/w/o: 14200.18/0.00/2027.03) lat (ms,95%): 18.28 err/s: 0.00 reconn/s: 0.00
[ 1s ] queue length: 0, concurrency: 3
[ 2s ] thds: 16 tps: 969.95 qps: 15563.23 (r/w/o: 13618.32/0.00/1944.90) lat (ms,95%): 12.98 err/s: 0.00 reconn/s: 0.00
[ 2s ] queue length: 1, concurrency: 8
SQL statistics:
...

After a couple of seconds, we should have gathered enough info inside ProxySQL for creating the rules we are interested into:

radmin=# SELECT hostgroup,database,username,digest,digest_text,count_star FROM stats.stats_pgsql_query_digest LIMIT 10;
 hostgroup | database | username |       digest       |                            digest_text                             | count_star
-----------+----------+----------+--------------------+--------------------------------------------------------------------+------------
 0         | sysbench | postgres | 0x2890ffd6808a3149 | SELECT SUM(k) FROM sbtest9 WHERE id BETWEEN ? AND ?                | 195
 0         | sysbench | postgres | 0x2396d00b02545841 | SELECT SUM(k) FROM sbtest4 WHERE id BETWEEN ? AND ?                | 186
 0         | sysbench | postgres | 0x569d075505d85717 | SELECT c FROM sbtest6 WHERE id BETWEEN ? AND ? ORDER BY c          | 208
 0         | sysbench | postgres | 0x9d058b6f3bc2f754 | SELECT c FROM sbtest4 WHERE id=?                                   | 1990
 0         | sysbench | postgres | 0x578ba02c8a0c4542 | SELECT DISTINCT c FROM sbtest6 WHERE id BETWEEN ? AND ? ORDER BY c | 208
 0         | sysbench | postgres | 0x9af59b998a3688ed | SELECT c FROM sbtest2 WHERE id=?                                   | 1950
 0         | sysbench | postgres | 0x847cd40ba8ea5175 | SELECT DISTINCT c FROM sbtest5 WHERE id BETWEEN ? AND ? ORDER BY c | 215
 0         | sysbench | postgres | 0x4607134a412a3661 | SELECT c FROM sbtest3 WHERE id BETWEEN ? AND ?                     | 225
 0         | sysbench | postgres | 0x381aad21f4326865 | SELECT c FROM sbtest2 WHERE id BETWEEN ? AND ?                     | 206
 0         | sysbench | postgres | 0xd3830cc26b680e5  | SELECT c FROM sbtest4 WHERE id BETWEEN ? AND ? ORDER BY c          | 217
(10 rows)

As expected, all queries are now redirected to the default hostgroup 0, let’s take advantage of ProxySQL routing and redirect some read traffic to the replica, as we would do in a real scenario:

radmin=# INSERT INTO pgsql_query_rules (active,database,match_pattern,destination_hostgroup,apply) VALUES (1,'sysbench','SELECT c FROM sbtest*',1,1);
INSERT 0 1
radmin=# LOAD PGSQL QUERY RULES TO RUNTIME;
LOAD

If we launch against sysbench, we will see that now part of the traffic is reaching hostgroup 1, we should also verify that our query rules are being exercised:

sysbench:

sysbench --db-driver=pgsql --db-ps-mode=disable --skip-trx=1 --pgsql-user=postgres --pgsql_password=postgres --pgsql-db=sysbench --pgsql-host=127.0.0.1 --pgsql-port=6133 --rate=1000 --time=1 --tables=10 --table-size=1000 --threads=16 --report-interval=1 --pgsql-sslmode="disable" oltp_read_only run

Remember that it’s important to set --skip-trx, otherwise transactions will be started and the connection will be bounded to the default hostgroup 0, thus preventing our query_rules to redirect the traffic to the correct hostgroups.

admin:

radmin=# SELECT * FROM stats_pgsql_query_rules;
 rule_id | hits
---------+-------
 1       | 11712
(1 row)

radmin=# SELECT * FROM stats_pgsql_connection_pool;
 hostgroup | srv_host  | srv_port | status | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us
-----------+-----------+----------+--------+----------+----------+--------+---------+-------------+---------+-----------------+-----------------+------------
 0         | 127.0.0.1 | 15432    | ONLINE | 0        | 17       | 17     | 0       | 16          | 34079   | 3426100         | 91216357        | 266
 1         | 127.0.0.1 | 15433    | ONLINE | 0        | 15       | 15     | 0       | 15          | 11712   | 510627          | 26156013        | 270
(2 rows)

We can now simulate traffic to both hostgroups using sysbench, and simulate a failover scenario to see how it will be handled by ProxySQL. We are going to use a custom script for continuously launching sysbench to check the amount of time it took to recover traffic, this is required for two main reasons:

sysbench fails to gracefully report an error when a connection to a server is lost. Instead, it segfaults, this can be verified directly connecting to PostgreSQL, and manually killing one of the connections.
sysbenc module for PostgreSQL doesn’t offer --mysql-ignore-errors.

Our convenience script is going to continuously attempt to relaunch sysbench after it goes down due to the connection failure, and this way, measure the amount of time it took for ProxySQL to detect the fail-over and perform the server switch:

sysbench:

./pg-mon-primary-replica/scripts/sysbench-relaunch.sh
2025.05.14 08:50:27.536
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Target transaction rate: 1000/sec
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

[ 1s ] thds: 16 tps: 967.58 qps: 13588.00 (r/w/o: 13588.00/0.00/0.00) lat (ms,95%): 20.00 err/s: 0.00 reconn/s: 0.00
[ 1s ] queue length: 0, concurrency: 4
[ 2s ] thds: 16 tps: 995.06 qps: 13931.83 (r/w/o: 13931.83/0.00/0.00) lat (ms,95%): 11.65 err/s: 0.00 reconn/s: 0.00
[ 2s ] queue length: 0, concurrency: 8
[ 3s ] thds: 16 tps: 979.00 qps: 13690.06 (r/w/o: 13690.06/0.00/0.00) lat (ms,95%): 11.04 err/s: 0.00 reconn/s: 0.00
[ 3s ] queue length: 0, concurrency: 3
[ 4s ] thds: 16 tps: 1044.98 qps: 14622.68 (r/w/o: 14622.68/0.00/0.00) lat (ms,95%): 13.22 err/s: 0.00 reconn/s: 0.00
[ 4s ] queue length: 0, concurrency: 1
...

Perform the failover in the cluster:

failover:

$ date "+%Y.%m.%d %H:%M:%S.%3N"; docker stop pg-mon-primary-replica-pg_primary-1 && docker exec pg-mon-primary-replica-pg_replica-1 pg_ctl promote -D /var/lib/postgresql/data/
2025.05.14 08:50:46.702
pg-mon-primary-replica-pg_primary-1
waiting for server to promote.... done
server promoted

Back in our script, we might see sysbench crashing and being respawning several times. This is due to the previously mentioned invalid handling of broken (disconnected) connections on sysbench side, and the fact that ProxySQL holds a connection pool for serving traffic. If sysbench was able to handle this correctly, we would just see a couple of error lines with a message like server closed the connection unexpectedly. We are interested in the latest timestamp before sysbench was able to serve traffic again:

./pg-mon-primary-replica/scripts/sysbench-relaunch.sh: line 7: 1470640 Segmentation fault      (core dumped) sysbench --db-driver=pgsql --db-ps-mode=disable --skip-trx=1 --pgsql-user=postgres --pgsql_password=postgres --pgsql-db=sysbench --pgsql-host=127.0.0.1 --pgsql-port=6133 --rate=1000 --time=0 --tables=10 --table-size=1000 --threads=16 --report-interval=1 --pgsql-sslmode="disable" oltp_read_only run
2025.05.14 08:50:46.980

... // Initialization again

./pg-mon-primary-replica/scripts/sysbench-relaunch.sh: line 7: 1471054 Segmentation fault      (core dumped) sysbench --db-driver=pgsql --db-ps-mode=disable --skip-trx=1 --pgsql-user=postgres --pgsql_password=postgres --pgsql-db=sysbench --pgsql-host=127.0.0.1 --pgsql-port=6133 --rate=1000 --time=0 --tables=10 --table-size=1000 --threads=16 --report-interval=1 --pgsql-sslmode="disable" oltp_read_only run
2025.05.14 08:50:47.701
WARNING: Both event and time limits are disabled, running an endless test
sysbench 1.1.0-de18a03 (using bundled LuaJIT 2.1.0-beta3)

Running the test with following options:
Number of threads: 16
Target transaction rate: 1000/sec
Report intermediate results every 1 second(s)
Initializing random number generator from current time


Initializing worker threads...

Threads started!

... // Serving traffic again

[ 1s ] thds: 16 tps: 1021.40 qps: 14360.45 (r/w/o: 14360.45/0.00/0.00) lat (ms,95%): 22.69 err/s: 0.00 reconn/s: 0.00
[ 1s ] queue length: 0, concurrency: 3
[ 2s ] thds: 16 tps: 962.11 qps: 13447.52 (r/w/o: 13447.52/0.00/0.00) lat (ms,95%): 12.98 err/s: 0.00 reconn/s: 0.00
[ 2s ] queue length: 0, concurrency: 4
[ 3s ] thds: 16 tps: 1040.01 qps: 14541.14 (r/w/o: 14541.14/0.00/0.00) lat (ms,95%): 12.30 err/s: 0.00 reconn/s: 0.00
[ 3s ] queue length: 0, concurrency: 4
...

Using the previous timestamps we can check how fast traffic recovered after the fail-over:

2025.05.14 08:50:46.980 // We triggered the failover
2025.05.14 08:50:47.701 // Sysbench was already serving traffic again

In this case, in less than one second since we triggered the fail-over, ProxySQL reconfigured the servers using the new read-only values found and was able to resume serving traffic. Let’s first take a look at the events registered in the Error Log, let’s study by parts the log itself:

proxysql.log:

// Errors take place at the same time the failover is triggered
2025-05-14 08:50:46 PgSQL_Connection.cpp:528:handler(): [ERROR] Error: [57P01] terminating connection due to administrator command, Multi-Statement: 0
2025-05-14 08:50:46 PgSQL_Session.cpp:2868:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,127.0.0.1,15432,0) , FD (Conn:94 , MyDS:94) , user postgres , last_used 9ms ago : 57P01, terminating connection due to administrator command
2025-05-14 08:50:46 PgSQL_Session.cpp:2881:handler_minus1_ClientLibraryError(): [WARNING] Retrying query.
...
2025-05-14 08:50:46 PgSQL_Connection.cpp:528:handler(): [ERROR] Error: [57P01] terminating connection due to administrator command, Multi-Statement: 0
2025-05-14 08:50:46 PgSQL_Session.cpp:2868:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,127.0.0.1,15432,0) , FD (Conn:91 , MyDS:91) , user postgres , last_used 9ms ago : 57P01, terminating connection due to administrator command
2025-05-14 08:50:46 PgSQL_Thread.cpp:3725:process_all_sessions(): [WARNING] Closing unhealthy client connection 127.0.0.1:56936
2025-05-14 08:50:46 PgSQL_Thread.cpp:3725:process_all_sessions(): [WARNING] Closing unhealthy client connection 127.0.0.1:56942
...
... // Previous Monitoring checks fails due to server shutdown
2025-05-14 08:50:47 PgSQL_Monitor.cpp:589:handle_async_check_cont(): [ERROR] Monitor readonly failed   addr='127.0.0.1:15432' status=7 error='FATAL:  terminating connection due to administrator command'
... // Monitor detects the status change (read_only 1 -> 0) in the replica and promotes it as a new writer
2025-05-14 08:50:47 PgSQL_HostGroups_Manager.cpp:3639:read_only_action_v2(): [INFO] Server '127.0.0.1:15433' found with 'read_only=0', but not found as writer
2025-05-14 08:50:47 PgSQL_HostGroups_Manager.cpp:4588:insert_HGM(): [INFO] Creating new server in HG 0 : 127.0.0.1:15433 , weight=1, status=0
2025-05-14 08:50:47 PgSQL_HostGroups_Manager.cpp:3649:read_only_action_v2(): [INFO] Regenerating table 'pgsql_servers' due to actions on server '127.0.0.1:15433'
2025-05-14 08:50:47 PgSQL_HostGroups_Manager.cpp:3736:read_only_action_v2(): [INFO] Checksum for table pgsql_servers is 0xF51E868B7F7B444B
2025-05-14 08:50:47 PgSQL_Connection.cpp:528:handler(): [ERROR] Error: [57P01] terminating connection due to administrator command, Multi-Statement: 0
2025-05-14 08:50:47 PgSQL_Session.cpp:2868:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,127.0.0.1,15432,0) , FD (Conn:97 , MyDS:97) , user postgres , last_used 430ms ago : 57P01, terminating connection due to administrator command
2025-05-14 08:50:47 PgSQL_Session.cpp:2881:handler_minus1_ClientLibraryError(): [WARNING] Retrying query.
2025-05-14 08:50:47 PgSQL_Connection.cpp:528:handler(): [ERROR] Error: [57P01] terminating connection due to administrator command, Multi-Statement: 0
2025-05-14 08:50:47 PgSQL_Session.cpp:2868:handler_minus1_ClientLibraryError(): [ERROR] Detected a broken connection while running query on (0,127.0.0.1,15432,0) , FD (Conn:95 , MyDS:95) , user postgres , last_used 457ms ago : 57P01, terminating connection due to administrator command
... // Query retrials due to the sudden server shutdown still ongoing
2025-05-14 08:50:47 PgSQL_HostGroups_Manager.cpp:266:connect_error(): [ERROR] Shunning server 127.0.0.1:15432 with 5 errors/sec. Shunning for 10 seconds
... // Monitor fails to connect to the old primary
2025-05-14 08:50:48 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'
2025-05-14 08:50:48 PgSQL_HostGroups_Manager.cpp:3673:read_only_action_v2(): [INFO] read_only_action_v2() detected RO=0 on server 127.0.0.1:15433 for the first time after commit(), but no need to reconfigure
... // Monitor keeps attempting the connections to the server at ping intervals and reporting the errors, since the old primary is still down
2025-05-14 08:50:49 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'
2025-05-14 08:50:49 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'
... // Server gets SHUNNED by the number of errors/sec
2025-05-14 08:50:57 PgSQL_HostGroups_Manager.cpp:266:connect_error(): [ERROR] Shunning server 127.0.0.1:15432 with 5 errors/sec. Shunning for 10 seconds
2025-05-14 08:50:57 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'
2025-05-14 08:50:57 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'
2025-05-14 08:50:58 PgSQL_Monitor.cpp:642:handle_async_connect_cont(): [ERROR] Monitor connect failed   addr='127.0.0.1:15432' error='Connection refused  Is the server running on that host and accepting TCP/IP connections?'

Let’s check the server status and hostgroup distribution, and compare it with what we previously learned about servers placement regarding read-only monitoring:

radmin=# SELECT * FROM runtime_pgsql_servers;
 hostgroup_id | hostname  | port  | status  | weight | compression | max_connections | max_replication_lag | use_ssl | max_latency_ms |  comment
--------------+-----------+-------+---------+--------+-------------+-----------------+---------------------+---------+----------------+------------
 0            | 127.0.0.1 | 15432 | SHUNNED | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-primary
 0            | 127.0.0.1 | 15433 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
 1            | 127.0.0.1 | 15433 | ONLINE  | 1      | 0           | 1000            | 0                   | 0       | 0              | pg-replica
(3 rows)

radmin=# SELECT * FROM stats_pgsql_connection_pool;
 hostgroup | srv_host  | srv_port | status  | ConnUsed | ConnFree | ConnOK | ConnERR | MaxConnUsed | Queries | Bytes_data_sent | Bytes_data_recv | Latency_us
-----------+-----------+----------+---------+----------+----------+--------+---------+-------------+---------+-----------------+-----------------+------------
 0         | 127.0.0.1 | 15432    | SHUNNED | 0        | 0        | 17     | 10      | 16          | 72424   | 6056378         | 331843637       | 137
 0         | 127.0.0.1 | 15433    | ONLINE  | 0        | 8        | 8      | 0       | 8           | 21624   | 1483251         | 136005016       | 202
 1         | 127.0.0.1 | 15433    | ONLINE  | 0        | 16       | 16     | 0       | 16          | 371640  | 16200232        | 829889526       | 202
(3 rows)

The previous primary (pg-primary) have been SHUNNED due to the continuous connections errors that experienced, and has been left in this state, since monitoring actions are still unable to connect. Meanwhile, the previous replica (pg-replica) has been moved to both hostgroups (0 and 1), since now it’s no longer acting as a replica, and the transition in its read-only value (from 1 to 0) has been detected by ProxySQL. Like we previously demonstrated, when this transition is detected by ProxySQL, and pgsql-monitor_writer_is_also_reader is true, the server is placed in both reader and writerhostgroups.

We have achieved a successful fail-over 🎉! The servers status have been modified and traffic is being properly routed in for both hostgroups 0 and 1 towards the new primary (old replica). Our simulated unplanned fail-over was detected and contained, resulting in less than 1 second of traffic interruption 🎊 🎊 🎉🎉!

Conclusion

We have demonstrated the power and flexibility of ProxySQL in monitoring and managing PostgreSQL deployments. We’ve explored how to configure ProxySQL for basic monitoring, handle ping errors, and leverage read-only monitoring for replication setups. We even successfully simulated an unplanned failover, showcasing ProxySQL‘s ability to automatically detect the failure and redirect traffic to the new primary with minimal interruption.

Hopefully, you are now more familiar with many ProxySQL concepts and utilities to apply to your solutions, and as always, happy ProxySQLing!

The post PostgreSQL Monitoring with ProxySQL appeared first on ProxySQL.

Releasing ProxySQL v3.0.2!

Javier Jaramago Fernandez — Mon, 25 Aug 2025 19:03:54 +0000

We are proud to announce the latest release of ProxySQL version 3.0.2

ProxySQL is a high performance, high availability, protocol aware proxy for MySQL and PostgreSQL, with a GPL license! It is freely usable and accessible according to the GNU GPL v3.0 license. It can be downloaded from the ProxySQL Repository or for a Docker image check out the Official ProxySQL Docker Repository.

Release Overview Highlights

ProxySQL v3.0.2 is a minor release for the 3.0 series containing some enhancements and several bug fixes. In this version, the MySQL module is considered stable, while the PostgreSQL module is still in beta and is undergoing active development.

Be sure to try out the ProxySQL 3.0.2 release and let us know your experience!

Release Highlights

As always, we strongly recommend reading the full release notes for ProxySQL 3.0.2. While we make every effort to ensure backward compatibility during minor releases to facilitate a smooth update process, there may be exceptions that require manual checks. These changes may arise from necessary improvements or updates to third-party dependencies.

Here are some of the key highlights of this release:

PostgreSQL:

Connection Reuse Enhancements: Important fixes on connection reset for connection pooling.
Improved Error Reporting (Backend): Improved processing and report format for errors on backend connections (‘libpq’).

MySQL:

Extended Group Replication Support: Added Group Replication support for MySQL 8.4 and 9.x.

A special thanks to all the people that report bugs: this makes each version of ProxySQL better than the previous. Please report any bugs or feature requests on the project’s Github issue tracker.

If you have any questions please do not hesitate to contact us. Our performance and scalability experts are readily available to evaluate the overall state of your database infrastructure and help to build a robust and reliable MySQL HA architecture.

We also offer consulting, long term support and training for ProxySQL & MySQL Environments

The post Releasing ProxySQL v3.0.2! appeared first on ProxySQL.

Enhancing Your Database Connection Pooling for Improved Performance

seo insidea — Tue, 08 Apr 2025 11:32:34 +0000

Businesses often turn to caching solutions to improve database performance. It speeds up read and write operations by storing frequently accessed data in memory, significantly reducing network latency. However, you can improve performance even more by optimizing how your database manages user connections through connection pooling.

Connection pooling is essential in reducing the overhead of repeatedly establishing and closing database connections. It minimizes resource consumption and ensures your database can handle spikes in traffic without bottlenecks by reusing active connections.

This blog will guide you through the essentials of connection pooling, from understanding its purpose to implementing best practices. You’ll learn how to optimize configurations, monitor usage, and address common challenges to ensure your database performs at its best—even under heavy demand.

What is Database Connection Pooling?

Database connection pooling is a technique for managing database connections efficiently by reusing existing connections rather than opening new ones for each request. In regular database communication, each client request initiates a new connection to the database, which can create overhead in terms of time and resources.

Connection pooling solves this by maintaining a pool of active connections that can be reused, reducing the need to open and close connections repeatedly.

How Does Database Connection Pooling Work?

Database connection pooling works in the following way:

Initialization: When an application starts, a pool of database connections is created and stored. These connections, including those to primary and backup servers, are only established when required and explicitly instructed to do so, ensuring efficient resource utilization.
Request Handling: Instead of establishing a new connection every time a request is made, the application simply borrows an available connection from the pool.
Recycling Connections: After a complete database operation, the connection is returned to the pool to be reused by other requests. This recycling of connections ensures minimal overhead.

Benefits of Connection Pooling

Here are some of the most essential benefits of connection pooling:

Performance: Connection pooling minimizes the time spent establishing new connections, resulting in faster database interactions. This is especially beneficial for applications handling frequent and concurrent database requests.
Resource Efficiency: Connection pooling reduces the strain on the database server by limiting the number of active connections, preventing resource exhaustion, and ensuring more efficient database utilization.
Scalability: As traffic grows, connection pooling ensures the application can handle more users and requests without causing significant delays or overloading the database. By reusing a limited number of connections, pooling helps maintain a smooth and responsive user experience even under heavy load.

In high-demand environments like e-commerce sites, online gaming, and real-time applications, connection pooling can significantly improve database performance, reduce latency, and optimize system resources. It is a fundamental component for any application that relies heavily on database interactions.

Common Challenges in Connection Pooling

While database connection pooling offers significant performance and scalability benefits, several challenges must be considered when implementing or maintaining it. If not addressed, these challenges can impact the effectiveness of connection pooling and lead to issues such as resource wastage, bottlenecks, or degraded performance.

1. Connection Leaks

One of the most common problems in connection pooling is a connection leak, which occurs when an application fails to release a connection back to the pool after it has finished using it. This can cause the pool to eventually run out of available connections, leading to a system slowdown or failure to process requests. Connection leaks often happen due to programming errors, such as failing to close connections properly after use, especially when exceptions are thrown.

Solution: Implement proper exception handling and ensure that every connection is returned to the pool, even in the event of an error. Many connection pool managers have built-in features to help track and identify leaks.

2. Exhausting the Connection Pool

Another challenge is the potential for the connection pool to exhaust its available connections under high traffic or load conditions. If the application requires more connections than the pool can handle, users may experience delays or errors when accessing the database.

Solution: Set up your pool with maximum connections that reflect your system’s needs. Consider adjusting the pool size based on the application’s usage patterns and monitoring database performance to fine-tune the maximum connections. Additionally, implementing connection timeouts and pooling strategies prioritizing efficient connection reuse can help prevent exhaustion.

3. Database Connection Configuration

Improper configuration of the connection pool can lead to inefficiencies or problems with performance. Factors such as the maximum number of connections, connection timeout, idle time, and how connections are validated before use can significantly impact the system’s overall behavior.

Solution: Carefully configure pool parameters based on your application’s specific requirements. For example, set reasonable connection timeouts to avoid unnecessarily holding connections too long or allowing idle connections to linger. Use health checks or validation queries to ensure connections are healthy before being handed out.

4. Thread Contention

Connection pools often face thread contention issues, especially in multi-threaded environments. If many threads simultaneously request database connections, they may compete for a limited number of available connections, leading to delays in acquiring a connection and potentially blocking threads.

Solution: To manage thread contention, ensure the pool size is adequate for the volume of concurrent requests your application expects. Additionally, optimize your database queries and application logic to reduce unnecessary connection requests, minimizing the load on the connection pool.

5. Inconsistent Connection Pool Behavior Across Environments

Connection pooling behavior can vary depending on the environment—development, staging, or production. This can cause unexpected issues when moving from one environment to another, such as differing connection pool configurations or database access patterns that weren’t foreseen during testing.

Solution: Maintain consistent configuration practices across all environments. Automate the deployment and configuration of connection pools to avoid discrepancies and ensure that performance testing includes realistic traffic patterns and database configurations to identify potential issues early.

Best Practices for Optimizing Database Connection Pooling

Optimizing database connection pooling is crucial for ensuring efficient database operations and maintaining high application performance. When properly configured and managed, connection pooling can dramatically reduce the overhead of opening and closing database connections, leading to faster response times, better resource utilization, and improved scalability. Below are some best practices for optimizing database connection pooling:

1. Determine the Optimal Pool Size

Your connection pool size can significantly impact performance and resource consumption. Too many connections can overload the database, while too few can result in connection bottlenecks.

Best Practice: Carefully assess your application’s workload and set a pool size to handle peak demand without overwhelming the database. Monitor usage and adjust the pool size accordingly.
Tip: Some systems use automatic scaling to adjust the pool size based on traffic, while others may require manual adjustments.

2. Configure Connection Timeout and Idle Time

Connection timeout and idle time settings are essential for managing how long connections should wait before timing out and how long idle connections can remain in the pool.

Best Practice: Set a reasonable connection timeout to ensure that connections that can’t be established quickly don’t block the application indefinitely. Similarly, configure idle time to ensure idle connections don’t waste resources.
Tip: For applications with unpredictable traffic spikes, it’s advisable to set idle connections to be released after a specific time to keep the pool from getting unnecessarily large.

3. Monitor and Manage Connection Leaks

A connection leak occurs when a connection is not returned to the pool after use, leading to resource exhaustion and eventual application failure.

Best Practice: Implement monitoring tools and logging mechanisms to detect and resolve connection leaks. Many connection pool libraries, such as HikariCP, have built-in leak detection features that can be enabled to automatically flag connections that were not properly closed.
Tip: Ensure proper error handling and the use of connection management patterns to guarantee connections are always released.

4. Use Connection Validation

If your application relies on long-lived connections, there’s a risk that connections in the pool may become stale or invalid, especially after idle periods.

Best Practice: Use connection validation queries (such as SELECT 1 for ProxySQL) to test the validity of connections before handing them over to the application. This ensures that the application does not try to use faulty connections.
Tip: Validate connections on checkout or at regular intervals to ensure all connections are functional, especially in highly dynamic environments.

5. Implement Efficient Query Management

Efficient query execution is just as important as efficient connection pooling. Poorly optimized queries can quickly exhaust the available pool and increase load times, defeating the purpose of connection pooling.

Best Practice: Optimize your queries to reduce database load. Indexing frequently used columns, minimizing complex joins, and caching results where possible can significantly reduce query execution time.
Tip: Use tools like EXPLAIN to review query plans and identify bottlenecks without executing the query. For deeper analysis, Explain Analyze can provide valuable diagnostic insights but should be used with caution, as it executes the query and might double its runtime. Consider running it during maintenance windows or on a staging environment that closely mirrors production.

6. Configure Pool Sizing for Failover and Load Balancing

In distributed systems or cloud environments, load balancing and failover mechanisms are essential to maintaining high availability and ensuring the database remains responsive under heavy loads.

Best Practice: Configure your pool to work with load balancers and failover systems when using connection pooling in a distributed environment. This ensures that connections are distributed evenly across servers and that the system remains operational even if a server goes down.
Tip: Use a solution like ProxySQL, which combines connection pooling and load balancing capabilities. This allows you to manage queries efficiently across multiple database instances without the need for separate tools.

7. Adjust Pool Settings for Multi-Tenant Applications

For multi-tenant applications, where each client has different database access requirements, connection pooling must be adapted to handle varying loads efficiently.

Best Practice: Segment connections for different tenants or prioritize certain tenants over others based on their specific needs (e.g., VIP clients may require faster database access). Many connection pool managers allow for fine-tuning these settings.
Tip: Implement tenant-specific connection limits to avoid one tenant consuming excessive database resources.

8. Enable Connection Pool Statistics

Monitoring connection pool performance through built-in statistics can help you identify issues such as pool exhaustion, excessive waits for connections, or unexpected spikes in connection usage.

Use statistics collection for your connection pool to track metrics like connection wait time, connection pool utilization, and the number of active connections. These metrics provide actionable insights into pool performance and potential bottlenecks.

Scale and Optimize Your Database Performance with ProxySQL

When managing large-scale database environments, especially with thousands of servers and 100K+ connections, ProxySQL stands out as a strong and efficient solution. With its advanced connection pooling, intelligent query routing, and high availability features, ProxySQL ensures your database infrastructure operates at peak performance, even under demanding conditions.

By adopting ProxySQL, you can:

Scale Seamlessly: Handle 100K+ connections across thousands of servers easily, ensuring smooth database operations.
Maximize Efficiency: Leverage connection multiplexing to drastically reduce resource usage and manage traffic more effectively.
Ensure High Availability: Benefit from failover detection and automatic traffic rerouting to maintain uptime and prevent disruptions.
Boost Performance: Use advanced query rules to intelligently route traffic, optimize database workloads, and balance the load between primary and replica databases.

No matter the complexity of your database setup, ProxySQL offers a lightweight, flexible solution that is compatible across multiple platforms.

Learn how ProxySQL can transform your database performance.

The post Enhancing Your Database Connection Pooling for Improved Performance appeared first on ProxySQL.

Setting Up a ProxySQL Cluster: A Comprehensive Guide for High Traffic Databases

seo insidea — Wed, 26 Mar 2025 13:17:46 +0000

High-traffic databases present significant challenges, especially when managing and optimizing performance across large, distributed environments. According to a recent survey, 47% of IT leaders report that poor data management hinders their ability to make strategic decisions. Additionally, 31% claim it prevents them from staying ahead of the competition. Inefficient data management blocks the ability to use emerging technologies.

Businesses need tools that help optimize database performance and ensure data flows seamlessly across systems, even under heavy traffic conditions.

ProxySQL, with its ability to handle database load balancing and high availability, is an essential proxy for businesses facing the pressures of high-volume data operations. This blog will explore how ProxySQL can help you address these challenges and scale your database infrastructure effectively.

Why Should Businesses Use ProxySQL for High-Traffic Databases?

ProxySQL offers multiple benefits for high-traffic databases, making it an ideal solution for businesses seeking high availability, scalability, and optimized performance.

Load Balancing and Scalability: ProxySQL provides advanced load balancing capabilities across MySQL, MariaDB, and other relational database systems. It intelligently distributes queries across available database nodes, ensuring no single server is overwhelmed by heavy traffic.
High Availability in Database Architectures: ProxySQL plays a critical role in high-availability database setups by efficiently routing traffic and balancing loads across database nodes. It works seamlessly with high availability frameworks like Galera Cluster or MySQL InnoDB Cluster to support automatic failover and minimize downtime during server failures. This makes it an invaluable component for maintaining database reliability and performance.
Query Caching: ProxySQL includes a query caching feature that can significantly improve response times for frequently executed queries. ProxySQL reduces the load on backend databases by caching results, which is especially useful for high-traffic websites or applications.
Security and Query Routing: ProxySQL improves security by allowing administrators to enforce user access controls and routing policies at the proxy level, reducing direct access to the database and minimizing exposure to security threats. This feature helps secure high-traffic databases where sensitive information is at risk. ProxySQL’s flexible query routing can also direct specific types of queries to dedicated servers based on predefined rules, optimizing query performance across different workload types.
Centralized Management: ProxySQL simplifies database administration by centralizing configuration and monitoring tasks for large organizations managing multiple database nodes. This centralized approach helps database administrators quickly identify and address performance issues.

Also read: Advanced ProxySQL Features and How to Leverage Them

Steps to Setting Up a 3-Node ProxySQL Cluster

Setting up a 3-node ProxySQL cluster ensures scalability, high availability, and adequate load balancing for high-traffic databases. Here are the detailed steps you need to follow:

1. Install ProxySQL on All Three Nodes

The first step is to install ProxySQL on each of the three nodes in your cluster. The installation process may vary depending on your operating system, but it typically involves updating your package manager and installing the ProxySQL software.

2. Configure Basic Settings on Each Node

After installation, configure the basic settings for each ProxySQL node. This includes setting up MySQL user credentials for administrative tasks and ensuring connectivity to your backend servers. Each node should be configured to handle queries efficiently, and the necessary system configurations (such as timeouts and connection limits) should be set.

3. Define Nodes and Host Groups

After installing ProxySQL, you need to configure the backend MySQL servers that the ProxySQL cluster will manage. This involves defining MySQL server entries on each ProxySQL node. These backend servers are grouped into host groups based on their role (e.g., read-write or read-only) to enable efficient load balancing and query routing. Each ProxySQL node in the cluster will share this configuration to ensure consistent query management across the setup.

4. Set Up Query Routing and Load Balancing

ProxySQL allows you to configure query routing for efficient load balancing and direct queries to the appropriate server. You will define rules to direct, read, and write queries to different host groups based on their type. ProxySQL supports load balancing across all nodes making part of the same hostgroup

5. Synchronize Configurations Across All Nodes

A main benefit of deploying a 3-node ProxySQL cluster is its built-in synchronization capability. Once the cluster is established, configuration updates are automatically propagated across all nodes, ensuring consistency without requiring manual intervention.

The Core nodes, defined in the runtime_proxysql_servers table, act as the primary source for configuration changes. These nodes distribute updates to the Satellite nodes, maintaining alignment across the entire cluster.

This semi real-time synchronization ensures that all nodes share consistent settings, including query rules, server assignments, and other vital parameters. As a result, your ProxySQL cluster remains resilient, scalable, and easy to manage with minimal effort.

6. Monitor Cluster Status

After configuring and synchronizing your ProxySQL cluster, monitoring its status is essential. This includes verifying that each node is active and ensuring the load is properly balanced across the nodes. ProxySQL provides monitoring tools to track each node’s health and see how queries are distributed.

7. Test the Setup

After setting up your ProxySQL cluster, it’s crucial to verify its functionality by simulating traffic and running test queries. While the ProxySQL cluster is responsible for managing traffic distribution, it’s important to remember that the routing and load balancing functionality are independent of the ability of individual ProxySQL nodes to serve or route traffic.

To test the setup, execute sample SELECT queries and ensure they are correctly routed to the appropriate backend servers, as defined by your query rules and load balancing configuration. This process confirms that the cluster’s routing logic and load balancing features are functioning properly, with traffic being handled efficiently by the designated backend servers.

Managing and Verifying Configuration in ProxySQL

Proper management and verification of configurations in ProxySQL are vital for ensuring that your high-traffic database environment remains efficient, secure, and reliable. Here’s how to manage and verify configurations in a ProxySQL setup:

1. Managing Configuration Changes

ProxySQL stores configurations in its internal tables, making it easy to manage settings. All changes are stored in the runtime configuration tables until explicitly saved to disk. To manage configurations, follow these steps:

Edit Runtime Tables: You cannot directly edit the runtime tables in ProxySQL. Instead, modify the configuration tables related to servers, users, query rules, or variables. After making changes, apply them to the runtime by issuing the appropriate LOAD … TO RUNTIME commands. This allows you to make immediate adjustments without affecting the disk-based configuration, providing flexibility for real-time changes while maintaining system consistency.
Saving Configuration to Disk: After making runtime changes, you need to save the configuration to disk using the SAVE CONFIG TO DISK command. This ensures that your changes persist across reboots of the ProxySQL instance.
Clear Runtime Configuration: To discard changes made to the runtime configuration, you can use the LOAD CONFIG FROM DISK command. This command reloads the settings stored in the configuration file from the disk to the runtime environment.

2. Verifying Configurations

After configuring ProxySQL, it’s essential to verify that the settings are applied correctly across all nodes in the cluster and that configurations match the intended setup.

Checksum Verification: ProxySQL uses checksums to ensure that configurations are identical across nodes. These checksums are not related to configuration files but to the internal tables that need to be synchronized across the cluster. Each configuration table (e.g., servers, users, query rules) is associated with a checksum value, which is updated whenever changes are made. You can compare the checksums of different nodes to confirm that their configurations are in sync.
Timestamps: ProxySQL tracks configuration changes using the epoch, changed_at, and updated_at fields in the configuration tables. This helps understand when a change was made and whether the configuration is up-to-date.
View Current Configurations: You can query the ProxySQL database to view the current configurations and their details. For example, use the SELECT * FROM global_variables query to display the runtime configuration variables for admin, mysql, and monitor. For configurations related to servers, users, query rules, and replication hostgroups, you need to query the specific runtime tables related to those settings.

3. Handling Failures and Inconsistencies

ProxySQL uses checksums to detect discrepancies in configurations and provides mechanisms to resolve them effectively.

Detecting and Resolving Failures

ProxySQL flags inconsistencies across nodes using the proxysql_servers table. Dissemination failures are prevented by:

Manual Pushes: Administrators can push configurations to nodes manually.
Configuration Backup: Nodes retain the last known configuration for stability.

Self-Healing and Synchronization

Outdated nodes automatically synchronize with the latest configurations upon reconnecting, utilizing ProxySQL’s self-healing capabilities.

By following these strategies and utilizing ProxySQL’s tools, administrators ensure consistent, reliable configuration management.

4. Monitoring Configuration Changes

To ensure ongoing health and proper configuration, ProxySQL provides tools for monitoring configuration changes:

Cluster Status: You can query the status of the entire ProxySQL cluster to see each node’s configuration state. This helps identify discrepancies or nodes that may still need to load the latest configuration successfully.
Configuration Change Log: Monitor the proxysql.log or use ProxySQL’s built-in monitoring tools to track system changes and troubleshoot issues related to configuration settings.

Real-Time Configuration Synchronization and Propagation in ProxySQL

ProxySQL enables real-time configuration synchronization and propagation across nodes in a cluster, ensuring consistency and availability of settings, which is crucial for maintaining high performance in high-traffic databases. Here’s how ProxySQL handles this process:

1. Automatic Synchronization Across Nodes

In a ProxySQL cluster, configurations are synchronized in real time between all nodes. When a configuration change is made on one node (one of the core nodes), the changes are automatically propagated to the other (core and satellite) nodes in the cluster. This propagation ensures that all nodes consistently run with the same configuration, preventing discrepancies that could lead to performance degradation or failures.

Replication of Configuration Tables: ProxySQL’s internal configuration tables, such as mysql_servers (for backend servers) and mysql_users (for backend users), are replicated across all nodes in the cluster. Changes made on one core node are automatically synchronized across others through the cluster synchronization process. This ensures consistency and reduces the need for manual updates, maintaining uniform configurations throughout the cluster.
Consistency Checks: ProxySQL regularly verifies that configurations are synchronized between the nodes. Depending on the setup, if a discrepancy is detected, the system can alert administrators or automatically resolve the issue.

2. Configurable Propagation Timing

Configuration propagation timing can be customized. By default, changes made on the source node are propagated to the other nodes in real-time. However, you can fine-tune the propagation interval based on your needs. This allows for more control over when and how changes occur across the cluster, particularly in high-traffic environments where downtime or disruption must be minimized.

Propagation Speed: The propagation process is generally very fast, but depending on the cluster size and the complexity of the changes, it might take a few seconds for the updates to fully propagate across all nodes. ProxySQL’s design aims to minimize latency, ensuring minimal impact on database traffic.
Load Balancing Adjustments: As part of the configuration synchronization, load balancing settings such as query rules, backend server weights, and other operational parameters are also propagated. This ensures that all nodes use the same load-balancing strategy, preventing traffic from being routed unevenly.

3. Handling Configuration Conflicts

If configuration conflicts exist (e.g., different settings applied on different nodes), ProxySQL uses conflict resolution mechanisms to ensure that the cluster remains consistent. In most cases, the conflict will be resolved by the core nodes’ settings, but administrators can intervene if needed to resolve more complex issues.

Version Control and Rollback: ProxySQL supports configuration versioning. Administrators can roll back to a previous version if a new configuration goes wrong, ensuring the system’s reliability.
Failed Propagation Handling: When configuration propagation fails on a node, the system notifies administrators, allowing them to review and resolve the issue manually.

4. Monitoring and Verification

Administrators can monitor the synchronization status in real-time to ensure that configurations are properly synchronized and that no discrepancies arise during propagation.

Cluster Health Checks: Regular health checks monitor the synchronization status, check for successful propagation, and flag any issues. These checks can be automated or manually triggered.
Checksum and Timestamp Validation: ProxySQL uses checksums and timestamps (such as the changed_at and updated_at fields in configuration tables) to verify that the configuration changes were successfully applied and synchronized across all nodes.

5. Dynamic Adjustments Without Downtime

ProxySQL allows administrators to make dynamic changes to configurations without downtime. The real-time synchronization ensures that changes are applied seamlessly without disrupting ongoing database operations.

Immediate Effect of Changes: Configuration changes made on one node take effect across the entire cluster almost immediately, ensuring that high-traffic databases can continue functioning without disruption.

Manage Your High-Traffic Database with ProxySQL!

ProxySQL stands out as a must-have tool for managing high-traffic databases with its powerful capabilities for load balancing, real-time configuration synchronization, and seamless scalability. Whether you’re handling increased traffic spikes, optimizing resource usage, or ensuring consistent performance across a multi-node cluster, ProxySQL offers the flexibility and reliability required for mission-critical systems.

If you’re ready to manage your database and simplify your cluster operations, setting up a ProxySQL cluster could be your game-changer. Don’t wait for performance bottlenecks to impact your user experience or business outcomes. Start implementing ProxySQL today and experience the benefits of real-time, automated configuration synchronization, robust failover support, and enhanced scalability.

Interested in getting started? Get a personalized consultation to ensure your system is optimized for the future.

Contact our experts now!

The post Setting Up a ProxySQL Cluster: A Comprehensive Guide for High Traffic Databases appeared first on ProxySQL.

ProxySQL

Making MySQL AI-Ready: How MyVector and ProxySQL Work Together

The Core Idea

Why This Matters

The RAG Ingest Pipeline

Migration Path

The Honest Tradeoff

Bringing GenAI to Every MySQL Instance: ProxySQL v4.0

The Problem with “Just Migrate”

The Transparent AI Layer

What’s Actually in v4.0

What’s Still Coming

The Bottom Line

We are hiring at ProxySQL!

ProxySQL C/C++ Developer [JOBSPEC-L41]

What You’ll Do

Required Skills

Modern Development & AI Tools

ProxySQL 3.0.5: Security hardening, protocol fixes, and smoother operations

ProxySQL 3.0.5: Security hardening, PostgreSQL protocol fixes, and smoother operations

What’s new in 3.0.5 (and why it matters)

PostgreSQL frontend SSL enforcement: closing a real security gap

MySQL compatibility: control SELECT VERSION() behavior

Better parsing for SET time_zone with modern timezone identifiers

PostgreSQL extended protocol: Bind packet correctness

Safer fork/exec behavior: file descriptor handling improvements

Admin API hardening: parameterized prepared statements

Upgrade notes: what we recommend checking

Thank you—and what’s next

ProxySQL 3.0.3 and 3.0.4: A Smoother Day-to-Day for PostgreSQL and MySQL Operators

ProxySQL 3.0.3 and 3.0.4: A Smoother Day-to-Day for PostgreSQL and MySQL Operators

PostgreSQL: Fewer “Driver Weirdness” Moments, More Predictability

MySQL: Better Behavior Around the Edges

Monitoring: More Honest Numbers, Better Signals

Safety and Stability: Quiet Improvements That Matter

The Bottom Line

Decrypting SSL/TLS Traffic with Wireshark and ProxySQL

Decrypting SSL/TLS Traffic with Wireshark and ProxySQL

Prerequisites

Enabling SSLKEYLOG in ProxySQL

Enabling SSLKEYLOG via Configuration File

Enabling SSLKEYLOG via Runtime Queries

Capturing and Decrypting SSL/TLS Traffic

Capturing SSL/TLS Traffic

Decrypting SSL/TLS Traffic

Analyzing Decrypted Traffic

Decrypting SSL/TLS Traffic

Decoding SSL/TLS Traffic as MySQL Protocol

Understanding Absolute and Relative Paths

PostgreSQL Monitoring with ProxySQL

Primary-Replica Setup

ProxySQL Configuration

Ping Errors – Servers SHUNNING

Replication Hostgroups – Failover Handling

Conclusion

Releasing ProxySQL v3.0.2!

We are proud to announce the latest release of ProxySQL version 3.0.2

Release Overview Highlights

Be sure to try out the ProxySQL 3.0.2 release and let us know your experience!

Release Highlights

Enhancing Your Database Connection Pooling for Improved Performance

What is Database Connection Pooling?

How Does Database Connection Pooling Work?

Benefits of Connection Pooling

Common Challenges in Connection Pooling

1. Connection Leaks

2. Exhausting the Connection Pool

3. Database Connection Configuration

4. Thread Contention

5. Inconsistent Connection Pool Behavior Across Environments

Best Practices for Optimizing Database Connection Pooling

1. Determine the Optimal Pool Size

2. Configure Connection Timeout and Idle Time

3. Monitor and Manage Connection Leaks

4. Use Connection Validation

5. Implement Efficient Query Management

6. Configure Pool Sizing for Failover and Load Balancing

7. Adjust Pool Settings for Multi-Tenant Applications

8. Enable Connection Pool Statistics

Scale and Optimize Your Database Performance with ProxySQL

MySQL compatibility: control `SELECT VERSION()` behavior

Better parsing for `SET time_zone` with modern timezone identifiers