No description

Kotlin 52.8%
Python 47.2%

Find a file

Say 7a5a8971ed change artifact name		2026-03-21 15:27:13 +01:00
gradle/wrapper	chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps	2026-03-20 16:05:33 +01:00
scripts	add script for testing sql performance in pipeline	2026-03-21 13:39:46 +01:00
spring-sql-saver	change artifact name	2026-03-21 15:27:13 +01:00
spring-sql-scanner	fix(spring-sql-scanner): add Jackson to testImplementation so unit tests can resolve ObjectMapper at runtime	2026-03-20 23:25:42 +01:00
.gitignore	proof of concept python tool to analyze DML dump	2026-03-21 13:23:43 +01:00
.mise.toml	chore: upgrade to Kotlin 2.3.20, Java 25, Spring Boot 4.0.4, datasource-proxy 2.0.0	2026-03-20 17:27:12 +01:00
gradlew	chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps	2026-03-20 16:05:33 +01:00
gradlew.bat	chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps	2026-03-20 16:05:33 +01:00
README.md	add script for testing sql performance in pipeline	2026-03-21 13:39:46 +01:00
settings.gradle.kts	feat(spring-sql-saver): add subproject scaffold, listener interface, properties, and SQL file writer	2026-03-20 22:31:55 +01:00

README.md

say-spring-sql-scanner

A collection of Spring Boot auto-configuration libraries that hook into datasource-proxy to observe SQL at runtime. Designed for use in E2E/integration test suites.

Library	What it does
`spring-sql-scanner`	Detects full table scans via `EXPLAIN FORMAT=JSON`; throws or warns
`spring-sql-saver`	Appends every executed SQL statement as tab-separated `parameterized TAB concrete` to `./sql.log`

spring-sql-scanner

Quick start

1. Publish to your local Maven repository

./gradlew :spring-sql-scanner:publishToMavenLocal

2. Add mavenLocal() to your consuming project

repositories {
    mavenLocal()
    mavenCentral()
}

3. Add the dependency

// build.gradle.kts
testImplementation("io.github.sqlscanner:spring-sql-scanner:0.1.0-SNAPSHOT")

<!-- pom.xml -->
<dependency>
  <groupId>io.github.sqlscanner</groupId>
  <artifactId>spring-sql-scanner</artifactId>
  <version>0.1.0-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

Usage

The library auto-configures itself. Add to your test application.properties:

full-table-scan.enabled=true
full-table-scan.fail-on-detection=true

Any SELECT that MySQL's query planner resolves with a full table scan will throw a FullTableScanException at query execution time, failing the test.

@SpringBootTest
class MyServiceTest {

    @Autowired
    lateinit var myService: MyService

    @Test
    fun `querying without an index throws FullTableScanException`() {
        assertThatThrownBy { myService.findByUnindexedColumn("value") }
            .isInstanceOf(FullTableScanException::class.java)
    }
}

Configuration

Property	Default	Description
`full-table-scan.enabled`	`true`	Master switch
`full-table-scan.fail-on-detection`	`true`	Throw `FullTableScanException` on detection; when `false` logs a warning instead

How it works

datasource-proxy wraps your DataSource and intercepts every query.
After each SELECT, FullTableScanListenerImpl runs EXPLAIN FORMAT=JSON <query> on the same connection.
ExplainResultParser walks the JSON and checks every query block for access_type: "ALL".
If detected and fail-on-detection=true, a FullTableScanException is thrown.

spring-sql-saver

Quick start

1. Publish to your local Maven repository

./gradlew :spring-sql-saver:publishToMavenLocal

2. Add mavenLocal() to your consuming project

repositories {
    mavenLocal()
    mavenCentral()
}

3. Add the dependency

// build.gradle.kts
testImplementation("io.github.sqlscanner:spring-sql-saver:0.1.0-SNAPSHOT")

<!-- pom.xml -->
<dependency>
  <groupId>io.github.sqlscanner</groupId>
  <artifactId>spring-sql-saver</artifactId>
  <version>0.1.0-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

Usage

The library auto-configures itself with no additional properties required. After your test suite runs, sql.log at the Gradle root will contain one line per executed statement. Each line is tab-separated: the left column has the parameterized query (for deduplication), the right column has the same query with concrete parameter values substituted in (for running EXPLAIN):

SELECT * FROM users WHERE id = ?	SELECT * FROM users WHERE id = 42
INSERT INTO orders (user_id, total) VALUES (?, ?)	INSERT INTO orders (user_id, total) VALUES (3, 99.95)
SELECT * FROM users WHERE id = ?	SELECT * FROM users WHERE id = 99

Pipeline 1: EXPLAIN analysis (DML only)

Deduplicate on the parameterized column, filter to DML statements that EXPLAIN supports, and pipe one representative concrete statement per unique query shape into MySQL:

sort -t$'\t' -k1,1 -u sql.log \
  | cut -f2 \
  | grep -iE '^(SELECT|INSERT|UPDATE|DELETE)\b' \
  | while read -r sql; do
      mysql -h 127.0.0.1 -u root -D my_database -e "EXPLAIN $sql"
    done

Use -h 127.0.0.1 instead of -h localhost to force a TCP connection and avoid Can't connect to local MySQL server through socket errors when connecting to a Docker-hosted database.

For structured reporting, use the included Python script instead:

pip install mysql-connector-python
python3 scripts/explain_parser.py \
    --database my_database \
    --user root \
    --password secret \
    --min-rows 100

The script deduplicates sql.log, runs EXPLAIN FORMAT=JSON for each unique DML shape, and prints a table of full table scans sorted by estimated row count. Pass --output json for machine-readable output. Run python3 scripts/explain_parser.py --help for all options.

Pipeline 1b: CI performance gate with whitelist (`check_sql_perf.py`)

For use in a test pipeline where you want the build to fail on new full table scans. The script deduplicates sql.log, runs EXPLAIN FORMAT=JSON for each unique DML shape, and checks every detected scan against a shared JSON whitelist file committed to your repo.

scripts/
  check_sql_perf.py          # the gate script
  sql_whitelist.json         # committed to your consuming project's repo
  sql_whitelist.example.json # copy this to get started

Install dependencies (once):

pip install mysql-connector-python rich   # rich is optional but recommended

Run interactively (developer workstation):

python3 scripts/check_sql_perf.py \
    --database my_database \
    --user root \
    --password secret \
    --whitelist path/to/sql_whitelist.json

When a new full table scan is found that is not on the whitelist, the script prints a risk warning and prompts:

  Options:
  1 → Fix the query (exit now with code 1)
  2 → Accept the risk (type exactly: YES, I RECOGNISE THE RISK DO AS I SAY)

  Your choice (1 / acceptance phrase):

Choosing option 2 and entering the phrase records the entry (with sql, reason, added_by, date) in sql_whitelist.json. Commit the updated file so your team can see and review the accepted risk.

Run in CI (non-interactive):

python3 scripts/check_sql_perf.py \
    --database my_database \
    --user root \
    --password "$DB_PASS" \
    --whitelist path/to/sql_whitelist.json \
    --ci

In CI mode (--ci, or whenever no TTY is attached) the script never prompts — it immediately exits with code 1 if any unwhitelisted full table scan is detected, blocking the build.

Exit codes:

Code	Meaning
`0`	All full table scans are on the whitelist (or none detected)
`1`	At least one unwhitelisted scan found, or the user refused to accept the risk

Whitelist file format:

[
  {
    "sql":      "SELECT * FROM agd_service_market_segment WHERE importer_id = ?",
    "reason":   "Reference table loaded fully into memory on startup. Tracked in JIRA-456.",
    "added_by": "alice",
    "date":     "2026-03-21"
  }
]

The sql field uses the parameterized form (with ? placeholders), so one whitelist entry covers every execution of the same query shape regardless of parameter values. See scripts/sql_whitelist.example.json for a full example.

Pipeline 2: DDL extraction for schema change tracking

Extract and deduplicate all DDL statements that ran during the test suite:

cut -f2 sql.log \
  | grep -iE '^\s*(ALTER|CREATE|DROP|TRUNCATE|RENAME)\b' \
  | sort -u \
  > schema-ddl.txt

Commit schema-ddl.txt to your repository. Any PR that adds or modifies a migration will produce a diff in this file, making schema changes visible in code review — particularly useful when migrations are applied in code rather than via a tool like Flyway or Liquibase.

To disable the listener:

sql-saver.enabled=false

Configuration

Property	Default	Description
`sql-saver.enabled`	`true`	Master switch
`sql-saver.log-file`	`sql.log`	Path to the log file. Relative paths are resolved against the Gradle root project directory (nearest ancestor containing `settings.gradle[.kts]`), so all submodules in a multi-module build write to the same file. Absolute paths are used as-is.

How it works

datasource-proxy wraps your DataSource and intercepts every query.
After each query, SqlSaverListenerImpl appends a tab-separated line to sql.log: the parameterized SQL (with ? placeholders) and the concrete SQL (with actual values substituted in). Temporal types (LocalDateTime, LocalDate, etc.) and java.sql.Timestamp are single-quoted so the concrete SQL is valid MySQL syntax.
All statement types are recorded: SELECT, INSERT, UPDATE, DELETE, DDL, etc.
Relative log file paths are resolved against the Gradle root project directory at startup via GradleRootResolver.

Contributing

Fork the repo and create a branch.
Follow TDD: write a failing test before implementing production code.
Run ./gradlew build and ensure all tests pass before opening a PR.

License

MIT

README.md

say-spring-sql-scanner

spring-sql-scanner

Quick start

Usage

Configuration

How it works

spring-sql-saver

Quick start

Usage

Pipeline 1: EXPLAIN analysis (DML only)

Pipeline 1b: CI performance gate with whitelist (check_sql_perf.py)

Pipeline 2: DDL extraction for schema change tracking

Configuration

How it works

Contributing

License

Pipeline 1b: CI performance gate with whitelist (`check_sql_perf.py`)