No description
  • Kotlin 52.8%
  • Python 47.2%
Find a file
2026-03-21 15:27:13 +01:00
gradle/wrapper chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps 2026-03-20 16:05:33 +01:00
scripts add script for testing sql performance in pipeline 2026-03-21 13:39:46 +01:00
spring-sql-saver change artifact name 2026-03-21 15:27:13 +01:00
spring-sql-scanner fix(spring-sql-scanner): add Jackson to testImplementation so unit tests can resolve ObjectMapper at runtime 2026-03-20 23:25:42 +01:00
.gitignore proof of concept python tool to analyze DML dump 2026-03-21 13:23:43 +01:00
.mise.toml chore: upgrade to Kotlin 2.3.20, Java 25, Spring Boot 4.0.4, datasource-proxy 2.0.0 2026-03-20 17:27:12 +01:00
gradlew chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps 2026-03-20 16:05:33 +01:00
gradlew.bat chore: scaffold Gradle project with Spring Boot 3.5 + datasource-proxy deps 2026-03-20 16:05:33 +01:00
README.md add script for testing sql performance in pipeline 2026-03-21 13:39:46 +01:00
settings.gradle.kts feat(spring-sql-saver): add subproject scaffold, listener interface, properties, and SQL file writer 2026-03-20 22:31:55 +01:00

say-spring-sql-scanner

A collection of Spring Boot auto-configuration libraries that hook into datasource-proxy to observe SQL at runtime. Designed for use in E2E/integration test suites.

Library What it does
spring-sql-scanner Detects full table scans via EXPLAIN FORMAT=JSON; throws or warns
spring-sql-saver Appends every executed SQL statement as tab-separated parameterized TAB concrete to ./sql.log

spring-sql-scanner

Quick start

1. Publish to your local Maven repository

./gradlew :spring-sql-scanner:publishToMavenLocal

2. Add mavenLocal() to your consuming project

repositories {
    mavenLocal()
    mavenCentral()
}

3. Add the dependency

// build.gradle.kts
testImplementation("io.github.sqlscanner:spring-sql-scanner:0.1.0-SNAPSHOT")
<!-- pom.xml -->
<dependency>
  <groupId>io.github.sqlscanner</groupId>
  <artifactId>spring-sql-scanner</artifactId>
  <version>0.1.0-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

Usage

The library auto-configures itself. Add to your test application.properties:

full-table-scan.enabled=true
full-table-scan.fail-on-detection=true

Any SELECT that MySQL's query planner resolves with a full table scan will throw a FullTableScanException at query execution time, failing the test.

@SpringBootTest
class MyServiceTest {

    @Autowired
    lateinit var myService: MyService

    @Test
    fun `querying without an index throws FullTableScanException`() {
        assertThatThrownBy { myService.findByUnindexedColumn("value") }
            .isInstanceOf(FullTableScanException::class.java)
    }
}

Configuration

Property Default Description
full-table-scan.enabled true Master switch
full-table-scan.fail-on-detection true Throw FullTableScanException on detection; when false logs a warning instead

How it works

  1. datasource-proxy wraps your DataSource and intercepts every query.
  2. After each SELECT, FullTableScanListenerImpl runs EXPLAIN FORMAT=JSON <query> on the same connection.
  3. ExplainResultParser walks the JSON and checks every query block for access_type: "ALL".
  4. If detected and fail-on-detection=true, a FullTableScanException is thrown.

spring-sql-saver

Quick start

1. Publish to your local Maven repository

./gradlew :spring-sql-saver:publishToMavenLocal

2. Add mavenLocal() to your consuming project

repositories {
    mavenLocal()
    mavenCentral()
}

3. Add the dependency

// build.gradle.kts
testImplementation("io.github.sqlscanner:spring-sql-saver:0.1.0-SNAPSHOT")
<!-- pom.xml -->
<dependency>
  <groupId>io.github.sqlscanner</groupId>
  <artifactId>spring-sql-saver</artifactId>
  <version>0.1.0-SNAPSHOT</version>
  <scope>test</scope>
</dependency>

Usage

The library auto-configures itself with no additional properties required. After your test suite runs, sql.log at the Gradle root will contain one line per executed statement. Each line is tab-separated: the left column has the parameterized query (for deduplication), the right column has the same query with concrete parameter values substituted in (for running EXPLAIN):

SELECT * FROM users WHERE id = ?	SELECT * FROM users WHERE id = 42
INSERT INTO orders (user_id, total) VALUES (?, ?)	INSERT INTO orders (user_id, total) VALUES (3, 99.95)
SELECT * FROM users WHERE id = ?	SELECT * FROM users WHERE id = 99

Pipeline 1: EXPLAIN analysis (DML only)

Deduplicate on the parameterized column, filter to DML statements that EXPLAIN supports, and pipe one representative concrete statement per unique query shape into MySQL:

sort -t$'\t' -k1,1 -u sql.log \
  | cut -f2 \
  | grep -iE '^(SELECT|INSERT|UPDATE|DELETE)\b' \
  | while read -r sql; do
      mysql -h 127.0.0.1 -u root -D my_database -e "EXPLAIN $sql"
    done

Use -h 127.0.0.1 instead of -h localhost to force a TCP connection and avoid Can't connect to local MySQL server through socket errors when connecting to a Docker-hosted database.

For structured reporting, use the included Python script instead:

pip install mysql-connector-python
python3 scripts/explain_parser.py \
    --database my_database \
    --user root \
    --password secret \
    --min-rows 100

The script deduplicates sql.log, runs EXPLAIN FORMAT=JSON for each unique DML shape, and prints a table of full table scans sorted by estimated row count. Pass --output json for machine-readable output. Run python3 scripts/explain_parser.py --help for all options.

Pipeline 1b: CI performance gate with whitelist (check_sql_perf.py)

For use in a test pipeline where you want the build to fail on new full table scans. The script deduplicates sql.log, runs EXPLAIN FORMAT=JSON for each unique DML shape, and checks every detected scan against a shared JSON whitelist file committed to your repo.

scripts/
  check_sql_perf.py          # the gate script
  sql_whitelist.json         # committed to your consuming project's repo
  sql_whitelist.example.json # copy this to get started

Install dependencies (once):

pip install mysql-connector-python rich   # rich is optional but recommended

Run interactively (developer workstation):

python3 scripts/check_sql_perf.py \
    --database my_database \
    --user root \
    --password secret \
    --whitelist path/to/sql_whitelist.json

When a new full table scan is found that is not on the whitelist, the script prints a risk warning and prompts:

  Options:
  1 → Fix the query (exit now with code 1)
  2 → Accept the risk (type exactly: YES, I RECOGNISE THE RISK DO AS I SAY)

  Your choice (1 / acceptance phrase):

Choosing option 2 and entering the phrase records the entry (with sql, reason, added_by, date) in sql_whitelist.json. Commit the updated file so your team can see and review the accepted risk.

Run in CI (non-interactive):

python3 scripts/check_sql_perf.py \
    --database my_database \
    --user root \
    --password "$DB_PASS" \
    --whitelist path/to/sql_whitelist.json \
    --ci

In CI mode (--ci, or whenever no TTY is attached) the script never prompts — it immediately exits with code 1 if any unwhitelisted full table scan is detected, blocking the build.

Exit codes:

Code Meaning
0 All full table scans are on the whitelist (or none detected)
1 At least one unwhitelisted scan found, or the user refused to accept the risk

Whitelist file format:

[
  {
    "sql":      "SELECT * FROM agd_service_market_segment WHERE importer_id = ?",
    "reason":   "Reference table loaded fully into memory on startup. Tracked in JIRA-456.",
    "added_by": "alice",
    "date":     "2026-03-21"
  }
]

The sql field uses the parameterized form (with ? placeholders), so one whitelist entry covers every execution of the same query shape regardless of parameter values. See scripts/sql_whitelist.example.json for a full example.


Pipeline 2: DDL extraction for schema change tracking

Extract and deduplicate all DDL statements that ran during the test suite:

cut -f2 sql.log \
  | grep -iE '^\s*(ALTER|CREATE|DROP|TRUNCATE|RENAME)\b' \
  | sort -u \
  > schema-ddl.txt

Commit schema-ddl.txt to your repository. Any PR that adds or modifies a migration will produce a diff in this file, making schema changes visible in code review — particularly useful when migrations are applied in code rather than via a tool like Flyway or Liquibase.

To disable the listener:

sql-saver.enabled=false

Configuration

Property Default Description
sql-saver.enabled true Master switch
sql-saver.log-file sql.log Path to the log file. Relative paths are resolved against the Gradle root project directory (nearest ancestor containing settings.gradle[.kts]), so all submodules in a multi-module build write to the same file. Absolute paths are used as-is.

How it works

  1. datasource-proxy wraps your DataSource and intercepts every query.
  2. After each query, SqlSaverListenerImpl appends a tab-separated line to sql.log: the parameterized SQL (with ? placeholders) and the concrete SQL (with actual values substituted in). Temporal types (LocalDateTime, LocalDate, etc.) and java.sql.Timestamp are single-quoted so the concrete SQL is valid MySQL syntax.
  3. All statement types are recorded: SELECT, INSERT, UPDATE, DELETE, DDL, etc.
  4. Relative log file paths are resolved against the Gradle root project directory at startup via GradleRootResolver.

Contributing

  1. Fork the repo and create a branch.
  2. Follow TDD: write a failing test before implementing production code.
  3. Run ./gradlew build and ensure all tests pass before opening a PR.

License

MIT