Bobbie Smulders Freelance Software Developer

Sudoku solving using pure SQL

A few years back, when I started using Java, I wrote an object-oriented Sudoku solver that uses a human like approach to solve the puzzles. I used the Singles and Hidden Singles algorithm that Angus Johnson used in his Simple Sudoku program. It was great way to get to know Java and understand the principles of object-oriented programming.

Now that I’ve been professionally using Microsoft SQL for a couple of years, I thought about using SQL to solve Sudoku puzzles. It would be a great challenge to test my SQL skill.

To make it a little harder on myself, the database (or should I call it a program?) should adhere to the following requirements:

  • Pure SQL only. No functions, stored procedures, triggers, cursors, loops, variables or modules. Using regular tables and views only
  • Very simplistic table and view structure
  • Everything has to be dynamic
  • Human approach to solving the Sudoku puzzle

For the impatient: A link to the database dump can be found at the end of this blog post. Please continue reading on to find out how I approached the challenge.

Tables

Lets start out with the tables. The first table is the source table. This table contains the original puzzle.

CREATE TABLE "source"
(
  "row" integer NOT NULL,
  "col" integer NOT NULL,
  "value" integer NOT NULL,
  CONSTRAINT source_pkey PRIMARY KEY ("row" , "col" )
)

The second table to be created is the solution table. This table is used to store all intermediate results and the final result.

CREATE TABLE "solution"
(
  "row" integer NOT NULL,
  "col" integer NOT NULL,
  "value" integer NOT NULL,
  CONSTRAINT solution_pkey PRIMARY KEY ("row" , "col" , "value" )
)

The last three tables are reference tables containing the rows, columns and the possible numbers for each field. I’ve inserted the values 1 to 9 in each of the tables, because the Sudoku puzzle I’m using has 9 rows, 9 columns and you can use the digits 1 to 9 in each field.

CREATE TABLE "reference_cols"
(
  "col" integer NOT NULL,
  CONSTRAINT cols_pkey PRIMARY KEY ("col" )
)

CREATE TABLE "reference_rows"
(
  "row" integer NOT NULL,
  CONSTRAINT rows_pkey PRIMARY KEY ("row" )
)

CREATE TABLE "reference_values"
(
  "value" integer NOT NULL,
  CONSTRAINT values_pkey PRIMARY KEY ("value" )
)

Views
With all the tables ready, it’s time to create a couple of views that help with solving the puzzle.

The first view to create returns the original Sudoku, but adds the candidates (values) 1 to 9 on each blank field. This is what a regular person would do when solving a Sudoku puzzle.

CREATE VIEW "candidates_from_source" AS 
	SELECT r."row", c."col", v."value"
	FROM reference_rows r
	JOIN reference_cols c ON 1 = 1
	JOIN reference_values v ON 1 = 1
	LEFT JOIN "source" s_solved 
		ON s_solved."row" = r."row" 
		AND s_solved."col" = c."col" 
		AND s_solved."value" = v."value"
	LEFT JOIN "source" s_unsolved 
		ON s_unsolved."row" = r."row" 
		AND s_unsolved."col" = c."col"
	WHERE s_unsolved."value" IS NULL 
		OR s_unsolved."value" IS NOT NULL 
		AND s_solved."value" IS NOT NULL
  	ORDER BY r."row", c."col", v."value";

Using this view, we can load the puzzle into the solution database:

TRUNCATE TABLE solution;
INSERT INTO solution
SELECT * FROM candidates_from_source

For the sake of convenience, I’ve added a view that displays the already solved fields from the solution table:

CREATE VIEW "solved_fields" AS 
	SELECT solution."row", 
		solution."col", 
		MIN(solution."value") AS "value"
	FROM solution
	GROUP BY solution."row", solution."col"
	HAVING count(*) = 1;

The Singles algorithm states that, for instance, if there is the number 2 on the first row and the third column, it is impossible to use the same number in the first row, the third column or the top-left box. By using this algorithm, we can delete these candidates based on the already solved fields.

There are four views to select the candidates to remove, one to scan the rows, one to scan the columns, one to scan the boxes and one to combine all three. Each view joins the solved_fields view to the solution table on the same row/column/box, the value and it excludes its own field.

CREATE VIEW "singles_by_column" AS 
	SELECT s."row", s."col", s."value"
	FROM solution s
	JOIN solved_fields sf 
		ON s."row" = sf."row" 
		AND s."col" != sf."col" 
		AND s."value" = sf."value";

CREATE VIEW "singles_by_row" AS 
	SELECT s."row", s."col", s."value"
	FROM solution s
	JOIN solved_fields sf 
		ON s."row" != sf."row" 
		AND s."col" = sf."col" 
		AND s."value" = sf."value";

CREATE VIEW "singles_by_box" AS 
	SELECT s."row", s."col", s."value"
	FROM solution s
	JOIN solved_fields sf 
	ON (ceil(s."row"::numeric / 3.0) * 10::numeric + ceil(s."col"::numeric / 3.0)) = (ceil(sf."row"::numeric / 3.0) * 10::numeric + ceil(sf."col"::numeric / 3.0)) 
	AND NOT (s."row" = sf."row" AND s."col" = sf."col") 
	AND s."value" = sf."value";

CREATE VIEW "singles_all" AS 
	SELECT * FROM singles_by_box
	UNION
	SELECT * FROM singles_by_column
	UNION
	SELECT * FROM singles_by_row

Now that we know which candidates are impossible, we can delete them from the intermediate solution. We do have to perform the delete query multiple times, because each time we remove candidates, we solve new fields. And each time we solve new fields, we can remove more candidates.

DELETE FROM solution s
USING singles_all sa
WHERE s."row" = sa."row"
AND s."col" = sa."col"
AND s."value" = sa."value";

Now that we have solved the puzzle, we can use the solution table to view the final result. To make everything a bit more human friendly, I’ve added the following view:

CREATE VIEW "show_solution" AS 
	SELECT SUM(CASE WHEN solved_fields."col" = 1 
		THEN solved_fields."value" ELSE NULL END) AS "1", 
	SUM(CASE WHEN solved_fields."col" = 2 
		THEN solved_fields."value" ELSE NULL END) AS "2", 
	SUM(CASE WHEN solved_fields."col" = 3 
		THEN solved_fields."value" ELSE NULL END) AS "3", 
	SUM(CASE WHEN solved_fields."col" = 4 
		THEN solved_fields."value" ELSE NULL END) AS "4", 
	SUM(CASE WHEN solved_fields."col" = 5 
		THEN solved_fields."value" ELSE NULL END) AS "5", 
	SUM(CASE WHEN solved_fields."col" = 6 
		THEN solved_fields."value" ELSE NULL END) AS "6", 
	SUM(CASE WHEN solved_fields."col" = 7 
		THEN solved_fields."value" ELSE NULL END) AS "7", 
	SUM(CASE WHEN solved_fields."col" = 8 
		THEN solved_fields."value" ELSE NULL END) AS "8", 
	SUM(CASE WHEN solved_fields."col" = 9 
		THEN solved_fields."value" ELSE NULL END) AS "9"
	FROM solved_fields
	GROUP BY solved_fields."row"
	ORDER BY solved_fields."row";

This was somewhat of a difficult decision. I didn’t want to use a proprietary pivot, so I used an aggregation. The problem is that the columns and rows are not dynamic. I haven’t adhered to my (own) requirements, but I guess I can let this one slip. Unless someone knows a solution that isn’t tied to one particular DBMS and doesn’t use dynamic SQL?

I can’t guarantee that any of the scripts above work, as I had to modify the PostgreSQL output to make them readable and to make them work in my syntax highlighter (apparently, using objects named “row”, “value” and “source” make my highlighter freak out).

Feel free to download the PostgreSQL database dump and mess about with it. I’ve added the “load_puzzle” and “delete_singles” functions to store the DELETE and INSERT queries I mentioned earlier.