postgresql - SQL duplicate elimination for tables with columns using UNION or INTERSECT in Postgres -


in postgres simplest possible query is

postgres=# select; -- (1 row) 

the result of query relation single tuple (row), i.e., empty tuple. relation {()}.

this semantically correct.

now assume have relation foo 1 attribute (value) , @ least 2 tuples (rows). example,

foo ---- value ------   1   2 

now issue following query

postgres=# select foo; -- (2 rows) 

this bag {(),()} , makes sense since projection on empty set of attributes , there 2 tuples in foo.

what want coerce bag set , obtain {()}. can done group follows:

postgres=# select foo group by(); -- (1 row) 

and correct well.

another possibility, in general, remove duplicates use union or intersect operators of sql because implicitly coerce bags sets.

so example, if r relation duplicates, query

(select * r) union (select * r) 

or query

(select * r) intersect (select * r) 

will trick.

now here observation (and question).

consider query

(select foo) union (select foo) 

one expect answer query relation {()}, i.e., relation in duplicate empty tuples removed. however, when query in postgresql not answer.

postgres=# (select foo) union (select foo); -- (4 rows) 

so, appears in case union operator behaves union all operator.

postgres=# (select foo) union (select foo); -- (4 rows) 

but stranger happens intersect operator. consider query

postgres=# (select foo) intersect (select foo); -- (4 rows) 

so again 4 empty tuples expect 1 (or @ 2).

the same happens intersect all operator

postgres=# (select foo) intersect (select foo); -- (4 rows) 

conclusion: appear there bug in postgresql when comes down union, intersect , intersect all operators concerns table no attributes.

remark:
1 may wonder why 1 consider such relations. useful because relation 0 columns can used mechanism in query processing reason true , false

the relation {()} stands true , relation {} stands false.

the fact both union , intersect return 4 rows seems self-contradictory – either empty tuples equal (in case union should have fewer elements) or empty tuples not equal (in case intersect should have fewer elements).

and intersect having more result rows either input relation insane.

you should take pgsql-hackers list or report bug.

but have doubt statement these questions of practical value:

  1. empty select lists not standard compliant. introduced implementation artifact in postgresql. keeps using (sql standard compliant) variant select 1 ...?

  2. how can interpret result of 1 empty tuple true , empty result false? meaning of result 2 empty tuple? union , intersect mean in context?


Comments