Postgres SELECT MAX slow

Discussion:

Jason Ngo

2004-09-03 09:48:47 UTC

I've searched newsgroups about a work-around in postgres to speed up a
query like:

SELECT MAX(date) FROM sales;
to
SELECT date FROM sales ORDER by date DESC LIMIT 1;

But it still takes longer to execute than a max statement in mysql. Is
there any other way to rewrite the statement to match mysql's speed? The
column is already indexed.
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Orlando Andico

2004-09-03 09:58:31 UTC

Permalink

postgres default configuration is very memory-starved.
check your pg_hba.conf there are some memory parameters there. most
important is the sort buffer size.

Post by Jason Ngo
I've searched newsgroups about a work-around in postgres to speed up a
SELECT MAX(date) FROM sales;
to
SELECT date FROM sales ORDER by date DESC LIMIT 1;
But it still takes longer to execute than a max statement in mysql. Is
there any other way to rewrite the statement to match mysql's speed? The
column is already indexed.
--
Philippine Linux Users' Group (PLUG) Mailing List
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Bopolissimus Platypus Jr

2004-09-03 10:12:01 UTC

Permalink

Post by Jason Ngo
I've searched newsgroups about a work-around in
SELECT MAX(date) FROM sales;
to
SELECT date FROM sales ORDER by date DESC LIMIT 1;
But it still takes longer to execute than a max statement
in mysql. Is there any other way to rewrite the statement
to match mysql's speed? The column is already indexed.

so there's already an index like:

create index sales_date on sales(date)?

i have a similar table (except the index is an integer), and i
find that:

explain select max(i) from test;
QUERY PLAN
--------------------------------------------------------------------
Aggregate (cost=1344.09..1344.09 rows=1 width=4)
-> Seq Scan on test (cost=0.00..1245.47 rows=39447 width=4)

so max(i) does a sequential scan, even though there's an index on i.

on the other hand,

explain select i from test order by i desc limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Limit (cost=0.00..0.10 rows=1 width=4)
-> Index Scan Backward using test_i on test (cost=0.00..3926.92
rows=39447 width=4)

so the thing to do is avoid the max, probably.

it seems that postgresql just doesn't use an index when doing a
max(integer). it might be the same with max(date). when i try:

set enable_seqscan=off;
and then do the explains, it still does a seqscan on the table when i do max(i).

sometimes, postgres thinks that a seq scan is going to be faster than
an index scan, so even if there's already an index, it'll still use
the seq scan. to tell it to use the index if an index is available,
you do: set enable_seqscan=off;

you should be careful to turn that on again though, after the query.
this is because usually the optimizer will do the right thing. you
only do set enable_seqscan=off when you know that the optimizer is
doing the wrong thing.

tiger
--
Gerald Timothy Quimpo http://bopolissimus.sni.ph
gquimpo-***@public.gmane.org bopolissimus-***@public.gmane.org tiger-dF0/z2Fx1RPXfrYAhh+***@public.gmane.org
Public Key: "gpg --keyserver pgp.mit.edu --recv-keys 672F4C78"
Mene sakhet ur-seveh
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Charlton Lopez

2004-09-03 13:31:41 UTC

Permalink

Hi!

A probable solution is to create a function-based
index. E.g.

CREATE INDEX sales_max_date_idx
ON sales(MAX(date));

This would create an index using the btree method.

To create an index that uses other methods:

CREATE INDEX sales_max_date_idx USING hash
ON sales(MAX(date));

CREATE INDEX sales_max_date_idx USING rtree
ON sales(MAX(date));

CREATE INDEX sales_max_date_idx USING gist
ON sales(MAX(date));

I leave it up to you to test which index method will
give you the best performance.

Charlie Lopez
Favorite Databases: Oracle Database and PostgreSQL
:-)

--- Bopolissimus Platypus Jr

On Fri, 03 Sep 2004 17:48:47 +0800, Jason Ngo

Post by Jason Ngo
I've searched newsgroups about a work-around in
SELECT MAX(date) FROM sales;
to
SELECT date FROM sales ORDER by date DESC LIMIT 1;
But it still takes longer to execute than a max

statement

Post by Jason Ngo
in mysql. Is there any other way to rewrite the

statement

Post by Jason Ngo
to match mysql's speed? The column is already

indexed.
create index sales_date on sales(date)?
i have a similar table (except the index is an
integer), and i
explain select max(i) from test;
QUERY PLAN

--------------------------------------------------------------------

Aggregate (cost=1344.09..1344.09 rows=1 width=4)
-> Seq Scan on test (cost=0.00..1245.47
rows=39447 width=4)
so max(i) does a sequential scan, even though
there's an index on i.
on the other hand,
explain select i from test order by i desc limit 1;
QUERY PLAN

-----------------------------------------------------------------------------------------------

Limit (cost=0.00..0.10 rows=1 width=4)
-> Index Scan Backward using test_i on test
(cost=0.00..3926.92
rows=39447 width=4)
so the thing to do is avoid the max, probably.
it seems that postgresql just doesn't use an index
when doing a
max(integer). it might be the same with max(date).
set enable_seqscan=off;
and then do the explains, it still does a seqscan on
the table when i do max(i).
sometimes, postgres thinks that a seq scan is going
to be faster than
an index scan, so even if there's already an index,
it'll still use
the seq scan. to tell it to use the index if an
index is available,
you do: set enable_seqscan=off;
you should be careful to turn that on again though,
after the query.
this is because usually the optimizer will do the
right thing. you
only do set enable_seqscan=off when you know that
the optimizer is
doing the wrong thing.
tiger
--
Gerald Timothy Quimpo http://bopolissimus.sni.ph
Public Key: "gpg --keyserver pgp.mit.edu --recv-keys
672F4C78"
Mene sakhet ur-seveh
--
Philippine Linux Users' Group (PLUG) Mailing List
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to
http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go
to

http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie
________________________________________________________________________
Yahoo! Messenger - Communicate instantly..."Ping"
your friends today! Download Messenger Now
http://uk.messenger.yahoo.com/download/index.html
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Sherwin Daganato

2004-09-03 14:41:44 UTC

Permalink

Have you tried VACUUM-ing the table?
e.g
VACUUM VERBOSE ANALYZE;

Do that first. If that doesn't help, show us the EXPLAIN of those
queries and we'll see what else we can do.
--
$_=q:; # SHERWIN #
70;72;69;6e;74;20;
27;4a;75;73;74;20;
61;6e;6f;74;68;65;
72;20;50;65;72;6c;
20;6e;6f;76;69;63;
65;27;:;;s=~?(..);
?=pack q$C$,hex$1;
;;;=egg;;;;eval;;;
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Jason Ngo

2004-09-09 08:09:05 UTC

Permalink

Post by Sherwin Daganato
Have you tried VACUUM-ing the table?
e.g
VACUUM VERBOSE ANALYZE;
Do that first. If that doesn't help, show us the EXPLAIN of those
queries and we'll see what else we can do.

Did that. Results of the explain statements:

explain select max(trandate) from possales;

Aggregate (cost=52967.84..52967.84 rows=1 width=4)
-> Seq Scan on possales (cost=0.00..49684.27 rows=1313427 width=4)

explain select trandate from possales order by trandate desc limit 1;

Limit (cost=0.00..3.92 rows=1 width=4)
-> Index Scan Backward using possales_trandate on possales
(cost=0.00..5148972.33 rows=1313427 width=4)

Anyway, I saw from the docs that it really behaves this way. I guess
we'll have to rewrite all those SQL statements.
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

Sherwin Daganato

2004-09-09 18:53:16 UTC

Permalink

Post by Jason Ngo
explain select max(trandate) from possales;
Aggregate (cost=52967.84..52967.84 rows=1 width=4)
-> Seq Scan on possales (cost=0.00..49684.27 rows=1313427 width=4)
explain select trandate from possales order by trandate desc limit 1;
Limit (cost=0.00..3.92 rows=1 width=4)
-> Index Scan Backward using possales_trandate on possales
(cost=0.00..5148972.33 rows=1313427 width=4)
Anyway, I saw from the docs that it really behaves this way. I guess
we'll have to rewrite all those SQL statements.

Yes but it would also be nice to tweak the postgresql configuration.
A similar table (t2) here -- with the same number of tuples (1,313,427)
and an index on a date attribute (f2) -- yields better numbers. OTOH it
could be just that I ran this test on a faster machine.

explain analyze select f2 from t2 order by f2 desc limit 1;

Limit (cost=0.00..0.02 rows=1 width=4) (actual time=0.03..0.03 rows=1
loops=1)
-> Index Scan Backward using t2_f2_idx on t2 (cost=0.00..30196.95
rows=1313427 width=4) (actual time=0.02..0.03 rows=2 loops=1)
Total runtime: 0.07 msec

explain analyze select max(f2) from t2;

Aggregate (cost=22856.84..22856.84 rows=1 width=4) (actual
time=2817.62..2817.62 rows=1 loops=1)
-> Seq Scan on t2 (cost=0.00..19573.27 rows=1313427 width=4)
(actual time=0.06..1830.67 rows=1313427 loops=1)
Total runtime: 2817.71 msec
--
$_=q:; # SHERWIN #
70;72;69;6e;74;20;
27;4a;75;73;74;20;
61;6e;6f;74;68;65;
72;20;50;65;72;6c;
20;6e;6f;76;69;63;
65;27;:;;s=~?(..);
?=pack q$C$,hex$1;
;;;=egg;;;;eval;;;
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie

eric pareja

2004-09-10 03:56:32 UTC

Permalink

Post by Sherwin Daganato
Yes but it would also be nice to tweak the postgresql configuration.
A similar table (t2) here -- with the same number of tuples (1,313,427)
and an index on a date attribute (f2) -- yields better numbers. OTOH it
could be just that I ran this test on a faster machine.

may i point out that you guys might also have different versions of
postgresql running?
--
___ _____________________________ eric pareja (xenos AT maharlika.upm.edu.ph)
\e/ [ Philippine Linux Users' Group http://plug.linux.org.ph ] [PSHS'85]
_v_ [ Linux User #8159 http://counter.li.org ] [Debian] [Python] [ /. #2834 ]
#27 [Ubiquitous Computing] [http://www.bookcrossing.com/referral/pusakat]
Join us on IRC: #plug irc.free.net.ph:6667 | http://www.upm.edu.ph/~xenos
--
Philippine Linux Users' Group (PLUG) Mailing List
plug-sJCYDywXB4m9hMQLrSLElQC/***@public.gmane.org (#PLUG @ irc.free.net.ph)
Official Website: http://plug.linux.org.ph
Searchable Archives: http://marc.free.net.ph
.
To leave, go to http://lists.q-linux.com/mailman/listinfo/plug
.
Are you a Linux newbie? To join the newbie list, go to
http://lists.q-linux.com/mailman/listinfo/ph-linux-newbie