sql - Improving Subquery performance in Postgres -

- February 15, 2012

i have these 2 tables in database

  student table                   student semester table | column     : type     |       | column     : type     | |------------|----------|       |------------|----------| | student_id : integer  |       | student_id : integer  |       | satquan    : smallint |       | semester   : integer  | | actcomp    : smallint |       | enrolled   : boolean  |  | entryyear  : smallint |       | major      : text     | |-----------------------|       | college    : text     |                                 |-----------------------|

where student_id unique key in student table, , foreign key in student semester table. semester integer 1 first semester, 2 second, , on.

i'm doing queries want students entryyear (and sat and/or act scores), of students associated data student semester table.

currently, queries this:

select * student_semester student_id in(     select student_id student_semester     student_id in(         select student_id student entryyear = 2006     ) , college = 'as' , ... ) order student_id, semester;

but, results in relatively long running queries (400ms) when selecting ~1k students. according execution plan, of time spent doing hash join. ameliorate this, have added satquan, actpcomp, , entryyear columns student_semester table. reduces time run query ~90%, results in lot of redundant data. there better way this?

these indexes have (along implicit indexes on student_id):

create index act_sat_entryyear on student using btree (entryyear, actcomp, sattotal) create index student_id_major_college on student_semester using btree (student_id, major, college)

query plan

query plan hash join  (cost=17311.74..35895.38 rows=81896 width=65) (actual time=121.097..326.934 rows=25680 loops=1)   hash cond: (public.student_semester.student_id = public.student_semester.student_id)   ->  seq scan on student_semester  (cost=0.00..14307.20 rows=698820 width=65) (actual time=0.015..154.582 rows=698820 loops=1)   ->  hash  (cost=17284.89..17284.89 rows=2148 width=8) (actual time=121.062..121.062 rows=1284 loops=1)         buckets: 1024  batches: 1  memory usage: 51kb         ->  hashaggregate  (cost=17263.41..17284.89 rows=2148 width=8) (actual time=120.708..120.871 rows=1284 loops=1)               ->  hash semi join  (cost=1026.68..17254.10 rows=3724 width=8) (actual time=4.828..119.619 rows=6184 loops=1)                     hash cond: (public.student_semester.student_id = student.student_id)                     ->  seq scan on student_semester  (cost=0.00..16054.25 rows=42908 width=4) (actual time=0.013..109.873 rows=42331 loops=1)                           filter: ((college)::text = 'as'::text)                     ->  hash  (cost=988.73..988.73 rows=3036 width=4) (actual time=4.801..4.801 rows=3026 loops=1)                           buckets: 1024  batches: 1  memory usage: 107kb                           ->  bitmap heap scan on student  (cost=71.78..988.73 rows=3036 width=4) (actual time=0.406..3.223 rows=3026 loops=1)                                 recheck cond: (entryyear = 2006)                                 ->  bitmap index scan on student_act_sat_entryyear_index  (cost=0.00..71.03 rows=3036 width=0) (actual time=0.377..0.377 rows=3026 loops=1)                                       index cond: (entryyear = 2006) total runtime: 327.708 ms

i mistaken there not being seq scan in query. think seq scan being done due number of rows match college condition; when change 1 has less students index used. source: https://stackoverflow.com/a/5203827/880928

query entryyear column included student semester table

select * student_semester student_id in(     select student_id student_semester     entryyear = 2006 , collgs = 'as' ) order student_id, semester;

query plan

sort  (cost=18597.13..18800.49 rows=81343 width=65) (actual time=72.946..74.003 rows=25680 loops=1)   sort key: public.student_semester.student_id, public.student_semester.semester   sort method: quicksort  memory: 3546kb   ->  nested loop  (cost=9843.87..11962.91 rows=81343 width=65) (actual time=24.617..40.751 rows=25680 loops=1)         ->  hashaggregate  (cost=9843.87..9845.73 rows=186 width=4) (actual time=24.590..24.836 rows=1284 loops=1)               ->  bitmap heap scan on student_semester  (cost=1612.75..9834.63 rows=3696 width=4) (actual time=10.401..23.637 rows=6184 loops=1)                     recheck cond: (entryyear = 2006)                     filter: ((collgs)::text = 'as'::text)                     ->  bitmap index scan on entryyear_act_sat_semester_enrolled_cumdeg_index  (cost=0.00..1611.82 rows=60192 width=0) (actual time=10.259..10.259 rows=60520 loops=1)                           index cond: (entryyear = 2006)         ->  index scan using student_id_index on student_semester  (cost=0.00..11.13 rows=20 width=65) (actual time=0.003..0.010 rows=20 loops=1284)               index cond: (student_id = public.student_semester.student_id) total runtime: 74.938 ms

the clean version of query is

select ss.*     student s     inner join     student_semester ss using(student_id)     s.entryyear = 2006     , exists (         select 1         student_semester                     college = 'as'             , student_id = s.student_id     ) order ss.student_id, semester

Search This Blog

Permission

sql - Improving Subquery performance in Postgres -

Comments

Post a Comment

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -