Updates to last post about Postgres not using your indexes
So I was eavesdropping on the dba and a database developer chatting about analyzing a query in Oracle while at work and the dd mentioned he was forcing the query to use the index for exploratory purposes and I was like “You can do that??” So I looked it up and Postgres has a seemingly easier way to do this from what it sounded like in Oracle.
In Postgres you simply add:
set enable_seq_scan = off;
via this post in SO
This showed that an index scan was much slower than the sequential scan.
Also, on Twitter someone pointed me in the direction of the why behind Postgres decisions/plans on when to use index vs sequential scans:
Why is your dbms not using your index?
So recently I was trying to gain some knowledge about indexes using Postgres, going beyond the tutorials I’ve done and so I attempted to make some queries faster using b-tree indexes on fish count data.
However, before getting into any complex queries I tried out a basic one:
select * from chinook where year > ‘2000’;
When I used explain analyze on this query it gave me:
Seq Scan on yearly_chinook_sum_view (cost=0.00..1.15 rows=4 width=40) (actual time=12.559..12.584 rows=12 loops=1)
Filter: ((year)::text > ‘2000’::text)
Total runtime: 12.669 ms
So why is it still doing a sequential scan?
First I looked at my table to make sure I really had added a b-tree index on the table named chinook using \d chinook which confirmed that yes, I have a b-tree index on the year column.
Then I started searching for the issue and found these articles:
There were a few possible reasons why my query wasn’t using the index, but the one that seemed most likely to me was that it wasn’t efficient due to table size.
Next step, how big is the Chinook table?
So from Wiki I used this join to find my total disk usage including indexes:
SELECT nspname || '.' || relname AS "relation",
pg_size_pretty(pg_total_relation_size(C.oid)) AS "total_size"
FROM pg_class C
LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
WHERE nspname NOT IN ('pg_catalog', 'information_schema')
AND C.relkind <> 'i'
AND nspname !~ '^pg_toast'
ORDER BY pg_total_relation_size(C.oid) DESC
And for Chinook it gave me:
public.chinook | 160 kB
So this is fairly small, but I don’t really feel like doing tests to see where the threshold is. What I did test were queries that would return one row versus a larger percentage of rows.
kenai_fishruns=# explain analyze select * from chinook where year = ‘2009’;
Index Scan using chinook_year_index on chinook (cost=0.28..9.67 rows=48 width=44) (actual time=30.929..31.151 rows=48 loops=1)
Index Cond: ((year)::text = ‘2009’::text)
Total runtime: 31.244 ms
kenai_fishruns=# explain analyze select * from chinook where year > ‘2009’;
Seq Scan on chinook (cost=0.00..13.45 rows=203 width=44) (actual time=0.033..1.258 rows=203 loops=1)
Filter: ((year)::text > ‘2009’::text)
Rows Removed by Filter: 393
Total runtime: 1.350 ms
So this started pointing me in the right direction, I figured it had to do with the amount of rows returned versus some error in how I indexed my tables, so I googled and stackoverflow came to the rescue again:
Basically the answer to this post says that if a query returns approximately more than 5-10% (roughly) of the total rows, then it will use a sequential scan versus an index scan.
This makes sense with my little test between the queries returning the one row versus the many rows it would be returning for the second query. Unfortunately this stackoverflow post didn’t cite where to find this info in the docs, but that’s research for another day. At least now I can rest easy knowing the general why behind Postgres choosing to use a sequential scan versus an index scan.
The Durable Document Store You Didn't Know You Had, But Did
How to back up a Heroku production database to staging
As it turns out, PostgreSQL has a number of ways of storing loosely structured data — documents — in a column on a table.
hstore is a data type available as a contrib package that allows you to store key/value structures just like a dictionary or hash.
- You can store data in JSON format on…
It’s right there in the docs but I didn’t notice it until recently:
heroku pgbackups:restore DATABASE `heroku pgbackups:url --remote production` --remote staging
Boom! It transfers the production Postgres database to staging.
It’s much faster than
db:push, which is what I…
View postgres.conf local modifications
To view your session specific local modifications when tuning your Postgres database & ’\x on’ prints out your query in a readable format:
select * from pg_settings
Homebrew should create a data folder or at least a pg_log folder
So I was following some advice from one of the tutorials I took at PyCon2014 called Postgres for Python People and since it was more of a list of stuff you should implement when using Postgres rather than a hands on tutorial, I thought I’d better get to it a month later. I have Postgres installed via Hombrew on my Mac and realized there was no folder for my log files, it’s just plopped in under the Postgres folder: /usr/local/var/postgres/server.log
So I created a pg_log folder under /usr/local/var/postgres/
then changed my homebrew.mxcl.postgresql.plist (which is a simple xml file) to point to the pg_log folder:
Then I shut down the Postgres server via launchctl:
launchctl unload ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
Then restarted it again via launchctl:
launchctl load ~/Library/LaunchAgents/homebrew.mxcl.postgresql.plist
And now your server.log files are under pg_log folder!
Find out what those flags in grep mean
If you don’t want to sort through a long blog post on grep flags and simply want a more interactive solution, you won’t get any help by typing (bad pun intended):
or any other combo of help in the command line. So, I recently learned that you use:
and that will give you all the dirty on grep in a very readable format. SO next time you’re typing in something like:
ps auxw | grep postgres | grep — -D
and want to know what the heck the -D flag is for just man grep and you’ll find out this: -D action, —devices=action
Specify the demanded action for devices, FIFOs and sockets. The default action is `read’, which means, that they are read as if they were normal files. If the action is set to `skip’, devices will be silently skipped.