Blog

Security Considerations While Using ssh-agent.

Recently we got contacted by a customer, and started the typical remote access setup: ask them to open ssh port for our bastion server, create a new user for us, and add the corresponding ssh public key. Well, they promptly responded, and we tested access, got into their bastion server and then tried to get to one of the broken servers but, it was asking for a password. We just asked the customer for the password, and got a reply stating that they had assumed we were using agent forwarding.

The first instinct of the engineer handling the ticket was: well, lets just start ssh-agent, load the key, and forward it! Of course, after a brief discussion, it became clear ...

Read More

Managing pg_hba.conf With Ansible

pg_hba.conf is perhaps one of the easiest to understand configuration files in PostgreSQL. Its syntax is straightforward, the concept seems to resemble that of any popular IP filter or ACL mechanism in various software packages. pg_hba.conf is also well documented, like the rest of PostgreSQL, and we love it because it lets us do what we want without getting in our way. What else could we possibly ask of it?

Perhaps, it can be a bit of a nuisance when for you pg_hba.conf means not just one file but many. You may have a dozen read-only standbys and your infrastructure is expanding -- a good sign that you're probably all the rage on the market -- so at ...

Read More

PgConf.eu is over, it was a blast but I am curious about the future

First let me say that I attended pgConf.eu like I attend every conference (that I am not running). I show up for a few hours on the first day, then I come back and attend my talk. I don't take travel lightly and as much as I bromance my fellow elephant bretheren, I want to explore the sights and this was freaking Ireland people.

I had an odd feeling for the time I was there. The community was in full force, there was at least 240 people there and that was great. It was the commerce side, the sponsor side, the **money** side that was lacking. EnterpriseDB, Cybertec and 2ndQuadrant were there with booths but I wonder if ...

Read More

5 Things a Non-Geek Girl Learned from Playing with Geeks at CMD

When I began at Command Prompt, Java was coffee, Python was a snake, and a Ruby was best used on the color of glittery slippers. If you would have asked me two and a half years ago what "PostgreSQL" does, I would have asked you what language you were speaking.

A year later, I took my first sales call with out Joshua Drake (jd, @linuxhiker). I was shaking in my boots and it was inevitable that I was going to be sick. Then something happened as soon as I heard the customer say, "Hello".

I understood what the customer needed and most importantly, I knew we could do it. I was able to say confidently and with out doubt, "Yes ...

Read More

A pg_basebackup Wish List

pg_basebackup was introduced in Postgres 9.1 as a simple way to copy the data directory of a running database cluster. This was a great addition to a small group of PostgreSQL Client Applications.

The pg_basebackup approach differs from the standard pg_start_backup(), rsync (or other file system copy), pg_stop_backup() approach in that it uses the replication protocol over a standard Postgres connection to make the base backup. A few highlights:

  • A completely standalone backup can be created easily by using the --xlog-method argument (the stream option here is particularly nice so that you don’t have to muck with the wal_keep_segments Postgres setting).
  • Thanks to the cascading replication machinery in 9.2 pg_basebackup can take a backup from an active hot ...

Read More

Just back from NYCPug August, on to more talks

In August I spoke at NYCPUG on Dumb Simple PostgreSQL Performance. The talk was well accepted and there was about 60 people in attendance. I have always enjoyed my trips to NYC but this is the first time I have taken a leisurely look at the city. I found myself enjoying a water front walk from 42nd, through the Highline, to Battery Park, all the way to the Brooklyn Bridge and over to Brooklyn to a great pub for dinner. What I enjoyed most about the walk outside of the 10 miles was the community that was present. I think it is easy to get jaded by "midtown" and all that is the tourist in that area. The hustle and ...

Read More

Compiling and installing OpenSRF 2.2 on Centos 5.9

We do quite a bit of work for King County Library systems. The library system has 45 branches and runs the Open Source Evergreen ILS. One of the very smart things that the Evergreen project decided was that their database of choice would be PostgreSQL. One of the things that the Evergreen project is not good at is supporting LTS releases of Linux and therefore certain things can be a chore. For example, by default OpenSRF 2.2 which is the current stable OpenSRF release can not be installed via RPM or compiled from source by default on CentOS 5.9.

When discussing with the community about CentOS, the response was the classic responses of, "just upgrade", "move to Fedora ...

Read More

Calling Bullsh*t in Open Source communities

We are all human. We all lose our temper. We all have our moments of, "I really wish I could take that back". Of course not if you are not Linus Torvalds. Now everyone knows that Linus has a temper, that he is a foul mouth, lacks certain social graces and is generally one of the, if not the most important developers to surface in the last 20 years. Does that mean he gets to be a jerk? In his mind, yes.

In some ways I agree with Linus. If you are a donkey butt and you don't pay attention to your community and follow its guidelines, then just leave. We don't have time for you anyway. We ...

Read More

postgres_fdw for 9.2

We have backported the postgres_fdw to 9.2. It is read only of course as the infrastructure for writes is not in 9.2 but it is usable. Enjoy it!

  • Postgres-FDW
  • Read More

    The steaming pile that is Precise with kernel 3.2

    I don't know if it is a mainline kernel problem but I can tell you that on Ubuntu Precise, Linux kernel 3.2 is a disaster for PostgreSQL. I am not even going to go into a huge rant about it. I am just posting the numbers. See for yourself. There should be a public service announcement about it.

    before upgrade to 3.9

    08:35:01 AM     CPU     %user     %nice   %system   %iowait    %steal     %idle
    08:45:01 AM     all     30.91      0.00      5.66     40.05      0.00     23.38
    08:55:02 AM     all     29.32      0.00      5.10     39.66      0.00     25.92
    09:05:02 AM     all     31.71      0.00      6 ...

    Read More

    Returning multiple results without a round trip

    My blog on changes to the wire protocol [1] prompted this question from a reader:

    "Would it be necessary to modify the wire protocol to support multiple query/result-set combinations per server round-trip? That is, to be able to send a hundred different queries (each with a different number and type of columns in the result set) and receive a hundred different results all in a single network round-trip? That is a feature supported by some other databases that can have an order-of-magnitude effect on the performance of high-latency installations (satellite, etc.)."

    I did a little research into this and it seems that we can already do this, sort of. See the following:

    postgres=# select 1; select 'two'; select 'three ...

    Read More

    Modifying the backend protocol for 9.4/10.0.

    A recent discussion on the lists about potentially incompatible changes to 9.4/10.0 of PostgreSQL the idea of things we wanted to do to the wire protocol in upcoming releases.

    The wire protocol is the language spoken between a client and the server of postgresql. The majority of programming languages out there do not implement their own version of the protocol instead opting to bind to the C library libpq. There are notable exceptions, specifically C# and Java both of which implement native versions of the wire protocol for their respective languages.

    The current wire protcol is v4 and was developed for 7.4 of PostgreSQL. That was released in 2003. I think we can safely say there ...

    Read More

    Considering PITRtools 1.4

    We quietly released PITRTool 1.3 last week. This version has been in development for a long time and over the past 6 months became a priority to complete. There is one known minor issue that may or may not be fixed as it doesn't affect production usage in a meaningful way. Release 1.3 contiues to support all the way back to 8.2 with warm standby but we also now support streaming replication and hot standby.

    With 1.4 we will be making some changes, quite a few of them in fact. Over the years replication and and log shipping has matured in PostgreSQL. We want to take advantage of that maturity. With that here are some ...

    Read More

    Remembering to check the docs: Autovacuum

    I was on a call very late last night with a good customer. Well, it was very early. They were having some performance problems and we were talking through how to resolve them before the EST wake up. It is late, we are all tired and of course there are too many people on the call.

    So what is the problem? The problem is they weren't running Autovacuum. Now many of my brethren would say, "HERESY!" but in reality there are good reasons not to run Autovacuum (although I would say not Autoanalyze). Autovacuum is unpredictable, and can cause performance problems. 99% of the time you should run Autovacuum but there is a 1% reason to consider other alternatives ...

    Read More

    GNU and the FSF should be split up

    The FSF should be broken up.

    Yes, I really did just write that. I believe the the FSF no longer fulfills its mission. Wait, let's back up a step. I can feel the torches started to be covered in pitch and the frankenstein cry of, "kill the heretic" starting to rumble through the old streets of the Free Software country. I am not here to say that the FSF is useless or that it doesn't have purpose. I am not here to say that Richard Stallman shouldn't continue on his political mission to save the world from the use of rightfully produced and licensed closed source software.

    What I am saying is that the FSF and GNU ...

    Read More

    Writing a custom conflict handler for Bucardo 4

    Bucardo, an asynchronous multi-master replication system (or, maybe the asynchronous multi-master replication system for PostgreSQL, cause I know of no other actively developed ones), deals with replication conflicts by providing conflict handlers in a form of standard (built-in) or custom code procedures. The built-in conflict handlers resolve a conflict by taking a row from the source or the target databases (in Bucardo 4 only 2 masters are supported, called 'source' and 'target' below), or using a delta row timestamp to determine the winner; other options include skipping conflicting rows altogether or picking one at random, but they are not very useful if you need the data to be consistent between replicas. For more complex conflict resolution rules it's necessary ...

    Read More

    ... and that was my last day

    This has been cooking for a while now, and now it's time to open it up: July 31st, 2012 was my last day with Command Prompt, Inc.

    I joined Command Prompt in October 2005. Back then I wasn't a very prolific blogger, it seems, because it took me six months to get this fact out. I haven't improved much since then, even though boss Josh Drake kept telling me to publish my thoughts and ideas on various PostgreSQL-related matters.

    During my time with them I had the opportunity to work on many interesting things. I got a number of patches into PostgreSQL, some of them thanks to some Command Prompt customer sponsoring it; I worked on the ...

    Read More

    Fedora 17 not so easy PostgreSQL configuration

    I don't usually post rants here, but this one might be actually helpful to others, so let's make an exception. It will be related to installing PostgreSQL from distro-specific packages. I usually prefer setting PostgreSQL from sources, unlike the majority of users; nevertheless, I'm familiar with how popular distros, like Debian or Fedora, manage their PostgreSQL layouts. Or so I thought until today.

    My task was simple: install PostgreSQL instance for testing on a fresh Linux box, use a non-standard port. The catch: the box was running a relatively new Fedora 17.

    Normally, I run a single database instance per each host, so I find Fedora's PostgreSQL layout much more straightforward then Debian's: all configuration ...

    Read More

    The Write–Ahead Log

    WAL (acronym for Write–Ahead Log) is the mechanism that Postgres uses to implement durability of data changes in the face of abrupt system crashes.

    The WAL system's central concept is that changes to data files must be written to such data files only after those changes are guaranteed to have been logged to WAL, that is, when WAL records have been flushed to permanent storage. (This flushing is accomplished by the fsync system call or equivalent mechanism.) If we follow this procedure, we need not flush data pages to disk on every transaction commit, because we know that in the event of a crash we will be able to recover the database using WAL — any changes that Postgres ...

    Read More

    In considerations of closed source development

    Open Source development has a lot going for it, as Bruce Momjian readily points out in a recent blog [1]. However, I believe he missed some key points that are positive for closed source development. Bruce asserts that with Open Source development the developers are the face of the software. That is true but certainly isn't always a good thing. There is a reason that the majority of software development, revenue generation, and software developer employment is closed source (and no it isn't because management and marketing are trying to keep the developer down).

    It is simple. Most of us Open Source developers aren't generally good with average people. We are good with our "breed" of people ...

    Read More

    Binding PostgreSQL server to specific CPU cores in Linux

    Recently we had a customer who was running PostgreSQL 8.2 on a 32 cores system with 64GB of memory. They were deploying this server in addition to the already running one with 24 total cores and 32GB of memory. PostgreSQL configuration has been adjusted for extra resources, the database has been partitioned roughly in half between the 2 servers and the queries running against both servers were similar.

    Suprisingly, when compared to the old server, the extra resources didn't improve the performance. Quite the contrary, the load average and CPU utilization on a new system was much higher during load spikes, while the TPS number plummeted. After performing initial examination of their server (applying our Audit & Tune package) we've decided that 8.2 might itself become an issue. This version of PostgreSQL is outdated and no longer supported by the community. What was suggested is that 8.2 doesn't scale well for 32 cores. How can we verify that hypothesis? Since they were running a relatively modern Linux kernel (2.6.32, supplied with RHEL 6) we were able to take advantage of the interface provided by the taskset utility.

    taskset is a small Linux tool that allows setting the CPU affinity of the process, i.e. which cores a given process is scheduled to run on. For instance, if we have total 4 cores available and willing to limit the process ID 12345 to the last two cores (the first two might be handling a lot of I/O) we can do it with the following command, assuming you are the owner of PID 12345 (CPU cores start from 0):

    taskset -pc 2,3 12345

    Alternatively, the set of CPU cores can be specified by a hexadecimal bitmask, which is convenient if the mask is calculated programatically. The last 2 cores out of 4 total are represented by the 1100 binary mask, hex 0xC. The command above is equivalent to:

    taskset -p 0xC 12345

    It's possible to start a new task already limited to a given subset of CPU cores, i.e. by applying taskset to a PostgreSQL startup script:

    taskset 0xC /etc/init.d/postgresql start

    In our case, however, we had to deal with a production server and run taskset against the already running PostgreSQL processes.

    To restrict PostgreSQL to specific cores one has to set the affinity for each of the processes the server consists of. The order in which the processes are restricted is important: if we don't handle the postmaster first it may spawn new backend processes that won't get into the list of postgres PIDs and won't be affected by taskset. Normally, one can get the postmaster's PID by getting it from ps. On a system with no PID wraparound and a single PostgreSQL instance the postmaster process has the lowest PID among all of the PostgreSQL processes, so here's the one-liner to get it:

    postmaster_pid=$(pidof postgres | xargs -n1 | sort | head -n1)

    In our case we have limited PostgreSQL it to the total of 24 cores, from 8 to 31:

    taskset -pc 8-31 $postmaster_pid

    After the postmaster's affinity is set no new postgres backends can utilize cores outside of those the postmaster process was limited to. Next we can do the same for the existing backend and auxillary processes:

    pidof postgres -o $postmaster_pid | xargs -n1 taskset -pc 8-31

    Afterwards, one can verify (via top, vmstat or any other utility that shows utilization of individual CPU cores) that postgres processes only utilize those cores they were limited to. In our initial test (not to be performed on a production instance!) we launched a number of postgres backends running a simple infinite loop:

    do $$ begin loop end loop;end;$$;

    and ...

    Read More

    Migrating hierarchical queries from Oracle to PostgreSQL

    This is the second part in a series of blog posts describing PostgreSQL analogs of common Oracle queries

    One of the most intricate Oracle specific constructions is "START WITH ... CONNECT BY". According to Oracle's documentation, the syntax is: SELECT [query] [START WITH initial_condition] CONNECT BY [nocycle] condition. This statement is commonly used to traverse hierarchical data in the parent-child order. It's easier to illustrate how it works with an example.

    Consider a table that stores opponents moves in a game of chess. Each table row contain coordinates (in algebraic notation) of a single move by whites and the move in response by blacks, as well as a column that references a preceding move, making it possible to keep ...

    Read More

    Cool and Sexy: Open Source PostgreSQL enterprise contenders

    As with any healthy project, there will be offshoots and people will take the source, fork it and try to create something new, better, different or just.... How that person feels it should be. This is a good thing, it leads to new ideas, new communities and sometimes truly interesting pieces of software.

    Postgres-XC has been around for a while, it is primarily developed by NTT and EnterpriseDB. It has a small community but a dedicated engineering/hacker backing. Postgres-XC is interesting because it keeps reasonably up to date with the latest Postgres (1.0 is set to be based on 9.1 of PostgreSQL) but provides a shared nothing clustering architecture. This type of infrastructure is one of the ...

    Read More

    PgNext: Cancelled

    It is with regret that I announce that PgNext is cancelled. I am not sure what is next for the PostgreSQL Conference series. The reasons are long and myriad and I will not bore you with them. However I will present the following video:


    If you can't see the video, here is the video link.

    That video represents why I would put on the conferences. They were fun. We had a good time.

    If you are looking for other Postgres conferences there are the following:

    Personally, I would suggest staying local and attending or help organize a local PUG day for PostgreSQL. PUG days are the best in small conferences. You are meeting with many ...

    Read More

    Remembering our roots

    Once upon a time, JD was a assistant manager for Block Buster video. This was a very long time ago and before a 23 month employment stint at Powells Books. It was at Powells that the world of computers was actually introduced to me as a viable employment option. While there I designed a special order database in DBase IV, was introduced to University Ingres, went through Book Buyer training, became a Novell Netware Administrator, and began a side business selling pre-built computers and parts. I also pretended to go to college and generally just had zero clue about life. I still don't have much of a clue about life.

    Why does this matter? It doesn't really. I ...

    Read More