skip to Main Content

I have read-only access to a Postgres database. I can not write to the database.

Q. Is there a way to construct and run a SQL query where I join a data frame (or other R object) to a table in a read-only Postgres database?

This is for accessing data from WRDS, https://wrds-www.wharton.upenn.edu/

Here’s an attempt at pseudocode

#establish a connection to a database
con <- dbConnect( Postgres(), 
          host = 'host.org',
          port = 1234,
          dbname = 'db_name',
          sslmode = 'require',
          user = 'username', password = 'password')

#create an R dataframe (or other object)
df <- data.frame( customer_id = c('a123', 'a-345', 'b0')  )


#write a sql query we will run
sql_query <- "
SELECT t.customer_id, t.* FROM df t
   LEFT JOIN table_name df
      on t.customer_id = df.customer_id
"

my_query_results <- dbSendQuery(con, sql_query)
temp <- dbFetch(res, n = 1)
dbClearResult(res)
my_query_results

Note and edit: The example query I provided is intentionally super simple for example purposes.

In my actual queries, there might be 3 or more columns I want to join on, and millions of rows I want to join on.

2

Answers


  1. If your join is on a single condition, it can be rewritten using an in clause:

    In SQL:

    SELECT customer_id
    FROM table_name 
    WHERE customer_id in ('a123', 'a-345', 'b0')
    

    Programmatically from R:

    sql_query = sprintf(
      "SELECT customer_id
       FROM table_name 
       WHERE customer_id in (%s)",
      paste(sQuote(df$customer_id, q = FALSE), collapse = ", ")
    )
    
    Login or Signup to reply.
  2. Use the copy_inline function from the dbplyr package, which was added following an issue filed on this topic. See also the question here.

    An example of its use is found here and also below.
    Before running the code below, run code like this in your R console:

    Sys.setenv(PGHOST = "wrds-pgdata.wharton.upenn.edu",
               PGPORT = 9737L,
               PGDATABASE = "wrds",
               PGUSER = "YOUR_WRDS_ID",
               PGPASSWORD = "YOUR_WRDS_PASSWORD")
    

    Then (assuming you have the dplyr, dbplyr, and RPostgres packages installed) you should be able to copy-paste the following code to check that it works for you. Many more examples can be found here.

    library(dplyr, warn.conflicts = FALSE)
    library(DBI)
    library(dbplyr, warn.conflicts = FALSE)
    
    pg <- dbConnect(RPostgres::Postgres(), sslmode='require')
    
    events <- tibble(permno = 14593L,
                     event_desc = c("WWDC 2005 ", "Apple Special Event", 
                                    "WWDC 2006"),
                     date = as.Date(c("2005-06-06", "2006-02-28", 
                                      "2006-02-28")))
    events_pg <- copy_inline(pg, events)
    
    dsf <- tbl(pg, sql("SELECT * FROM crsp.dsf"))
    
    results <-
      dsf %>%
      inner_join(events_pg, join_by(permno, date)) %>%
      select(permno, date, event_desc, ret, prc) %>%
      collect()
    
    results
    #> # A tibble: 3 × 5
    #>   permno date       event_desc                 ret   prc
    #>    <dbl> <date>     <chr>                    <dbl> <dbl>
    #> 1  14593 2005-06-06 "WWDC 2005 "          -0.00837  37.9
    #> 2  14593 2006-02-28 "Apple Special Event" -0.0352   68.5
    #> 3  14593 2006-02-28 "WWDC 2006"           -0.0352   68.5
    

    Created on 2023-04-23 with reprex v2.0.2

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search