The objective of this post is to:
- Supplement and document the process of connecting to an SQL database from RStudio via ODBC.
- Show how SQL queries can be executed directly from RStudio
The best resource to start: Follow instructions at rstudio Connect to database.
1. brew install freetds: "--with-unixodbc has depreciated, not needed"
2. brew install psqlodbc
3. brew install mysql
4. brew install sqliteodbc
5. Find odbcinst.ini
sudo find / -name odbcinst.ini (should be in usr/local/etc/)
nano /usr/local/etc/odbcinst.ini
6. Add the following & save:
[PostgreSQL Driver]
Driver = /usr/local/lib/psqlodbcw.so
[SQLite Driver]
Driver = /usr/local/lib/libsqlite3odbc.dylib
[ODBC for FreeTDS]
Driver = /usr/local/lib/libtdsodbc.so
7. Find odbc.ini
sudo find / -name odbc.ini (should be in usr/local/etc/)
nano /usr/local/etc/odbc.ini
8. Add the following & save
[PostgreSQL]
Driver = PostgreSQL Driver
Database = test_db
Servername = localhost
UserName = postgres
Password = password
Port = 5432
[SQLite]
Driver = SQLite Driver
Database=/tmp/testing
[ODBC for FreeTDS]
Driver = ODBC for FreeTDS
Database = nameofthedatabase
Servername = nameofserver.com
UserName = username
Password = passwd
Port = Portnumber
9. In R, run the console command to check if drivers are installed. . .
odbcListDrivers()
10. run the console command
install.packages("odbc")
Test it. Install the required libraries; details @Yihui’s book on rmarkdown: chapter Other language engines. First establish a connection to a database using DBI::dbConnect() function
.
#Load required libraries
library(odbc)
library(DBI)
library(tidyverse)
#Define Connection
myconnection <- DBI::dbConnect(odbc::odbc(),
driver = "libtdsodbc.so",
server = "Server Name",
database = "Database name",
UID = "userid",
Pwd = "passwd",
port = 1433)
The established/defined connection can be accessed via the Connections pane in Rstudio.
Define an sql chunk and run SQL queries in R-markdown using: {sql, connection = myconnection, output.var = “nameoftheoutputfile”}
SELECT *,
convert(INT, DATENAME(YEAR, DATEADD(month, 6, First_Pur_Date))) FY_FirstPurchase,
convert(INT, DATENAME(YEAR, DATEADD(month, 6, MaxMED))) FY_EMaintD,
convert(INT, MONTH(First_Pur_Date)) Month_FirstPurch
FROM "databasename"."tablename"."dataset"
Use SQL query similar to case_when
in R
SELECT *,
case
when Churn = 'No' AND First_Purchase > 1825 AND First_Purchase < f.fpur then '5+ Years Active'
when Churn = 'No' AND First_Purchase > 1460 AND First_Purchase < 1824 then '4+ Years Active'
when Churn = 'No' AND First_Purchase > 1095 AND First_Purchase < 1459 then '3+ Years Active'
when Churn = 'No' AND First_Purchase >730 AND First_Purchase <1094 then '2+ Years Active'
when Churn = 'No' AND First_Purchase >365 AND First_Purchase <729 then '1+ Years Active'
when Churn = 'No' AND First_Purchase <365 then 'New Active'
else 'Inactive'
end segment
FROM table.Prod_Churn c inner join (select SystemId,
max(First_Pur_Date) fpur from table.Prod_Churn
GROUP by SystemID) f on f.SystemID=c.SystemID
WHERE Product = 'name_of_product'
Thanks to Mike Griffith, our IT ninja guy, and Greg Lee (SQL Ninja) who helped me through this process.
Share this post
Twitter
Google+
Facebook
Reddit
LinkedIn
StumbleUpon
Pinterest
Email