Skip to content

R Database Packages

Curated list of R Database Packages.

Contents

Overview

As datasets become larger and larger, it is impossible for people to save them in traditional file formats such as spreadsheet, raw text file, etc., which could not fit on devices with limited storage and could not be easily shared across collaborators. Instead, people nowadays tend to store data in databases for more scalable and reliable data management.

Database systems are often classified based on the database models that they support.

Relational Databases

  • Relational databases became dominant in the 1980s. The data in relational databases is modeled as rows and columns in a series of tables with the use of SQL to express the logic for writing and querying data.
  • The tables are relational, e.g. you have a user who users your software and those software have creators and contributors.

This section includes packages that provides access to relational databases within R.

Relational Database Packages

  • DBI - The r pkg("DBI", priority = "core") package provides a database interface definition for communication between R and relational database management systems. It’s worth noting that some packages try to follow this interface definition (DBI-compliant) but many existing packages don’t.

  • RODBC - The r pkg("RODBC") package provides access to databases through an ODBC interface.

  • RMariaDB - The r pkg("RMariaDB") package provides a DBI-compliant interface to MariaDB and MySQL.

  • RMySQL - The r pkg("RMySQL") package provides the interface to MySQL. Note that this is the legacy DBI interface to MySQL and MariaDB based on old code ported from S-PLUS. A modern MySQL client based on Rcpp is available from the RMariaDB package we listed above.

  • Packages for PostgreSQL, an open-source relational database:

  • RPostgreSQL and RPostgres - The r pkg("RPostgreSQL") package and r pkg("RPostgres") package both provide fully DBI-compliant, Rcpp-backed interfaces to PostgreSQL.

  • rpostgis - The r pkg("rpostgis") package provides the interface to its spatial extension PostGIS.
  • RGreenplum - The r pkg("RGreenplum") provides a fully DBI-compliant interface to Greenplum, an open-source parallel database on top of PostgreSQL.
  • ROracle - The r pkg("ROracle") package is a DBI-compliant Oracle database driver based on the OCI.

  • The r pkg("ora") package provides convenience functions to query and browse a database through the r pkg("ROracle") connection.

  • Packages for SQLite, a self-contained, high-reliability, embedded, full-featured, public-domain, SQL database engine:

  • RSQLite - The r pkg("RSQLite") package embeds the SQLite database engine in R and provides an interface compliant with the DBI package.

  • filehashSQLite - The r pkg("filehashSQLite") package is a simple key-value database using SQLite as the backend.
  • liteq - The r pkg("liteq") package provides temporary and permanent message queues for R, built on top of SQLite.
  • bigrquery - The r pkg("bigrquery") package provides the interface to Google BigQuery, Google’s fully managed, petabyte scale, low cost analytics Data Warehouse.

  • RDruid - The r github("druid-io/RDruid") package on GitHub provides the interface to Apache Druid, a high performance analytics data store for event-driven data.

  • RH2 - The r pkg("RH2") package provides the interface to H2 Database Engine, the Java SQL database.

  • influxdbr - The r pkg("influxdbr") package provides the interface to InfluxDB, a time series database designed to handle high write and query loads.

  • odbc - The r pkg("odbc", priority = "core") package provides a DBI-compliant interface to drivers of Open Database Connectivity (ODBC), which is a low-level, high-performance interface that is designed specifically for relational data stores.

  • RPresto - The r pkg("RPresto") package implements a DBI-compliant interface to Presto, an open source distributed SQL query engine for running interactive analytic queries against data sources of all sizes ranging from gigabytes to petabytes.

  • RJDBC - The r pkg("RJDBC") package is an implementation of R’s DBI interface using JDBC as a back-end. This allows R to connect to any Database Management System (DBMS) that has a JDBC driver.

  • implyr - The r pkg("implyr") package provides the back-end for Apache Impala, which enables low-latency SQL queries on data stored in the Hadoop Distributed File System (HDFS), Apache HBase, Apache Kudu, Amazon Simple Storage Service (S3), Microsoft Azure Data Lake Store 2 (ADLS), and Dell EMC Isilon.

  • dbx - The r pkg("dbx") package provides intuitive functions for high performance batch operations and safe inserts/updates/deletes without writing SQL on top of r pkg("DBI"). It is designed for both research and production environments and supports multiple database backends such as PostgreSQL, MySQL, MariaDB, and SQLite.

  • sparklyr - The r pkg("sparklyr") package provides provides a r pkg("dplyr") interface to Apache Spark DataFrames s well as an R interface to Spark’s distributed machine learning pipelines.

  • Hmisc - The r pkg("Hmisc") provides a wrapper function Hmisc::mdb.get() that uses the mdbtools utility to read from Microsoft Access database on Unix-alike systems.

Non-Relational Databases

Non-relational databases became popular in recent years due to huge demand in storing unstructured data with the use of NoSQL as the query language. - Users generally don’t need to define the data schema up front. If there are changing requirements in the applications, non-relational databases can be much easier to use and manage.

Non-Relational Database Packages

  • Packages for Redis, an open-source, in-memory data structure store that can be used as a database, cache and message broker:

  • RcppRedis - The r pkg("RcppRedis") package provides interface to Redis using r github("redis/hiredis").

  • redux - The r pkg("redux") package provides a low-level interface to Redis, allowing execution of arbitrary Redis commands with almost no interface, and a high-level generated interface to more than 200 redis commands.
  • Packages for Elasticsearch, an open-source, RESTful, distributed search and analytics engine:

  • elastic - The r pkg("elastic") package provides a general purpose interface to Elasticsearch.

  • uptasticsearch - The r pkg("uptasticsearch") package is a Elasticsearch client tailored to data science workflows.
  • mongolite - The r pkg("mongolite") package provides a high-level, high-performance MongoDB client based on r github("mongodb/mongo-c-driver"), including support for aggregation, indexing, map-reduce, streaming, SSL encryption and SASL authentication.

  • R4CouchDB - The r pkg("R4CouchDB") package provides a collection of functions for basic database and document management operations in CouchDB.

  • RCassandra - The r pkg("RCassandra") package provides a direct interface (without the use of Java) to the most basic functionality of Apache Cassanda such as login, updates and queries.

  • aws.dynamodb - The r github("cloudyr/aws.dynamodb") package on GitHub provides access to Amazon DynamoDB.

  • rrocksdb - The r github("mrcsparker/rrocksdb") package on GitHub provides access to RocksDB.

Database Tools

This section includes packages that provides tools for working and testing with databases, databases table manipulations, etc.

  • pool - The r pkg("pool") package enables the creation of object pools, which make it less computationally expensive to fetch a new object.
  • DBITest - The r pkg("DBItest") package is a helper that tests DBI back ends for conformity to the interface.
  • dbplyr - The r pkg("dbplyr") package is a r pkg("dplyr") back-end for databases that allows you to work with remote database tables as if they are in-memory data frames. Basic features works with any database that has a DBI back-end; more advanced features require SQL translation to be provided by the package author.
  • sqldf - The r pkg("sqldf") package provides functionalities to manipulate R Data Frames Using SQL.
  • pointblank - The r pkg("pointblank") package provides tools to validate data tables in databases such as PostgreSQL and MySQL.
  • TScompare - The r pkg("TScompare") package provides utilities for comparing the equality of series on two databases.
  • dittodb - The r pkg("dittodb") package provides functionality to test database interactions with any r pkg("DBI") compliant database backend. It includes functionality to use fixtures instead of direct database calls during testing as well as functionality to record those fixtures when interacting with a real database for later use in tests.
  • tfio - The r pkg("tfio") package provides the ability to use Apache Ignite, which handles distributed database management for high-performance computing with in-memory speed.
  • dbr - The r github("daroczig/dbr") package on GitHub provides convenient database connections and queries from R using YAML configuration files and templates.
  • rocker - The r pkg("rocker") package provides a r pkg("R6") class interface for handling relational database connections using r pkg("DBI") as backend. The purpose is having an intuitive object allowing straightforward handling of SQL databases.

Resources

Index: All R Database Packages

Core Packages

PostgreSQL

SQL Server

Code:

Outdated:

MySQL

SQLite

Others

Redis:

BigQuery:

MariaDB:

Oracle:

Miscellaneous:

  • elastic
  • sparklyr
  • RGreenplum
  • influxdb
  • SciDB
  • ibmdbR: provides methods to read data from, write data to, and sample data from a Db2 database. It also provides access methods for in-database analytic functions and functions for storing R objects in the database.

Utility Packages

Testing and Validation

Configuration and Connections


Related Notes:

Parent Notes:

Backlinks:

list from [[R Database Packages]] AND -"Changelog"