Title: | Create and Query a Local 'PubTator' Database |
---|---|
Description: | 'PubTator' <https://www.ncbi.nlm.nih.gov/CBBresearch/Lu/Demo/PubTator/> is a National Center for Biotechnology Information (NCBI) tool that enhances the annotation of articles on PubMed <https://www.ncbi.nlm.nih.gov/pubmed/>. It makes it possible to rapidly identify potential relationships between genes or proteins using text mining techniques. In contrast, manually searching for and reading the annotated articles would be very time consuming. 'PubTator' offers both an online interface and a RESTful API, however, neither of these approaches are well suited for frequent, high-throughput analyses. The package 'pubtatordb' provides a set of functions that make it easy for the average R user to download 'PubTator' annotations, create, and then query a local version of the database. |
Authors: | Zachary Colburn [aut, cre], Madigan Army Medical Center - Department of Clinical Investigation [cph, fnd] |
Maintainer: | Zachary Colburn <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.1.4 |
Built: | 2024-11-09 03:37:23 UTC |
Source: | https://github.com/mamc-dci/pubtatordb |
Download PubTator data via ftp.
download_pt(pubtator_parent_path, ...)
download_pt(pubtator_parent_path, ...)
pubtator_parent_path |
The path to the directory where the PubTator data folder will be created. |
... |
Additional arguments to dir.create and download.file. |
The path to the newly created directory. This can be passed to other functions as the pt_path argument.
# Use the full path. The files are large. Writing somewhere other than the # temp directory is recommended. download_path <- tempdir() download_pt(dowload_path)
# Use the full path. The files are large. Writing somewhere other than the # temp directory is recommended. download_path <- tempdir() download_pt(dowload_path)
Make a path to the PubTator sqlite file.
make_pubtator_sqlite_path(pt_path)
make_pubtator_sqlite_path(pt_path)
pt_path |
A character string indicating the full path of the directory containing the pubtator gz files to be extracted. |
A character string indicating the full path to the sqlite file.
List the column names for a table in the PubTator sqlite database
pt_columns(db_con, table_name)
pt_columns(db_con, table_name)
db_con |
A connection to the PubTator sqlite database, as created via pubator_connector. |
table_name |
The name of the table of interest. Valid tables can be found using pt_tables. Capitalization does not matter. |
A character vector of the column names for a given table.
db_con <- pt_connector(pt_path) pubtator_columns(db_con, "gene")
db_con <- pt_connector(pt_path) pubtator_columns(db_con, "gene")
Connect to pubtator.sqlite
pt_connector(pt_path)
pt_connector(pt_path)
pt_path |
A character string indicating the full path of the directory containing the pubtator gz files to be extracted. |
A SQLiteConnection
pt_connector("D:/Reference_data/PubTator")
pt_connector("D:/Reference_data/PubTator")
Retrieve data from the PubTator database.
pt_select( db_con, table_name, columns = NULL, keys = NULL, keytype = NULL, limit = Inf )
pt_select( db_con, table_name, columns = NULL, keys = NULL, keytype = NULL, limit = Inf )
db_con |
A connection to the PubTator sqlite database, as created via pubator_connector. |
table_name |
The name of the table of interest. Valid tables can be found using pt_tables. Capitalization does not matter. |
columns |
A character vector of the names of the columns of interest. Capitalization does not matter. |
keys |
A vector specifying which values must be in the keytype column to enable retrieval. No filtering is performed if keys = NULL. |
keytype |
The column in which the keys should be searched for. |
limit |
The maximum number of rows the query should return. All rows passing filtering (if any) are returned if limit = Inf. |
A data.frame.
db_con <- pt_connector(pt_path) pt_select( db_con, "gene", columns = c("ENTREZID","Resource","MENTIONS","PMID"), keys = c("7356", "4199", "7018"), keytype = "ENTREZID", limit = 10 )
db_con <- pt_connector(pt_path) pt_select( db_con, "gene", columns = c("ENTREZID","Resource","MENTIONS","PMID"), keys = c("7356", "4199", "7018"), keytype = "ENTREZID", limit = 10 )
List the tables in the PubTator sqlite database
pt_tables(db_con)
pt_tables(db_con)
db_con |
A connection to the PubTator sqlite database, as created via pubator_connector. |
A character vector of the names of the tables found in the database.
db_con <- pt_connector(pt_path) pt_tables(db_con)
db_con <- pt_connector(pt_path) pt_tables(db_con)
Create sqlite database from the pubtator data.
pt_to_sql(pt_path, skip_behavior = TRUE, remove_behavior = FALSE)
pt_to_sql(pt_path, skip_behavior = TRUE, remove_behavior = FALSE)
pt_path |
A character string indicating the full path of the directory containing the pubtator gz files to be extracted. |
skip_behavior |
TRUE/FALSE indicating whether the file should be re-extracted if it has already been extracted. |
remove_behavior |
TRUE/FALSE indicating whether the gz files should be removed following successful extraction. |
download_path <- tempdir() current_dir <- getwd() setwd(download_path) pt_to_sql("PubTator") setwd(current_dir)
download_path <- tempdir() current_dir <- getwd() setwd(download_path) pt_to_sql("PubTator") setwd(current_dir)
See the citations for PubTator
pubtator_citations()
pubtator_citations()
pubtator_citations()
pubtator_citations()
NCBI's ftp url definition for PubTator.
pubtator_ftp_url()
pubtator_ftp_url()
A character string giving the ftp url for PubTator.
Table and dataset definitions
pubtator_tables()
pubtator_tables()
A character vector where names are table names and values are dataset names.