SRA-Explorer
SRA Explorer
This tool aims to make datasets within the Sequence Read Archive more accessible.
PRJNA227276[All Fields]
Search for: Max Results 100 Start At Record 0
Need inspiration? Try E30567, SAPO3520, PRE), EXP of human liver A
18 Saved Datasets
Fast Downloads SRA Downloads
Full Meta data
To download Fast files directly, sra-explorer queries the ENA for each SRA run accession number.
Raw Fast Download URLS
Bash script for downloading Fast files
Aspera commands for downloading Fast files
Cluster Flow Fast download file (nice filenames)
18 saved datasets
Remove all from collection and send to search results bcbio project file for Fast downloads (nice filenames)
Fig. 2 SRA-Explorer interface showing an example dataset to batch download files using a bash script
text search in the NCBI-SRA database where “lung adenocarcinoma” was the search keyword, followed by some additional filters in the left-hand-pane such as strategy, library layout, and platform.
The library layout of sequenced reads is available in either single- end (SE) or paired-end (PE) format. SE reads are usually sufficient for expression-level studies, while PE reads are preferred in the case of de novo transcript discovery.
However, we have taken PE data in the example as it provides with accuracy during mapping to reference genome or transcriptome [34]. Following are the steps to download the FASTQ files from the selected datasets using SRA Explorer (Fig. 2):
Search for the sample accession ID for the selected dataset. 2. Select desired samples from the dataset and “click add to collection”. This will provide with URLS for batch download of the files in FASTQ and SRA file formats.
Now select Bash Script for downloading FASTQ files, and click download which will generate sra_explorer_fastq_- download.sh file. Enter the following command in Ubuntu terminal to execute the batch download of FASTQ files. $ ./sra_explorer_fastq_download
we now have the ability to correlate (and we should take advantage of this opportunity because rarely do security organizations have the opportunity to be a service to the organization versus a perceived burden to it!). This is an opportunity to promote our security organization through our ability to provide value reporting outside the security group.
As we move further into this topic, we must keep in mind that it is not the charter of SOC staff to provide external reporting. But the question remains: why not leverage the security group’s ability to provide external report value via its system reporting without adversely affecting system performance.
Knowing that external customers who receive these batch reports don’t require the same level of detail or timeliness of reports that SOC and NOC personnel require, the challenge is to be able to generate a meaningful and detailed report to the customer within a usable timeframe without overextending security resources.
Batch file reports should all deliver some degree of detail. They should look much like what the real-time SOC staff sees, albeit received via batch process at predefined periodic intervals. In addition, given that it is for external user groups, it must be relevant enough to be of value. If there is a need for more detailed and current information, then a request can be referred to the SOC call number for further analysis and assistance.
The batch report should have a run cap on it for size, possibly only showing the top ‘nn’ number of events, or past ‘n’ number of hours so that it doesn’t get too large under extreme circumstances. It should also have a set retention period (knowing the original records are preserved, this is merely a forwarded report and, therefore, the report itself wouldn’t have any set retention requirements).
The only question here is how long the end users require the ability to view these reports. Given the fact that it is a drill-down, with detailed reporting, and probably only periodically utilized, a short retention period is recommended, possibly as short as a week.
This will all be decided as a result of the user community, and more important, experience and trials of the reporting infrastructure. A few additional considerations for retention include the size of the reports and storage capacities. Are the reports stored on a single central server or distributed across your enterprise?
A distributed approach spreads the storage load out among multiple servers, possibly making better use of resources, but also introduces additional risk of distributing security reports. This could also potentially mean added costs to protect these reports.
If the reports are distributed to outlying hosting servers and it were deemed necessary to encrypt the reports, then multiple SSL encryption keys would have to be purchased, at least one for each distributed host server receiving and redistributing the security reports. In some cases if the central server were sized appropriately, economies of scale in storage could make the centralized approach more cost effective.
An example of an external group that could benefit outside your core security group is your network operations group. The fact that you could be collecting from a multitude of devices, but most important, your firewalls, both on the perimeter and internally, you will have near real- time data from these devices at a single point from across your enterprise.
This may cross product boundaries as well, such as between your Check- point and Cisco Pix firewalls; the point here is each of these has its own management console, but it is specific to its own product arena.
You have the benefit of being able to collect from both and view them side by side on a single console. Generating reports of firewall denies and overall activity (from a security perspective) can be quite helpful to the network group by showing possible misconfiguration of other network devices or components, especially those facing the internal firewalls.
If a firewall is suddenly denying a high volume of traffic and it generates identifying records where this traffic is coming from, this could be quite useful to the network performance management group.
Events such as this may go undetected by their normal network sensors (until they reach a critical mass), but your ongoing reporting structure may provide an early warning type of alert across a heterogeneous platform of devices. (In fact, it has been the author’s experience that the network group not only wants to see such reports, but has gone to the point of requesting real-time console access to make its own queries once it was discovered that all the log data was available in a single console access.)
This discussion further makes the case for plenty of advance planning of the overall system needs prior to diving into a purchase. By having the whole architecture laid out in advance, including the aspect of reporting cycle, it will help in making cost-effective purchases based on estimated sizing of components such as mass storage, server encryption, user base, and systems to be managed.
A final consideration for generating any number of static reports is protection and access controls. Because the reports are static implies that they will be stored somewhere for some period of time and this scenario creates a risk factor. If the reports were deemed valuable and included information from a multitude of security systems, they would therefore warrant adequate protection.
Protections available depend on the methodology utilized to generate the batch reports. Many SEM products today have protection capabilities for the reasons stated previously (relieving the system of volumes of real-time ad hoc queries against the main data repository). Because the COTS products have built-in scheduled report generation capabilities, some may also include protected output queues. These are usually built around some form of role-based access control. This gives the report providers the ability to grant report access by user group or role-based access. In the best-case scenario, the system would
Table 6.2 Role-Based Access Controls
OU (Organizational Unit) Access within Reporting System Role
ADMIN IT
Manager IT
NOC Staff IT
Manager Western Region
ADMIN Western Region
Read All, Console, online queries, batch reports
Read All, batch reports
Read All, Console Read only, batch reports
Read Selected (Regional), Batch Reports (Selected)
Read Selected (Regional), Console Selected Batch Reports (Selected) use enterprise access controls, such as an LDAP directory of users, potentially even using predefined user roles. By leveraging a preexisting repository of users, logging and report administrators would not have to enter all the users into yet another directory system, with a different set of access controls and rules. Depending on the degree of roles defined in the user directory, we may be able to grant access based on the existing roles in the manner depicted in Table 6.2.
Using this sample of roles and access rights list, it could be used to grant access (and protections) over potentially sensitive data as well as system functions. Note how the IT ADMINS are the only ones with broad- based access not only to the reports but also the functions of the system such as uninhibited console access. The IT managers are merely granted Read and Batch Report access (recall how we stated that the ad hoc query function on the console must be severely restricted in order to preserve system performance).
Then by further using roles such as “Regional” access, we can further restrict access not only to functions, but selected data within the reporting functions, specifically those related only to their region or area of the company. This could be identified by keying off the actual data itself, such as some identifier for a North American firewall log versus a European firewall log, either through the naming of the device or the designated IP ranges of addresses assigned to the various regions. If you are able to make this designation, then you could restrict the North American users to view only data associated with their device logs within their geographic region.
Our assumptions for applying these access controls rely on a couple of conditions: first, the COTS SEM product that was selected has the capability to base its access rights on an external LDAP directory and second, our LDAP directory has the same degree of “identity” (as in our sample) tied to each of the user entries. The first condition is not uncommon; the issue of separate user access rights lists has always been around; in addition, the linking to the common LDAP directory of users is a logical solution set that has become quite common.
The second condition may be slightly more difficult as it relies on the directory of users having the required information to be able to run a role-based access control methodology. Hopefully this won’t be as challenging as having a directory that has user groups, as well as their placement within the company by OU (Organizational Unit) which is also a fairly common concept.
Again, this all takes up-front knowledge and planning, especially when selecting a SEM system to know to ask questions such as, “Does the security event manager allow role-based access control from an external directory structure?”
We’ve begun to cover basic reporting from SOC real-time to nonreal- time batch NOC reports delivered to non-SOC personnel. Let’s dig deeper into external reporting options.
Let’s start with one of our previous external reporting scenarios, where we reported on suspicious desktop machines that were identified in reports when correlated with eradicated viruses and potential sources of suspicious network traffic. The report of course has security ramifications, but it may also be useful to desktop support personnel whose responsibility it is to ensure that current desktop protection software is deployed, or if a deployment effort or update failed, which led to the desktop (or groups of desktops) showing up on the SOC alert report. The appropriate report for this group would be generated in batch mode (and if possible in a further attempt to maintain the efficient performance of your system), only run on the 72- to 120-hour basis (three to five days).
The recipients would be the desktop support group responsible for the timely distribution of the AV updates, as well as the base AV software. The fewer times a day or week that even batch reports are run, the better off the overall performance of this system.
Our desktop support team is a good example where the data is not so time critical (infected machines that have been corrected, knowing that is one of the input sources from the enterprise virus defense system that is reporting those machines it has patched) as to not demand frequent reporting. So it soon becomes evident that this is a report that doesn’t require as timely a reporting cycle, so the “Infected machine, anomalous traffic” report could be batch run on a 72- hour cycle and reported and still provide a degree of value to the end user(s). It will show those machines that had viruses detected as well as (through correlation with the network IDS) show a match to that machine as generating suspicious traffic.
If a pattern of the same machine or machines shows up on this report, analysis should be performed on that particular machine (or groups of machines) to determine if the update process is still working, or if it requires further degrees of protection because it continues to get infected
Security Alert Management (SAM)
SAM is also a key component of an enterprise security program; this is the alerting mechanism of any one of your security tools. It is often distributed on a product-by-product basis where each security tool has its own console and alerting mechanism.
The relationship of SAM to log consolidation efforts is that your consolidated log collection contains all the SAM alerts located within it and thus provides us with a prime opportunity to also fire off alerts on a near-real-time basis (there are usually plenty of specific consoles for this purpose), but, if we are already collecting data from the multiple security devices, then it might make sense to leverage the system to generate additional near-real-time alerts to benefit the watchful eyes of SOC or NOC (although we initially focus on the NOC alerts).
We are going to some depth of the concept of using your log consolidation system to create or supplement your security alert management infrastructure based on the log consolidation architecture that you are building. In Chapter 7 we go further by detailing the escalation process that should be developed to standardize how to address each of the alert types, but for now we just go into the basis of generating the alerts. Let’s begin our discussion with the basics of what can be done based on our firewall log collection system. Figure 6.1 illustrates a basic data flow of our log collection system and then depicts how near-real-time alerts could be relayed to the NOC console infrastructure.
On the right are the various firewalls from which data is being collected; these firewalls are feeding the intermediary log collection device, referred to as an “Event Collector.” From the Event Collector, the data is passed to the main database collection point, with the master SOC console attached. At the SOC console, security alert messages are first displayed. The SOC console is normally staffed by security personnel; it may not be a physical console but a logically accessible (via a secure communication
and get eradications on a regular basis. Whereas other reports (i.e., firewall failed authentications) may be more sensitive and subject to the security access controls, this type of report may not be as sensitive as some of the other batch reports that we spoke of earlier, and thus they might be able to be hosted on distributed servers, not requiring encryption or, for that matter, much in the way of access control. This follows the old rule of applying security controls such that they are commensurate with the sensitivity of the data. As we continue to work through various options, we must always keep in mind the importance of delivering the data and reports in a timely manner, while still providing adequate protection to the data contained in the report.
Security Alert Management (SAM)
SAM is also a key component of an enterprise security program; this is the alerting mechanism of any one of your security tools. It is often distributed on a product-by-product basis where each security tool has its own console and alerting mechanism.
The relationship of SAM to log consolidation efforts is that your consolidated log collection contains all the SAM alerts located within it and thus provides us with a prime opportunity to also fire off alerts on a near-real-time basis (there are usually plenty of specific consoles for this purpose), but, if we are already collecting data from the multiple security devices, then it might make sense to leverage the system to generate additional near-real-time alerts to benefit the watchful eyes of SOC or NOC (although we initially focus on the NOC alerts).
We are going to some depth of the concept of using your log consolidation system to create or supplement your security alert management infrastructure based on the log consolidation architecture that you are building. In Chapter 7 we go further by detailing the escalation process that should be developed to standardize how to address each of the alert types, but for now we just go into the basis of generating the alerts.
Let’s begin our discussion with the basics of what can be done based on our firewall log collection system. Figure 6.1 illustrates a basic data flow of our log collection system and then depicts how near-real-time alerts could be relayed to the NOC console infrastructure.
On the right are the various firewalls from which data is being collected; these firewalls are feeding the intermediary log collection device, referred to as an “Event Collector.” From the Event Collector, the data is passed to the main database collection point, with the master SOC console attached. At the SOC console, security alert messages are first displayed. The SOC console is normally staffed by security personnel; it may not be a physical console but a logically accessible (via a secure communication
Inference of Dynamic Growth Regulatory Network 61
$http://ftp.ensembl.org/pub/release-105/fasta/ homo sapiens/dna/Homo sapiens. GRCh38.dna.pri-
mary_assembly.fa.gz
$ http://ftp.ensembl.org/pub/release-105/gtf/ homo sapiens/Homo_sapiens. GRCh38.105.gtf.gz
The downloaded reference genome is then used to build the index which is used by Hisat2 for faster and memory-efficient alignment using the following commands:
$ hisat2-build -p 6 Homo sapiens.GRCh38. dna.primary_assembly.fa-exon genome
hisat2-build builds indices from DNA sequences and appends “.1.ht2, .2.ht2….” suffixes to the base name (genome) of the index files generated. Additional arguments can be given in the command, -ss and -exon corresponding to splice- sites and exons within the genome. Since index creation for larger genomes like Human genome requires huge memory, so index files for well-annotated genomes can be downloaded from the Hisat2 website itself (https://genome-idx.s3.amazonaws.com/hisat/ grch38 genome.tar.gz). Once the index files and annotation files are ready, we can proceed for alignment of reads using the following command:
$hisat2 – 12 -dta-rna-strandness RF -x /path/to/ reference/genome/index/genome -1 /path/to/ SRR1027983_1P_clean. fq.gz -2 /path/to/ SRR1027983_2P_clean. fq.gz -S SRR1027983.sam -un-gz SRR1027983_unaln. fq.gz -summary-file SRR1027983_sum.txt -met-file SRR1027983_met.tsv
-dta reports alignments tailored for downstream transcript assemblers, -rna-strandness is used to detect the correct strand against which reads should be aligned, -x is used for providing base name (genome) of index file, -1 and -2 are used for paired-end sequencing. The other arguments -s, -un-gz, -summary-file, -met-file are used to print SAM file, export the reads that failed to align, alignment summary, and the metrics of alignment respectively. The primary output of alignment, SAM (Sequence Align- ment/Map) format is a generic nucleotide alignment format and are human readable and can be compressed to Binary alignment/ Map (BAM) format using SAM tools. BAM format contains the positions of the genes and are optimized for fast access which can also be sorted and indexed so that all reads aligning to a locus can be efficiently retrieved without loading the entire file into memory. This can be achieved using SAM tools as:
Table 2
List of tools along with steps at which quality-check (QC) should be performed
ос
checkpoints
QC metrics
Database/ Tool
Qualifying criteria
Data
Library selection
acquisition
Sequencing depth
cDNA library followed by mRNA enrichment or rRNA depletion
10-20 million PE reads
Raw read
Sequence quality
FastQC
Q30 > 70%
Read
Sequence coverage
Picard
70% Alignment
Million reads aligned
14 million reads
reference Percent aligned to rRNA
<5% 50% Percent aligned to coding
One can also download the paired-end data of a single FASTQ sample using wget in the Ubuntu terminal as (see Notes 4 and 5):
$ wgetftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR102/
003/SRR1027983/SRR1027983_1.fastq.gz
$ wget ftp://ftp.sra.ebi.ac.uk/vol1/fastq/SRR102/
003/SRR1027983/SRR1027983_2.fastq.gz
3.2 Quality Control of Ideally, quality assessment should be done at every step of data Raw Reads
analysis, so that any bias in the data can be corrected for a reliable downstream analysis (Table 2). Among others, FastQC is a tool used for quality control of raw reads which accepts FASTQ files as input and generates quality assessment results in HTML format [25]. FastQC can be executed on a single sample as:
$ fastqc SRR1027983_1.fastq.gz
FastQC can be executed on each sample in the working direc-
tory in a batch as:
$ fastqc.fastq.gz
]]Inference of Dynamic Growth Regulatory Network in Cancer Using High-Throughput Transcriptomic Data
Aparna Chaturvedi and Anup Som
Abstract
Growth is regulated by gene expression variation at different developmental stages of biological processes such as cell differentiation, disease progression, or drug response. In cancer, a stage-specific regulatory model constructed to infer the dynamic expression changes in genes contributing to tissue growth or proliferation is referred as a dynamic growth regulatory network (dGRN).
Over the past decade, gene expression data has been widely used for reconstructing dGRN by computing correlations between the differentially expressed genes (DEGs).
A wide variety of pipelines are available to construct the GRNs using DEGs and the choice of a particular method or tool depends on the nature of the study. In this protocol, we have outlined a step-by-step guide for the analysis of DEGs using RNA-Seq data, beginning from data acquisition, pre-processing, mapping to reference genome, and construction of a correlation-based co-expression network to further downstream analysis.
We have also outlined the steps for the inclusion of publicly available interaction/regulation information into the dGRN followed by relevant topological inferences. This tutorial has been designed in a way that early researchers can refer to for an easy and comprehensive glimpse of methodologies used in the inference of dGRN using transcriptomics data.
Key words RNA-Seq, Read alignment, Read quantification, Differential gene expression, High- throughput data analysis, Growth regulatory network, Co-expression network, Pathway analysis, Systems biology
1 Introduction
Cancer is a dynamic disease, changing over time and characterized by the uncontrolled growth of cells over time. As growth is regulated by gene expression variations at different cancer stages, it is important to identify the regulatory roles of genes in a stage- specific manner. The network constructed based on this concept is termed as a dynamic growth regulatory network (dGRN) where a regulatory model or network is built for each successive stage contributing to growth in tumor size. This concept of GRN construction is routinely used in plant sciences to identify the key genes and pathways associated with growth in tissue size [1-3]. In
Sudip Mandal (ed.), Reverse Engineering of Regulatory Networks, Methods in Molecular Biology, vol. 2719, https://doi.org/10.1007/978-1-0716-3461-5_4,
© The Author(s), under exclusive license to Springer Science+Business Media, LLC, part of Springer Nature 2024
multiomics data along with time-series gene expression data. Additionally, these tools can also be used to analyze multiple or cyclic time-series data, in case of other dynamic processes such as developmental processes or drug responses to chemotherapy (see [20, 21] for detailed implementation). To construct a co-expression network, initially, individual relationship between genes is identified based on various correlation measures and a similarity matrix is constructed which is then used to construct a co-expression network where each node represents a gene and each edge represents the interactions regulating the genes expression or function [22].
The reconstructed GRNs are significantly different from random networks and therefore various parameters have been worked out to define the properties of a network such as hub nodes, modules, and so on [23].
Modules are a cluster of genes showing similar temporal patterns and therefore often these clus- ters of genes are believed to regulate similar functional pathways altogether. Often, modules in the GRNs are used to infer the gene ontology-enriched processes and functional pathways using DEGS that might be prominent in the causal or progression of tumors. In this article, we have provided a step-by-step guide to constructing a GRN using high-throughput lung adenocarcinoma (LUAD) sam- ples submitted by Morton et al in the NCBI-SRA database [24]. We believe that by simply following these steps one might be able to construct a co-expression network from downloading and processing of RNA-Seq data, identification of DEGs, construc- tion of co-expression network to the topological inferences by measuring network parameters such as node-degree distribution, clustering coefficient, and betweenness centrality.
Computational requirements: The computational require- ments vary according to the amount of data to be analyzed. In this work, we used a Dell desktop 64-bit workstation con- sisting of 24-core Intel Xeon CPU processors, 32 GB RAM, a2-TB hard drive, and Ubuntu 20.04 LTS (see Note 1).
R v4.0.2 is a language used primarily for statistical analysis and data visualization purposes. The RStudio IDE (Integrated Development Environment) can be downloaded from https://www.rstudio.com/categories/rstudio-ide/ for easy handling of the R working environment (see Note 2).
FastQC v0.11.9 (see Note 2) for quality control of high- throughput sequencing data [25] that can be downloaded from https://www.bioinformatics.babraham.ac.uk/projects/ fastqc/.
In this book, we’ll pick a number of third-party integrations and show examples. But because Express is unopinionated, none of the contents of this book are the only options. If I cover Third-Party Tool X in this book, but you prefer Third-Party Tool Y, you can swap them out.
Express has some small features for rendering HTML. If you’ve ever used vanilla PHP or a templating language like ERB, Jinja2, HAML, or Razor, you’ve dealt with rendering HTML on the server.
Express doesn’t come with any templating languages built in, but it plays nicely with almost every Node.js-based templating engine, as you’ll see. Some popular templating languages come with Express support, but others need a simple helper library.
In this book, we’ll look at two options: EJS (which looks a lot like HTML) and Pug (which tries to fix HTML with a radical new syntax).
Express doesn’t have any notion of a database. You can persist your application’s data however you choose: in files, in a relational SQL database, or in another kind of storage mechanism. In this book, we’ll cover the popular MongoDB database for data storage. As we talked about earlier, you should never feel boxed in with Express. If you want to use another data store, Express will let you.
Users often want their applications to be secure. There are a number of helpful libraries and modules (some for raw Node.js and some for Express) that can tighten the belt of your Express applications. We’ll explore all of this in chapter 10 (which is one of my favorite chapters, personally). We’ll also talk about testing your Express code to make sure that the code powering your apps is robust.
An important thing to note: there’s no such thing as an Express module-only a Node.js module. A Node.js module can be compatible with Express and work well with its API, but they’re all just JavaScript served from the npm registry, and you install them the same way. Just like in other environments, some modules integrate with other modules, where others can sit alongside. At the end of the day, Express is just a Node.js module like any other.
Raw read (FASTQ)
Reads acquisition
Quality Check QC fail
Reads trimming
Reference genome (FASTA) Annotation (GTF/GFF)
QC pass Reference-based alignment
SAM/BAM
Gene Quantification Count
Publicly available
Normalisation interaction data
Differential Gene Co-expression network
Expression Analysis Upregulated/
Down regulated genes reconstruction
Module Identification/ Topological Inference
Gene Ontology and
Pathway analysis
Fig. 1 Flowchart showing steps involved in the construction of a growth regulatory network using high-throughput data
The transcriptomic data can be downloaded primarily from three databases: Short Read Archive-National Centre for Biotechnology Information (https://www.ncbi.nlm.nih.gov/sra), DNA Data Bank of Japan (https://www.ebi.ac.uk/ena/browser/home), and European Nucleotide Archive-European Molecular Biology Laboratory (https://www.ebi.ac.uk/ena/browser/home).
However, few other repositories are also available from where only cancer- specific high-throughput data can be downloaded. The GDC data portal of the Cancer Genome Atlas (TCGA) (https://portal.gdc. cancer.gov/) but all the data here is not publicly available, cBio-Portal of the Memorial Sloan Kettering Cancer Center (MSK) (https://www.cbioportal.org/) that also contains cell-line specific datasets. The transcriptomic data is available as short RNA-Seq reads or raw reads in compressed FASTQ format. All the databases have a very easy-to-go method for downloading datasets, but NCBI-SRA provides a tool that makes the dataset downloads rela- tively easier using SRAExplorer (https://sra-explorer.info).
For demonstration purpose, we have considered a dataset with 12 tumor (6 adenocarcinoma in-situ (AIS) and 6 invasive adeno- carcinoma (INV)) and 6 control samples for lung adenocarcinoma (Sample accession: PRJNA227275). The dataset was identified by
great at serving HTML and other files, and it’s great at building APIs. Because the learning curve is relatively low for front-end developers, they can whip up a simple SPA server with little new learning required.
When you write applications with Express, you can’t get away from using Node.js, so you’re going to have the E and the N parts of the MEAN stack, but the other two parts (M and A) are up to you because Express is unopinionated. Want to replace Angular with Backbone.js on the front end? Now it’s the MEBN stack.
Want to use SQL instead of MongoDB? Now it’s the SEAN stack. Although MEAN is a common bit of lingo thrown around and a popular configuration, you can choose whichever you want. In this book, we’ll cover the MongoDB database, so we’ll use the MEN stack: MongoDB, Express, and Node.js.
Express also fits in side by side with a lot of real-time features. Although other programming environments can support real-time features like WebSocket and WebRTC, Node.js seems to get more of that than other languages and frameworks.
That means that you can use these features in Express apps; because Node.js gets it, Express gets it too.
1.5.3. Third-party modules for Node.js and Express
The first few chapters of this book talk about core Express-things that are baked into the framework.
In very broad strokes, these are routes and middleware.
But more than half of the book covers how to integrate Express with third-party modules.
There are loads of third-party modules for Express. Some are made specifically for Express and are compatible with its routing and middleware features. Others aren’t made for Express specifically and work well in Node.js, so they also work well with Express.
application. It has real strengths that other frameworks don’t have, like Node.js’s performance and the ubiquitous JavaScript, but it does less for you than a larger framework might do, and some people don’t think JavaScript is the finest language out there. We could argue forever about which is best and never come to an answer, but it’s important to see where Express fits into the picture.
1.5.2. What Express is used for
In theory, Express could be used to build any web application. It can process incoming requests and respond to them, so it can do things that you can do in most of the other frameworks mentioned earlier. Why would you choose Express over something else?
One of the benefits of writing code in Node.js is the ability to share JavaScript code between the browser and the server. This is helpful from a code perspective because you can literally run the same code on client and server. It’s also very helpful from a mental perspective; you don’t have to get your mind in server mode and then switch into client mode-it’s all the same thing at some level. That means that a front-end developer can write back-end code without having to learn a whole new language and its paradigms, and vice-versa. There is some learning to do this book wouldn’t exist otherwise-but a lot of it is familiar to front-end web developers.
Express helps you do this, and people have come up with a fancy name for one arrangement of an all-JavaScript stack: the MEAN stack. Like the LAMP stack stands for Linux, Apache, MySQL, and PHP, MEAN, as I mentioned earlier, stands for MongoDB (a JavaScript-friendly database), Express, Angular (a front-end JavaScript framework), and Node.js. People like the MEAN stack because it’s full-stack JavaScript and you get all of the aforementioned benefits.
Express is often used to power single-page applications (SPAs). SPAs are very JavaScript-heavy on the front end, and they usually require a server component. The server is usually required to simply serve the HTML, CSS, and JavaScript, but there’s often a REST API, too. Express can do both of these things quite well; it’s
In this book, we’ll pick a number of third-party integrations and show examples. But because Express is unopinionated, none of the contents of this book are the only options. If I cover Third-Party Tool X in this book, but you prefer Third-Party Tool Y, you can swap them out.
Express has some small features for rendering HTML. If you’ve ever used vanilla PHP or a templating language like ERB, Jinja2, HAML, or Razor, you’ve dealt with rendering HTML on the server. Express doesn’t come with any templating languages built in, but it plays nicely with almost every Node.js-based templating engine, as you’ll see. Some popular templating languages come with Express support, but others need a simple helper library. In this book, we’ll look at two options: EJS (which looks a lot like HTML) and Pug (which tries to fix HTML with a radical new syntax).
Express doesn’t have any notion of a database. You can persist your application’s data however you choose: in files, in a relational SQL database, or in another kind of storage mechanism. In this book, we’ll cover the popular MongoDB database for data storage. As we talked about earlier, you should never feel boxed in with Express. If you want to use another data store, Express will let you.
Users often want their applications to be secure. There are a number of helpful libraries and modules (some for raw Node.js and some for Express) that can tighten the belt of your Express applications. We’ll explore all of this in chapter 10 (which is one of my favorite chapters, personally). We’ll also talk about testing your Express code to make sure that the code powering your apps is robust.
An important thing to note: there’s no such thing as an Express module-only a Node.js module. A Node.js module can be compatible with Express and work well with its API, but they’re all just JavaScript served from the npm registry, and you install them the same way. Just like in other environments, some modules integrate with other modules, where others can sit alongside. At the end of the day, Express is just a Node.js module like any other.
based on a changing stock price or a new time of day-but GETS shouldn’t cause that change. That’s idempotent.
POST-Generally used to request a change to the state of the server. You POST a blog entry; you POST a photo to your favorite social network; you POST when you sign up for a new account on a website. POST is used to create records on servers, not modify existing records. POST is also used for actions, like buy this item. Unlike GET, POST is non-idempotent. That means that the state will change the first time you POST, and the second time, and the third time, and
so on.
PUT -A better name might be update or change. If I’ve published (POSTed) a job profile online and later want to update it, I would PUT those changes. I could PUT changes to a document, or to a blog entry, or to something else. (You don’t use PUT to delete entries, though; that’s what DELETE is for, as you’ll see.) PUT has another interesting part; if you try to PUT changes to a record that doesn’t exist, the server can (but doesn’t have to) create that record. You probably wouldn’t want to update a profile that doesn’t exist, but you might want to update a page on a personal website whether or not it exists. PUT is idempotent. Let’s say I’m “Evan Hahn” on a website but I want to change it to Max Fightmaster. I don’t PUT “change name from Evan Hahn to Max Fightmaster”; I PUT “change my name to Max Fightmaster”; I don’t care what it was before. This allows it to be idempotent. I could do this once or 500 times, and my name would still be Max Fightmaster. It is idempotent in this way.
DELETE -Probably the easiest to describe because its name is obvious. Like PUT, you basically specify DELETE record 123. You could DELETE a blog entry, or DELETE a photo, or DELETE a comment.
DELETE is idempotent in the same way that PUT is. Let’s say I’ve accidentally published (POSTed) an embarrassing photo of me wearing a lampshade over my head. If I don’t want it on there, I can DELETE it. Now it’s gone! It doesn’t matter whether I ask for it to be deleted once or 500 times; it’s going to be gone. (Phew!)
There’s nothing that strictly enforces these constraints-you could theoretically use GET requests to do what POST requests should do-but it’s bad practice and against
In this book, we’ll pick a number of third-party integrations and show examples. But because Express is unopinionated, none of the contents of this book are the only options. If I cover Third-Party Tool X in this book, but you prefer Third-Party Tool Y, you can swap them out.
Express has some small features for rendering HTML. If you’ve ever used vanilla PHP or a templating language like ERB, Jinja2, HAML, or Razor, you’ve dealt with rendering HTML on the server. Express doesn’t come with any templating languages built in, but it plays nicely with almost every Node.js-based templating engine, as you’ll see. Some popular templating languages come with Express support, but others need a simple helper library. In this book, we’ll look at two options: EJS (which looks a lot like HTML) and Pug (which tries to fix HTML with a radical new syntax).
Express doesn’t have any notion of a database. You can persist your application’s data however you choose: in files, in a relational SQL database, or in another kind of storage mechanism. In this book, we’ll cover the popular MongoDB database for data storage. As we talked about earlier, you should never feel boxed in with Express. If you want to use another data store, Express will let you.
Users often want their applications to be secure. There are a number of helpful libraries and modules (some for raw Node.js and some for Express) that can tighten the belt of your Express applications. We’ll explore all of this in chapter 10 (which is one of my favorite chapters, personally). We’ll also talk about testing your Express code to make sure that the code powering your apps is robust.
An important thing to note: there’s no such thing as an Express module-only a Node.js module. A Node.js module can be compatible with Express and work well with its API, but they’re all just JavaScript served from the npm registry, and you install them the same way. Just like in other environments, some modules integrate with other modules, where others can sit alongside. At the end of the day, Express is just a Node.js module like any other.
based on a changing stock price or a new time of day-but GETS shouldn’t cause that change. That’s idempotent.
POST-Generally used to request a change to the state of the server. You POST a blog entry; you POST a photo to your favorite social network; you POST when you sign up for a new account on a website. POST is used to create records on servers, not modify existing records. POST is also used for actions, like buy this item. Unlike GET, POST is non-idempotent. That means that the state will change the first time you POST, and the second time, and the third time, and
so on.
PUT -A better name might be update or change. If I’ve published (POSTed) a job profile online and later want to update it, I would PUT those changes.
I could PUT changes to a document, or to a blog entry, or to something else. (You don’t use PUT to delete entries, though; that’s what DELETE is for, as you’ll see.)
PUT has another interesting part; if you try to PUT changes to a record that doesn’t exist, the server can (but doesn’t have to) create that record.
You probably wouldn’t want to update a profile that doesn’t exist, but you might want to update a page on a personal website whether or not it exists.
PUT is idempotent. Let’s say I’m “Evan Hahn” on a website but I want to change it to Max Fightmaster. I don’t PUT “change name from Evan Hahn to Max Fightmaster”; I PUT “change my name to Max Fightmaster”; I don’t care what it was before.
This allows it to be idempotent. I could do this once or 500 times, and my name would still be Max Fightmaster. It is idempotent in this way.
DELETE -Probably the easiest to describe because its name is obvious. Like PUT, you basically specify DELETE record 123.
You could DELETE a blog entry, or DELETE a photo, or DELETE a comment. DELETE is idempotent in the same way that PUT is. Let’s say I’ve accidentally published (POSTed) an embarrassing photo of me wearing a lampshade over my head.
If I don’t want it on there, I can DELETE it. Now it’s gone! It doesn’t matter whether I ask for it to be deleted once or 500 times; it’s going to be gone. (Phew!)
There’s nothing that strictly enforces these constraints-you could theoretically use GET requests to do what POST requests should do-but it’s bad practice and against
In this book, we’ll pick a number of third-party integrations and show examples. But because Express is unopinionated, none of the contents of this book are the only options. If I cover Third-Party Tool X in this book, but you prefer Third-Party Tool Y, you can swap them out.
Express has some small features for rendering HTML. If you’ve ever used vanilla PHP or a templating language like ERB, Jinja2, HAML, or Razor, you’ve dealt with rendering HTML on the server. Express doesn’t come with any templating languages built in, but it plays nicely with almost every Node.js-based templating engine, as you’ll see. Some popular templating languages come with Express support, but others need a simple helper library. In this book, we’ll look at two options: EJS (which looks a lot like HTML) and Pug (which tries to fix HTML with a radical new syntax).
Express doesn’t have any notion of a database. You can persist your application’s data however you choose: in files, in a relational SQL database, or in another kind of storage mechanism. In this book, we’ll cover the popular MongoDB database for data storage. As we talked about earlier, you should never feel boxed in with Express. If you want to use another data store, Express will let you.
Users often want their applications to be secure. There are a number of helpful libraries and modules (some for raw Node.js and some for Express) that can tighten the belt of your Express applications. We’ll explore all of this in chapter 10 (which is one of my favorite chapters, personally). We’ll also talk about testing your Express code to make sure that the code powering your apps is robust.
An important thing to note: there’s no such thing as an Express module-only a Node.js module. A Node.js module can be compatible with Express and work well with its API, but they’re all just JavaScript served from the npm registry, and you install them the same way. Just like in other environments, some modules integrate with other modules, where others can sit alongside. At the end of the day, Express is just a Node.js module like any other.
