Backend Fundamentals
Communication Protocols
TCP is a stream-based connection-oriented protocol. TCP provides reliable delivery at a cost of connection setup and retransmission.
UDP is a message based and connectionless, UDP starts faster but doesn’t have guaranteed delivery.
Databases and ACID properties
- Relational databases are fully ACID and require a schema while MongoDB was built as a document based database, with basic atomicity (document level) and no schema. Redis built a fanstatic high performance cache by sacrificing durability by default.
Proxies
- The main purpose of a proxy is it receives requests from a client and forward the requests to backend servers. The proxy hides the network layer identity of the original client from the destination server. There are two levels of proxying: layer 4 and layer 7 proxying. Layer 4 proxying works at the transport layer, while layer 7 proxying works at the application level. Each layer provides different capabilities and can be used for different purposes. Layer 7 proxying require the proxy to understand the application protocol while layer 4 proxying works with any application protocol because it works at the transport layer (TCP or UDP).
OSI model
The Open Systems Interconnection (OSI) model describes seven layers that computer systems use to communicate over a network. OSI is less reliable, has strict boundaries, follows a vertical approach, uses different session and presentation layers.
1. Physical Layer
The physical layer is responsible for the physical cable or wireless connection between network nodes. It defines the connector, the electrical cable or wireless technology connecting the devices, and is responsible for transmission of the raw data, which is simply a series of 0s and 1s, while taking care of bit rate control.
- 2. Data Link Layer
The data link layer establishes and terminates a connection between two physically-connected nodes on a network. It breaks up packets into frames and sends them from source to destination. This layer is composed of two parts—Logical Link Control (LLC), which identifies network protocols, performs error checking and synchronizes frames, and Media Access Control (MAC) which uses MAC addresses to connect devices and define permissions to transmit and receive data.
- 3. Network Layer
The network layer has two main functions. One is breaking up segments into network packets, and reassembling the packets on the receiving end. The other is routing packets by discovering the best path across a physical network. The network layer uses network addresses (typically Internet Protocol addresses) to route packets to a destination node.
- 4. Transport Layer
The transport layer takes data transferred in the session layer and breaks it into “segments” on the transmitting end. It is responsible for reassembling the segments on the receiving end, turning it back into data that can be used by the session layer. The transport layer carries out flow control, sending data at a rate that matches the connection speed of the receiving device, and error control, checking if data was received incorrectly and if not, requesting it again.
- 5. Session Layer
The session layer creates communication channels, called sessions, between devices. It is responsible for opening sessions, ensuring they remain open and functional while data is being transferred, and closing them when communication ends. The session layer can also set checkpoints during a data transfer—if the session is interrupted, devices can resume data transfer from the last checkpoint.
- Presentation Layer 6
The presentation layer prepares data for the application layer. It defines how two devices should encode, encrypt, and compress data so it is received correctly on the other end. The presentation layer takes any data transmitted by the application layer and prepares it for transmission over the session layer.
- Application Layer 7
The application layer is used by end-user software such as web browsers and email clients. It provides protocols that allow software to send and receive information and present meaningful data to users. A few examples of application layer protocols are the Hypertext Transfer Protocol (HTTP), File Transfer Protocol (FTP), Post Office Protocol (POP), Simple Mail Transfer Protocol (SMTP), and Domain Name System (DNS).
TCP/IP networking model
1. Network Access Layer This layer corresponds to the combination of Data Link Layer and Physical Layer of the OSI model. It looks out for hardware addressing and the protocols present in this layer allows for the physical transmission of data.
2. Internet Layer This layer parallels the functions of OSI’s Network layer. It defines the protocols which are responsible for logical transmission of data over the entire network. The main protocols residing at this layer are :
IP – stands for Internet Protocol and it is responsible for delivering packets from the source host to the destination host by looking at the IP addresses in the packet headers. IP has 2 versions: IPv4 and IPv6. IPv4 is the one that most of the websites are using currently. But IPv6 is growing as the number of IPv4 addresses are limited in number when compared to the number of users.
ICMP – stands for Internet Control Message Protocol. It is encapsulated within IP datagrams and is responsible for providing hosts with information about network problems.
ARP – stands for Address Resolution Protocol. Its job is to find the hardware address of a host from a known IP address. ARP has several types: Reverse ARP, Proxy ARP, Gratuitous ARP and Inverse ARP.
3. Host-to-Host Layer
This layer is analogous to the transport layer of the OSI model. It is responsible for end-to-end communication and error-free delivery of data. It shields the upper-layer applications from the complexities of data. The two main protocols present in this layer are : Transmission Control Protocol (TCP) – It is known to provide reliable and error-free communication between end systems. It performs sequencing and segmentation of data. It also has acknowledgment feature and controls the flow of the data through flow control mechanism. User Datagram Protocol (UDP) – On the other hand does not provide any such features. It is the go-to protocol if your application does not require reliable transport as it is very cost-effective. Unlike TCP, which is connection-oriented protocol, UDP is connectionless.
- 4. Application Layer
This layer performs the functions of top three layers of the OSI model: Application, Presentation and Session Layer. It is responsible for node-to-node communication and controls user-interface specifications. Some of the protocols present in this layer are: HTTP, HTTPS, FTP, TFTP, Telnet, SSH, SMTP, SNMP, NTP, DNS, DHCP, NFS, X Window, LPD.
HTTP and HTTPS – HTTP stands for Hypertext transfer protocol. It is used by the World Wide Web to manage communications between web browsers and servers. HTTPS stands for HTTP-Secure. It is a combination of HTTP with SSL(Secure Socket Layer). It is efficient in cases where the browser need to fill out forms, sign in, authenticate and carry out bank transactions. SSH – SSH stands for Secure Shell. It is a terminal emulations software similar to Telnet. The reason SSH is more preferred is because of its ability to maintain the encrypted connection. It sets up a secure session over a TCP/IP connection. NTP – NTP stands for Network Time Protocol. It is used to synchronize the clocks on our computer to one standard time source. It is very useful in situations like bank transactions. Assume the following situation without the presence of NTP. Suppose you carry out a transaction, where your computer reads the time at 2:30 PM while the server records it at 2:28 PM. The server can crash very badly if it’s out of sync.
Messaging systems
At their core support a feature called publish-subscribe where a client can publish a message and other clients can subscribe to consume this content. The choice of architecting how publishing and consumption is made is up to the messaging system. For example, Kafka use long-polling model while RabbitMQ
Message formats
Go hand in hand with communication protocols; They describe in-wire format of the message being sent. They usually broken down into two types human readable and non-human readable. Examples are XML, JSON. When a client sends a message to a backend, it needs to serialize the message from the language data structure to the on-wire message format. When the backend receives the message it needs to then deserialize the message from this format to the language data structure.
Security
You can secure communications with encryption or TLS. You can prevent network intrusion with firewall rules and proper network configuration.
Man-in-the-middle attack. This is where an attacker intercepts communication between two parties and tries to eavesdrop or modify the data being exchanged. To prevent this type of attack, it is important to encrypt and authenticate the communicated parties using Transport Layer Security (TLS).
Denial of service attack. This is where an attacker tries to prevent legitimate users from accessing a service by flooding it with requests or through finding a way to crash the backend by sending a special payload. To prevent this type of attack, it is important to have a firewall or a layer 7 DDOS protection layer in place to block illegitimate requests. Cloudflare has great services to detect DDOS traffic, made possible through layer 7 inspection.
There is a client side security attacks such as XSS (cross side scripting), there are server side security attacks such as SQL injection.
MultiThreaded model
Any Web Application developed without Node JS, typically follows “Multi-Threaded Request-Response” synchronous model which runs on multiple threads. So requests spawn on new threads and in the end much memory is taken, has to wait for one process to loadSimply we can call this model as Request/Response Model. Client sends request to the server, then server do some processing based on clients request, prepare response and send it back to the client. This model uses HTTP protocol. As HTTP is a Stateless Protocol, this Request/Response model is also Stateless Model. So we can call this as Request/Response Stateless Model. However, this model uses Multiple Threads to handle concurrent client requests.
Types of languages
Programming languages need a compiler to compile the code e.g. C, C++, Java, Go, Typescript whereas scripting languages without compiling execute the code in runtime and use interpreters e.g. Javascript, Python, PHP and Ruby.
- Compile time is the period when the programming code (such as C#, Java, C, Python) is converted to the machine code (i.e. binary code).
- Runtime is the period of time when a program is running and generally occurs after compile time.
- Interpreters usually take less amount of time to analyze the source code. However, the overall execution time is comparatively slower than compilers.
- Compilers usually take a large amount of time to analyze the source code. However, the overall execution time is comparatively faster than interpreters.
Fails of single-thread approach
A single-threaded app fails big if you need to do lots of CPU calculations before returning the data. Now, I don’t mean a for loop processing the database result. That’s still mostly O(n). What I mean is things like doing Fourier transform (mp3 encoding for example), ray tracing (3D rendering), etc.
Fails of Multi-thread approach
Another pitfall of single-threaded apps is that they will only utilize a single CPU core. So if you have a quad-core server (not uncommon nowadays) you’re not using the other 3 cores. A multithreaded app fails big time when handling multiple requests concurrently because it allocates lots of RAM per thread, and this huge amount of RAM stays idle while the job is being done like for example when you call a query to your database until the database returns the thread and the RAM are going nowhere thus, in essence, blocking out your RAM for no reason which in other words means we can potentially end up being slower than single-threaded apps. This is where node.js usually wins