Capybara DBMS: Query Processor Introduction

Capybara DBMS introduces a robust Query Processor as part of its architecture, designed to facilitate efficient interaction with databases. The Query Processor is the linchpin that allows clients to perform meaningful operations on databases by processing Data Definition Language (DDL) and Data Manipulation Language (DML) statements. This document provides an overview of the components and functionalities of the Query Processor within Capybara DBMS.

Query Processor Components

DDL Interpreter

Purpose: Interprets DDL statements, incorporating the definitions directly into the data dictionary. This process ensures that the database structure can evolve as needed, maintaining flexibility and adaptability.

DML Compiler

Functionality: Translates DML statements into an evaluation plan comprising low-level instructions that the query evaluation engine can execute.
Optimization: The compiler enhances query performance through optimization, crafting a plan that reduces execution time without compromising accuracy.

Query Evaluation Engine

Execution: Carries out the instructions generated by the DML compiler, interacting directly with the database data to fulfill query requests.

SQL Parser

An integral part of the Query Processor, the SQL Parser deconstructs SQL commands into their constituent elements, making it easier for applications to interpret and process SQL queries. The parser undertakes:

Lexical Analysis: Breaks down the SQL query into tokens, identifying the fundamental elements like keywords and operators.
Syntax Analysis: Ensures the query conforms to SQL’s grammatical rules, utilizing a parse tree or AST for structural representation.
Semantic Analysis (Optional): Verifies the query’s logic, such as the existence of referenced tables and the validity of operations based on column data types.

Understanding SQL Grammar

SQL Grammar encompasses the complete set of rules that define SQL statement structures, including keywords, expressions, and operators. Mastery of SQL grammar is crucial for writing efficient and accurate SQL queries and for developing effective parsers.

ANTLR: Automated Parser Generation

ANTLR simplifies parsing by auto-generating lexer and parser code from a defined grammar, enabling developers to create applications that can process specific formats of code, data, or text.

Abstract Syntax Tree (AST)

The AST represents the abstract syntax of source code, offering a structured format that is crucial for:

Semantic Analysis: Enables in-depth analysis of code semantics, including type compatibility and variable usage.
Optimization and Transformation: Facilitates code optimization and transformation, aiding in efficient target code generation.

Custom AST Generation

Capybara DBMS opts for generating its own AST due to the limitations of ANTLR-generated ASTs, such as inconvenient access methods and immutability. By creating a custom AST, Capybara DBMS achieves:

Ease of Use: Nodes are grouped by category, allowing for intuitive access.
Customization: Adapts the AST to meet project-specific requirements, enabling performance optimization of SQL statements.

In summary, the Query Processor of Capybara DBMS is a comprehensive suite designed to interpret, compile, and execute database queries efficiently. Through components like the DML Compiler, DDL Interpreter, and a sophisticated SQL Parser, it ensures that database interactions are both effective and performance-optimized.

Capybara DBMS File I/O Layer

The persistence layer of a database is crucial for managing the storage and retrieval of data, ensuring that data persists beyond the execution of a program. The Capybara DBMS’s file I/O layer is a foundational component of this persistence layer, handling the essential file operations and locking mechanisms to ensure data integrity and concurrency control. This document outlines the key functionalities and design principles of the Capybara DBMS file I/O layer.

Overview

The file I/O layer in Capybara DBMS, implemented in C++, offers a robust set of functionalities for file operations, including opening, reading, writing, and deleting files. It integrates file locking to ensure safe concurrent access, providing a solid foundation for building reliable database systems.

Key Components

OsFile Class: Manages file operations such as open, read, write, and close. It integrates file locking to ensure safe concurrent access.
Mutex (pthread_mutex_t mutex): Ensures that locking operations are atomic, preventing race conditions.
File Descriptor (fd_) and Locking Status (locked_): Tracks the file descriptor and its locking status, facilitating file operations and access control.

File I/O Operations

The Capybara DBMS file I/O layer provides a comprehensive set of file operations:

Open File: Supports opening files in different modes, including exclusive, read-only, and read-write, with appropriate error handling.
Read and Write File: Facilitates reading from and writing to files, offering functionalities for handling specific amounts of data.
File Information: Includes methods for checking file existence, retrieving file size, and displaying file content.
Seek and Sync: Enables seeking to specific positions in a file and synchronizing file changes with the storage device.

Locking Mechanism

A robust locking mechanism is crucial for managing concurrent access to files:

FindLockInfo and ReleaseLockInfo: Manage lock information for files, handling the creation and removal of lock entries based on the file’s inode.
Lock and Unlock Operations: Provide methods for acquiring and releasing read and write locks, ensuring proper synchronization and handling contention.

Implementation Details

The implementation leverages standard C++ libraries and system calls for file operations, with platform-specific considerations for Unix and Windows environments. The design prioritizes efficiency, safety, and ease of integration with the broader database management system.

File I/O Layer Operation Example

Step 1: Create and Open Database File

Action: Create a new database file named test_SuccessfulOp.db and open it for read-write operations. Ensure any existing file with this name is removed before opening to guarantee a fresh start.
Expected Result: The file opens successfully (ResultCode::kOk).

Step 2: Verify File Existence

Action: Check if the newly created file exists.
Expected Result: The file exists (ResultCode::kOk).

Step 3: Seek to File Beginning

Action: Seek to the beginning of the database file.
Expected Result: The seek operation is successful.

Step 4: Write Test String

Action: Write the string “hello this is vfs test” to the file. Convert the string to a vector of bytes for the write operation.
Expected Result: The write operation is successful.

Step 5: Read Specific Bytes

Action: Seek to offset 6 in the file and read the next 4 bytes.
Expected Result: The read operation is successful, and the data read matches the substring “this”.

Step 6: Overwrite Data

Action: Seek to offset 11 and overwrite part of the data with “is override”. Write this new string to the file.
Expected Result: The overwrite operation is successful.

Step 7: Close and Delete File

Action: Close the database file and then delete it.
Expected Result: Both operations (close and delete) are successful.

Step 8: Confirm File Deletion

Action: Check if the file still exists after deletion.
Expected Result: The file no longer exists (ResultCode::kError).

TEST(VFS, SuccessfulOp) {
  // Step 1 : Create a db file
  std::string filename = "test_SuccessfulOp.db";
  std::remove(filename.c_str());
  OsFile file;
  bool read_only = false;
  ResultCode rc = file.OsOpenReadWrite(filename, read_only);
  EXPECT_EQ(ResultCode::kOk, rc);

  rc = file.OsFileExists();
  EXPECT_EQ(ResultCode::kOk, rc);

  // Step 2 : Seek offset 0 on the database file
  rc = file.OsSeek(0);
  EXPECT_EQ(ResultCode::kOk, rc);

  std::string exam = "hello this is vfs test";
  std::vector<std::byte> exam_bytes;
  for (char c : exam) {
    exam_bytes.push_back(static_cast<std::byte>(c));
  }
  rc = file.OsWrite(exam_bytes);
  EXPECT_EQ(ResultCode::kOk, rc);

  // Step 3 : Seek offset 6 on the database file
  rc = file.OsSeek(6);

  std::vector<char> read_buf(4);
  std::vector<std::byte> byte_buf(reinterpret_cast<std::byte *>(read_buf.data()),
                                  reinterpret_cast<std::byte *>(read_buf.data() + read_buf.size()));
  rc = file.OsRead(byte_buf, 4);

  std::string word = "this";
  std::vector<std::byte> this_bytes;
  for (char c : word) {
    this_bytes.push_back(static_cast<std::byte>(c));
  }
  EXPECT_EQ(ResultCode::kOk, rc);
  EXPECT_EQ(byte_buf, this_bytes);
  file.OsDisplay();

  // Step 4 : Seek offset 11 on the database file and overwrite the data, get this is override
  rc = file.OsSeek(11);
  std::string overwrite = "is override";
  std::vector<std::byte> overwrite_bytes;
  for (char c : overwrite) {
    overwrite_bytes.push_back(static_cast<std::byte>(c));
  }
  rc = file.OsWrite(overwrite_bytes);
  EXPECT_EQ(ResultCode::kOk, rc);
  file.OsDisplay();

  // Step 5 : Close the database file
  rc = file.OsClose();
  EXPECT_EQ(ResultCode::kOk, rc);
  rc = file.OsDelete();
  EXPECT_EQ(ResultCode::kOk, rc);

  rc = file.OsFileExists();
  EXPECT_EQ(ResultCode::kError, rc);
}

Implementation

Below is a simplified excerpt of the test case implementation:

Conclusion

The Capybara DBMS file I/O layer represents a critical component of the database’s persistence layer, providing essential functionalities for file management and concurrent access control. Through its comprehensive set of operations and robust locking mechanism, it lays the groundwork for building reliable and efficient database systems.