Scaling a Reusable, Full-Stack Learning Management System

16 min readApr 1, 2021

Reach LMS — Background

Reach LMS is a general-purpose, open-source learning management system designed for the developing world. Reach lets organizations offer education and training to anyone — whether they’re working from a laptop in a city center or a solar-charged flip-phone in a remote village.

Last month, I worked with a team to build a first iteration of this project from scratch. We laid out a solid foundation upon which future teams could build. Users had one of three roles: Student, Teacher, or Administrator. The focus of the app revolved around creating Programs, Courses, and Modules. In this hierarchy, Programs had Courses. Courses had Modules. Admins made Programs. Admins & Teachers made Courses and Modules. Students simply had read-access to content they were attached to. Feel free to read my previous article for more details about the first month’s project: Building an Open-Source LMS From Scratch.

THIS month, I came back for round 2. But things were different this time around. Not only were there new features to implement and a new team; there were two entire codebases from which to choose. The month prior, my team was not the only team building a first iteration of Reach LMS — we were one of two!! Both teams received the same Product Vision Document (or Roadmap), attended the same Product Review and Stakeholder meetings; however, the teams worked independently of one another and each created a functioning frontend and backend for this product.

Our beloved Stakeholder & Labs Manager, Frank Fusco, truly loved both products from the month previous — I can hardly blame him, both teams did an extraordinary job. Frank left the decision for which codebase to use entirely in our team’s hands. This freedom ultimately led to the best learning experience I could have ever wished for this month.

So. We had four repositories — a frontend and a backend from each team. Both teams had some huge differences in UX, user-flow, and design; varying techniques & patterns used in the respective codebases; and some huge differences in the conception and design of the database schema and how information was being relayed between the backend, frontend, and the end-user. We had a whole list of new features to implement. And, best yet, we had team consisting of 2 members from last month’s Team A, 2 members from last month’s Team B, and 4 members who were completely new to the codebase(s). Quite the starting point.

Ramping Up — Where do we start??

Starting with TWO completely functional full-stack applications is far from a typical starting point. Most teams are lucky to walk into one semi-functional application. Initially, we thought that having the two codebases would be a pretty easy task to deal with — “just pick one!”, you might think. As it turns out, though, there were aspects of each worth preserving. Our entire first week mainly consisted of Zoom calls and Discord sessions where we talked through the pros and cons of each codebase. Specifically the frontend, strangely enough. We meticulously looked through the flow of each app and discussed the good, the bad, the ugly. Then we turned to the code and looked at everything — project file-structure, Redux style, state management in general, coding style, all the way down to the differences of .js vs .jsx for file suffixes.

Our team generally preferred the project organization, design, and user-flow of Team A’s frontend; but then we favored the Redux patterns, Route handling, and component tree from Team B’s frontend. Essentially, we were wishing that we could inject Team B’s patterns into Team A’s project to sew the two together.

So that’s what we did.

It took one hell of a refactor, but we went through the Team A frontend and did the following:

Transformed the old-style Redux files (splitting actions and reducers) into Redux Ducks
Write async thunks so that Redux was driving all interaction with the backend and handling any business logic necessary (this allowed components to focus on displaying that data to the user rather than fetching, organizing, and dealing with it)
Utilize some Redux patterns inspired by Redux Toolkit (which is also where the ducks pattern came from)
Transformed the Routing in the app to utilize React Router and various team-defined utilities to make routing more consistent.
Reorganize components, stripping out any repetitive logic into reusable hooks, and stripping out any repetitive layout components (such as the Header & NavBar) into separate components. The goal here was to isolate some task for each component; we don't want any single component doing 5 jobs at the same time — we wanted 5 components doing ONE job PERFECTLY. That way, we can pull them into other components and combine them so they can do their jobs in tandem.

It was a chaotic week with a LOT of meetings to discuss one thing vs another. Handling the transition from two codebases to one not only took some time and patience, but it also made planning for the rest of the month ridiculously hard. It felt as if our app was in limbo. The team was waiting on me for the refactor; waiting on our design manager for input on how to structure our user-flow; waiting on this, that, the other thing.

So the whole first week was a lot of chatting about how to GET to the starting point. By late that Thursday, most of the changes listed above were effectively merged in. But the app LOOKED exactly the same, just functioned very differently under the hood.

At the end of that week, it felt like we had discussed how to get to the starting point all week. We made some big changes to the frontend so that our two frontend codebases were one. But we still had two backends to talk about and an entire month’s worth of new features to that we had barely started to PLAN yet.

BE Database Design & Schema

That first Friday, the four of us who had worked on this project the month previously met up and talked about some backend goodness. Our backend application was using the following tech-stack: Java, Spring Boot, PostgreSQL, Okta Security, Spring Security, and Swagger-UI (for documentation).

Once again, we had two backends. They had different schemas and a handful of differences in how data was shaped and serialized.

Team A Database Schema:

Team B Database Schema:

The most notable differences are related to (1) how Students and Teachers are attached to their content, and (2) how Students and Teachers are formed.

The largest difference is the fact that Team A attached Students and Teachers at the Course level. Team B attached Students and Teachers at the Program level. The thinking for Team B was that anytime a Student or a Teacher was enrolled in a Program, they'd be attached to every Course within that Program's courses. We quickly opted to take Team A's approach on this front, attaching Student and Teacher entities at the Course level. That way, there would be a lot more flexibility for users to create the Program-Course-Module relationship however they see fit.

The next largest difference is that Team A made a join table between Students and Courses called StudentCourses, and a similar join table between Teachers and Courses called TeacherCourses. In their backend, Teachers and Students were NOT Users. They were entirely separate entities.

In Team B, Students and Teachers WERE actually just general Users, and the distinction happened in a similar join table... but this join table joined Users to Programs on either the Student or the Teacher side.

Though each team had a functional solution to the whole role-based user situation, neither solution used the Roles table (and UserRoles join table) as the driving factor for how the Student or Teacher existed in the system. We decided to refactor this so that, essentially, the two approaches were combined with an additional level of distinction: Roles should be the determining factor for whether a User could be attached to a Course as a Student or a Teacher.

Refactored Schema

This was essentially our initial refactored DB Schema:

For better or for worse, Students and Teachers are no longer individual tables nor are they individual entities. Rather, Users have Roles, which can be of type "ADMIN", "TEACHER", and "STUDENT". Then, Users are joined to Courses with a <Many-to-One-to-Many> relationship between <Users-to-UserCourses-to-Courses>. Any user in the UserCourses table is GUARANTEED to have a STUDENT or TEACHER role, because of how we implemented the insertion and management of users to courses. Then, we can pull Teachers or Students out of that UserCourses table based on ROLE!!! Any user enrolled in a course is going to be in that UserCourses table; they are treated as a Teacher if their role is of RoleType TEACHER and treated as a Student if their role is of RoleType STUDENT.

Shape of Data

One of the pieces of the puzzle that we knew for sure we wanted to change: the shape of the data coming from the backend.

Shape of Data: TEAM A’s BACKEND

Team A’s backend sent data in such a way all associations from one entity to another are represented when looking at that endpoint… by stuffing all the related entities in via nested data. The association part of that pattern is GREAT! It means I can look at all the Courses affiliated with any the specific Program I'm looking at. But what happens when I hit an endpoint that gives me ALL of the programs... and Programs can have arbitrarily many Courses... and Courses can have arbitrarily many Modules...

When I hit GET all programs, I expect to see all of the programs. But if each program has a list of potentially many courses, do I really want to see all of the courses associated with each program? Maybe! But certainly not always.

For instance, what if I were hitting the GET all programs endpoint so that I could render the following screen:

In this hypothetical situation, I hit GET all programs and received THREE programs. Only three! Then I displayed those three programs as Cards, where the Program's title, type, and description are showing.

Absolutely NOTHING about that screen above requires any information about the Courses associated with each Program. As the client, should I have to wait until the backend finds every single Course inside of every single Program? And since this issue trickles down the hierarchy, finding every Course inside each Program would require finding every single Module inside of every Course inside of every Program.

If you got lost in the maze of nested data described above, imagine how a computer feels — they think in 1s and 0s, in true and false.

NOTE — — the following is an example of what the data might look like. Feel free to scroll on past

Here’s what the JSON might’ve looked like for ONE program in this example :

// 1st (and only) Program
{
        "programid": 8,
        "programname": "Program1",
        "programtype": "12th grade",
        "programdescription": "This is program 1",
        "courses": [
// 1st Course in ONE program
      {
        "courseid": 15,
        "coursename": "Course1",
        "coursecode": "COURSE_1",
        "coursedescription": "This is course #1",
        "users": [
// 1st User in 1st Course in ONE program
          {
            "user": {
              "userid": 5,
              "username": "user_teacher_01@mail.com",
              "email": "user_teacher_01@mail.com",
              "firstname": "Teacher001",
              "lastname": "TEACHER_001",
              "phonenumber": "0123456789",
              "role": "TEACHER"
            }
          },
// 2nd User in 1st Course in ONE program
          {
            "user": {
              "userid": 6,
              "username": "user_student_01@mail.com",
              "email": "user_student_01@mail.com",
              "firstname": "Student001",
              "lastname": "STUDENT_001",
              "phonenumber": "987654321",
              "role": "STUDENT"
            }
          }
        ]
      },
// 2nd Course in ONE program
      {
        "courseid": 16,
        "coursename": "Course2",
        "coursecode": "COURSE_2",
        "coursedescription": "This is course #2",        "users": [
// 1st User in 2nd Course in ONE Program
          {
            "user": {
               "userid": 5,
               "username": "user_teacher_01@mail.com",
               "email": "user_teacher_01@mail.com",
               "firstname": "Teacher001",
               "lastname": "TEACHER_001",
               "phonenumber": "0123456789",
               "role": "TEACHER"
            },
          }
// 2nd User in 2nd Course in ONE Program 
          {
            "user": {
               "userid": 6,
               "username": "user_student_01@mail.com",
               "email": "user_student_01@mail.com",
               "firstname": "Student001",
               "lastname": "STUDENT_001",
               "phonenumber": "987654321",
               "role": "STUDENT"
             }
           }
        ] // end List<User> in List<Course> in Program
    } // end Course_2 in List<Course> in Program
  ] // end List<Course> in Program
} // end Program

If that felt like a pain in the ass to scroll through, think about this:

THAT was ONE SINGLE PROGRAM!
That program had TWO courses
Those COURSES had TWO users
Aaaaand I didn’t even DISPLAY modules.

Imagine if that had been a LIST of Programs. What if there were 50 programs in the system? And each program had 100 courses? And each course had 50 users and 500 modules?

Nested data like this is dangerous in a context such as our product. It’s far from scalable. It’s far from INCLUSIVE — — even the best internet connections would fail in the worst-case.

(Side note — if you ever wondered why the hell you were learning about Big-O notation or Computer Science, here it is people. In the most straightforward example)

SO. Team A’s backend had a problem with the fashion in which the data was displayed. BUT they did convey all of the relationships between each entity.

Shape of Data — TEAM B’s BACKEND

Team B, on the other hand, thought about the problem discussed above. They opted to completely exclude the nested data from the response. So when the client hits GET all programs, they receive a list of Program objects containing ONLY top-level information.

So the example above would turn into this:

// 1st (and only) Program
{
        "programid": 8,
        "programname": "Program1",
        "programtype": "12th grade",
        "programdescription": "This is program 1",
}

A list of objects that look like THAT would be GREAT! I could have 50,000 programs and my laptop wouldn’t even think about sweating. Most computers could tank 5 million programs that looked like the above without a problem.

Wow! That sounds fantastic! But what’s missing?

As the client, if I wanted to see the courses for this program, I would have to

Know the endpoint to get courses by programid
Keep my code updated to make sure the developer of my backend hadn’t changed the aforementioned endpoint
I’d have to store that programId somewhere... which means State-Management SOMEWHERE (be that in React.useState or react-router-dom or Redux or a fucking Sticky Note if my app was truly primitive.)

That’s three bullet-points for one task. Yikes. Even though most web developers are USED TO such a situation — this is not the best-case scenario.

Shape of Data — The New & Improved Approach

So… Team A’s BE provided associations between Programs and Courses or any other related data.

But this risked an INSANELY EXPENSIVE time & space complexity. This would affect the frontend, backend, end-user, and every layer in-between.

Team B’s BE kept their response-data short, sweet, and to the point. This allowed clients to get A LOT of one specific entity at any point in time in a VERY FAST & INEXPENSIVE fashion.

But this approach left it up to the client to know all the endpoints, keep their code updated with those endpoints, and manage associations themselves in application state. This, too, would affect the frontend, backend, end-user, and every layer in between.

So what’s the BEST case-scenario? We care mainly about two things:

Brief, top-level, non-nested data…
Relevant related data — relationships between THIS data and WHAT IS RELATED TO IT

If you’ve ever used the GitHub API, you’ve probably seen the solution since the moment you started reading:

LINKS!! HYPERMEDIA!!!

What if our program data looked like this:

{
        "programid": 8,
        "programname": "Program1",
        "programtype": "12th grade",
        "programdescription": "This is program 1",
        "_links": {
          "self": {
            "href": "<http://localhost:2019/programs/program/8>"
          },
          "courses": {
            "href": "<http://localhost:2019/courses/8>"
          },
        }
}

Now we have top-level program information when looking at this program. And we have an association between this program and the courses that belongs to this program. But that association doesn’t come at a cost of O(c^(n^x)) where c is the number of courses inside of this program, n is the variety of nested entities (modules/users/assignments/tags) INSIDE of that course and x is the number of occurrences FOR that nested entity.

Now, GET all programs is O(p) where p is the NUMBER OF PROGRAMS that exist. There is no additional cost! Every program has courses (even if that program has zero courses)! The good news is this: with links, it don't even matta how many courses a program has at this scope. A program exists with or without its courses. If a client wants the courses, they can snag the program._links.courses.href and BOOM... now they have the link to go get those programs.

The implications of that are bigger than you might imagine at first glance.

RESTful Resources & HATEOAS

The “New and Improved Approach” described above plays into RESTful services and how we should serve our clients. The implications of the pattern above are manyfold. It may look new and scary at first glance, but it’s a fairly small mental shift that gives way to a ginormous level of scalability.

Scale — Time & Space Complexity

The shape of our new data is exponentially faster than Team A’s; yet it includes any relational data that Team B missed.

All top-level info is included for the relevant data is included. However, nested data is contained in one single O(1) addition to each entity.

The worst case of any given collection endpoint is O(n) where n i the number of requested entities. If a client hits GET all programs, that n is equal to the number_of_programs that exist in the system... or the number_of_rows that exist in our PostgreSQL Programs table. If a client hits GET programs by userId, that n is equal to the number_of_programs that the specified User has created.

This is quite literally the best Big-O you could hit for a collection endpoint.

The time & space complexity for any given SINGULAR ENTITY is O(1). If a client is getting one program—thus getting a single entity—let's treat it as such! They shouldn't have to worry about the number of courses in that program until they're REQUESTING those courses!

Evolve — Adding Properties or Changing Endpoint URIs

As the backend developer, I could change the actual endpoint URI to get all courses in a given program from GET "/courses/{programId}" to GET "/courses/at-program-id/totally-renamed/{programId}" and the client frontend wouldn't need to change a single line of code if they consumed my links how I intended.

Access — Conditional Links

Further, the backend could add as any various relationships to other entities as they want, or even dive as deep as using these links as the driver of ACTIONS.

Imagine the following Program:

{
        "programid": 8,
        "programname": "Program1",
        "programtype": "12th grade",
        "programdescription": "This is program 1",
        "_links": {
          "self": {
            "href": "<http://localhost:2019/programs/program/8>"
          },
          "courses": {
            "href": "<http://localhost:2019/courses/8>"
          },
        }
}

Current State of the Project

Our project is currently in a perfect place to scale up and build upon—both frontend and backend.

FEATURE LIST

Users of any role can log into our app with their Okta credentials
Users can have roles of an ADMIN, TEACHER, or STUDENT and interact with content based on the privileges associated with their role.
All users can view and update their profile information
ADMIN users will land on a page with all of the Programs they own with associated actions — they can create a new Program, edit existing Programs, delete existing Programs.
ADMINS have the ability to specify Tags within each Program with a title and a hex-code for the color to represent that Tag. ADMINs and TEACHERs can then specify the type of anyCourse within that Program by selecting one of the Tags available in that Program.
ADMINusers can view all users in their system.
TEACHER users can create STUDENT users.
TEACHER users can assign STUDENT users to and remove STUDENT users from courses that they are attached to
ADMIN users Create/Edit/Delete other users with roles of TEACHER or STUDENT. This includes the ability to change an existing user’s role from TEACHER or STUDENT to STUDENT, TEACHER, or ADMIN. Note: ADMIN users cannot edit or delete other ADMIN users, only create them or change TEACHER and STUDENT users TO an ADMIN
ADMIN users can assign TEACHER and STUDENT users to courses
ADMIN users can upload a CSV file to create users of any type to be created in their system—this will create Okta users for any users that don’t yet exist in the Okta application and email the newly created user an email verification. Additionally, each user will be created in our database if they do not yet exist in the DB.
ADMIN and TEACHER users are able to upload a CSV file to create users of the STUDENT type and attach them to a specified course
ADMIN users can View/Create/Edit/Delete programs over which they have ownership.
ADMINS can create, edit, delete, and view Courses within Programs that they have created.
ADMINS can create, edit, delete, and view Modules within Courses within Programs that they have created.
TEACHERS have the ability to create, edit, and view Courses and Modules that they are associated with.
ADMIN and TEACHER users have the ability to update Module content by writing with Markdown. The markdown editor has live feedback which renders the formatted preview as the user types.
STUDENTS can only view the Courses and Modules that they are associated with.
Users of any sort can search when looking at a list of their Courses and expect results to be filtered accordingly
ADMIN users can search for users when looking at a list of all the Users in the system and expect results to be filtered accordingly.

POTENTIAL FUTURE FEATURES

Further integration with our app and Okta. For instance, each user could explicitly inherit features from Okta’s User class from the Okta SDK. Additionally, roles could be handled ENTIRELY by Okta by utilizing Okta’s user Groups
Server-side rendering. Or potentially just shifting to a completely Spring-based full-stack application. Views could be specified with Thymeleaf and everything could be neatly handled and intertwined in Spring. I personally think this route would be the most powerful version of this app. Tiny bundle-size, mostly static views, and full control over each view from the Spring Services would make for a very powerful LMS.
Markdown file upload for Modules.
Multi-part content for Modules.
More Markdown layers throughout the Entity Hierarchy. (Why should Program Description be limited to plain text when that itself could be MD…)
User-defined templates for what MD content should/can look like.
More granular permissions. Break roles down even further into specific permissions. Then, let ADMIN users select which powers other users have.