Scaling a Reusable, Full-Stack Learning Management System

Reach LMS — Background

Reach LMS is a general-purpose, open-source learning management system designed for the developing world. Reach lets organizations offer education and training to anyone — whether they’re working from a laptop in a city center or a solar-charged flip-phone in a remote village.

Last month, I worked with a team to build a first iteration of this project from scratch. We laid out a solid foundation upon which future teams could build. Users had one of three roles: Student, Teacher, or Administrator. The focus of the app revolved around creating Programs, Courses, and Modules. In this hierarchy, Programs had Courses. Courses had Modules. Admins made Programs. Admins & Teachers made Courses and Modules. Students simply had read-access to content they were attached to. Feel free to read my previous article for more details about the first month’s project: Building an Open-Source LMS From Scratch.

THIS month, I came back for round 2. But things were different this time around. Not only were there new features to implement and a new team; there were two entire codebases from which to choose. The month prior, my team was not the only team building a first iteration of Reach LMS — we were one of two!! Both teams received the same Product Vision Document (or Roadmap), attended the same Product Review and Stakeholder meetings; however, the teams worked independently of one another and each created a functioning frontend and backend for this product.

Our beloved Stakeholder & Labs Manager, Frank Fusco, truly loved both products from the month previous — I can hardly blame him, both teams did an extraordinary job. Frank left the decision for which codebase to use entirely in our team’s hands. This freedom ultimately led to the best learning experience I could have ever wished for this month.

So. We had four repositories — a frontend and a backend from each team. Both teams had some huge differences in UX, user-flow, and design; varying techniques & patterns used in the respective codebases; and some huge differences in the conception and design of the database schema and how information was being relayed between the backend, frontend, and the end-user. We had a whole list of new features to implement. And, best yet, we had team consisting of 2 members from last month’s Team A, 2 members from last month’s Team B, and 4 members who were completely new to the codebase(s). Quite the starting point.

Ramping Up — Where do we start??

Starting with TWO completely functional full-stack applications is far from a typical starting point. Most teams are lucky to walk into one semi-functional application. Initially, we thought that having the two codebases would be a pretty easy task to deal with — “just pick one!”, you might think. As it turns out, though, there were aspects of each worth preserving. Our entire first week mainly consisted of Zoom calls and Discord sessions where we talked through the pros and cons of each codebase. Specifically the frontend, strangely enough. We meticulously looked through the flow of each app and discussed the good, the bad, the ugly. Then we turned to the code and looked at everything — project file-structure, Redux style, state management in general, coding style, all the way down to the differences of .js vs .jsx for file suffixes.

Our team generally preferred the project organization, design, and user-flow of Team A’s frontend; but then we favored the Redux patterns, Route handling, and component tree from Team B’s frontend. Essentially, we were wishing that we could inject Team B’s patterns into Team A’s project to sew the two together.

So that’s what we did.

It took one hell of a refactor, but we went through the Team A frontend and did the following:

It was a chaotic week with a LOT of meetings to discuss one thing vs another. Handling the transition from two codebases to one not only took some time and patience, but it also made planning for the rest of the month ridiculously hard. It felt as if our app was in limbo. The team was waiting on me for the refactor; waiting on our design manager for input on how to structure our user-flow; waiting on this, that, the other thing.

So the whole first week was a lot of chatting about how to GET to the starting point. By late that Thursday, most of the changes listed above were effectively merged in. But the app LOOKED exactly the same, just functioned very differently under the hood.

At the end of that week, it felt like we had discussed how to get to the starting point all week. We made some big changes to the frontend so that our two frontend codebases were one. But we still had two backends to talk about and an entire month’s worth of new features to that we had barely started to PLAN yet.

BE Database Design & Schema

That first Friday, the four of us who had worked on this project the month previously met up and talked about some backend goodness. Our backend application was using the following tech-stack: Java, Spring Boot, PostgreSQL, Okta Security, Spring Security, and Swagger-UI (for documentation).

Once again, we had two backends. They had different schemas and a handful of differences in how data was shaped and serialized.

Team A Database Schema:

Team B Database Schema:

The most notable differences are related to (1) how Students and Teachers are attached to their content, and (2) how Students and Teachers are formed.

The largest difference is the fact that Team A attached Students and Teachers at the Course level. Team B attached Students and Teachers at the Program level. The thinking for Team B was that anytime a Student or a Teacher was enrolled in a Program, they'd be attached to every Course within that Program's courses. We quickly opted to take Team A's approach on this front, attaching Student and Teacher entities at the Course level. That way, there would be a lot more flexibility for users to create the Program-Course-Module relationship however they see fit.

The next largest difference is that Team A made a join table between Students and Courses called StudentCourses, and a similar join table between Teachers and Courses called TeacherCourses. In their backend, Teachers and Students were NOT Users. They were entirely separate entities.

In Team B, Students and Teachers WERE actually just general Users, and the distinction happened in a similar join table... but this join table joined Users to Programs on either the Student or the Teacher side.

Though each team had a functional solution to the whole role-based user situation, neither solution used the Roles table (and UserRoles join table) as the driving factor for how the Student or Teacher existed in the system. We decided to refactor this so that, essentially, the two approaches were combined with an additional level of distinction: Roles should be the determining factor for whether a User could be attached to a Course as a Student or a Teacher.

Refactored Schema

This was essentially our initial refactored DB Schema:

For better or for worse, Students and Teachers are no longer individual tables nor are they individual entities. Rather, Users have Roles, which can be of type "ADMIN", "TEACHER", and "STUDENT". Then, Users are joined to Courses with a <Many-to-One-to-Many> relationship between <Users-to-UserCourses-to-Courses>. Any user in the UserCourses table is GUARANTEED to have a STUDENT or TEACHER role, because of how we implemented the insertion and management of users to courses. Then, we can pull Teachers or Students out of that UserCourses table based on ROLE!!! Any user enrolled in a course is going to be in that UserCourses table; they are treated as a Teacher if their role is of RoleType TEACHER and treated as a Student if their role is of RoleType STUDENT.

Shape of Data

One of the pieces of the puzzle that we knew for sure we wanted to change: the shape of the data coming from the backend.

Shape of Data: TEAM A’s BACKEND

Team A’s backend sent data in such a way all associations from one entity to another are represented when looking at that endpoint… by stuffing all the related entities in via nested data. The association part of that pattern is GREAT! It means I can look at all the Courses affiliated with any the specific Program I'm looking at. But what happens when I hit an endpoint that gives me ALL of the programs... and Programs can have arbitrarily many Courses... and Courses can have arbitrarily many Modules...

When I hit GET all programs, I expect to see all of the programs. But if each program has a list of potentially many courses, do I really want to see all of the courses associated with each program? Maybe! But certainly not always.

For instance, what if I were hitting the GET all programs endpoint so that I could render the following screen:

In this hypothetical situation, I hit GET all programs and received THREE programs. Only three! Then I displayed those three programs as Cards, where the Program's title, type, and description are showing.

Absolutely NOTHING about that screen above requires any information about the Courses associated with each Program. As the client, should I have to wait until the backend finds every single Course inside of every single Program? And since this issue trickles down the hierarchy, finding every Course inside each Program would require finding every single Module inside of every Course inside of every Program.

If you got lost in the maze of nested data described above, imagine how a computer feels — they think in 1s and 0s, in true and false.

NOTE — — the following is an example of what the data might look like. Feel free to scroll on past

Here’s what the JSON might’ve looked like for ONE program in this example :

// 1st (and only) Program
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"courses": [
// 1st Course in ONE program
{
"courseid": 15,
"coursename": "Course1",
"coursecode": "COURSE_1",
"coursedescription": "This is course #1",
"users": [
// 1st User in 1st Course in ONE program
{
"user": {
"userid": 5,
"username": "user_teacher_01@mail.com",
"email": "user_teacher_01@mail.com",
"firstname": "Teacher001",
"lastname": "TEACHER_001",
"phonenumber": "0123456789",
"role": "TEACHER"
}
},
// 2nd User in 1st Course in ONE program
{
"user": {
"userid": 6,
"username": "user_student_01@mail.com",
"email": "user_student_01@mail.com",
"firstname": "Student001",
"lastname": "STUDENT_001",
"phonenumber": "987654321",
"role": "STUDENT"
}
}
]
},
// 2nd Course in ONE program
{
"courseid": 16,
"coursename": "Course2",
"coursecode": "COURSE_2",
"coursedescription": "This is course #2",
"users": [
// 1st User in 2nd Course in ONE Program
{
"user": {
"userid": 5,
"username": "user_teacher_01@mail.com",
"email": "user_teacher_01@mail.com",
"firstname": "Teacher001",
"lastname": "TEACHER_001",
"phonenumber": "0123456789",
"role": "TEACHER"
},
}
// 2nd User in 2nd Course in ONE Program
{
"user": {
"userid": 6,
"username": "user_student_01@mail.com",
"email": "user_student_01@mail.com",
"firstname": "Student001",
"lastname": "STUDENT_001",
"phonenumber": "987654321",
"role": "STUDENT"
}
}
] // end List<User> in List<Course> in Program
} // end Course_2 in List<Course> in Program
] // end List<Course> in Program
} // end Program

If that felt like a pain in the ass to scroll through, think about this:

Imagine if that had been a LIST of Programs. What if there were 50 programs in the system? And each program had 100 courses? And each course had 50 users and 500 modules?

Nested data like this is dangerous in a context such as our product. It’s far from scalable. It’s far from INCLUSIVE — — even the best internet connections would fail in the worst-case.

(Side note — if you ever wondered why the hell you were learning about Big-O notation or Computer Science, here it is people. In the most straightforward example)

SO. Team A’s backend had a problem with the fashion in which the data was displayed. BUT they did convey all of the relationships between each entity.

Shape of Data — TEAM B’s BACKEND

Team B, on the other hand, thought about the problem discussed above. They opted to completely exclude the nested data from the response. So when the client hits GET all programs, they receive a list of Program objects containing ONLY top-level information.

So the example above would turn into this:

// 1st (and only) Program
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
}

A list of objects that look like THAT would be GREAT! I could have 50,000 programs and my laptop wouldn’t even think about sweating. Most computers could tank 5 million programs that looked like the above without a problem.

Wow! That sounds fantastic! But what’s missing?

As the client, if I wanted to see the courses for this program, I would have to

That’s three bullet-points for one task. Yikes. Even though most web developers are USED TO such a situation — this is not the best-case scenario.

Shape of Data — The New & Improved Approach

So… Team A’s BE provided associations between Programs and Courses or any other related data.

But this risked an INSANELY EXPENSIVE time & space complexity. This would affect the frontend, backend, end-user, and every layer in-between.

Team B’s BE kept their response-data short, sweet, and to the point. This allowed clients to get A LOT of one specific entity at any point in time in a VERY FAST & INEXPENSIVE fashion.

But this approach left it up to the client to know all the endpoints, keep their code updated with those endpoints, and manage associations themselves in application state. This, too, would affect the frontend, backend, end-user, and every layer in between.

So what’s the BEST case-scenario? We care mainly about two things:

If you’ve ever used the GitHub API, you’ve probably seen the solution since the moment you started reading:

LINKS!! HYPERMEDIA!!!

What if our program data looked like this:

{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"_links": {
"self": {
"href": "<http://localhost:2019/programs/program/8>"
},
"courses": {
"href": "<http://localhost:2019/courses/8>"
},
}
}

Now we have top-level program information when looking at this program. And we have an association between this program and the courses that belongs to this program. But that association doesn’t come at a cost of O(c^(n^x)) where c is the number of courses inside of this program, n is the variety of nested entities (modules/users/assignments/tags) INSIDE of that course and x is the number of occurrences FOR that nested entity.

Now, GET all programs is O(p) where p is the NUMBER OF PROGRAMS that exist. There is no additional cost! Every program has courses (even if that program has zero courses)! The good news is this: with links, it don't even matta how many courses a program has at this scope. A program exists with or without its courses. If a client wants the courses, they can snag the program._links.courses.href and BOOM... now they have the link to go get those programs.

The implications of that are bigger than you might imagine at first glance.

RESTful Resources & HATEOAS

The “New and Improved Approach” described above plays into RESTful services and how we should serve our clients. The implications of the pattern above are manyfold. It may look new and scary at first glance, but it’s a fairly small mental shift that gives way to a ginormous level of scalability.

Scale — Time & Space Complexity

The shape of our new data is exponentially faster than Team A’s; yet it includes any relational data that Team B missed.

All top-level info is included for the relevant data is included. However, nested data is contained in one single O(1) addition to each entity.

The worst case of any given collection endpoint is O(n) where n i the number of requested entities. If a client hits GET all programs, that n is equal to the number_of_programs that exist in the system... or the number_of_rows that exist in our PostgreSQL Programs table. If a client hits GET programs by userId, that n is equal to the number_of_programs that the specified User has created.

This is quite literally the best Big-O you could hit for a collection endpoint.

The time & space complexity for any given SINGULAR ENTITY is O(1). If a client is getting one program—thus getting a single entity—let's treat it as such! They shouldn't have to worry about the number of courses in that program until they're REQUESTING those courses!

Evolve — Adding Properties or Changing Endpoint URIs

As the backend developer, I could change the actual endpoint URI to get all courses in a given program from GET "/courses/{programId}" to GET "/courses/at-program-id/totally-renamed/{programId}" and the client frontend wouldn't need to change a single line of code if they consumed my links how I intended.

Access — Conditional Links

Further, the backend could add as any various relationships to other entities as they want, or even dive as deep as using these links as the driver of ACTIONS.

Imagine the following Program:

{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"_links": {
"self": {
"href": "<http://localhost:2019/programs/program/8>"
},
"courses": {
"href": "<http://localhost:2019/courses/8>"
},
}
}

Current State of the Project

Our project is currently in a perfect place to scale up and build upon—both frontend and backend.

FEATURE LIST

POTENTIAL FUTURE FEATURES

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store