Scaling a Reusable, Full-Stack Learning Management System
Reach LMS — Background
Reach LMS is a general-purpose, open-source learning management system designed for the developing world. Reach lets organizations offer education and training to anyone — whether they’re working from a laptop in a city center or a solar-charged flip-phone in a remote village.
Last month, I worked with a team to build a first iteration of this project from scratch. We laid out a solid foundation upon which future teams could build. Users had one of three roles: Student, Teacher, or Administrator. The focus of the app revolved around creating Programs, Courses, and Modules. In this hierarchy, Programs had Courses. Courses had Modules. Admins made Programs. Admins & Teachers made Courses and Modules. Students simply had read-access to content they were attached to. Feel free to read my previous article for more details about the first month’s project: Building an Open-Source LMS From Scratch.
THIS month, I came back for round 2. But things were different this time around. Not only were there new features to implement and a new team; there were two entire codebases from which to choose. The month prior, my team was not the only team building a first iteration of Reach LMS — we were one of two!! Both teams received the same Product Vision Document (or Roadmap), attended the same Product Review and Stakeholder meetings; however, the teams worked independently of one another and each created a functioning frontend and backend for this product.
Our beloved Stakeholder & Labs Manager, Frank Fusco, truly loved both products from the month previous — I can hardly blame him, both teams did an extraordinary job. Frank left the decision for which codebase to use entirely in our team’s hands. This freedom ultimately led to the best learning experience I could have ever wished for this month.
So. We had four repositories — a frontend and a backend from each team. Both teams had some huge differences in UX, user-flow, and design; varying techniques & patterns used in the respective codebases; and some huge differences in the conception and design of the database schema and how information was being relayed between the backend, frontend, and the end-user. We had a whole list of new features to implement. And, best yet, we had team consisting of 2 members from last month’s Team A, 2 members from last month’s Team B, and 4 members who were completely new to the codebase(s). Quite the starting point.
Ramping Up — Where do we start??
Starting with TWO completely functional full-stack applications is far from a typical starting point. Most teams are lucky to walk into one semi-functional application. Initially, we thought that having the two codebases would be a pretty easy task to deal with — “just pick one!”, you might think. As it turns out, though, there were aspects of each worth preserving. Our entire first week mainly consisted of Zoom calls and Discord sessions where we talked through the pros and cons of each codebase. Specifically the frontend, strangely enough. We meticulously looked through the flow of each app and discussed the good, the bad, the ugly. Then we turned to the code and looked at everything — project file-structure, Redux style, state management in general, coding style, all the way down to the differences of .js
vs .jsx
for file suffixes.
Our team generally preferred the project organization, design, and user-flow of Team A’s frontend; but then we favored the Redux patterns, Route handling, and component tree from Team B’s frontend. Essentially, we were wishing that we could inject Team B’s patterns into Team A’s project to sew the two together.
So that’s what we did.
It took one hell of a refactor, but we went through the Team A frontend and did the following:
- Transformed the old-style Redux files (splitting actions and reducers) into Redux Ducks
- Write async thunks so that Redux was driving all interaction with the backend and handling any business logic necessary (this allowed components to focus on displaying that data to the user rather than fetching, organizing, and dealing with it)
- Utilize some Redux patterns inspired by Redux Toolkit (which is also where the ducks pattern came from)
- Transformed the Routing in the app to utilize React Router and various team-defined utilities to make routing more consistent.
- Reorganize components, stripping out any repetitive logic into reusable hooks, and stripping out any repetitive layout components (such as the
Header
&NavBar
) into separate components. The goal here was to isolate some task for each component; we don't want any single component doing 5 jobs at the same time — we wanted 5 components doing ONE job PERFECTLY. That way, we can pull them into other components and combine them so they can do their jobs in tandem.
It was a chaotic week with a LOT of meetings to discuss one thing vs another. Handling the transition from two codebases to one not only took some time and patience, but it also made planning for the rest of the month ridiculously hard. It felt as if our app was in limbo. The team was waiting on me for the refactor; waiting on our design manager for input on how to structure our user-flow; waiting on this, that, the other thing.
So the whole first week was a lot of chatting about how to GET to the starting point. By late that Thursday, most of the changes listed above were effectively merged in. But the app LOOKED exactly the same, just functioned very differently under the hood.
At the end of that week, it felt like we had discussed how to get to the starting point all week. We made some big changes to the frontend so that our two frontend codebases were one. But we still had two backends to talk about and an entire month’s worth of new features to that we had barely started to PLAN yet.
BE Database Design & Schema
That first Friday, the four of us who had worked on this project the month previously met up and talked about some backend goodness. Our backend application was using the following tech-stack: Java, Spring Boot, PostgreSQL, Okta Security, Spring Security, and Swagger-UI (for documentation).
Once again, we had two backends. They had different schemas and a handful of differences in how data was shaped and serialized.
Team A Database Schema:
Team B Database Schema:
The most notable differences are related to (1) how Students
and Teachers
are attached to their content, and (2) how Students
and Teachers
are formed.
The largest difference is the fact that Team A attached Students
and Teachers
at the Course
level. Team B attached Students
and Teachers
at the Program
level. The thinking for Team B was that anytime a Student
or a Teacher
was enrolled in a Program
, they'd be attached to every Course
within that Program
's courses. We quickly opted to take Team A's approach on this front, attaching Student
and Teacher
entities at the Course
level. That way, there would be a lot more flexibility for users to create the Program-Course-Module relationship however they see fit.
The next largest difference is that Team A made a join table between Students
and Courses
called StudentCourses
, and a similar join table between Teachers
and Courses
called TeacherCourses
. In their backend, Teachers
and Students
were NOT Users
. They were entirely separate entities.
In Team B, Students
and Teachers
WERE actually just general Users
, and the distinction happened in a similar join table... but this join table joined Users
to Programs
on either the Student
or the Teacher
side.
Though each team had a functional solution to the whole role-based user situation, neither solution used the Roles
table (and UserRoles
join table) as the driving factor for how the Student
or Teacher
existed in the system. We decided to refactor this so that, essentially, the two approaches were combined with an additional level of distinction: Roles
should be the determining factor for whether a User
could be attached to a Course
as a Student
or a Teacher
.
Refactored Schema
This was essentially our initial refactored DB Schema:
For better or for worse, Students
and Teachers
are no longer individual tables nor are they individual entities. Rather, Users
have Roles
, which can be of type "ADMIN"
, "TEACHER"
, and "STUDENT"
. Then, Users
are joined to Courses
with a <Many-to-One-to-Many>
relationship between <Users-to-UserCourses-to-Courses>
. Any user in the UserCourses
table is GUARANTEED to have a STUDENT
or TEACHER
role, because of how we implemented the insertion and management of users to courses. Then, we can pull Teachers
or Students
out of that UserCourses
table based on ROLE!!! Any user enrolled in a course is going to be in that UserCourses
table; they are treated as a Teacher
if their role is of RoleType TEACHER
and treated as a Student
if their role is of RoleType STUDENT
.
Shape of Data
One of the pieces of the puzzle that we knew for sure we wanted to change: the shape of the data coming from the backend.
Shape of Data: TEAM A’s BACKEND
Team A’s backend sent data in such a way all associations from one entity to another are represented when looking at that endpoint… by stuffing all the related entities in via nested data. The association part of that pattern is GREAT! It means I can look at all the Courses
affiliated with any the specific Program
I'm looking at. But what happens when I hit an endpoint that gives me ALL of the programs... and Programs can have arbitrarily many Courses... and Courses can have arbitrarily many Modules...
When I hit GET all programs
, I expect to see all of the programs. But if each program has a list of potentially many courses, do I really want to see all of the courses associated with each program? Maybe! But certainly not always.
For instance, what if I were hitting the GET all programs
endpoint so that I could render the following screen:
In this hypothetical situation, I hit GET all programs
and received THREE programs. Only three! Then I displayed those three programs as Cards, where the Program's title
, type
, and description
are showing.
Absolutely NOTHING about that screen above requires any information about the Courses associated with each Program. As the client, should I have to wait until the backend finds every single Course inside of every single Program? And since this issue trickles down the hierarchy, finding every Course inside each Program would require finding every single Module inside of every Course inside of every Program.
If you got lost in the maze of nested data described above, imagine how a computer feels — they think in 1s and 0s, in true
and false
.
NOTE — — the following is an example of what the data might look like. Feel free to scroll on past
Here’s what the JSON might’ve looked like for ONE program in this example :
// 1st (and only) Program
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"courses": [
// 1st Course in ONE program
{
"courseid": 15,
"coursename": "Course1",
"coursecode": "COURSE_1",
"coursedescription": "This is course #1",
"users": [
// 1st User in 1st Course in ONE program
{
"user": {
"userid": 5,
"username": "user_teacher_01@mail.com",
"email": "user_teacher_01@mail.com",
"firstname": "Teacher001",
"lastname": "TEACHER_001",
"phonenumber": "0123456789",
"role": "TEACHER"
}
},
// 2nd User in 1st Course in ONE program
{
"user": {
"userid": 6,
"username": "user_student_01@mail.com",
"email": "user_student_01@mail.com",
"firstname": "Student001",
"lastname": "STUDENT_001",
"phonenumber": "987654321",
"role": "STUDENT"
}
}
]
},
// 2nd Course in ONE program
{
"courseid": 16,
"coursename": "Course2",
"coursecode": "COURSE_2",
"coursedescription": "This is course #2", "users": [
// 1st User in 2nd Course in ONE Program
{
"user": {
"userid": 5,
"username": "user_teacher_01@mail.com",
"email": "user_teacher_01@mail.com",
"firstname": "Teacher001",
"lastname": "TEACHER_001",
"phonenumber": "0123456789",
"role": "TEACHER"
},
}
// 2nd User in 2nd Course in ONE Program
{
"user": {
"userid": 6,
"username": "user_student_01@mail.com",
"email": "user_student_01@mail.com",
"firstname": "Student001",
"lastname": "STUDENT_001",
"phonenumber": "987654321",
"role": "STUDENT"
}
}
] // end List<User> in List<Course> in Program
} // end Course_2 in List<Course> in Program
] // end List<Course> in Program
} // end Program
If that felt like a pain in the ass to scroll through, think about this:
- THAT was ONE SINGLE PROGRAM!
- That program had TWO courses
- Those COURSES had TWO users
- Aaaaand I didn’t even DISPLAY modules.
Imagine if that had been a LIST of Programs. What if there were 50 programs in the system? And each program had 100 courses? And each course had 50 users and 500 modules?
Nested data like this is dangerous in a context such as our product. It’s far from scalable. It’s far from INCLUSIVE — — even the best internet connections would fail in the worst-case.
(Side note — if you ever wondered why the hell you were learning about Big-O notation or Computer Science, here it is people. In the most straightforward example)
SO. Team A’s backend had a problem with the fashion in which the data was displayed. BUT they did convey all of the relationships between each entity.
Shape of Data — TEAM B’s BACKEND
Team B, on the other hand, thought about the problem discussed above. They opted to completely exclude the nested data from the response. So when the client hits GET all programs
, they receive a list of Program objects containing ONLY top-level information.
So the example above would turn into this:
// 1st (and only) Program
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
}
A list of objects that look like THAT would be GREAT! I could have 50,000 programs and my laptop wouldn’t even think about sweating. Most computers could tank 5 million programs that looked like the above without a problem.
Wow! That sounds fantastic! But what’s missing?
As the client, if I wanted to see the courses for this program, I would have to
- Know the endpoint to get
courses
byprogramid
- Keep my code updated to make sure the developer of my backend hadn’t changed the aforementioned endpoint
- I’d have to store that
programId
somewhere... which means State-Management SOMEWHERE (be that inReact.useState
orreact-router-dom
orRedux
or a fucking Sticky Note if my app was truly primitive.)
That’s three bullet-points for one task. Yikes. Even though most web developers are USED TO such a situation — this is not the best-case scenario.
Shape of Data — The New & Improved Approach
So… Team A’s BE provided associations between Programs and Courses or any other related data.
But this risked an INSANELY EXPENSIVE time & space complexity. This would affect the frontend, backend, end-user, and every layer in-between.
Team B’s BE kept their response-data short, sweet, and to the point. This allowed clients to get A LOT of one specific entity at any point in time in a VERY FAST & INEXPENSIVE fashion.
But this approach left it up to the client to know all the endpoints, keep their code updated with those endpoints, and manage associations themselves in application state. This, too, would affect the frontend, backend, end-user, and every layer in between.
So what’s the BEST case-scenario? We care mainly about two things:
- Brief, top-level, non-nested data…
- Relevant related data — relationships between THIS data and WHAT IS RELATED TO IT
If you’ve ever used the GitHub API, you’ve probably seen the solution since the moment you started reading:
LINKS!! HYPERMEDIA!!!
What if our program data looked like this:
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"_links": {
"self": {
"href": "<http://localhost:2019/programs/program/8>"
},
"courses": {
"href": "<http://localhost:2019/courses/8>"
},
}
}
Now we have top-level program information when looking at this program. And we have an association between this program and the courses that belongs to this program. But that association doesn’t come at a cost of O(c^(n^x))
where c
is the number of courses inside of this program, n
is the variety of nested entities (modules/users/assignments/tags) INSIDE of that course and x
is the number of occurrences FOR that nested entity.
Now, GET all programs
is O(p)
where p
is the NUMBER OF PROGRAMS that exist. There is no additional cost! Every program has courses (even if that program has zero courses)! The good news is this: with links, it don't even matta how many courses a program has at this scope. A program exists with or without its courses. If a client wants the courses, they can snag the program._links.courses.href
and BOOM... now they have the link to go get those programs.
The implications of that are bigger than you might imagine at first glance.
RESTful Resources & HATEOAS
The “New and Improved Approach” described above plays into RESTful services and how we should serve our clients. The implications of the pattern above are manyfold. It may look new and scary at first glance, but it’s a fairly small mental shift that gives way to a ginormous level of scalability.
Scale — Time & Space Complexity
The shape of our new data is exponentially faster than Team A’s; yet it includes any relational data that Team B missed.
All top-level info is included for the relevant data is included. However, nested data is contained in one single O(1)
addition to each entity.
The worst case of any given collection endpoint is O(n)
where n
i the number of requested entities. If a client hits GET all programs
, that n
is equal to the number_of_programs
that exist in the system... or the number_of_rows
that exist in our PostgreSQL Programs
table. If a client hits GET programs by userId
, that n
is equal to the number_of_programs
that the specified User
has created.
This is quite literally the best Big-O you could hit for a collection endpoint.
The time & space complexity for any given SINGULAR ENTITY is O(1)
. If a client is getting one program—thus getting a single entity—let's treat it as such! They shouldn't have to worry about the number of courses in that program until they're REQUESTING those courses!
Evolve — Adding Properties or Changing Endpoint URIs
As the backend developer, I could change the actual endpoint URI to get all courses in a given program from GET "/courses/{programId}"
to GET "/courses/at-program-id/totally-renamed/{programId}"
and the client frontend wouldn't need to change a single line of code if they consumed my links how I intended.
Access — Conditional Links
Further, the backend could add as any various relationships to other entities as they want, or even dive as deep as using these links as the driver of ACTIONS.
Imagine the following Program:
{
"programid": 8,
"programname": "Program1",
"programtype": "12th grade",
"programdescription": "This is program 1",
"_links": {
"self": {
"href": "<http://localhost:2019/programs/program/8>"
},
"courses": {
"href": "<http://localhost:2019/courses/8>"
},
}
}
Current State of the Project
Our project is currently in a perfect place to scale up and build upon—both frontend and backend.
FEATURE LIST
- Users of any role can log into our app with their Okta credentials
- Users can have roles of an
ADMIN
,TEACHER
, orSTUDENT
and interact with content based on the privileges associated with their role. - All users can view and update their profile information
ADMIN
users will land on a page with all of thePrograms
they own with associated actions — they can create a newProgram
, edit existingPrograms
, delete existingPrograms
.ADMINS
have the ability to specifyTags
within eachProgram
with a title and a hex-code for the color to represent thatTag
.ADMINs
andTEACHERs
can then specify the type of anyCourse
within thatProgram
by selecting one of theTags
available in thatProgram
.ADMIN
users can view all users in their system.TEACHER
users can createSTUDENT
users.TEACHER
users can assignSTUDENT
users to and removeSTUDENT
users from courses that they are attached toADMIN
users Create/Edit/Delete other users with roles ofTEACHER
orSTUDENT
. This includes the ability to change an existing user’s role fromTEACHER
orSTUDENT
toSTUDENT
,TEACHER
, orADMIN
. Note:ADMIN
users cannot edit or delete otherADMIN
users, only create them or changeTEACHER
andSTUDENT
users TO anADMIN
ADMIN
users can assignTEACHER
andSTUDENT
users to coursesADMIN
users can upload a CSV file to create users of any type to be created in their system—this will create Okta users for any users that don’t yet exist in the Okta application and email the newly created user an email verification. Additionally, each user will be created in our database if they do not yet exist in the DB.ADMIN
andTEACHER
users are able to upload a CSV file to create users of theSTUDENT
type and attach them to a specified courseADMIN
users can View/Create/Edit/Delete programs over which they have ownership.ADMINS
can create, edit, delete, and viewCourses
withinPrograms
that they have created.ADMINS
can create, edit, delete, and viewModules
withinCourses
withinPrograms
that they have created.TEACHERS
have the ability to create, edit, and viewCourses
andModules
that they are associated with.ADMIN
andTEACHER
users have the ability to updateModule
content by writing with Markdown. The markdown editor has live feedback which renders the formatted preview as the user types.STUDENTS
can only view theCourses
andModules
that they are associated with.- Users of any sort can search when looking at a list of their
Courses
and expect results to be filtered accordingly ADMIN
users can search for users when looking at a list of all theUsers
in the system and expect results to be filtered accordingly.
POTENTIAL FUTURE FEATURES
- Further integration with our app and Okta. For instance, each user could explicitly inherit features from Okta’s
User
class from the Okta SDK. Additionally,roles
could be handled ENTIRELY by Okta by utilizing Okta’s userGroups
- Server-side rendering. Or potentially just shifting to a completely Spring-based full-stack application. Views could be specified with
Thymeleaf
and everything could be neatly handled and intertwined in Spring. I personally think this route would be the most powerful version of this app. Tiny bundle-size, mostly static views, and full control over each view from the Spring Services would make for a very powerful LMS. - Markdown file upload for Modules.
- Multi-part content for Modules.
- More Markdown layers throughout the Entity Hierarchy. (Why should Program Description be limited to plain text when that itself could be MD…)
- User-defined templates for what MD content should/can look like.
- More granular permissions. Break roles down even further into specific permissions. Then, let
ADMIN
users select which powers other users have.