You can view this finished project live at citethis.net.
This is a citation generator designed to help students with their papers.
Starting in middle school and throughout high school and college, students write papers that require a list of sources. These sources must be cited in proper MLA/APA/Chicago syntax in order to get full credit for the assignment.
Many students, myself included, relied on online citation generators to get the finicky syntax correct.
The Chegg Monopoly
Chegg is king of online citation generators – they practically own the first page of any google search relating to citations. They trick search engines by duplicating their website and presenting it as different websites. Easybib.com, citationmachine.net, citethisforme.com, and bibme.org are all the same website, just reskinned.
Can you blame them though? It's working very well for them.
Below is a screenshot of a google search for "mla citation generator". As you can see, all but one of the results is owned by Chegg. And how often do you ever click on the very last result of a google search?
Creating my own citation generator
The citation generators on the first few pages of Google are all incredibly difficult to use.
- It takes 5+ minutes just to get a few citations
- Every page is cluttered with ads
- There are restrictions on how many citations can be generated
- They ask/force you to create an account
- They ask you to upgrade for an outrageous $10/month
And this is where I got my design requirements, prioritizing speed and usability.
- Be able to cite many websites within seconds
- No ads
- No restrictions
- No user signup
- Totally free
And so, citethis.net was born.
It's really, really fast
Citethis.net is the fastest citation generator on the entire Internet. Hands down.
From the moment that a URL is entered into the input field until the moment that the browser has the citation – with the author, date published and everything – is about 400 milliseconds. For previously crawled URLs, it's about 200 milliseconds (because it's cached into the database).
And this all happens when the user pastes a URL – not when the user clicks the cite button. By the time the user actually clicks the button, the citation is already in the browser, ready to be displayed.
User testing: time spent
When users go to these sites, they have one thing on their mind: "I need a citation".
How long does actually that take?
To get real data on this, I conducted a test in which users were instructed to cite a webpage in MLA and APA format using different citation generators.
You can see the full user test here: https://youtu.be/ZDnJCwV8vQQ
|Website||Quickest time||Longest time||Average time|
|citationmachine.net||53 sec.||3 min. 33 sec.||1 min. 50 sec.|
|citationproducer.com||22 sec.||40 sec.||32 sec.|
|citefast.com||31 sec.||1 min. 45 sec.||54 sec.|
|citethis.net||6 sec.||32 sec.||18 sec.|
I should note that these times include the user opening the website and copying the generated citation to the clipboard.
Below is a visualization of the average times.
Data transferred for each citation
I grew up on a farm with a terrible internet connection (you'd be lucky to ever stream YouTube in 480p), so I'm very conscientious when it comes to loading page content. I know in today's age, everyone has a 6 MB parallax ultra HD fancy-framework website, but I like to keep it simple. (Guess how large this blog post is!)
I recorded the amount of data that's transferred to generate a single citation, comparing my website to other websites.
You can see the full data here: https://docs.google.com/spreadsheets/d/1HG-aTB6BGECBIHyMwuC84WbDg2dnc5jmuXuhWoJIJqY
|Website||Number of requests||Data transferred|
Citation Machine, Google's #1 ranked generator, loads 30 MB of content for every citation generated. 30 MB!! That's 300x the amount of data as Cite This. If you went to citethis.net every weekday for an entire year, your browser would download less content than it would generating a single citation using Citation Machine.
Languages and Technologies
I used PHP to serve the HTML. The website is simple enough that I didn't need any libraries or frameworks, and PHP got the job done with ease.
I chose Python for the crawler script because it's developer-friendly and it's versatile to use. The script parses HTML from a given URL, and finds the author, date published, article title, and website name. Libraries such as Flask, Requests and PyQuery made the task very duable.
For the database I used MySQL, because you can't go wrong with MySQL.
That's right. I said the M-word. Microservice.
If you're not familiar with microservices, I recommend watching this quick video that explains monolithic architecture vs microservice architecture: https://youtu.be/RJkn9VHM7lc.
When I first put this site up, it was hosted on my $20/month AWS server, along with a dozen other of my websites. One time, the server crashed, and I didn't realize that all of my sites were down until a full 8 hours later. 8 hours of production downtime is downright embarrassing!
Enter: Microservices. Docker to build containers, and Kubernetes as the container orchestration system.
One of the cool features of Kubernetes is the self-healing ability. If a container dies, or if the pod dies, or if the node dies – Kubernetes will relaunch it. I'm proud to say that since the architectural change, I haven't had a single second of downtime.
Breaking down this application into microservices also made deployments an effortless task. This has allowed me to focus my energy on developing the application instead of worrying about maintaining production.
Having infrastructure clearly defined in code is another bonus. Defining the containers, the ports that are exposed, the CPU and memory that's allocated, the routing of domain names to the containers, and even the SSL certificate – is all explicitly defined in configuration files, in source control, alongside the rest of the files in the GitHub repository.
Mistakes I made
Initially, I only supported MLA, APA, and Chicago formats because they were the most commonly used. But once I started asking for feedback from students, they asked, "Does it support IEEE format?". I quickly realized supporting only 3 formats would limit my audience, so I shifted the objective to instead support every single format.
Initially, this was also just a single page application. I didn't see a need for making this a multi-page website, and I wanted to keep it as simple as possible. However, I soon came to realize it's poor impact on SEO. Chegg's websites are ranked so highly largely because of the sheer amount of content they have, and I decided to take a lesson from them. I ditched the single page application and created a page for every format in an effort to improve SEO and to better organize content (manual citations).
One aspect of this project that significantly slowed me down was choosing the wrong tools for the job. I initially started this project years ago, and at the time PHP was the only backend language I was comfortable with. I wrote the entire crawl script in PHP, and it was so messy you'd think it had gone through a woodchipper. It was hacky and difficult to maintain.
If I were to redo it, I'd build the entire project using Django, a Python framework which organizes code following the model-view-controller pattern. Python is also the best language of choice for the crawler script. It just makes sense to be consistent, using only one language and one organization system.
Another thing that I did poorly was failing to create detailed mockups before developing the site. Too often I went straight to development before flushing out my idea, resulting in me having to redo nearly every aspect of the website. Very inefficient.
I am proud to have imagined an idea and seen it through to creation. I'm now watching the daily user count grow and I plan on improving this website for years down the line – fixing bugs, supporting more kinds of sources, and supporting more citation formats.
Thanks for reading!