Earn 4,500 ($45.00)

due 2 years ago

Completed

Simple NestJS RPC Microservice to wrap Two Wikipedia API Calls

DanBarrett

Posted 2 years ago

Details

Applications

Discussion

This Bounty has been completed!

Bounty Description

Problem Description

Create a small NestJS microservice which makes calls to the Wikipedia API to fetch the ids (urls or titles) of popular pages and the structured content of a page given its url or id. Map the result into an array of the main body text (broken roughly if imperfectly into wikipedia paragraphs/sections) and an array of any image urls present on the page.

Acceptance Criteria

Code is written in Typescript using the NestJS framework and uses axios to make API calls
the NestJS framework can be minimal (no more than a simple skeleton init, with a single RPC Controller and a single service)
The RPC controller should expose two methods:
fetch a list of popular Wikipedia page identifiers (ie id, url, title)
fetch a full page's structured content using the id or url
map the result into an array of text (paragraphs) and an array of image urls
add one set of integration tests (a spec file using NestJS TestModule) that actually makes the call out to the real API (we will run this infrequently but on an ongoing basis as a healthcheck)

Technical Details

Language: Typescript
Framework: NestJS
HTTP client: axios

Pseudocode example of the service in question, wrap this in an RPC controller and make available to the network. (may not work as written):

import { Injectable } from '@nestjs/common';
import axios from 'axios';
 
@Injectable()
export class WikipediaService {
  async getPopularPages(): Promise<WikipediaPage[]> {
    const response = await axios.get(
      'https://en.wikipedia.org/w/api.php?action=query&list=mostviewed&format=json',
    );
    return response.data.query.mostviewed;
  }
 
  async getPageContent(pageTitle: string): Promise<string> {
    const response = await axios.get(
      `https://en.wikipedia.org/w/api.php?action=query&titles=${pageTitle}&prop=revisions&rvprop=content&format=json`,
    );
    const pages = response.data.query.pages;
    const pageId = Object.keys(pages)[0];
    return pages[pageId].revisions[0]['*'];
  }
}
 
export interface WikipediaPage {
  pageid: number;
  ns: number;
  title: string;
  count: number;
  created: string;
}

Pseudocode example of extracting paragraphs from page (may not work as written):

function extractParagraphs(content) {
  const lines = content.split('\n');
  const paragraphs = [];
  let currentParagraph = '';
  for (const line of lines) {
    if (line.startsWith('==')) {
      break;
    }
    if (line.startsWith('=')) {
      continue;
    }
    if (line.startsWith('*')) {
      continue;
    }
    if (line.startsWith('{')) {
      continue;
    }
    if (line === '') {
      paragraphs.push(currentParagraph);
      currentParagraph = '';
    } else {
      currentParagraph += `${line}\n`;
    }
  }
  if (currentParagraph !== '') {
    paragraphs.push(currentParagraph);
  }
  return paragraphs;
}

Pseudocode example of extracting images from response (may not work as written):

function extractImages(content) {
  const lines = content.split('\n');
  const images = [];
  for (const line of lines) {
    const imageRegex = /\[\[File:(.*?)\|.*?\]\]/g;
    const matches = line.match(imageRegex);
    if (matches) {
      for (const match of matches) {
        const imageName = match.replace('[[File:', '').replace('|.*]]', '');
        images.push(imageName);
      }
    }
  }
  return images;
}

Link to Project

// no public-facing link yet, email me for any questions you have