HenrysRecords.org logo Technical Notes for HenrysRecords.org

The HenrysRecords.org code is under version control (check out or browse). If you'd like to suggest a patch, please contact us.

The search code itself is simple and self-explanatory. The more complex part is converting the data from its native home (a Symantec Q&A database running inside an MS-DOS emulator under Windows 10) to an Sqlite3 database served by Apache2 on Debian GNU/Linux. This documentation covers that process.

Save the QA4 folder to a USB drive:

  1. Find the source QA4 folder (\This PC\Desktop\Henry\QA4) and confirm that it is the correct one by checking the timestamps of some of the .asc files in it.

  2. Put a USB drive into the computer. It'll probably mount as E:.

  3. Copy the QA4 folder, using that bizarre Windows copy gesture that involves dragging an object from one location to another, which would normally suggest a "move" rather than a "copy" but apparently when the destination location is special — e.g., is a separate mount point or whatever that concept is called in Windows — then it's a "copy".

That's it. Now give the USB key to Karl, who will do the steps below...

Export the data from Q&A, and save to CD.

The commands in this section should be issued under DOS, or within dosemu under GNU/Linux. (We used to run these steps on Henry Fogel's computer, but later discovered that dosemu was good enough to run Q&A.)

  1. Start up Q&A (under dosemu, dosemu QA1.EXE).

  2. Go to File->Utilities->Export Data->Standard Ascii and hit Enter.

  3. You will be prompted for filenames to export from and to. Fill them in, for example like this:

       QA Filename:  C:\QA4\INSTRUM.DTF
         Export to:  C:\QA4\INSTRUM.ASC

    When you hit Enter after the second filename, Q&A may pop up an alert box saying that there is already a file of that name, and asking if you want to overwrite it. Say "Yes", but make sure that you're overwriting a ".ASC" file, not a ".DTF" file.

    (For the first filename, you don't actually need to type ".DTF". You can just give the front part of the name and hit Enter, Q&A will fill in the rest. However, when typing the second name, it is important that you type the ".ASC".)

  4. Hit F10 twice.

  5. Now you will be asked if you want to change the default export settings. We need to change two of them: in the first line, say no quotes, and in the second line, use return (not comma) as the field delimiter. You can leave the third line, export field template, set to no.

  6. Hit F10 again.

  7. Lather, rinse repeat: The above steps should be run for each of

  8. Exit Q&A.

Transform and publish the data.

These steps should be performed in the data/ subdirectory in the live web area on the server (which runs Debian GNU/Linux), after the .asc files have been copied there:

  1. Run './transfer QA4_DIRECTORY'.

  2. Run 'make sanitize' to remove the final ^Z (Ctrl-Z) character from all the .asc files and convert the files from DOS to Unix-style line endings.

  3. Transform orch.asc into ../orch-abbrevs.txt.

    Start by trying to apply orch-asc-diff.txt as a patch to orch.asc. If it doesn't apply cleanly, the adjustments you need to make might be obvious. In case they're not, and you have to resort to a manual transformation, here's the work that diff is trying to do for you:

    • Remove blank entries (whitespace on a line by itself) before processing. Likewise with the Fix the "sao paulo.." and "capella" entries, whose problems will be obvious when you see them.
    • Watch out for multiple contradictory entries for "Ens".
    • Convert standalone "Orch" to "Orchestra", but carefully -- some should be "Orchestre".
    • Same for "Symph".
    • Same for "Phil".
    • Same for "Univ".
    • Search for periods, as these are usually abbreviations (though we leave "St." alone).
    • Check for backslashes (e.g., "Orch\USSR").
    • Replace "Nat'l" with "National" or whatever's appropriate for that language.
    • Make sure that "Non", "non" and "xxx" are present and expand to "None".
    • No orchestras should contain a colon as part of their names; if any do, choose a new delimiter and adjust asc-to-xml.pl and ../search.py accordingly.

    If you do make any manual updates, please commit them to orch-asc-diff.txt, of course.

  4. Check that instrum.asc is unmodified ('svn diff will do the trick). If there are any changes, update the @instrum_abbrevs variable in asc-to-xml.pl accordingly. That's right, just put it in the source code — this ain't no Grand Hotel. Remember that the expansions need to be in double quotes, since some of the data may contain single quotes.

    Make sure @instrum_abbrevs contains this abbreviation:

      ["Perc", "Percussion"],

    (But in general use svn diff to check that no abbreviations get dropped.

  5. Make sure there's a copy of the excellent Awesomplete Javascript auto-completion library, with my continution-hint changes, in the top of the web server directory (which is the directory where ../search.py will run from — that is, the parent of the data directory we've been working in). The quickest way to get that is to just clone Awesomeplete from my copy and use the branch that has my changes:

      $ git clone https://code.librehq.com/kfogel/awesomplete
      $ cd awesomplete
      $ git checkout continuation-hint

    (Really we just need the awesomplete.css and awesomplete.js files there, but right now the code expects to find them in that subdirectory because I developed using a copy cloned from the full upstream source. So now we're running that way in production — that's right, not even using the minified versions. The completion lists themselves are so large they surely dwarf the code anyway.)

  6. Run 'make xml' to converts .asc files to .xml.

    You may need to run this step a few times, checking success each time. Look for "CHECK" lines on stderr that indicate problematic titles or soloists. Search for the word "gotcha" in asc-to-xml.pl for more explanation about that. Basically, asc-to-xml.pl is tentative about certain expansions of abbrevations in titles and soloists; you may have to add the title or soloist in question to the okay_titles or okay_soloists hash in asc-to-xml.pl, or, somewhat less often, modify gotcha_titles or gotcha_soloists.

    All data cleaning occurs in this step. This is because we want the .xml files to represent exactly the same data as is in the Sqlite3 database, since we encourage people to download the XML files if they need local data. Thus, the asc-to-xml.pl script is much more complex than xml_to_sql.py.

  7. Run 'make sql' to convert .xml files to .sql and build the autocompletion lists.

    This step also does some post-processing and some renaming, the latter because we don't publish one of the opera databases (it's idiosyncratic, incomplete, and mainly for Henry's private use) but we do publish the main opera database.

  8. If you're doing the data preparation somewhere other than on the main server (which is often the case in practice), run 'make upload' to copy all the important files to a holding area on the server, from which they can be moved manually into place in the live web area.

  9. On the production server, run 'make db'.

  10. Tweak the "last updated" date in index.html appropriately, and commit.

This completes the conversion. The database is now upgraded.

(Back to the search page.)