Project Overview: Citation Networks in Rampage Shootings

I’ve been working on my latest project for a long time now, but, having underestimated how complex it would get, was waiting until I had “a draft finished” to post something– ha! Yeah, right! There may never be a true “end” to this project, so waiting for a “complete” draft might mean waiting forever. I’m going to start posting periodic updates as I pass through stages of the project instead. So, without further ago, introducing:

Citation Networks in Rampage Shootings.

Background: In my experience, media coverage and anecdotal discussion of mass shooting events typical portray them as essentially unconnected natural disasters. But as it turns out, “climatology” is a very complex science…

In the last few years the US public has started to realize that there is an element of media influence involved in the motivation for perpetration, and academics have begun to analyze the characteristics of perpetrators (examples: race, age, mental health diagnoses on record, year shooting committed) and even the frequency of some contents in their writings; for example this Peter Langman paper (link does not directly open the large document). We are able to do this because of the great deal of shooter/event information available to the public via the Freedom of Information Act (FOIA)

However there is not a concentration on rampage shootings as a covert political movement of sorts, and/or a type of abstract terrorist network operating in single-cell units.  I posit that this is a useful conception, as expressed by a network (acyclic directed graph) of “citations” between rampage shooters. The existing methodology of citation analysis can provide a framework or guide for this expression. For now, I am concentrating on creating the most thorough network I reasonably can while maintaining relevance to my interests and an appropriately tight scope (I’ll elaborate).

sci2figure4.12

Illustration of citation network(s), from this wiki page on network analysis.

My vague “starting goal”: Obtain and organize documents (primarily manifestos) associated with rampage shooters, and datamine them for cross-references. Create a visual network.

My concretized, “actionable” goal: Scrape the web for documents written directly, or otherwise generated directly (e.g. FBI report list of websites visited by shooter as found on his computer), by a restricted population of mass shooters (starting with those on Langman’s schoolshooters.info website). [✔️] Process them all into searchable text (as some are PDF, handwritten, etc). [✔️] Create a list of shooter-associated names and terms (names, nicknames, schools attacked, and unambiguous referents) [✔️], and directly search the documents for these names/terms, creating a list of citations apparent between shooters (which will need to be cleaned for redundancy and ongoingly manually checked to see if the results are making sense). Go through and throw out false positives, attempt to identify false negatives from personal knowledge of the documents (which I viewed individually when classifying during OCR pre-processing). Create a visualized network of citations [❌] where the graph is an acyclic directed graph, the nodes are shooters, and the edges are citations. Ideally, the nodes will be physically laid out along a timeline. [❌]

Where I’m at now:
✔️ Outline my goals.
🆕  Blog about same.

✔️ 
Define the population to be included (restricted set of shooters). Start with shooters represented in Langman’s original documents database, and work outward from there.
✔️ Scrape the web for documents using BeautifulSoup.
🔜 Blog about the web scraping.

✔️ Learn what OCR options are out there. Use OCR and manual/voice transcription to convert all document types to searchable .txt files as follows. First classify all documents according to their status: useless document, desirable and already usable txt, desirable but needs OCR, desirable but handwritten / otherwise illegible to conventional prefab OCR. Use tesseract and other packages until either every typed document is successfully hit or I conclude that that is not going to happen anytime soon. For everything else, decide whether it’s worth training AI to decode the writing/printing style or whether it’s better to just grind it out by transcribing manually or reading aloud into a voice-to-text processor.
🔜 Blog about the document processing.

✔️ Create list of terms and names associated with each shooter.
✔️ For each shooter, use Python to mine associated list of outgoing citations (of other shooters in this set) from documents.
✔️ Review identified citations, scrutinize sensitivity and specificity.
❌ Repeat until satisfied.
🔜 Blog about the comedic mishaps encountered along the way (no spoilers, but let’s just say I had some very confusing results for awhile).

❌ Decide which language(s) to create visuals in.
❌ Create formal nodelist (easy modification of shooter list) and edgelist (slightly harder but still pretty easy translation of citations list/log once created).
❌ Plot and prettify visual graph.
❌ Partially order nodes in time, place physically along a timeline.

💭 Node expansion goals: flesh out the network with all “iconic figures” mentioned, starting with other mass shooters (not from Langman), then enlarging to other mass killers, then killers in general (e.g. Hitler) as well as media (e.g. The Basketball Diaries). Needless to say many of these would not need to be searched for most outward citations (e.g. The Basketball Diaries is likely not citing any modern shooters).
💭 Edge expansion goals: grow the network into a set of hypergraphs with edge criteria such as: same type of weapon used (may need consultation as to what is similar enough to constitute probable mimesis), similar poses in images released, same type of manifesto (video, etc) released to the news intentionally, and so on. Each criterion could be represented by a different color, for example, if we wish to be able to toggle through criteria– perhaps create a mini interactive network where the user selects the criterion.
💭 Incorporate visual citations: Consider pictures released to the public by shooters that contain visual references to other shooters. Most of these are to Columbine. We have the “wrath” & “natural selection”-style “uniforms”, the Dylan Klebold gun-finger wave, and so on.

Coming soon… Perils and adventures encountered putting my first toe in the web scraping waters!

Leave a Reply

Your email address will not be published. Required fields are marked *