Intro to NotebookLM
One of many instruments that I discovered not too long ago and I hold utilizing increasingly every day is NotebookLM from Google Labs. NotebookLM is a good instrument for studying new subjects, researching massive quantities of knowledge, summarizing knowledge.
The information is organized into notebooks, every pocket book can include a number of sources of knowledge.
You may add knowledge in numerous codecs (net URLs, Slides, PDFs, textual content recordsdata, audio knowledge, YouTube movies, …) after which use the instrument to research them.
I often use it to ask questions in regards to the knowledge or summarize the info and/or extract items of data.
Essentially the most helpful characteristic for me is that whenever you ask a query it is going to present a solution with numbered hyperlinks to the sources so you’ll be able to double verify if the reply is appropriate or not.
Right here, I’m opening the pocket book Introduction to NotebookLM and ask the query What’s the most variety of phrases a pocket book can include? and you’ll see that it answered with a hyperlink to the paragraph that lists the Supply limitations. (Every supply can include as much as 500,000 phrases.)
That’s very useful whenever you need to verify if the reply you’ve acquired is grounded on reality or not.
A WordPress hack
A couple of days in the past I had the concept of making an attempt to see if it’s potential to research WordPress logs with NotebookLM (or with LLMs typically). That occurred after a buddy’s weblog was hacked and I spent loads of time trying on the logs making an attempt to make sense of them. I used to be pondering, there should be a neater means to do that, LLMs are nice at analyzing structured knowledge.
So, I setup a check WordPress weblog, made it public on the web for a couple of days to get some background web noise logs (to make it as real looking as potential). After which, I hacked my check weblog with the exploit my buddy’s weblog was hacked with (to breed the scenario). The exploit is CVE-2023-6961, it’s associated to the WordPress plugin WP Meta search engine marketing. The exploit is effectively described on this weblog submit from Fastly.
This can be a saved XSS vulnerability by way of the Referer
header, you ship an HTTP request with an XSS payload on the Referer
header.
GET /index.php/2024/10/20/973498739847943/ HTTP/1.1
Referer:
Host: weblog.thx.bz
Settle for-Encoding: gzip, deflate, br
Settle for: */*
Settle for-Language: en-US;q=0.9,en;q=0.8
Consumer-Agent: Mozilla/5.0 (Home windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.6533.100 Safari/537.36
Connection: shut
Cache-Management: max-age=0
When the administrator logs into the WP Admin dashboard and visits the WP Meta search engine marketing 404 & Redirects web page, the XSS payload will get executed. For the payload I’ve used some JS code that can create a brand new WP admin person just like what occurred in my buddy’s case.
In case you are to see the precise logs that I’ve uploaded into NotebookLM, you will discover them on this Kaggle dataset.
Nice, now we now have the WordPress Hack Apache Entry logs. Let’s load them into NotebookLM and see what we are able to do with them.
What I’ve uploaded to NotebookLM is a file named apache_access_log.txt
(because it solely accepts textual content recordsdata) that incorporates 1076 strains of entry logs logged over 3 days. It’s potential to add way more knowledge, the Gemini 1.5 Professional mannequin utilized by NotebookLM helps as much as 2 million tokens/phrases.
178.215.238.68 - - [19/Oct/2024:00:03:17 +0000] "GET /login.rsp HTTP/1.1" 404 453 "-" "Howdy World"
167.99.55.110 - - [19/Oct/2024:00:13:56 +0000] "POST /wp-cron.php?doing_wp_cron=1729469636.1745829582214355468750 HTTP/1.1" 200 259 "-" "WordPress/6.6.1; http://weblog.thx.bz"
143.110.222.166 - - [19/Oct/2024:00:13:55 +0000] "GET / HTTP/1.1" 200 15340 "-" "Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Model/16.1 Cellular/15E148 Safari/604.1"
162.158.154.86 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/certificates/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.115.200 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-admin/person/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.230.7 - - [19/Oct/2024:01:03:12 +0000] "GET /.well-known/acme-challenge/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.230.7 - - [19/Oct/2024:01:03:12 +0000] "GET /.well-known/acme-challenge/plugins.php HTTP/1.1" 404 490 "-" "-"
162.158.158.139 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/customise/plugins.php HTTP/1.1" 404 490 "-" "-"
172.70.115.200 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-includes/SimplePie/plugins.php HTTP/1.1" 404 489 "-" "-"
162.158.154.86 - - [19/Oct/2024:01:03:12 +0000] "GET /wp-admin/css/colours/blue/plugins.php HTTP/1.1" 404 489 "-" "-"
...
1076 strains of logs
Analyze WordPress logs with NotebookLM
Now that we now have the logs uploaded into NotebookLM, let’s attempt to analyze the info.
Let’s begin with an “simple” query.
What's the IP tackle of the WordPress administrator?
I’m asking what's the IP tackle of the WordPress administrator to see if NotebookLM can perceive the info and extract some data from this knowledge:
Nice reply, not solely as a result of it appropriately decided that IP tackle of the WP admin (80.97.26.93), but it surely additionally was ready to determine that originally the person logged on as one other IP (138.199.53.226) after which switched to the ultimate one (80.97.26.93).
That’s fairly spectacular, I used to be curious to know the way it knew to correlate these two IP addresses.
So, I’ve requested subsequent:
How are you aware that these 2 IP addresses (80.97.26.93 and 138.199.53.226) belong to the identical person?
Once more an important reply, it observed the Equivalent Consumer Agent and Sequential Exercise.
That’s fairly helpful already. Let’s ask extra difficult questions, to attempt to establish what HTTP requests may very well be associated with the creation of a brand new WP Admin account (that is what we all know occurred in my buddy’s case—a brand new WP person was created).
Checklist all of the IP addresses and logs that generated HTTP requests that might have resulted in a brand new WP admin person creation
Fascinating. It discovered that our personal WP admin IP tackle was used to attempt to create a brand new WP admin person.
That is fairly attention-grabbing because it type of hints to a Saved XSS vulnerability.
The obvious means our personal IP tackle may very well be used to create a brand new admin person is that if we visited an administrative web page the place attacker JS code was injected and our personal person (from our personal IP tackle) executed the attacker’s injected code.
Let’s ask a extra difficult query making an attempt to pinpoint the WP plugin that was concerned within the exploit.
What WP plugin may have been exploited to create a brand new WP admin person?
I’ve additionally added the next extra data to the query to assist the LLM reply the query (as we already know what WP plugins we now have put in):
What WP plugin may have been exploited to create a brand new WP admin person?
Take into accounts the next recognized info:
The next WordPress plugins are put in in my WordPress set up:
akismet
wp-fail2ban
wp-meta-seo
hi there.php
I’ve principally requested it to establish the WP plugin that might have been used to create a brand new WP admin person and offered an inventory of put in WP plugins.
Wow, it was capable of establish the susceptible WP plugin (WP Meta search engine marketing) that was used throughout the exploit.
Not solely that but it surely was additionally capable of establish the WP Meta search engine marketing admin web page the place the exploit occurred.
The reply incorporates the next part:
These makes an attempt originated from pages associated to the WP Meta search engine marketing plugin, particularly the “metaseo_broken_link” web page
metaseo_broken_link
is the susceptible web page the place the XSS payload executed.
It quoted the next logs:
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "GET /wp-admin/user-new.php HTTP/1.1" 200 10927 "http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link" "Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "POST /wp-admin/user-new.php HTTP/1.1" 302 459 "http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link" "Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
80.97.26.93 - - [21/Oct/2024:08:15:49 +0000] "GET /wp-admin/customers.php?replace=add&id=2 HTTP/1.1" 200 12205 "http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link" "Mozilla/5.0 (Home windows NT 10.0; Win64; x64; rv:127.0) Gecko/20100101 Firefox/127.0"
That’s nice. We see a POST /wp-admin/user-new.php
that ends in a 302 (Success) that has a Referer
of http://weblog.thx.bz/wp-admin/admin.php?web page=metaseo_broken_link
.
After which GET /wp-admin/customers.php?replace=add&id=2
we all know that the newly created WP person has id=2
(that's appropriate).metaseo_broken_link
is clearly the perpetrator.
Let’s ask another query:
Please listing all of the log entries the place the Referrer header incorporates HTML code
It appropriately recognized the request that I’ve used to inject the XSS payload that resulted within the Saved XSS vulnerability.
As you'll be able to see, utilizing NotebookLM helped us to rapidly get an concept of how the WordPress weblog was compromised and which plugin was doubtlessly susceptible.
After all, it doesn’t work as effectively every time, but it surely nonetheless can save loads of time.
In case you are within the patch for this vulnerability, it’s out there right here (the Referrer header is HTML encoded).