-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: glean metrics logic #1626
Conversation
d779cc2
to
82a32fe
Compare
fe603d9
to
28592aa
Compare
glean/src/server_events.rs
Outdated
extra: HashMap<String, String>, | ||
} | ||
|
||
pub fn new_glean_event( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit torn.
The more "rust"y way to do this would be to create an
impl Default for GleanEvent {
fn default() -> Self {
Self {
category: String::default(),
name: String::default(),
timestamp: Utc::now().timestamp_millis(),
extra: Hashmap::new(),
}
}
and then where you call new_glean_event
you would specify it as:
GleanEvent {
category,
name,
extra,
...Default::default()
}
Generally, creating a ::new(...)
is because there are more side effects or setup required than just a quick builder.
I'm not going to really push for that sort of change for something like this, though, so consider this more an FYI.
} | ||
|
||
impl GleanEventsLogger { | ||
fn create_client_info(&self) -> ClientInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would strongly recommend converting this to an impl Default
unless there are more commonly used values to initialize this struct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In discussing, and if ok with you, were gonna leave this one as it is closely tied to the GleanEventsLogger
struct and needs values from the instance of said GleanEventsLogger
, so wouldn't provide us with what we're looking for.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because the app_display_version
and app_channel
are defined from the values set in GleanEventsLogger
.
Technically, a more rust way of doing that would be to create a
impl From<GleanEventsLogger> for ClientInfo {
fn from(gel: GleanEventsLogger>) -> ClientInfo {
ClientInfo {
app_display_version: gel.app_display_version,
app_channel: gel.app_channel,
..Default::default()
}
}
}
(this is where you'd have a impl Default for ClientInfo
that specifies the other values.)
You could then do
let telemetry_payload: PingPayload = PingPayload {
client_info: self.clone().into(),
(This also lends itself nicely to template building, because then you can just define the non-standard stuff you want to specify and trust that rust will backfill with the standard bits.)
But if this is the way that the Glean team is advising you create this, so be it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure but this also gets a bit in to the weeds of what you want for default values of app_display_version/channel
-- I'm wondering if glean might never want those to be empty strings or "Unknown" (unlike the other fields) so you might want to discourage that from potentially ever happening by not having such a default as a footgun 🤷♂️
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the other parsers (JS, Go, Ruby, Python, etc) generate the string "Unknown" by default, so I think we want to keep it. The comment in one of the server implementations stated: "Unknown" fields below are required in the Glean schema, however they are not useful in server context.
Therefore, we don't get any benefit but it is expected that the string "Unknown" is present.
glean/src/server_events.rs
Outdated
payload: &PingPayload, | ||
) -> Ping { | ||
let payload_json = | ||
serde_json::to_string(payload).expect("unable to marshal payload to json."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should consider converting this to a thiserror
derived struct so that it can be captured by calling systems and retried. An .expect()
is a glorified panic and will kill whatever thread or process it's in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was ok with these as I can't imagine what could actually trigger the 2 expects generated in this file, maybe if you really screwed up your glean yaml/schema? In that case I'm ok with panic'ing -- and a nice side benefit is getting away without having to implement any actual Error stuff.
glean/src/server_events.rs
Outdated
let mut metrics = Metrics::new(); | ||
// Create the inner metric value map to insert into `Metrics`. | ||
metrics.insert( | ||
"string".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit nitty, but we generally call to_owned()
instead of to_string()
. Both are equally performant. to_owned()
reflects that we're converting a static string reference (that's what "string" is) to a own-able String object. Since insert(...)
essentially is consuming these values, it's semantically more correct.
syncserver/src/web/handlers.rs
Outdated
app_id: "syncstorage".to_string(), | ||
app_display_version: "1.0.0".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Take a look at the logging
module on what we use for its similar values. You probably want:
app_id: "syncstorage".to_string(), | |
app_display_version: "1.0.0".to_string(), | |
app_id: env!("CARGO_PKG_NAME").to_owned(), | |
app_display_version: env!("CARGO_PKG_VERSION").to_owned(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
app_id
is actually what the application is defined as in the probe-scraper
configuration: https://github.com/mozilla/probe-scraper/pull/834/files , but can use the app_display_version
as you show above.
syncserver/src/web/handlers.rs
Outdated
// app id will be supplied when added to probe-scraper | ||
app_id: "syncstorage".to_string(), | ||
app_display_version: "1.0.0".to_string(), | ||
app_channel: "prod".to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have a SENTRY_ENVIRONMENT
env var which could be used here but we should probably just add a new SYNC_ENV
var to the helm chart (like merino has).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, defining one that states it's the glean app_id is probably the way to go, eh? Can leave string for now while we wait till helm chart updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hardcoding the app_id
to syncstorage
wfm, I just mean something else for the app_channel
of "prod"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got the PR in place to add this, for now we will keep this in place and when the cloudops-infra changes land, we'll modify.
glean/src/server_events.rs
Outdated
// Required imports | ||
use chrono::Utc; | ||
use serde::{Deserialize, Serialize}; | ||
use serde_json; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally you don't have to specify crate roots like this. You only need to specify the various elements you're importing. Rust will automatically associate crates when you specify the root (like you do later for Value
.)
If you want, you could specify
use serde_json; | |
use serde_json::{Value, to_string}; |
and then drop the crate prefix from those uses later.
(I'm surprised that clippy didn't yell about this.)
glean/src/server_events.rs
Outdated
} | ||
|
||
// Exported type for public method parameters | ||
// Default impl empty values will be omitted in json from ping struct definition |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW: serde offers a flag for this: https://serde.rs/field-attrs.html#skip_serializing_if
} | ||
|
||
impl GleanEventsLogger { | ||
fn create_client_info(&self) -> ClientInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, because the app_display_version
and app_channel
are defined from the values set in GleanEventsLogger
.
Technically, a more rust way of doing that would be to create a
impl From<GleanEventsLogger> for ClientInfo {
fn from(gel: GleanEventsLogger>) -> ClientInfo {
ClientInfo {
app_display_version: gel.app_display_version,
app_channel: gel.app_channel,
..Default::default()
}
}
}
(this is where you'd have a impl Default for ClientInfo
that specifies the other values.)
You could then do
let telemetry_payload: PingPayload = PingPayload {
client_info: self.clone().into(),
(This also lends itself nicely to template building, because then you can just define the non-standard stuff you want to specify and trust that rust will backfill with the standard bits.)
But if this is the way that the Glean team is advising you create this, so be it.
glean/src/server_events.rs
Outdated
pub event: Option<Box<dyn EventsPingEvent>>, // valid event of `EventsPingEvent` for this ping. | ||
} | ||
|
||
// Record and submit `events` ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should make this a doc string, since we've got one for the record_events_ping()
function.
glean/src/server_events.rs
Outdated
pub syncstorage_platform: String, // Platform from which sync action was initiated. Firefox Desktop, Fenix, or Firefox iOS. | ||
} | ||
|
||
// Record and submit `sync-dau` ping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, probably should be a docstring
@@ -266,6 +273,14 @@ impl Server { | |||
&Metrics::from(&metrics), | |||
blocking_threadpool.clone(), | |||
)?; | |||
let glean_logger = Arc::new(GleanEventsLogger { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future optimization would be to have glean_logger
be an Option<GleanEventsLogger>
and only instantiate it if settings.syncstorage.glean_enabled
. That saves us a byte in the ServerState
.
.and_then(|header| header.to_str().ok()) | ||
.unwrap_or("none"); | ||
let _device_info = get_device_info(user_agent); | ||
if state.glean_enabled { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you used Option<GleanEventLogger>
you'd do something like:
if let Some(glean_logger) = state.glean_logger {
...
glean_logger.record_events_ping( ...
…send_in_pings to metrics.yaml
…er fields as not needed
…orce check in get_collections
fa2f0a8
to
cfc1e53
Compare
Made requested changes and really tightened up the documentation. As far as the From/Into impl stuff, I definitely want to implement it, but if ok with you we'll keep it the same as the generated Rust code for validating everything works and I'll open another short ticket for those refinements. Just makes it simple to validate our approach in case there's something standing out. Thanks! |
syncserver/src/web/handlers.rs
Outdated
let hashed_fxa_uid: String = meta.user_id.hashed_fxa_uid.clone(); | ||
let hashed_device_id: String = meta.user_id.hashed_device_id.clone(); | ||
let user_agent = request | ||
.headers() | ||
.get(header::USER_AGENT) | ||
.and_then(|header| header.to_str().ok()) | ||
.unwrap_or("none"); | ||
let device_info: DeviceInfo = get_device_info(user_agent); | ||
|
||
state.glean_logger.record_events_ping( | ||
&RequestInfo { | ||
user_agent: user_agent.to_owned(), | ||
ip_address: "".to_owned(), | ||
}, | ||
&EventsPing { | ||
syncstorage_device_family: device_info.device_family.to_string(), | ||
syncstorage_hashed_device_id: hashed_device_id.to_string(), | ||
syncstorage_hashed_fxa_uid: hashed_fxa_uid.to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you already have ownership of these Strings here so I don't think they need to_string'ing (or probably even the cloning as well?)
let hashed_fxa_uid: String = meta.user_id.hashed_fxa_uid.clone(); | |
let hashed_device_id: String = meta.user_id.hashed_device_id.clone(); | |
let user_agent = request | |
.headers() | |
.get(header::USER_AGENT) | |
.and_then(|header| header.to_str().ok()) | |
.unwrap_or("none"); | |
let device_info: DeviceInfo = get_device_info(user_agent); | |
state.glean_logger.record_events_ping( | |
&RequestInfo { | |
user_agent: user_agent.to_owned(), | |
ip_address: "".to_owned(), | |
}, | |
&EventsPing { | |
syncstorage_device_family: device_info.device_family.to_string(), | |
syncstorage_hashed_device_id: hashed_device_id.to_string(), | |
syncstorage_hashed_fxa_uid: hashed_fxa_uid.to_string(), | |
let user_agent = request | |
.headers() | |
.get(header::USER_AGENT) | |
.and_then(|header| header.to_str().ok()) | |
.unwrap_or("none"); | |
let device_info: DeviceInfo = get_device_info(user_agent); | |
state.glean_logger.record_events_ping( | |
&RequestInfo { | |
user_agent: user_agent.to_owned(), | |
ip_address: "".to_owned(), | |
}, | |
&EventsPing { | |
syncstorage_device_family: device_info.device_family.to_string(), | |
syncstorage_hashed_device_id: meta.user_id.hashed_device_id, | |
syncstorage_hashed_fxa_uid: meta.user_id.hashed_fxa_uid, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also I wonder if an empty string for the lack of user agent would be preferable to "none" (with us sending the glean system so many events shaving even a few bytes here and there might be worth it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call on ""
, much better. The values from meta
will still need to be cloned since meta
is then passed to the db_pool. However, I did remove those two needless vars and just .clone()
right off the meta
struct. 👍
…ty user agent, remove meta variables
LMK @pjenvey if the env var impl looks good to you. Should be overwritten when the matching |
Description
Core Glean logic to process metrics and events.
Related to the PR presently in the glean_parser repo #772 for permanent rust server metric capabilities, so changes there may propagate to here
Testing
Issue(s)
Closes SYNC-4440.