This blog is a merger of several personal and technical blogs I’ve maintained in the past, so you’ll come across a pretty random mix of technical posts and personal ramblings about my travels and my time in Manchester, UK. Enjoy 🙂
We recently went through one of our most epic product launches at Collectors: Customers are now able to submit cards for grading to PSA, add them to their collection on the new collectors.com site, and choose to have their valuable cards stored in an actual physical vault. Oh, and all of this will be accessible via a single login, their “Collectors ID”, which replaces the multiple logins customers previously had to manage across our different products and business units like PSA and PCGS. While all these new features are integrated to provide a seamless experience to our customers, we’re dealing with a significantly more complex architecture of multiple new systems, databases, and APIs on the backend. Naturally, as with any new product launch, our product managers were keen to get analytics about the use of these features from day 1: How many users actually converted to the new “Collectors ID”?How many items have customers submitted to the Vault?Who is using the new collectors.com “My Collection” feature?
In order to provide these kinds of data insights right from the go-live, we had to coordinate with several engineering teams to get our hands on the right data and integrate it into our data warehouse. In total, we ended up pulling data from systems sitting on top of four different production databases that were being launched at the same time. Considering how many different systems and databases we were working across, the integrations went pretty smoothly! Within a day of the go-live, I had produced a few dashboards with key metrics that the product and business stakeholders started using immediately to track uptake of the new services. The road to getting there wasn’t straightforward though and involved some amount of scrambling, knocking on different doors, and a few small surprises during the go-live. In this post, I will share some of my lessons learned from integrating with a new production system when you’re looking to provide analytics from the get-go.
I’m big on keeping running docs with notes from my conversations and findings when working on a project – I always say I outsource my brain into a Google doc. Keep a doc with (datestamped) notes and “to do” items for every piece of information you find, open questions, as well as a list of who’s responsible for what on the product, e.g. product managers, engineering leads, project managers, etc.
In addition to the running notes, connect with the business stakeholders (product managers, analysts…) early on to document a set of desired metrics along with their priorities and timelines: What do we need to know from day 1? What can wait until some time after the launch? This will also be helpful when exploring the new data models to determine what is actually being captured and what data points may not be available to calculate the required metrics.
If there are standing meetings for the engineering team that’s responsible for the database setup, I strongly recommend regularly sitting in on those meetings. Even if you don’t always understand everything that’s going on, it’s helpful to have the context of what the team is focusing on, and establish a relationship with them. As data engineers, we’re often pretty removed from our counterparts on the data producer side, but knowing the people on the team (and having them know you) can be helpful in working together more effectively.
Once you know who your engineering point of contact is, the first question you’ll want to ask is: How do we get access to the data? Assuming we’re talking about data that lives in a relational database, here’s a short check list of information you need to get from the engineering team that’s responsible for the database setup:
Find out (and document) what cloud service the database is hosted in
Will you get access to a production database or a read-replica? And what permissions will you get, read-only, or will you be able to create temp tables or views if they’re needed by any of the tools in your pipeline?
Will there be dev and prod environments? What’s the timing for these being available?
How do users and services authenticate against the database? Do we need personal and/or service accounts to log in?
How will the logins will be shared? Will you need access to a shared password storage?
Do you need an SSH tunnel setup to access the database from any of the tools in your data stack?
It’s best to try and get all these details ironed out as early as possible, since especially tasks like setting up SSH tunnels can take some time. Make sure you can access the database as early as possible to avoid surprises later on, even if there is no meaningful data in there yet.
Now that we’ve covered physical access to the data, let’s take a look at things to consider when you’re working with a new data model. I got looped into the production database design process early on and was able to provide input on the data modeling (see also: establishing a good connection with the upstream engineering teams! They’re your friends!). This ensured that the data would be suitable for our data extraction tool (Stitch) and contained all relevant data. Again, assuming you’re working with a relational database, here are some questions you’ll want to cover when talking about the data model:
Where is the data model documentation and how is it being kept up to date?
For any fields containing value sets, such as status codes, where are the corresponding descriptions stored? Will there be lookup tables in the database, or will these only be stored in code? The latter means you will need to be able to access the up-to-date list of lookups through your infrastructure, e.g. by querying an API (or simply reading the API documentation).
Will there be JSON columns? What is the schema for those?
What are the constraints on each table and column, e.g. foreign key relationships, NULL values, default values?
For datetime fields, will they be stored with timezone (they should)?
Application and data flow
Perhaps most importantly, when trying to make sense of data coming from a production base, we need to understand what the flow of the application is: What workflows (user-created or automated) in the application modify the data in what way? This is absolutely crucial to handling the data correctly and drawing the right conclusions from it. For example:
How and when is a record created, and what fields are populated through what input?
What workflows cause records to be modified in what way? And what metadata is there to track modifications, e.g. a “last updated” timestamp?
Will update timestamps for events such as status changes be tracked in separate fields? Or will there be kind of changelog table that captures these kinds of changes? This also trickles down into your data warehouse models, where you might need to start tracking status change dates right from the get-go.
How are deletions being handled? Will there be “hard deletes”, i.e. the record is simply removed, or “soft deletes”, i.e. the record has a “is deleted” or “deleted timestamp” field. And, along the same lines, is there a data retention policy that means data will be dropped or archived after a certain amount of time?
If the application is replacing a legacy application, will data be migrated? How do you recognize migrated data? Will there be any gaps or differences between migrated and newly create data?
Will there be realistic dummy data (i.e. data that adheres to the constraints and workflows described above) to develop our data models and metrics against?
Is there any chance of any test or dummy data getting into the production system? And if yes, how can we recognize and filter for it?
Ideally, your engineering and database admin teams will already have a “best practice” guide for designing new databases, which usually answers a lot of these questions. Otherwise, this might be a good time to start collecting these kinds of design decisions into a guide and encoding them in setup scripts where possible.
I hope that this post has provided you with a starting point for a checklist for your next production data integration. All the questions I’ve covered in the above paragraphs should be treated as conversation prompts to elicit existing design decisions, or help influence decisions that are yet to be made. There will likely be some oversights (I have yet to work with *the* perfect production database), but coming prepared with a plan may help you catch some of the biggest issues to getting a good data integration early on. And even with the best preparation, you can probably expect to make some tweaks after the application go-live to adjust to some last-minute database changes or correct some assumptions you’ve made about the data. Developing against an empty data model or even dummy data can be challenging, and you might not nail everything at first try.
One last thing to keep in mind: As data consumers, our downstream use case will most likely be of lower priority than getting the production system stood up – and that’s totally okay. While I would love for data to always be a first-class citizen, I believe it’s pretty obvious that producing a stable production system needs to take priority, and we just need to accept that resource constrained engineering teams may move slower on supporting a data integration. This is why you’ll want to get started early and get these kinds of tasks and questions on the engineering team’s radar as soon as possible.
I wrote an epic blog post series about my experience building a data platform from scratch in my new job, using the “Modern Data Stack” (well, at least parts of it). The post is an account of my first six months at Collectors building a data platform. It is part memoir, part instructional manual for data teams embarking on a “build a data platform” journey. I figured this might be relevant for some of y’all data engineering folks and/or “data teams of one”, so check it out here: Building a data platform from scratch at Collectors: Part 1(parts 2 and 3 are linked from the post).
This article provides an overview of (possible) steps to perform an exploratory data analysis (EDA) on a data set. These instructions are largely based on my own experience and may be incomplete or biased, I just figured this may be helpful since there’s not a lot of content out there.
The goal of EDA
The goal of exploratory data analysis is to get an idea of what a new data set we’re working with looks like. We’re mainly interested in aspects such as the size and “shape” of the data set, date ranges, update frequency, distribution of values in value sets, distribution of data over time, missing values and sparsely populated fields, the meaning of flags, as well as connections between multiple tables. This goes along with notes and documentation about the findings with the purpose of having a permanent reference for this particular dataset. Once EDA is complete, we should be able to add an integration for the data and create analyses more easily than if we started from scratch.
Tools for EDA
EDA can be done in different ways depending on which toolkit you’re most comfortable with. Generally, you can start out simply writing SQL in your SQL workbench (e.g. DataGrip) or start immediately with a notebook (e.g. Jupyter or Hex). At the time of writing this (February 2022), my data team mostly uses Hex notebooks for EDA, as they allow sharing and commenting on analyses fairly easily.
Some basic principles
In an ideal world, we wouldn’t have to do EDA, but be working with well documented data and an accurate and up-to-date entity relationship diagram. Ideally, we’d also have this data “profiled”, i.e. have some form of documentation with some basic statistics. This isn’t the case very often for production databases, which is why we do in-depth EDA. However, if we can find documentation or code, we should use it during EDA to inform our assumptions and insights.
When looking to integrate source data, make sure to query the actual source data table, rather than any modification (view, subset, extract) of it – if possible. This is to make sure we’re not looking at data that might already have some issues introduced through our code.
Keep a “running monolog” in the notebook or SQL script. Yes, the code is usually self-explanatory, but it’s good to document your thought process so that other people can follow along more easily. Examples:
“Here I’m just looking at the row count.”
“Let’s join set_item on the set table to see if the IDs match. I see that there are no missing joins, so this seems to be in sync.”
Keep in mind that you’re only looking at a snapshot of the data at this point in time, so all assumptions you’re making may not be true forever, unless they’re explicitly documented and asserted in code (see 1.).
If you’re working with a large data set that’s very slow to query, pick a “reasonable” subset, e.g. restricted to a specific time frame.
I usually just look at numbers, but occasionally having some lightweight data visualization can be helpful to see trends. The resources I’ve listed below have some more content on using data visualization for EDA.
There is no exact playbook for EDA since the steps depend on the type of data you’re looking at. Here are some high-level steps to follow:
Print some sample data of the table (first few rows, or use a sample function if available), just to look at what kind of data each column contains.
If using Pandas, transposing a dataframe (df.T) can be helpful for reading through wide tables.
Print the data types for each column. Keep in mind that the database datatype and logical datatype might be different, e.g. an integer field may be used to represent a boolean value with 0/1.
Identify the primary key and potential foreign key columns, and relevant timestamp fields (e.g. created date, last updated).
Get some basic numbers for relevant fields:
Table row count
Unique count for the “primary key” field: Does it match the row count, i.e. is it really a unique primary key? If not, is there a set of fields that can uniquely identify a record, e.g. ID field + timestamp.
Min/max for numeric and date fields: What ranges are we looking at?
Note if there are values that look like dummy values, e.g. “1900-01-01” for dates, or dates in the future.
Note if there are values that look like outliers based on the column name, e.g. a “2088” in a field named “customer_age”.
Group by and counts for value set columns, i.e. categorical variables such as boolean fields or values from a fixed set, e.g. “service category”
Pay attention to NULL values – how sparsely populated is this column? Can we expect NULL values at all? Or do we have another “dummy” value that represents NULL, e.g. “Unknown”?
For boolean fields, do we have only true/false values, or do we also have true/false/NULL and if yes, what does NULL represent?
It’s fairly critical to find out which fields are free-text and which ones have controlled input. The database datatype in both cases will be text, but the logicaltype is either controlled input (categorical) or free-text.
Look at the distribution of record counts over time: Identify the relevant date field (if exists) and count the number of records for a reasonable time period, e.g. by month or year.
This gives us an idea of the volume of data to expect over time.
It also helps see at what point the data starts to be “complete”, better than just looking at the earliest date since we might just have a handful of records for specific dates.
This captures most of the basic stats listed under “Get some basic numbers” as well as more detailed histograms, correlations, etc.
It can get a little unwieldy for large table, so might make sense to only focus on a subset of relevant fields.
If working with multiple tables, try and draw out a simplified high-level ERD (entity relationship diagram) to get an idea of how the tables join together and whether we have referential integrity.
Run the joins and confirm whether join fields always match or whether there are some “empty joins”. E.g. ask questions such as “Does every service_level_id in the submission table have a corresponding record in the service_level lookup table?”
I haven’t found too many posts that I found particularly helpful (maybe all the good content is tucked away in books?). Here are a few links to other sites that may be useful and complement this post:
This might be important context for some of y’all who come from a different background: I never exercised consistently in my life. I’ve never participated in any competitive sports. In part because that’s just how I grew up (see also my post on “Is it okay to just be okay?“), and in part because competitive sports aren’t as big a deal in German schools and colleges as they are in the US. I only got into running casually in college, and only started taking it more seriously (i.e. tracking my times) in early 2020 when I embarked on my “52 weeks, 52 albums” running project.
I also used to get absolutely debilitating exercise-induced migraines ever since was a kid, in particular triggered by exercise in hot conditions, which made it impossible for me to run through summer even when I was in fairly decent shape. It took me until my late 20s to figure out that I had to supplement massively with electrolytes (big fan of nuun!) to make up for all the sweating, and ever since then I’ve been able to live a fairly normal life with the occasional advil thrown in.
By July 2020, I’d spent the first few months of the pandemic doing not much else and felt like I’d be physically able to do a run streak. Enter: #31DaysOfRunning.
Real simple: #31DaysOfRunning is a running “challenge” where you commit to running a certain mileage every day of the month, for 31 days. Specifically, the version I know, is committing to running a minimum of 3.1 miles (5K) every day in July, one of the hottest months of the year in the northern hemisphere. And yeah, the weather is kind of the point.
Honestly, I can’t quite recall how I came across this hashtag on Instagram, but someone in the NYC running community posted about #31DOR and I decided to commit! According to MapMyRun, it was actually started by NYC folks a few years back… which doesn’t surprise me, given how hard people go in this city. We truly have no chill 🙂 Resident Runners, the NYC running crew lead by Ray who originally started the streak, keep a website up where you can sign up to “officially” participate in #31DOR and may be in the run for some Under Armour gear. According to their Instagram page it looks like it’s happening again this year!
Why is #31DaysOfRunning such a game changer?
Habits! Y’all love habits!
First of all, this is the most obvious one: You do something every day for 31 days in a row, it will likely turn into a habit. Meaning, it’ll be easier to do it and it’ll come naturally to you without having to think about it much.
For me, not having to think about whether I’d run but only figure out some of the logistics (when, where, how long, do I have anything left to wear!?) took a lot of the procrastination and “I’ll do it tomorrow” out of my running and made me… JUST DO IT. It’s almost a relief to not have to play the eternal mind game of “will I, won’t I?”.
The other thing it made me good at was the actual logistics. I was able to significantly cut down the time it took me to get ready for a run, then shower afterwards and get ready for work or other activities, which still helps me today when I want to squeeze in a run but don’t have a ton of time.
Resilience, mental and physical
If there’s one thing you’ll get from running every single day during one of the hottest months of the year (at least in the northern hemisphere), it’s mental strength. Being able to deal with The Suck, aka the feeling of discomfort that’s clearly different from actual pain and non-threatening but it still sucks and you want it to stop. Especially during the first 10 days or so, I could definitely feel the physical strain on my body, but I also knew that my mileage had been high enough and my running consistent enough that I’d be able to handle it. I was just sore and felt, well, The Suck. I just kept going and after a couple of weeks, I had a mental shift that enabled me to run an easy 5K at pretty much any time without really noticing.
And after a while, there’s also some physical resilience happening. I wasn’t sore anymore after my 3 milers. It just felt… normal. Keep in mind, I didn’t have a pace goal whatsoever and kept a lot of these runs super easy.
Daily wins. Like, DAILY. WINS.
Every single time you put on your sneakers, get out of the house, and do even the tiniest little baby run, you will feel like you’re absolutely CRUSHING your goals. And you get that 31 days in a row. Pretty awesome.
I still remember coming back home from a bike packing trip (50 miles through rolling hills with an old bike and panniers full of camping gear), dropping off my gear, and going for a run at the East River track. My brain went “wtf are you DOING“, but I felt like I’d finally accomplished… something.
What stuck with me?
After finishing the run streak, I took a day off running, believe it or not. And then went right back at it the day after, because I felt like it. Sadly, I’d been dealing with a light injury which had been caused by a toe injury the year before, so at some point in September 2020 I had to put my running on hold to get some rest.
It did take me a while to get back into a running habit in early 2021 (also in part because I was worried about the injury returning…), but I never lost the ease with which I’d put on my sneakers and run. Going for a run isn’t a big deal anymore or takes much thought, it’s just part of what I do several days a week, even when I’m not necessarily feeling it. I know I’ll get into it once I’m running, and if not, I know I have the mental strength to push through The Suck… or I can just run back home.
And in that way, #31DOR did in fact change my life. I’m very much not the same person I was before that. I’ve developed a habit, something that I just “do” without thinking about it. I know I can do hard shit, like, really hard shit, much more than I thought I could.
Pick a minimum distance/duration to commit to. Classic #31DOR would be 3.1 miles/5K, but I’d say if you’re not running on the regular, this might be physically overwhelming. Pick whatever feels like a challenge, but where you know you’re not going to overwork your body. Commit to walking if you’re not a regular runner, but I would say if you do feel like running 31 days is physically possible, go for running. Even the lightest jog requires a different mindset from walking. And keep in mind, it’s a minimum, you can always do more. I probably ended up doing more miles about 50% of the time.
Find a good regular time, or just be ready to run whenever. That might mean early morning, midday heat, or 11pm at night. Make sure you have the right equipment (see below) and have some routes planned out that are suitable for different times – shady or breezy during the day, safe, well lit, and with plenty of people at night. You might end up circling your block 10 times… that’s life.
Have the right equipment and have it ready. That means: Sunscreen, hat, sunglasses, water if you’re going for a longer run, watch or phone, headlamp and reflective stuff if you’re running at night, something to hold your stuff. ELECTROLYTES (y’all know I’m obsessed with nuun).
Track every run (or walk) as a separate workout using your favorite app (Strava, MapMyRun, Nike Run, Garmin, etc.). This is about being intentional. “I happened to walk a lot today” doesn’t count, the point is to get into the habit of putting on your shoes and getting out the door.
Workout clothes logistics: You’ll be going through an insane amount of clothes because SWEAT. I usually take my outfit (shorts and bra) into the shower with me and rinse it in soapy water, then hang it up to dry. That way I’ll get 1-2 more runs out of it before it needs a proper wash. Yes, your bathroom will likely be covered in drying/wet workout clothes. That’s part of the fun.
Sign up with Resident Runners if you want to commit to the daily 3 miles and share your runs via MapMyRun! Apparently there may be some Under Armour gear happening…
And finally, share the fun! I’ll be posting daily updates with the #31DaysOfRunning #31DOR hashtags on Twitter with the day and my mileage and maybe a fun lil snap. Feel free to share and tag me (@spbail) too and I will be HYPING YOU UP.
But also: This is a challenge for yourself. Make it your own thing, whatever feels good. My mileage isn’t as high as it was last year and I’m still coming back from an injury, so I’m only committing to a 1 mile minimum (but I’ll likely do more if I feel it’s safe). I’ll be posting daily updates on Twitter with the hashtag! Let’s goooo!
Have you ever been out to a restaurant or bar with someone you considered a friend, or maybe a partner, or a date, and it turned out they acted kinda shitty towards the wait staff? Maybe they were unnecessarily impatient, rude, dismissive, entitled, or talking down at people? Or maybe you just witnessed someone acting like that in a public setting and felt some amount of “Fremdscham” (the German word for feeling ashamed for something someoneelse is doing) ? Yeah? That’s because acting like that is generally considered “bad behavior” and most folks are aware of the rules of common courtesy when interacting with other people, usually those in a position of delivering a form of service.
Cool, Sam, but why are you telling me that? Isn’t this like, a tech blog of sorts?
Well, I recently participated in a number of virtual tech events where I witnessed that very same rude, dismissive, impatient, disrespectful, and entitled behavior (yes, this post is a bit of a rant!) from participants towards the organizers and presenters, and it appears to be more of a systemic problem than just a few individuals being annoying.
Here’s an example from a free live training session I recently attended that was the catalyst for this blog post (note the timestamps for the correct order):
The presenter had clearly explained and demonstrated two free options for using the software at the beginning of the hands-on part, and the teaching assistants in the course had responded to every single one of the participant’s questions. And yet, he posted himself into a rage and acted like a complete ass. I can’t imagine that he’d act like that around his office – and if he did, I hope the company would tell him very clearly that’s not acceptable behavior.
(As an aside, another participant joined the live training 20 minutes before the end of the 2 hoursession and demanded someone explain to them how to get started. The training was definitely interesting.)
Another example for interactions that are not necessarily disruptive but just look bad are folks asking for help in Slack channels. I just posted about this on Twitter a while ago:
I’m in a quite a few tech Slack channels and I used to be a maintainer of an open source project, and the typical behavior I notice is:
New user joins the channel
Immediately posts a question asking for support, often dumping an entire error stack trace into the channel with no warning
Frequently cross-posts the same question in other channels
Occasionally posts several “anyone?” type follow-ups
(Rarely) posts some annoyed or frustrated comment when they don’t receive help
Maybe I should care less about these kinds of things, but man, seeing this is annoying. I’ve muted most Slack channels I’m in because of too many Fremdscham-inducing interactions. Especially in open source communities, this sort of Kool Aid Man behavior (kicking down the virtual door but going “HELP ME” instead of “OH YEAH”, you get the idea) makes you wonder where people left their manners.
Another version of this is the “mouse asking for milk” behavior, which often follows Kool Aid Man behavior once someone receives help. For those that don’t know, the popular children’s book tells the story of a mouse that receives a cookie, then proceeds to ask for milk (to go with the cookie), a straw (to drink the milk), and other favors. This often has the effect of pressuring the helper to dedicate more time and implicitly puts the responsibility of solving the issue on them instead of the original question asker: “If you don’t continue to help me, you’re letting me down and I can’t solve this problem”.
Look, I understand that we’re all trying to get to results as quickly as possible. Fixing bugs and production fires, figuring out a configuration after banging our heads against the wall for hours, trying to get something to work while following along with a live instructor, all these things are annoying and stressful and make us impatient and want HELP. NOW. But we always have to keep in mind that the people on the receiving end are also just… people. Who are usually trying their best to be helpful, but they might have their own stressors, deadlines, time schedules to stick with, and might not have the capacity to drop everything and help. And maybe you’re the one who’s causing the thing to not work (if you’re in tech you’re guaranteed to have had that experience) – might be time to take a step back and take a break.
I’d also like to clarify that I’m not talking about obviously “bad” or illegal behavior. While many meetup groups, conferences, and open source projects have a Code of Conduct, most of the behavior I refer to is not necessarily a violation of a Code of Conduct, but just generally unpleasant. But keep in mind, just because it doesn’t go against any of the rules doesn’t mean it’s not disruptive, disrespectful, or just plain annoying to the organizers, presenters, volunteers, and other participants. And it makes you, and potentially the company you represent, look kinda bad.
How to not be “that person”
So here’s a thought for folks attending any kind of (virtual) events or participating in Slack communities, message boards, Reddit, GitHub conversations, and other communication channels. I don’t know if anyone’s reading this who should be reading this, but here we go. Before posting anything, ask yourself the following questions:
Did I read the “welcome” message and instructions of where to post what?
Am I posting in the right channel?
Is my question clear and can people actually help me based on the information I’m providing?!
Did I use the search functionality to try and see if this question was already answered?
Am I asking an unpaid volunteer to do extra work? Have I already taken up a lot of their time?
Am I being respectful and mindful of people’s time and other responsibilities?
Would I post these kinds of things in my company chat, or say it out loud in a team meeting when my peers and managers are around?
Can I wait until it’s a good time to ask that question?
And even after posting a question, there are some things you can do to make everyone’s life easier:
Check whether someone actually answered the question, or asked for more details. Respond in a timely manner, or at least let them know that you will get back later.
Said differently, pay attention and understand that if someone responds to you, they dedicated time to helping you. Be respectful of their efforts.
If you don’t get the help you need, well, so be it. Unless you’re talking to the customer service of a service or product you pay for, you are not entitled to receiving any help, like, ever. And even if you’re paying for the service, keep in mind that customer service staff are humans you should treat with respect. Be persistent if you need to. But for goodness’ sake, please be nice.
If the problem is resolved, post that you solved it and ideally, share your solution! This will help people later on, and lets people know that you no longer need help.
Tell your coworkers to not be “that person”
And for the managers out there: I know you’re not responsible for how your reports act outside of the work environment, unless that employee is explicitly there to represent your company. But we all know that the workplace implicitly extends beyond the boundaries of your company’s office, Slack, or email, and that employees are often seen as representing the company in the “outside world”, whether that’s good or bad. If your reports or coworkers (or managers…) behave disrespectful or somewhat disruptive (again, without necessarily violating any Code of Conduct) in an “extended work” setting, that’s just going to look bad and quite possibly make people question your company culture and what kind of people you hire. Well, it definitely makes me question what your company culture is like.
This isn’t an easy conversation to have, but I do believe that any company that onboards new employees likely shares (should be sharing?) some form of “rules” of communication, their company values, or other training that usually boils down to “don’t be rude“. It should be easy enough to include that this also applies to external venues such as (virtual) conferences, Slack channels, message boards, meetups, and other spaces in which the employee is present in a somewhat work-related context and may be seen as representing the company.
And for the presenters, maintainers, and volunteers out there…
Hey there, I see you. Well, I am you. I run workshops, teach coding classes, give conference talks, and help out in tech Slack channels. And I know that putting yourself out there and doing stuff out in public, whether that’s as a volunteer or part of your job, always comes with some amount of pressure and anxiety. Dealing with people who are rude or impatient is never pleasant. Here are some thoughts on how to help with this:
1. Set automated welcome messages in Slack and other communication channels explaining to folks where to post and how. Based on my experience, you can expect some proportion of people to actually read them, and some proportion of that to follow the rules. There will always be people who don’t pay attention, but you can make sure that the rules are actually enforced through gentle reminders: Ask your staff or volunteers to nudge people to post in the right channels, which (hopefully) also will be noticed by other members who will help with that. The dbt folks are pretty good at directing their Slack traffic to the right channels using welcome messages and periodical friendly reminders, see the screenshot below.
2. Add an “FAQ” page to your organization’s website.Reshama Shaikh, a data scientist who’s incredibly active in the NYC tech community I’ve been lucky to collaborate with for years now, recently pointed me to the FAQ page of Data Umbrella, a volunteer-led community group she founded. The FAQ cover a range of questions such as “can you give me career advice” and “can you help me find a job” and kindly point out that the group is entirely run by volunteers who give up their free time and pay out of their own pocket for any kind of expenses (such as MeetUp fees).
3. Have a slide on “How we communicate” rules at the beginning of a talk or workshop. In addition to highlighting the Code of Conduct, you can remind people when and how to ask questions, to use the search function, mention that the talk will be recorded and how the recording will be shared. If you have helpers or TA’s, ask them to enforce those rules, e.g. by posting reminders to hold questions, that the talk will be recorded, or links to the material.
4. Make technology work for you. Honestly, this might be a little dramatic, but see item #1 – there’s going to be a certain number of people who don’t read the rules. One way to make technology work for you, in addition to automated welcome messages, is to lock down the “general” Slack channel to allow only staff announcements, which is a good way to avoid the “new user support question dumping ground” effect. Another option to consider for any kind of live event is to only allow participants of a to join until a few minutes into the event, which avoids people not catching parts and then demanding help 45 minutes into a session.
5. It’s ok to not please everyone. I used to have the “will to please” like a freaking Golden Retriever. But you know what – it’s ok to say no, ignore people, or tell them to wait, for the sake of your own sanity. If someone comes into an event 30 minutes late and you’re a presenter or assistant already juggling several participants, well, maybe the person who came late simply won’t get lucky today and will have to figure things out themselves. Be kind, but firm, and let them know that you won’t be able to catch them up. Sorry. Likewise, if you’re helping someone out in a Slack channel and the mouse asks for more milk, it’s ok to let them know if you don’t have the capacity to help them any further… unless you are working in customer support of course and uh get paid to do exactly this. Otherwise, allow yourself to say no if this is turning from something you enjoy into a chore.
I focused a lot on the “don’t make your company look bad” argument in this post, but I think it’s also important to point out that general kindness and respect towards people who dedicate their time to maintaining software, running workshops, giving talks or presentations, should be a given. Whether that’s paid or unpaid, we all need to consistently make an effort to see the person on the other side and man, just give em a break. Chill. Be nice. Accept the fact that sometimes you can’t have it your way. It’s ok. The world won’t end.
So… it’s been almost exactly 8 months since everything started shutting down in NYC due to covid-19 (and, well, it’s been a clusterfuck of a year but let’s not talk about that right now ok). As so many people, I spent my time in lockdown learning new things – I did a lot of yoga, mastered a handstand, did macrame, way too much tie-dye (anyone need tie-dye baby onesies? HMU.) and well, I taught myself how to use Garage Band and recorded a few songs.
My quarantine inspired album (ok, it’s only 3 songs so far) “iso trap” (from isolation and the music style trap but also a word play on being trapped at home, get it?) combines two of my passions: music, and comedy. In fact, it combines two wonderful things into what I consider to be one of the lowest possible art forms: musical comedy. Don’t even try and argue. Musical comedy is great because it’s just straight up bad. No one, not a single person, ever said “oh man that was an awesome song” when listening to, dunno, Flight of the Conchords. It’s just about tolerable as far as music goes, and the humor is more witty than actually funny, but somehow it’s strangely appealing nonetheless. Musical comedy is the Taco Bell of art. It’s bad and we all know that, but when it’s good, it slaps.
Anyway, here are several songs I wrote, arranged, and recorded. Two of these were performed at my friend Soheil’s virtual variety show “The Cat’s Throne Zoom”, the third one in my kitchen with no audience. Enjoy.
“Everything is canceled”: A song about everything being canceled. Inspired by the Lego Movie. Includes a dubstep breaking and bad white girl rapping.
“You’re not essential”: A love song / PSA about picking the right person for your quarantine bubble, or maybe waiting until Phase 4 reopening to engage. Inspired by a song we all know and love.
“Second lockdown”: Yup, we messed up. A song about starting from square one. A straight up cover of “Closer” by the Chainsmokers using the instrumental, planning to do my own arrangement of this too and maybe get a second person to be the Halsey to my chainsmoker.
While I usually publish less personal posts on here, I’ve been thinking about this topic a lot since I started to learn snowboarding and skateboarding a couple of years ago – in my early 30s, which has been not just a physically painful experience, but also stirred up a lot of emotions about my own sense of worth.
I’m not an expert in anything. I’ve never learned to play an instrument. I enjoy exercise and sports, but I’m far from actually being good at anything. I’m fluent in my second language (English), but will probably never be at the level of a native speaker, and I’ve been struggling to reach any kind of fluency in Spanish for over a year. I did well (not amazing) in school and in my PhD, I’m decent at my job, and (luckily) I’ve always had the chance to work with plenty of people who are significantly smarter than I am. In short, I’m just kind of… okay at things. Good enough where necessary to not be a burden to others (like my job, organizing events, public speaking and teaching), and somewhere between pretty bad and decent at everything else.
And I’ve resented myself and my parents for that pretty much my entire life. While some of my friends joined sports teams or started taking music classes at an early age, learning a skill that would last them their whole life, I sat at home, watching TV, and eating top ramen. Others appeared to have an interest and internal drive to learn and excel at their chosen hobbies at an early age, as well as the determination and ambition to stick with it through years of hard training, competition, failures, and successes. Meanwhile, I became addicted to online chat rooms at age 13 and didn’t leave my computer for years, racking up enormous phone bills until we finally switched to a flat rate.
A lot of this is my own fault – I had the chance to speak up and ask my parents to enroll me in a class or club at any point during my childhood. But I almost never did, and the one attempt I made at taking piano lessons at age 12 (copying a friend of mine from school) failed after several months because I preferred to take afternoon naps rather than go to class. Some of this might be due to my parents leaving me to figure things out by myself at an early age, emphasis on “by myself”, and the lack of support from any adult who might have had a more long-term view of the benefits of extra-curricular activities. Besides, US schools put a lot more emphasis on extra-curricular activities than German schools (or from what I know, any European country for that matter), with scholarships being an important factor in gaining entry to college education, so any activity outside of class is usually driven by parents who did the research required to find the right clubs and lessons for their kids.
Whatever the reason, I have been struggling with my mediocrity my entire life, but ironically also never attempted to actually rectify this and put in the work required to become an expert at something – at any point, I figured it was already too late anyway. Now, at 34, I’m a decent software engineer (thanks to the internet and wonderful smart and patient coworkers) who can play a handful of songs on the harmonica, ride up and down small ramps on a skateboard, survive blue runs on a snowboard, do a few yoga poses, bake a nice cake, get a couple of laughs when doing standup, and communicate in mostly broken Spanish (old people usually find that charming though, so I guess that’s a win). I’m not great at any of these things – I’m the textbook definition of “okay”. On the other hand, my partner is a software engineer and multi-talented musician who plays and performs in several bands, has been skateboarding for decades, and knows how to ride and maintain a motorcycle, just for good measure. And I’d be lying if I said it doesn’t pain me to face my own ineptitude every time I see him perform on stage, land a seemingly impossible trick at the skate park (first try!) or bomb black runs on a snowboard.
So. Where do I go from here? Should I stop doing the things I enjoy because the chances of me becoming an expert are fairly low, given that I’ve never shown any sort of stamina and determination when it comes to learning anything outside of what’s required for my job (I have a strong sense of duty that seems to kick me into overdrive mode when needed)? Should I work hard to finally change my personality that’s been shaped by 34 years of mediocrity, self-loathing, and abandoned plans to become a better person? Throw money at the problem and pay someone to coach me? And once I’ve achieved expertise in something, will I be able to lead a happier and more fulfilled life? Why do I even believe that it’s so important for me to become an expert at something? Do I want to get better because it’s the only way I’m able to enjoy something or because it enables me to do even more enjoyable things, or do I want to achieve expertise because expertise is highly valued in society, and I crave the attention and praise that might follow from it? Why do we value expertise so highly for interests like arts and sports that have little actual impact on other people’s lives? (I’m not talking about professions that require expertise to ensure safety and wellbeing of others – being a mediocre pilot or doctor is not exactly an option.) Why the hell am I beating myself up over this so much when I don’t even know what I’m doing this for, other than my own enjoyment?
Interestingly, while writing this post I did a quick Google search on “is it okay to be mediocre”. Most results told me “NO” – it’s a sign of laziness, you should strive for greatness, no one is average – everyone is special! and so on, and so forth. And then I found a post from Mark Manson, author of “The Subtle Art of Not Giving a F*ck” which I actually read about a year ago. In his post “In defense of being average”, he writes:
It’s my belief that this flood of extreme information has conditioned us to believe that “exceptional” is the new normal. And since all of us are rarely exceptional, we all feel pretty damn insecure and desperate to feel “exceptional” all the time.
A lot of this might be obvious to others, but things only just clicked for me here:
Praise and recognition are a wonderful result of doing something, but most of what I do voluntarily is already enjoyable, even if I will never receive any compliments or applause for it. Otherwise I wouldn’t be doing it. Or maybe shouldn’t be doing it.
I also realized that one of the reasons why society values expertise so highly is because it makes your skills valuable to others – they can benefit from it, whether it’s you working for them, or providing entertainment of sorts. Just about average performance isn’t worth anything to others. Especially in tech, we’re constantly looking for rockstars and ninjas and superheroes – people who are exceptional at their job, devaluing everyone who is good but not amazing.
Mastery may open up new opportunities – maybe a black run really is disproportionately more fun to ride a snowboard on than a blue run – but for me, even easy runs are still fun and challenging. Even if I don’t practice and train at the level to become an expert, if I keep doing as much as feels right, I’ll most likely make a little bit of progress eventually and get to try out something new. Or I might not. Maybe I’ll keep doing the same blue runs over and over without ever getting any better, and eventually give up on snowboarding because it’s getting boring. And that’s… also okay?
A lot of people are afraid to accept mediocrity because they believe that if they accept being mediocre, then they’ll never achieve anything, never improve, and that their life doesn’t matter.
(Mark Manson again)
While I still wish I had invested more time and effort into mastering something earlier in my life, I should probably just accept that I can’t go back, and at the same time, that mastery might not even be the right goal for me. I’m not an expert in anything. If you say “you still have time to become one”, trust me, I won’t be, I’m not the kind of person who will. But I’m neither a pilot, nor a doctor, and I’m both confident enough in my job skills and aware enough of my deficiencies to consider myself competent, employable, and not a risk to others. Whether I’m just mediocre at the things that I enjoy doing shouldn’t matter to anyone, and it shouldn’t matter to myself. Maybe it’s okay to just be okay.
In case you missed it, I lived in Manchester for 5 years and somehow developed a proper Mancunian accent. Somehow I ended up on Nathan Rae’s podcast “Northology” in 2013, talking about Manchester Girl Geeks, a not-for-profit community group I co-founded a few years prior (they’re still going strong, 10 years later!). If you want to listen to 30 minutes of me being proper Northern, the recording is still online.
When I joined Flatiron Health in February 2014, I had no idea what to expect. I had just moved to New York City – my second ever trip to the US – with two suitcases, crashed on my friend’s couch, and walked into the office in the middle of a snowstorm (I got in late on my first day because I was left stranded by the MTA – pro move!). I was on a 1-year visa and didn’t even know whether it was going to get extended after the year was up, or whether the 20-person startup I had just joined after finishing my PhD in England was even going to last that long.
Almost 5 1/2 years later I’m now looking back onto many late nights at the office, countless meals with my work family, a few drinks (just a few, really!), late night karaoke, rafting and ski trips, pipeline breaks and product launches, both great and absolutely horrifying client calls, several rounds of funding, an acquisition (us buying a company twice our size), another acquisition (this time us getting acquired), almost a thousand new employees, many farewells, wonderful relationships, challenging relationships, my first intern, my first direct report, my first time as a team lead, and my first goodbye to a company that I still talk about as “we” even though I officially left almost a month ago. As I like to tell people who ask me about my time at Flatiron: It’s been a wild ride.
So… what’s next? Honestly, I don’t know. I want to continue doing “data stuff”, but as a non-traditional (as far as the word “traditional” applies to a fairly new field) data scientist who puts data empathy and interpretability before building ML models, it’s going to be an interesting challenge to find the right fit for me. For now, I’m still based in NYC, enjoying the summer, plotting some travel, and reflecting on the things I’ve learned over the past few years.
You must be logged in to post a comment.