Defining and (probably) debunking “The Datamesh”

I work for a Data Science Software company where we make tools to help people do data science(DS). Actually, we make one tool that covers the range of “data wrangling”, normalizing, testing, modeling, hosting and monitoring. There’s a lot more to DS than the cool stuff DS’s love to do. Date normalization is an easy example.

What’s a Datamesh? According to http://Starburst.io the answer is: “Data mesh is a new approach based on a modern, distributed architecture for analytical data management. It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product.”

Let’s discuss: New approach? Not really, it’s just the correct way to manage data. Saying new approach always sounds cooler.

Modern, distributed architecture for analytical data management. So… distributed OLAP? Distributed how? Distributed why? We don’t like warehouses anymore?

It enables end users to easily access and query data where it lives without first transporting it to a data lake or data warehouse. “easily” is always a fun word. Sorta like “user friendly” from 80’s software. Right, play the data where it lies… AKA remote query. So instead of moving data, you connect to the source and local processing engine directly. That last bit is important, not all data source are SQL engines. Sometimes you have to deal with clusters. You also have to pay for processing. Are you OK with the BI Team beating on your SQL Server?

The decentralized strategy of data mesh distributes data ownership to domain-specific teams that manage, own, and serve the data as a product. Awesome. What tools are you providing to enable these “domain-specific teams?” (We used to call them Domain Experts or SME’s.) I’m all about SME’s being able to curate and publish data products. I’m all about making it easily searchable. I’m also all about making sure it’s well explained, quality controlled and secure when needed.

So, is Datamesh a product? Nope. It’s a concept. It’s data management done the way it should be. Sorta like Cloud is basic IT done the way it should be. Individual corporations struggle to implement either solution effectively because it’s “overhead.” Especially if you have to build the solution. Elastic Search, Airflow, some Wiki, some platform for hosting all of this, user management, security, etc. Plus monitoring and hopefully some Data freaking Science!

Get this all in one product; Dataiku DSS. It is a centralized, data platform that allows you to connect to multiple data stores, process data where it lies, create DS models, deploy them and monitor them in one stop. It can also generate model documentation to be reviewed so we don’t have another Apple Credit card incident. Plus, it allows SME’s to contribute to the process via “Visual Recipes.” I hear all of the DS’s now crying about visual tools. You can still code the cool bits, in your own IDE or a notebook. This just lets SMEs do easy stuff, like Normalize Date/time fields, expand Log file attributes, or a couple of hundred other things to prep your data.

The easy example they use in their 101 Tutorial, is normalizing T-shirts. M-T-Blk should be Men-T-Black to match all of the other records in the table. Done visually, so the SME doesn’t have to worry about learning Python, Rust, Go, R, Julia, etc.

DSS also has “Project” level wikis. Document your heart out with common wiki tools. It’ll be attached to your project and you can even link to specific parts of your flow for further clarification. And when your curated data is ready, you can share it easily. Other users can find shared dataset using the Data Catalog and/or Feature Store that is also included.

I agree with the concepts of Datamesh. Just know it’s a concept, not a product and it will take a lot of work to implement. Should you? Shmaybe… Do you have Data Products that warrant the investment for building a mesh?

I imagine a large company with may unique datasets with a centralized DS Team. They allow their SMEs to “Curate” data sets for exposure to the corporate hub. Then the DS Team can search the Data Catalog to find appropriate data sets, read about how they were curated and use them in unique projects as needed.

According to the above definition; Dataiku DSS can be a Datamesh if you use it correctly. Just my $0.02.


Grease Monkey ~~ GM
Posted in Opinion | Tagged | Comments Off on Defining and (probably) debunking “The Datamesh”

Amazon Website Feedback does not exist.

I’m an Amazon whore. My account was created in 1997, when they were a book store. I got a pre-GA Alexa invitation. I still order most of my consumables from Amazon. They are, IMHO, in decline as a whole.

Alexa used to be smart, fun and sometimes funny. Now, she’s irritating and stupid. I went from 7 Alexa devices to 1. One that I don’t use. At all. And I don’t miss it. I’m sure there are plenty of examples on the webs (sic) to support this assertion.

Amazon.com, is surprisingly, shockingly not interested in what you think about the site. (I might be persuaded to believe they rely on “other metrics” to determine customer satisfaction, but…) There is no way to leave feedback on your searching/shopping experience. Their search engine sucks ass. Search for 2GB SSD M2 and see all of the results for 512MB, 1GB, etc. Seriously? Did you not pay your ElasticSearch bill?

Worse, and this is the thing that pushed me over the edge, this time, is: Subscribe and Save. I have Toilet paper on S&S. It’s a long gap, because I’m a single dude. I noticed that I need another shipment (I refrained from that pun) but there in no process to say, “Hey, instead of August, I need it next week!”

I also don’t have a way to opt out of that stupid ass QVP/Shopping Network style video that auto plays when I click on todays deals. I fucking hate that shit with mad passion.

I admit I’m part of the problem, because I still buy from them. A lot; probably too much.

Elon Musk please go kick Bezos in the balls and tell him to fix this shit. He’s an embarrassment to the “Billionaire Boys Club.” IMHO.


Grease Monkey ~~ GM
Posted in Uncategorized | Tagged , | Comments Off on Amazon Website Feedback does not exist.

You’re fired if you’re wrong.

Being a sysadmin type, I get irked by things like, “why don’t we have version 9.x, it came out last week and solves our problems,” or why can’t I just have a VM where I can install anything I want?”

There are many, very many, examples of new (most current) libraries that are broken, subject to spyware or worse. Do your own research, I’m not your mommy.

So, rather than explaining, yet again, that life on the bleeding edge of IT involves significant risk, I say let’s try this approach. Make developers responsible for version upgrades. With the caveat, that if they bring in something that breaks or exposes the project to hacking, they’re fired with prejudice!

I’m not talking about latent bugs in SSH, etc. I mean they read the release notes and they should have seen the problem. “Major change to SECAUTH” “We added SSO” “Now OKTA comaptible” That shit needs to be vetted before I’d put it in my systems.

Also, please NEVER build your project from https://github.com/myproject/latest.


Grease Monkey ~~ GM
Posted in Administration, Experience, Opinion, Rant | Comments Off on You’re fired if you’re wrong.

FedEx is F’d up

So I ordered some flip flops. When I got the email that they had been delivered. I went outside to look around and they’re not anywhere. I tracked the package via FedEx. Hilarity ensued. And I never got my package.

Monday, April 18, 2022
12:05 PMWilkes-Barre, PADelivered
6:05 AMPITTSTON, PAAt local FedEx facility
3:34 AMNORTHAMPTON, PADeparted FedEx location
2:55 AMNORTHAMPTON, PAArrived at FedEx location
Sunday, April 17, 2022
10:18 PMSTRASBURG, VAIn transit
10:10 AMSALISBURY, NCIn transit
Saturday, April 16, 2022
10:09 PMORLANDO, FLDelayPackage delayed
9:57 PMORLANDO, FLDeparted FedEx location
5:16 AMORLANDO, FLArrived at FedEx location
Friday, April 15, 2022
3:25 PMDAVENPORT, FLShipment exceptionBarcode label unreadable and replaced
10:24 AMDAVENPORT, FLIn transit
7:06 AMDAVENPORT, FLArrived at FedEx location
7:04 AMDAVENPORT, FLAt local FedEx facility
3:38 AMORLANDO, FLDeparted FedEx location
12:36 AMORLANDO, FLArrived at FedEx location
Wednesday, April 13, 2022
10:26 PMNORTHAMPTON, PADeparted FedEx location
3:01 PMNORTHAMPTON, PAArrived at FedEx location
7:21 AMFAIRVIEW TWP, PAIn transit
Tuesday, April 12, 2022
5:00 PMSANTEE, SCIn transit
4:52 AMDAVENPORT, FLDeparted FedEx location
Monday, April 11, 2022
1:41 PMDAVENPORT, FLArrived at FedEx location
3:44 AMDAVENPORT, FLArrived at FedEx location
3:42 AMDAVENPORT, FLAt local FedEx facility
Sunday, April 10, 2022
2:58 PMDAVENPORT, FLIn transit
Saturday, April 9, 2022
11:00 PMORLANDO, FLDeparted FedEx location
5:23 AMORLANDO, FLArrived at FedEx location
2:10 AMDAVENPORT, FLDeparted FedEx location
Friday, April 8, 2022
1:54 PMDAVENPORT, FLArrived at FedEx location
5:23 AMDAVENPORT, FLAt local FedEx facility
1:16 AMORLANDO, FLDeparted FedEx location
Thursday, April 7, 2022
2:14 PMORLANDO, FLArrived at FedEx location
11:51 AMBELLE ISLE, FLIn transit
Wednesday, April 6, 2022
10:22 PMWADE, NCIn transit
10:21 AMNORTHAMPTON, PADeparted FedEx location
3:12 AMNORTHAMPTON, PAArrived at FedEx location
Tuesday, April 5, 2022
11:13 PMNORTHAMPTON, PAIn transit
5:53 AMFAYETTEVILLE, NCIn transit
Monday, April 4, 2022
4:56 PMBELLE ISLE, FLIn transit
Sunday, April 3, 2022
9:12 PMDAVENPORT, FLDeparted FedEx location
Saturday, April 2, 2022
6:23 PMDAVENPORT, FLDelayPackage delayed
4:23 PMDAVENPORT, FLArrived at FedEx location
5:44 AMDAVENPORT, FLArrived at FedEx location
5:41 AMDAVENPORT, FLAt local FedEx facility
Friday, April 1, 2022
8:56 AMDAVENPORT, FLDelayPackage delayed
8:45 AMDAVENPORT, FLAt local FedEx facility
4:56 AMORLANDO, FLDeparted FedEx location
Thursday, March 31, 2022
11:33 PMORLANDO, FLArrived at FedEx location
Wednesday, March 30, 2022
10:27 PMNORTHAMPTON, PADelayPackage delayed
10:17 PMNORTHAMPTON, PADeparted FedEx location
3:38 PMNORTHAMPTON, PAArrived at FedEx location
Monday, March 28, 2022
10:19 PMDAVENPORT, FLDelayPackage delayed
10:08 PMDAVENPORT, FLDeparted FedEx location
4:47 AMDAVENPORT, FLArrived at FedEx location
Saturday, March 26, 2022
12:18 AMDAVENPORT, FLArrived at FedEx location
Friday, March 25, 2022
8:43 PMAddress corrected
8:43 PMDAVENPORT, FLReturning package to shipperUnable to deliver shipment – Returning to shipper
5:15 AMDAVENPORT, FLDelayPackage delayed
5:03 AMDAVENPORT, FLAt local FedEx facility
Thursday, March 24, 2022
3:42 PMDAVENPORT, FLDelivery exceptionRefused by recipient – Order canceled
10:36 AMClermont, FLDeliveredSignature Service not requested.
10:36 AMDAVENPORT, FLDelivery exceptionRetrieved shipment
7:44 AMDAVENPORT, FLOn FedEx vehicle for delivery
7:40 AMDAVENPORT, FLArrived at FedEx location
7:38 AMDAVENPORT, FLAt local FedEx facility
4:13 AMORLANDO, FLDeparted FedEx location
12:17 AMORLANDO, FLArrived at FedEx location
Tuesday, March 22, 2022
11:05 PMPITTSTON, PALeft FedEx origin facility
3:43 PMPITTSTON, PAShipment arriving On-Time
3:32 PMPITTSTON, PAArrived at FedEx location
12:21 PMShipment information sent to FedEx
12:00 AMPITTSTON, PAPicked up

Grease Monkey ~~ GM
Posted in Uncategorized | Comments Off on FedEx is F’d up

buzzfeed.com is evil

This shouldn’t be news to anyone, but it needs to be said. Buzzfeed has no, Zero, probably negative value on a healthy society. So why does it exist?

I could go on at length from a biblical perceptive, but let’s focus on the local perspective. (See how when I say “but” nothing before the but matters?

Buzzfeed is class cannibalism. If you’re rooting for the rich peoples (Projecting your life onto theirs) or rooting for the “take down” pieces (fuck those elite fucks), either way you win. You can view everything they write from your own perspective and feel like a winner.

The only way you win this game is to totally fucking ignore anything they “publish”. There are probably a few other sites that should be ignored, but I don’t know your newsfeed. BTW, Twitter is NOT news.

IMHO.


Grease Monkey ~~ GM
Posted in Uncategorized | Comments Off on buzzfeed.com is evil

Time Limits for spare or “fixable” hardware

A few months ago I was thinking about buying an SSD to put in a FreeNAS system. Mind you, I already have 2 free standing NAS devices using traditional spinning disks. I was considering this because I can download a 2.4G file in about 1 minute to my OS Disk, and M2 SSD. It takes nearly 5 mins to download it directly to my NAS. Then I realized I could by a 2TB M2 SSD for $200 and use it instead. I still have my NAS for backup.

The point is, I was considering using old tech to solve a problem because I had it laying around when a “better” new tech solution was available cheaply.

I see IT organizations doing similar things. 5 year old technology finally broke and was replaced. The old tech still had “usable” or “fixable” parts and was put in a corner. A few years later it’s still there, covered in dust, and taking up space. It’s worthless and now takes effort to recycle it. Worse, you have several other things in similar condition that have been impeding your work by being in the way.

Those old laptops, tablets and PC’s aren’t getting any younger. Given the rate of change in technology, if it’s over 3 years old, it’s probably obsolete and better replaced than reused. This is not a hard rule, but you should always consider the reasonable lifetime when keeping replaced tech. Puppy Linux exists to give new life to old hardware. Worth it? Not for most.

I’m currently helping an organization that doesn’t have full time IT. Well intended people who want to save money, stack “fixable” or “old but running” laptops all over the place. So much so, that it’s now difficult to find anything. Your 10+ year old desktop it not worth re-purposing.


Grease Monkey ~~ GM
Posted in Uncategorized | Comments Off on Time Limits for spare or “fixable” hardware

That time I lost all of my code

Just prior to my time at my second start up, I was a Network Admin (Network 386) for a small plastics company. I’d leave that job and work with Gary and Phil for a few hours most evenings. I was also “working” at a PC Store selling and repairing PC’s. This is where I had access to the BigMouth and SatisFAXtion boards. They were in a demo computer at the shop.

I fully developed the phone to fax solution on this system and we coordinated demos to potential investors on the same PC.

One day I got a call that said we had a Demo on Monday morning and please make sure things are working as expected. I went to the store and discovered that they system had been sold. With my code on it.

I called Phil and explained the situation. I had a PC we could use so he bought the BigMouth and Fax card and asked me to meet him in the office Sat. morning. For the next 14 hours I rewrote my entire code base, including re-recording all of the voice prompts, re-creating the templates and reloading all of the data. Phil kept the coffee and pizza coming and by the end of the night we had a new working demo.

It was interesting to re-learn some of the things I had to learn while developing it the first time around. One that I still remember is that the BigMouth library return 65535 for a certain error code, but the var type Pascal used thought it was -1.


Grease Monkey ~~ GM
Posted in Uncategorized | Tagged | Comments Off on That time I lost all of my code

Green Field Linux

There were only a few things that had been determined. We were using SuSE Linux, we would (eventually) have Active Directory for Authentication/Authorization and were running on HP/Compaq servers. We started with physical, book style systems and used SAN for large storage requirements.

I learned how to use YaST in place of kickstart and started building the first project. We were running IBM Websphere and our first test was setting up new system, that would have otherwise been run on AIX. I didn’t realize until later this was more of a POC than a genuine effort. I highly expect the VP of my area was expecting Linux to bomb so he could keep AIX.

Fortunately, the project was a huge success and highly cost effective. The next few projects were large enough we needed to hire (rent) some additional administrators. And since I was the only full time Linux employee, I got to be in charge.

Given that we had no legacy BS to hold us back, we started doing things the right way. No servers were hand built. There was a minimal base YaST script and we had templates to add required libraries for things like Websphere (WAS). Working with the networking teams, I was able to get a “next server” added to certain networks so we could have PXE kick off the build. PXE has this cool feature where it will look for a filename based on it’s MAC address and work it’s way to less specific files if it can’t find it. Finally it will look for a file called “default” where we tell it to boot to local HD. On a new server this triggers a reboot and the process continues until we’re ready to build.

After we’d set up some common NAS mount points for tools and home dirs, etc. we we’re able to “push the button” and build a server in about 7 minutes. Gathering all of the data before pushing the button took a while longer. Someone had to assign an IP Address. We had to determine a hostname (using a naming convention of meaningful letters and numbers). We had to get SAN Allocated, etc. etc. We usually built using the shared NAS drives to ensure consistency. Someone challenged our numbers to I ran a special build going directly to a local FTP server and it built in just over 3 minutes.

Once we have all of the details specified, we use the YaST Templates to build a custom YaST file and put it in the appropriate PXE “next server” directory. We started off with some bash scripts to do this and stored all of the data in a MySQL Database. Brainstorming with some of the hired help, we built out a pretty cool Web UI system call SPAN. Server Provisioning and Notes (or Notebook).

SPAN became the tool for all of our gathering stuff. We specified templates, hostnames, update schedules, business owners and basically anything we thought we’d need to deal with a server. It was pretty awesome. We had change management and consistency tools, we had monitoring, version control, templates and all of the good IT things a system should have. We even ran an LDAP Server for the first several months until the AD integration could be worked out.

In short, it was a top tier platform. Our very first Audit was rated “Generally Acceptable” by the auditors, which if you’ve ever been audited, you know is an A+. One of the issues was something like “Former employees should be removed from Groups they were members of.” 1. Their accounts were disabled, so the groups don’t matter. 2. We weren’t in charge of managing those groups. There were 2 or 3 other items, but they were all a nit picky.

The system continued to cruise along until the VP was ready to strike. I was moved to a new team that was supposed to help with automation. The manager was on a PIP and we only had $500k budget for a tool that was supposed to cost $750k or so. We spent a few weeks installing and learning the tool and were able to demonstrate self serve host provisioning on both Linux and Windows. Eventually they spent our budget on something else and I … Moved on to Hadoop, which is where this whole blog started.


Grease Monkey ~~ GM
Posted in Experience | Tagged | Comments Off on Green Field Linux

Data Center and Linux

I got my first Linux CD ROM in 1995, Walnut Creek from 1994. My previous experience with DOS and CP/M left me longing for true multitasking. I had a few web hosts over the years but nothing super serious. I basically wanted to control my own mail and web servers. The mail server lost luster when I got my Gmail invite. But I had experience.

That experience got honed working for an online check system company. I was running a backup data center. We were installing VMWare hosts and then RHEL VM’s as needed. We had a lot of custom web servers running that we built using kickstart. I learned a lot about manipulating kickstart scripts and dealing with hardware. After about 18 months I ran across an opportunity to build a Linux platform “green field” style. The large, Fortune 20, enterprise was running AIX and Windows. The closest thing they had to a mainframe was an IBM AS/400 system or two.

So… I took the new gig.


Grease Monkey ~~ GM
Posted in Experience | Tagged | Comments Off on Data Center and Linux

The Big Idea(s)

During the dot com boom, I was working in the software distribution world. At the time the world was worried about running out of IP addresses. I was irritated by this as IPv61 had been around for quite some time and I didn’t understand why more people weren’t using it. All of the then current operating systems supported it, so….

I put a plan together to build a second internet using IPv6. My idea was to build a “virtual backbone” using some existing networking and firing up some dark fiber. I wanted to partner with local cable companies to run the “last mile” and deal with managing customers. We were going to use a federated Identity Management system so you could tell who was who. The idea was fleshed out to bring in other players to incentive-ize adoption and expansion.

Taking advantage of the features in IPv6 we could use QOS to stream TV and Audio, plus keep all traffic secure using it’s built in encryption, IPSec. And relieve the IPv4 IP issues.

It was good enough to warrant a “shark tank” like trip to Charlotte, NC. An investor there had several VC connections for whom he sourced ideas. His usual process was a 2 1/2 hour lunch with 7 – 10 VC’s where ideas were given 15 minutes to make their pitch. For me, he invited his VC’s and we had a 3 hour dinner where I was the only pitch. By the end, I was offered $5M to do a test in a local market.

Within 1 week of that meeting, the dot com bubble popped and No one wanted anything to do with the idea.

I also pitched the idea that we could set up our own Master DNS server to offer more domains than were currently available. At the time the .com domain selling business was 100’s of millions of dollars per year. We were going to do all the fun ones. .mail, .xxx, .mall, .person, .family, etc. etc. Setting your own DNS is pretty trivial and Google and Cloudflare are late to the party IMHO.

1 https://en.wikipedia.org/wiki/IPv6


Grease Monkey ~~ GM
Posted in Experience | Tagged | Comments Off on The Big Idea(s)