UPDATE: the outlined logic will break if your TFS instance has a mix of pre-AzDevOps collections with recently (2019+) created ones. Details here. The linked gist was updated, too.
Enter reflection. First, I've created a separate database specifically for federation. For the dozen or so tables that I'm interested in, I've created views that UNION together all copies of the tables in all collections. So, for example, for tbl_Definition, which holds build definitions, I have a view in my custom database, which goes:
create view dbo.tbl_Definition
as
select {guid'00000000-1111-2222-3333-444444444444'} as CollID, *
from Tfs_FirstCollection.Build.tbl_Definition
union
select {guid'11111111-1111-2222-3333-444444444444'} as CollID, *
from Tfs_SecondCollection.Build.tbl_Definition
union
... --and so on, for all collections
But I didn't create them by hand. Too many tables, too many collections. Instead, I've written a procedure that creates a view like this, given a schema and a table name. It uses Tfs_Configuration.dbo.tbl_ServiceHost to retrieve a list of collections, creates a CREATE VIEW statement with a UNION cascade as a string variable, then executes said statement using EXECUTE().
I've made the procedure available as a gist.
One thing it doesn't do - in TFS, some tables belong to non-dbo schemata. My procedure creates all the federating views under dbo.
When you add or delete a collection, you need to rerun this procedure for all involved tables. It makes sense to maintain another procedure with a sequence of EXEC ImportTFSTable... calls for all tables you're interested in.
Note the use of FOR XML for string concatenation. A little bird told me that MSSQL 2017 has built-in, no-hassle string concatenation, but this procedure existed long before that.
I know TFS has its own warehouse database. It's rather hard to navigate though, doesn't aggregate all the data that I want (notably, the build/release stuff is missing), and it's not realtime.
When joining cross-collection views, pay attention to matching CollID where necessary. For example, DataspaceId is unique only within its home collection, not globally, so when joining tbl_ReleaseDefinition to tbl_Dataspace, you have to do it by two fields - CollID and DataspaceId.
Naturally, this is wildly unsupported. Performance degradation from running heavy queries against the live production data is to be expected.
No comments:
Post a Comment