networking

2811 readers

1 users here now

Community for discussing enterprise networks and the ensuing chaos that comes after inheriting or building one.

founded 1 year ago

MODERATORS

[email protected]

Requesting a sanity check...help untangle my home network as I expand into more advanced networking? (lemmy.world)

submitted 8 months ago by [email protected] to c/[email protected]

12 comments fedilink hide all child comments

Hey there, I've been on a networking journey that has, over a few years, taken me from simple unmanaged networking, to managed networking, to advanced VLAN management. It's all been self taught, but mostly successful. However, I've gotten myself into a bit of a pickle and I'm hitting a wall in troubleshooting. Apologies for the length of the post, however I want to provide as much detail as possible.

High level, I have several /16 vlans for things. VLAN 99 is networking, 2, is servers, 4 is clients, 6 is wireguard clients, and there are some others. They're all 10.99.0.0/16 with a gateway at 10.99.1.254, etc.

I have had a very old Netgear Layer3 switch for some time. I've replaced it with a Brocade ICX6610, mostly so I can move my storage infrastructure to 10G fiber (I have a small hypervisor cluster). I had done a ton of preparatory work to configure the new L3 switch so that it could just be dropped in place of the old one; this was MOSTLY successful...

...However, in doing that I broke the connection to my opnsense firewall and sort of had to redo that piece from scratch. During my planning, I didn't realize some of the config changes I'd made would require changes on the firewall, and after the cut over I was locked out of the firewall. This is all my fault; that's the piece of this I understand the least, and I had followed dodgy guides when getting it to initially work. I have a backup in xml format, but even having that I'm realizing what I had been doing didn't make sense. Previously, I had a firewall interface on all of my vlans and the trunk going to it was carrying all the VLANS. Now, I set this up with only 2 vlans going to the firewall, the networking vlan and the wireguard vlan, as it seems to make more sense with my understanding of how Layer 3 routing works. All routing should happen on the Brocade L3 switch. The firewall itself has 4 physical ports, 1 going to my comcast gateway, and 2 in an LACP lagg going to my L3 switch. (I have a single interface right now going to the L3 switch separately for troubleshooting, removing the LACP lag as a complexity source).

So, in recovering this, I had to get into the firewall at the console and re-define the interfaces and IP's. I got this to work, but at this point I had tons of connection problems which I didn't understand fully. I have found some of opnsense's configuration to be a bit obfuscating, which I think is making my learning more difficult. The following were put in place:

The "LAN" interface was given a static 10.99.1.40/16 IP, and an upstream gateway was defined at 10.99.1.254.
The "WAN" interface was given DHCP, and is up and works

Once I recovered the connection to the web interface I had to make the following changes:

Under the "Firewall" sidebar, under "Aliases", I defined each of my VLANS/Subnets with a CIDR notation and a name.
Under the "Firewall" sidebar, under "NAT" and then under "Outbound" I switched the mode to "hybrid" and added a rule for each of my vlans on the "LAN" interface, with the "Source" being the aliases defined above, and the target (NAT Address) being the "WAN address"
Under the "Firewall" sidebar, under "NAT" and then under "Port Forward" I added some port forward rules.
While it's outside the scope of my immediate troubleshooting, I had a working WireGuard setup. I have an interface defined for it on that VLAN, and a second gateway defined at 10.6.1.254. It's all set up according to the opnsense documentation, and I can connect from the WAN and can access any resources on the LAN.

So onto the problem...I can access the internet from almost all of my LAN clients. I can access LAN clients via the port forward rules from the WAN. The firewall itself CANNOT access the WAN; for example, I can't check for updates. I can access the firewall web interface from anywhere on the LAN, I can ssh to the firewall from anywhere on the LAN, but once I'm ssh'd in, I can't ping back to the client I'm connecting from. The firewall CAN ping things like 8.8.8.8, but as my DNS resolver is on the LAN, DNS queries from the firewall fail. I believe in a related note, my WireGuard clients can access anything on the LAN, but cannot connect to anything on the WAN.

I believe this has to do with outbound routes from the firewall, but any time I mess with it I end up locking myself out and having to reset interfaces from the console. I tried defining some static routes in "System" -> "Routes" -> "Configuration" but that isn't working. I'm kind of stumped and have been looking at it so long that I don't think more reading and configuring is going to help me anymore. I'll post some screenshots of rules and routes as well (you'll be able to see various things enabled/disabled for experimentation), but I'm kind of in over my head and need some help.

top 12 comments

sorted by: hot top controversial new old

[–] [email protected] 0 points 8 months ago (1 children)

First off, if your firewall can ping 8.8.8.8 it can access the WAN, as 8.8.8.8 (hopefully, or you have bigger issues) is on the WAN. It not being able to do updates etc is probably a DNS issue in that case, probably caused by your firewall not being able to access your DNS server due to improper configuration on either the firewall, the switches or the DNS server itself.

Is your DNS server allowing clients coming from subnets other than its own? Can your Wireguard clients also ping 8.8.8.8? If so, they probably share the DNS issue with your firewall.

I would recommend trying to debug this iteratively, as this sort of problem has a lot of potential error sources that is hard to know of no matter how many screenshots you provide, like the configuration of your switches and DNS server. Try this:

Computer A cant reach computer B. What is the IP of A? What is the subnet of A? If it is different from the subnet of B, what route should it take to reach B? What is the next step on that route? Can we successfully reach this next step? Does the next hop on the route know where to go to reach the subnet for B? If so, what is the next step? Repeat until we've reached B, ideally ensuring each step on the way is acting as it should either trough something like wireguard or the built in tools of your firewall/switch/gateway/etc.
Assuming the problem hasnt been found, repeat from B to A, as responses might not reach us resulting in a broken connection even if we can reach B.
If the routing makes sense, is there a firewall on the way that doesnt allow us to reach B from A? Can we instead reach A from B? If not, we've found the problem.

I would strongly reccomend drawing your network layout (or at least the route you are trying to debug) in a flowchart tool (diagrams.net being a good option), as it is extremely hard to keep track of everything otherwise.

[–] [email protected] 0 points 8 months ago (1 children)

Ok, it's definitely an issue with the firewall not sending traffic from itself to LAN. It's weird, it's passing traffic, but it cannot ping or access anything on the LAN including things on the 99 VLAN (so it's local VLAN). The DNS requests are for sure failing from the firewall...but they work fine for the rest of the LAN. Any client can get a DNS response from the DNS server on the 2 VLAN, and can access the resulting site.

For now, I'm just excluding the wireguard thing, I think it's a distraction to the problem that the firewall has some sort of bad routing going on.

I have a diagram, but at this point it's pretty local to the firewall itself and I think it's around the gateway/route configuration. I got some advice on the opnsense forum that my static routes are wrong; they say to make a single static route of 10.0.0.0/8 instead of one for each VLAN, turn off "upstream gateway" on the LAN GW (which when I do that I lose all WAN connectivity...which is a concern but I can revert). When I do the cli configuration, and I assign an IP for LAN, it asks if I want to put a gateway; it kind of says "it should be yes for wan, but no for LAN" but if I do no, I can't access the internet from any clients, and if I do yes, it ticks "upstream gateway" on the lan gateway. Something is awry, but I'm gonna try again after making some static route changes.